Skip to content

Access secrets securely

DataPull supports the ability to read secrets (database credentials, access keys, client certificates, etc.) securely without exposing them in plain-text in the inpout JSON, if the secrets are stored in a secret store that can take AWS IAM Principals for authentication.

Client Certificates

Client certificates can be stored as JKS files in an SSE-S3 encrypted S3 bucket, and DataPull can access them if DataPull is installed in AWS.

Store certificates in S3 bucket

Please store the client certificates as JKS files in an S3 bucket that is accessible by the emr_ec2_datapull_role (or custom emr ec2 role) IAM role that was created while installing DataPull on AWS. It is recommended to store the JKS files in an SSE-S3 encrypted S3 bucket, and use an EMR Security Configuration that has access to the S3 bucket.

Copy certificates to EMR using jksfiles input json array

Most Spark Connectors that use SSL client authentication, require the JKS Certificate files to be available on all the executors of the EMR Spark cluster. DataPull makes this possible, using the jksfiles array attribute in the input json. For example, if a Kafka topic requires a JKS certificate file for authentication, then it can be specified as shown in the example below...

{
...
  "destination": {
    "platform": "kafka",
    "bootstrapservers": "broker_server_host_name:broker_port",
    "schemaregistries": "schema_registry_url:schema_registry_port",
    "topic": "some_topic_name",
    "keystorepath": "/mnt/bootstrapfiles/some_keystore_file.jks",
    "truststorepath": "/mnt/bootstrapfiles/som_truststore_file.jks",
    "keystorepassword": "inlinesecret{{\"secretstore\": \"aws_secrets_manager\", \"secretname\": \"some_secret_path\", \"secretkeyname\": \"some_secret_key_name\"}}",
    "truststorepassword": "inlinesecret{{\"secretstore\": \"aws_secrets_manager\", \"secretname\": \"some_secret_path\", \"secretkeyname\": \"some_secret_key_name\"}}",
    "keypassword": "inlinesecret{{\"secretstore\": \"aws_secrets_manager\", \"secretname\": \"some_secret_path\", \"secretkeyname\": \"some_secret_key_name\"}}",
    "jksfiles": [
      "s3://some_bucket_name/some_path/some_keystore_file.jks",
      "s3://some_bucket_name/some_path/some_truststore_file.jks"
    ]
  }
...        
}

Other secrets

Retrieve secrets from AWS Secrets Manager

First, please store the secret in a secret name/path in AWS Secrets Manager that is accessible by DataPull's emr_ec2_datapull_role (or custom emr ec2 role) IAM role that was created while installing DataPull on AWS. Secrets in AWS Secrets Manager are usually stored either as a JSON object or as a binary key/value array. In such cases, please specify the key of the secret object.

Next, use the inlinesecret{{}} syntax anywhere in the input json where you want the secret to be substituted. Examples of this syntax can be found in the partial input json above.