- Newest
- Most votes
- Most comments
Hi
There are two ways that you can use DynamoDB as a source to process data using the ingestion pipeline—with and without a full initial snapshot.
A full initial snapshot is a backup of a table. DynamoDB uploads this snapshot to Amazon S3. From there, an OpenSearch Ingestion pipeline sends it to one index in a domain, or partitions it to multiple indexes in a domain. To keep the data in DynamoDB and OpenSearch consistent, the pipeline syncs all of the create, update, and delete events in the DynamoDB table with the documents saved in the OpenSearch index or indexes.
When you use a full initial snapshot, your OpenSearch Ingestion pipeline first ingests the snapshot and then starts reading data from DynamoDB Streams. It eventually catches up and maintains near real-time data consistency between DynamoDB and OpenSearch. When you choose this option, you must enable both PITR and DynamoDB streams on your table.
Below is the Blueprint used by default when you are creating zero-ETL jobs from the console, which is with the full initial snapshot.
So in your case, you will have to use Terraform to create the Opensearch Ingestion pipeline with this Blueprint:
version: "2"
dynamodb-pipeline:
source:
dynamodb:
acknowledgments: true
tables:
# REQUIRED: Supply the DynamoDB table ARN and whether export or stream processing is needed, or both
- table_arn: "<<arn:aws:dynamodb:us-east-1:123456789012:table/MyTable>>"
# Remove the stream block if only export is needed
stream:
start_position: "LATEST"
# Remove the export block if only stream is needed
export:
# REQUIRED for export: Specify the name of an existing S3 bucket for DynamoDB to write export data files to
s3_bucket: "<<my-bucket>>"
# Specify the region of the S3 bucket
s3_region: "<<us-east-1>>"
# Optionally set the name of a prefix that DynamoDB export data files are written to in the bucket.
s3_prefix: "ddb-to-opensearch-export/"
aws:
# REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
# Provide the region to use for aws credentials
region: "<<us-east-1>>"
sink:
- opensearch:
# REQUIRED: Provide an AWS OpenSearch endpoint
hosts: [ "<<https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com>>" ]
index: "table-index"
index_type: custom
document_id: "${getMetadata(\"primary_key\")}"
action: "${getMetadata(\"opensearch_action\")}"
document_version: "${getMetadata(\"document_version\")}"
document_version_type: "external"
aws:
# REQUIRED: Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
# Provide the region of the domain.
region: "<<us-east-1>>"
# Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection
serverless: false
# serverless_options:
# Specify a name here to create or update network policy for the serverless collection
# network_policy_name: "network-policy-name"
# Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.. This is recommended as a best practice for all pipelines.
dlq:
s3:
# Provide an S3 bucket
bucket: "<<your-dlq-bucket-name>>"
# Provide a key path prefix for the failed requests
# key_path_prefix: "dynamodb-pipeline/dlq"
# Provide the region of the bucket.
region: "<<us-east-1>>"
# Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
Some references: https://aws.amazon.com/blogs/aws/amazon-dynamodb-zero-etl-integration-with-amazon-opensearch-service-is-now-generally-available/ https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/OpenSearchIngestionForDynamoDB.html
Relevant content
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Great, thanks for this. I have created an ingestion pipeline which is active, so i don't need to seperately/manually do an initial export of my dynamoDB table as the ingestion pipeline will do this? Also, how can I test if my pipeline is working as expected, when I click on the Dashboard URL for my pipeline I get a timeout error? Thanks,
Hi @Will
I missed notification for this comment, are you still facing the timeout error?