dynamodb table clone into opensearch

0

I am looking to create a copy of a dynamodb table in opensearch (so important requests can be sent to our db and queries can be sent to the db copy/openseatch). I am doing this in terraform, do I not need to do an initial snapshot of the database? I thought the process was to do an initial snapshot of the db which is then stored in s3, and any changes to do db are then streamed to the opensearch domain? or will the ingestion pipeline (Amazon DynamoDB zero-ETL integration) do this?

1 Answer
1

Hi

There are two ways that you can use DynamoDB as a source to process data using the ingestion pipeline—with and without a full initial snapshot.

A full initial snapshot is a backup of a table. DynamoDB uploads this snapshot to Amazon S3. From there, an OpenSearch Ingestion pipeline sends it to one index in a domain, or partitions it to multiple indexes in a domain. To keep the data in DynamoDB and OpenSearch consistent, the pipeline syncs all of the create, update, and delete events in the DynamoDB table with the documents saved in the OpenSearch index or indexes.

When you use a full initial snapshot, your OpenSearch Ingestion pipeline first ingests the snapshot and then starts reading data from DynamoDB Streams. It eventually catches up and maintains near real-time data consistency between DynamoDB and OpenSearch. When you choose this option, you must enable both PITR and DynamoDB streams on your table.

Below is the Blueprint used by default when you are creating zero-ETL jobs from the console, which is with the full initial snapshot.

So in your case, you will have to use Terraform to create the Opensearch Ingestion pipeline with this Blueprint:

version: "2"
dynamodb-pipeline:
  source:
    dynamodb:
      acknowledgments: true
      tables:
        # REQUIRED: Supply the DynamoDB table ARN and whether export or stream processing is needed, or both
        - table_arn: "<<arn:aws:dynamodb:us-east-1:123456789012:table/MyTable>>"
          # Remove the stream block if only export is needed
          stream:
            start_position: "LATEST"
          # Remove the export block if only stream is needed
          export:
            # REQUIRED for export: Specify the name of an existing S3 bucket for DynamoDB to write export data files to
            s3_bucket: "<<my-bucket>>"
            # Specify the region of the S3 bucket
            s3_region: "<<us-east-1>>"
            # Optionally set the name of a prefix that DynamoDB export data files are written to in the bucket.
            s3_prefix: "ddb-to-opensearch-export/"
      aws:
         # REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
         sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
         # Provide the region to use for aws credentials
         region: "<<us-east-1>>"
  sink:
    - opensearch:
        # REQUIRED: Provide an AWS OpenSearch endpoint
        hosts: [ "<<https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com>>" ]
        index: "table-index"
        index_type: custom
        document_id: "${getMetadata(\"primary_key\")}"
        action: "${getMetadata(\"opensearch_action\")}"
        document_version: "${getMetadata(\"document_version\")}"
        document_version_type: "external"
        aws:
          # REQUIRED: Provide a Role ARN with access to the domain. This role should have a trust relationship with osis-pipelines.amazonaws.com
          sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"
          # Provide the region of the domain.
          region: "<<us-east-1>>"
          # Enable the 'serverless' flag if the sink is an Amazon OpenSearch Serverless collection
          serverless: false
          # serverless_options:
            # Specify a name here to create or update network policy for the serverless collection
            # network_policy_name: "network-policy-name"
        # Optional: Enable the S3 DLQ to capture any failed requests in an S3 bucket. Delete this entire block if you don't want a DLQ.. This is recommended as a best practice for all pipelines.
        dlq:
          s3:
            # Provide an S3 bucket
            bucket: "<<your-dlq-bucket-name>>"
            # Provide a key path prefix for the failed requests
            # key_path_prefix: "dynamodb-pipeline/dlq"
            # Provide the region of the bucket.
            region: "<<us-east-1>>"
            # Provide a Role ARN with access to the bucket. This role should have a trust relationship with osis-pipelines.amazonaws.com
            sts_role_arn: "<<arn:aws:iam::123456789012:role/Example-Role>>"

Some references: https://aws.amazon.com/blogs/aws/amazon-dynamodb-zero-etl-integration-with-amazon-opensearch-service-is-now-generally-available/ https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/OpenSearchIngestionForDynamoDB.html

profile pictureAWS
EXPERT
answered 2 months ago
  • Great, thanks for this. I have created an ingestion pipeline which is active, so i don't need to seperately/manually do an initial export of my dynamoDB table as the ingestion pipeline will do this? Also, how can I test if my pipeline is working as expected, when I click on the Dashboard URL for my pipeline I get a timeout error? Thanks,

  • Hi @Will

    I missed notification for this comment, are you still facing the timeout error?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions