AWS OpenSearch Serverless - Bulk API - "Document ID is not supported in create/index operation request"

0

There seems to be conflicting documentation relating to OpenSearch's bulk API, specifically related to "Create" and "Index" actions.

Error

If I run a simple bulk request with the following body:

{"index": {"_id": "63a3b4074244c5b760010f1f", "_index": "index-client"}}
{"message": "My log message"}

I get the following error in the response:

{
  'items': [
    {
      'index': {
        'status': 400,
        '_id': '63a3b4074244c5b760010f1f',
        'error': {
          'reason': 'Document ID is not supported in create/index operation request',
          'type': 'illegal_argument_exception'
        },
        '_index': 'index-client'
      }
    }
  ],
  'errors': True,
  'took': 0
}

Question

Everything I can find (see docs below for reference) suggests that the Bulk API for the supported 2.0.x version of OpenSearch looks like the following:

POST _bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }

with the Newline Delimited JSON (see https://www.ndjson.org) format separating actions and documents. If we focus on the "Create" and "Index" actions, these allow you to specify an "_id" field in the action. For "Index" actions, this will: "create a document if it doesn’t yet exist and replace the document if it already exists."

So, how is the "Index" action supposed to work if the action can't take an "_id" to correlate it with an existing document?

You might think that maybe it pulls the "_id" field from the document itself instead of from the action, but this will lead to the following error:

Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.

Googling "Document ID is not supported in create/index operation request" reveals nothing. I'm at a loss for how to do Bulk Index operations for AWS OpenSearch Serverless.

Documentation

OpenSearch docs

(Version 2.0 because "Serverless collections currently run OpenSearch version 2.0.x.")

AWS docs

Elastic docs

  • I have switched to using Domain-based OpenSearch and this problem goes away. It's something to do with Serverless service not adhering to the OpenSearch bulk API

1 Answer
1

Amazon OpenSearch Serverless supports ingestion with "id" if collection type is search, document ingestion with id is not supported for time-series collection.

Can you confirm, what kind of collection you are creating? Also, it has been documented in the AWS docs under aoss:WriteDocument here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-genref.html#serverless-operations PUT <index>/_create/<id> (for search collection types only)

Let me know if this helps.

AWS
answered a year ago
  • I believe you are correct - I was using a time-series collection. I didn't see that comment in the documentation, nor did I see the note in the collection differences here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-usecase. Thank you.

  • What would be the recommend practice to ensure idempotency of records being batched into an index?

  • I am new to AOSS. I have a serverless vector opensearch collection. I am trying to use langchain to add embeddings to the db

    My code -

    def opensearchvector(docs, embedding_function, opensearch_url, awsauth, index_name): return OpenSearchVectorSearch.from_documents( docs, embedding_function, opensearch_url=opensearch_url, http_auth=awsauth, use_ssl=True, verify_certs=True, http_compress = True, connection_class=RequestsHttpConnection, index_name = index_name, engine="faiss" )

    This worked for me a few days ago, but recently it has started giving me the illegal_document_exception - ('1 document(s) failed to index.', [{'index': {'_index': 'json-test-index', '_id': 'cec36b7c-7140-45cb-aaa0-ddede89ed5af', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Document ID is not supported in create/index operation request'}. I am not adding any indexes to my documents but still i get this error

  • Hi, any updates on this? I am running the exact same issue as @Ayush Bhura

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions