AWS OpenSearch Serverless - Bulk API - "Document ID is not supported in create/index operation request"

0

There seems to be conflicting documentation relating to OpenSearch's bulk API, specifically related to "Create" and "Index" actions.

Error

If I run a simple bulk request with the following body:

{"index": {"_id": "63a3b4074244c5b760010f1f", "_index": "index-client"}}
{"message": "My log message"}

I get the following error in the response:

{
  'items': [
    {
      'index': {
        'status': 400,
        '_id': '63a3b4074244c5b760010f1f',
        'error': {
          'reason': 'Document ID is not supported in create/index operation request',
          'type': 'illegal_argument_exception'
        },
        '_index': 'index-client'
      }
    }
  ],
  'errors': True,
  'took': 0
}

Question

Everything I can find (see docs below for reference) suggests that the Bulk API for the supported 2.0.x version of OpenSearch looks like the following:

POST _bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }

with the Newline Delimited JSON (see https://www.ndjson.org) format separating actions and documents. If we focus on the "Create" and "Index" actions, these allow you to specify an "_id" field in the action. For "Index" actions, this will: "create a document if it doesn’t yet exist and replace the document if it already exists."

So, how is the "Index" action supposed to work if the action can't take an "_id" to correlate it with an existing document?

You might think that maybe it pulls the "_id" field from the document itself instead of from the action, but this will lead to the following error:

Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.

Googling "Document ID is not supported in create/index operation request" reveals nothing. I'm at a loss for how to do Bulk Index operations for AWS OpenSearch Serverless.

Documentation

OpenSearch docs

(Version 2.0 because "Serverless collections currently run OpenSearch version 2.0.x.")

AWS docs

Elastic docs

  • I have switched to using Domain-based OpenSearch and this problem goes away. It's something to do with Serverless service not adhering to the OpenSearch bulk API

Kent
gefragt vor einem Jahr2826 Aufrufe
1 Antwort
1

Amazon OpenSearch Serverless supports ingestion with "id" if collection type is search, document ingestion with id is not supported for time-series collection.

Can you confirm, what kind of collection you are creating? Also, it has been documented in the AWS docs under aoss:WriteDocument here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-genref.html#serverless-operations PUT <index>/_create/<id> (for search collection types only)

Let me know if this helps.

AWS
beantwortet vor einem Jahr
  • I believe you are correct - I was using a time-series collection. I didn't see that comment in the documentation, nor did I see the note in the collection differences here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-usecase. Thank you.

  • What would be the recommend practice to ensure idempotency of records being batched into an index?

  • I am new to AOSS. I have a serverless vector opensearch collection. I am trying to use langchain to add embeddings to the db

    My code -

    def opensearchvector(docs, embedding_function, opensearch_url, awsauth, index_name): return OpenSearchVectorSearch.from_documents( docs, embedding_function, opensearch_url=opensearch_url, http_auth=awsauth, use_ssl=True, verify_certs=True, http_compress = True, connection_class=RequestsHttpConnection, index_name = index_name, engine="faiss" )

    This worked for me a few days ago, but recently it has started giving me the illegal_document_exception - ('1 document(s) failed to index.', [{'index': {'_index': 'json-test-index', '_id': 'cec36b7c-7140-45cb-aaa0-ddede89ed5af', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Document ID is not supported in create/index operation request'}. I am not adding any indexes to my documents but still i get this error

  • Hi, any updates on this? I am running the exact same issue as @Ayush Bhura

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen