Given a DocumentDB collection called test
, with 200 documents of the following shape (i.e. containing a simple key/value pair):
{ "num" : 1 }
and an index on the key num
:
{ v: 4, key: { num: 1 }, name: 'num-idx', ns: 'db.test2' }
the index is not used for the following query:
db.test.find({'num': {$exists: true}})
...
winningPlan: { stage: 'COLLSCAN' }
This is consistent with the AWS documentation which states:
Amazon DocumentDB does not currently support the ability to use indexes with the $ne, $nin, $nor, $not, $exists, $distinct, and $elemMatch operators. As a result, utilizing these operators will result in collection scans
However when I recreate the index, specifying it is sparse:
{ v: 4, key: { num: 1 }, name: 'num-idx', ns: 'db.test2', sparse: true }
then the index IS used:
db.test.find({'num': {$exists: true}})
...
winningPlan: { stage: 'IXSCAN', indexName: 'num-idx', direction: 'forward' } }
According to the MongoDB documentation, making an index sparse has the following implications:
- documents missing the indexed keys will not be present in the index
- this makes the index smaller in memory
- you cannot use it for queries containing
{ $exists: false }
There is no mention of the sparsity of an index being a requirement in order for it to be used in the case of { $exists: true }
queries.
The closest to an answer I can find is https://docs.aws.amazon.com/documentdb/latest/developerguide/functional-differences.html#functional-differences.sparse-index piece of AWS documentation which states:
To use a sparse index that you have created in a query, you must use the $exists clause on the fields that cover the index. If you omit $exists, Amazon DocumentDB does not use the sparse index
However, this clashes with the previous documentation quote.
Behaviour for MongoDB (db.test.find({'num': {$exists: <VALUE>}})):
Head | exists:true | exists:false |
---|
sparse index | IDX | COLLSCAN |
non sparse index | IDX | IDX |
Behaviour for DocumentDB (db.test.find({'num': {$exists: <VALUE>}})):
Head | exists:true | exists:false |
---|
sparse index | IDX | COLLSCAN |
non sparse index | COLLSCAN | COLLSCAN |
Is this an unintended loophole?
Can I rely on this behaviour to guarantee that a query containing an { $exists: true }
clause is able to use an index if it is sparse?
Yes, it would be better if the documentation was made clearer but combining these 2 pieces of information in one place:
"Amazon DocumentDB does not currently support the ability to use indexes with the $ne, $nin, $nor, $not, $exists, $distinct, and $elemMatch operators. As a result, utilizing these operators will result in collection scans. Except in the case of sparse indexes which can be used for queries containing a
$exists: true
clause".I am still interested in knowing why this behaviour exists though. As I mentioned above, it doesn't make sense that the sparsity of the index affects whether or not it can be used in a query containing
$exists:true
. This is what I meant by 'unintended loophole'. Because it doesn't make sense, I wondered if this was intentional behaviour. And if not, someone might come along and fix it and suddenly sparse indexes stop working for particular queries.