您可以通过使用以下命令启动 mongod 实例来静默此行为:
mongod --setParameter failIndexKeyTooLong=false
或者通过 mongoShell 执行以下命令
db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )
如果您确保您的字段很少会超出限制,那么
解决此问题的一种方法是按字节长度 val,我会将其拆分为字段元组val_1、val_2 等等。 Mongo 将文本存储为 utf-8 有效值。这意味着您需要一个可以正确拆分 utf-8 字符串的函数。
def split_utf8(s, n):
"""
(ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence
An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:
With the high bit set to 0, it's a single byte value.
With the two high bits set to 10, it's a continuation byte.
Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
"""
s = s.encode('utf-8')
while len(s) > n:
k = n
while (ord(s[k]) & 0xc0) == 0x80:
k -= 1
yield s[:k]
s = s[k:]
yield s
然后你可以定义你的复合索引:
db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})
或每个 val_i 有多个索引:
db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})
重要提示:如果您考虑在复合索引中使用您的字段,请注意split_utf8 函数的第二个参数。在每个文档中,您需要删除构成索引键的每个字段值的字节总和,例如对于索引 (a:1, b:1, val: 1) 1024 - sizeof(value(a)) - sizeof(value(b))
在任何其他情况下,使用 hash 或 text 索引。