如何通过增加单个集合中的文档数量来提高 ArangoDB 2.7 中的检索查询性能答案

【问题标题】：How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection如何通过增加单个集合中的文档数量来提高 ArangoDB 2.7 中的检索查询性能
【发布时间】：2016-02-08 16:31:25
【问题描述】：

我已按以下格式将数据存储在 arangoDB 2.7 中：

    {"content": "Book.xml", "type": "string", "name": "name", "key": 102}
    {"content": "D:/XMLexample/Book.xml", "type": "string", "name": "location", "key": 102}
    {"content": "xml", "type": "string", "name": "mime-type", "key": 102}
    {"content": 4130, "type": "string", "name": "size", "key": 102}
    {"content": "Sun Aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102}
    {"content": "Wed Jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102}
    {"content": "catalog", "type": "tag", "name": "root", "key": 102}
    {"content": "book", "type": "string", "name": "tag", "key": 103} 
    {"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103}
    {"content": "Gambardella, Matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031} 
  {"content": "XML Developer's Guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}
    {"content": "Computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}
    {"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}
    {"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}
    {"content": "An in-depth look at creating applications with XML.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}

正如我将文档数量增加到 1000、10000、100000、1000000、10000000 等等。平均查询响应时间随着文档数量的增加而增加，从 0.2 秒到 3.0 秒不等。我已经在这个集合上创建了哈希索引。我的问题是我们是否可以通过增加文档数量来减少这种情况。

另一方面，我还在内容组件上创建了一个全文索引，在全文搜索中也会发生同样的事情，响应时间从 0.05 秒到 0.3 秒不等。

那么告诉我有什么办法可以进一步减少这个时间..

请告诉我我们可以进一步缩短响应时间吗？

【问题讨论】：

你在数据上运行什么查询？
查询格式为： FOR k IN DSP FOR p IN k.data filter p.name == "modified_date" || p.type == "string" 返回 p.
答案是否满足您的需求？如果没有，缺少什么？如果，你能把它标记为接受吗？

标签： python arangodb aql

【解决方案1】：

不能在嵌套FOR 语句的第一级中使用索引。但是，从 ArangoDB 2.8 开始，您可以使用 array indices：

您查询的值是data.pname[*].name 和data.pname[*].type，所以让我们为它们创建索引：

db.DSP.ensureIndex({type:"hash", fields: ['data[*].type']});
db.DSP.ensureIndex({type:"hash", fields: ['data[*].name']});

现在让我们重新制定查询，以便它可以利用这个索引。我们从一个简单的版本开始进行实验，并使用 explain 重新验证它实际上使用了索引：

db._explain('FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k')
Query string:
 FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR k IN DSP   /* hash index scan */
  5   ReturnNode         1       - RETURN k

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields               Ranges
  6   hash   DSP          false    false       100.00 %   [ `data[*].name` ] 
                                              ("modified_date" in k.`data`[*].`name`)

所以我们看到我们可以过滤数组条件，这样您就可以只将要检查的文档放入内部循环：

FOR k IN DSP FILTER "modified_date" IN k.data[*].name || "string" IN k.data[*].type
  FOR p IN k.data FILTER p.name == "modified_date" || p.type == "string" RETURN p

【讨论】：