【问题标题】:Fulltext Search in arangodb using AQL and python使用 AQL 和 python 在 arangodb 中进行全文搜索
【发布时间】:2015-11-10 11:36:03
【问题描述】:

我已将数据存储在 arangodb 中,格式如下:

{"data": [
{
  "content": "maindb",
  "type": "string",
  "name": "db_name",
  "key": "1745085839"
},
{
  "type": "id",
  "name": "rel",
  "content": "1745085840",
  "key": "1745085839"
},
{
  "content": "user",
  "type": "string",
  "name": "rel_name",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584001",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584002",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584003",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584004",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584005",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584006",
  "key": "1745085840"
},
{
  "type": "id",
  "name": "tuple",
  "content": "174508584007",
  "key": "1745085840"
},
{
  "content": "dspclient",
  "type": "varchar",
  "name": "username",
  "key": "174508584001"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "password",
  "key": "174508584001"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "cpassword",
  "key": "174508584001"
},
{
  "content": "n",
  "type": "varchar",
  "name": "PostgreSQL",
  "key": "174508584001"
},
{
  "content": "n",
  "name": "IBMDB2",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "n",
  "name": "MySQL",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "n",
  "type": "varchar",
  "name": "SQLServer",
  "key": "174508584001"
},
{
  "content": "n",
  "name": "Hadoop",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "None",
  "name": "dir1",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "None",
  "name": "dir2",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "None",
  "name": "dir3",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "None",
  "name": "dir4",
  "type": "varchar",
  "key": "174508584001"
},
{
  "type": "inet",
  "name": "ipaddr",
  "content": "1921680103",
  "key": "174508584001"
},
{
  "content": "y",
  "name": "status",
  "type": "varchar",
  "key": "174508584001"
},
{
  "content": "None",
  "type": "varchar",
  "name": "logintime",
  "key": "174508584001"
},
{
  "content": "None",
  "type": "varchar",
  "name": "logindate",
  "key": "174508584001"
},
{
  "content": "None",
  "type": "varchar",
  "name": "logouttime",
  "key": "174508584001"
},
{
  "content": "client",
  "type": "varchar",
  "name": "user_type",
  "key": "174508584001"
},
{
  "content": "royal",
  "type": "varchar",
  "name": "username",
  "key": "174508584002"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "password",
  "key": "174508584002"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "cpassword",
  "key": "174508584002"
},
{
  "content": "n",
  "type": "varchar",
  "name": "PostgreSQL",
  "key": "174508584002"
},
{
  "content": "n",
  "name": "IBMDB2",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "n",
  "name": "MySQL",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "n",
  "type": "varchar",
  "name": "SQLServer",
  "key": "174508584002"
},
{
  "content": "n",
  "name": "Hadoop",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "None",
  "name": "dir1",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "None",
  "name": "dir2",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "None",
  "name": "dir3",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "None",
  "name": "dir4",
  "type": "varchar",
  "key": "174508584002"
},
{
  "type": "inet",
  "name": "ipaddr",
  "content": "1921680105",
  "key": "174508584002"
},
{
  "content": "y",
  "name": "status",
  "type": "varchar",
  "key": "174508584002"
},
{
  "content": "190835899000",
  "type": "varchar",
  "name": "logintime",
  "key": "174508584002"
},
{
  "content": "20151002",
  "type": "varchar",
  "name": "logindate",
  "key": "174508584002"
},
{
  "content": "None",
  "type": "varchar",
  "name": "logouttime",
  "key": "174508584002"
},
{
  "content": "client",
  "type": "varchar",
  "name": "user_type",
  "key": "174508584002"
},
{
  "content": "abc",
  "type": "varchar",
  "name": "username",
  "key": "174508584003"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "password",
  "key": "174508584003"
},
{
  "content": "12345",
  "type": "varchar",
  "name": "cpassword",
  "key": "174508584003"
},
{
  "content": "n",
  "type": "varchar",
  "name": "PostgreSQL",
  "key": "174508584003"
},
{
  "content": "n",
  "name": "IBMDB2",
  "type": "varchar",
  "key": "174508584003"
}]}

为了执行全文搜索,我使用 python 脚本中的语法创建了内容属性的索引:

c.DSP.ensureFulltextIndex("content");

其中,c 是数据库,DSP 是集合名称。现在,我正在尝试使用以下语法在上述数据集中执行搜索操作:

FOR doc IN FULLTEXT(DSP, "content", "username") RETURN doc

然后,发生错误:

[1571] in function 'FULLTEXT()': no suitable fulltext index found for fulltext query on 'DSP' (while executing)

请告诉我问题所在,并告诉我当我用 python 脚本尝试这个查询时的语法是什么。

谢谢...

【问题讨论】:

  • 你看管理界面了吗? Collections->yourCollection[ (I) ]->Indexes - 您应该在该列表中找到您的索引。
  • 谢谢 dothebert,现在我已经通过 arangodb 接口创建了索引...但是上面的查询返回了一个空列表 [] 作为结果....
  • 如果我的眼球 grep 工作正常,你没有任何包含字符串 usernamecontent 属性?见the fulltext index example

标签: python full-text-search arangodb aql


【解决方案1】:

使用the 10 minutes tutorialdriver documentation

我是这样工作的:

from pyArango.connection import *
c = Connection()
db = c.createDatabase(name = "testdb")
DSP= db.createCollection(name = "DSP")

DSP.ensureFulltextIndex(fields=["content"])

doc = DSP.createDocument({"content": "test bla"})
doc.save()

print db.AQLQuery('''FOR doc IN FULLTEXT(DSP, "content", "bla") RETURN doc''', 10)

导致:

[{u'_key': u'1241175138503', u'content': u'test bla', u'_rev': u'1241175138503', u'_id': u'DSP/1241175138503'}]

我已使用arangosh 重新验证 python 提示符中的步骤:

arangosh> db._useDatabase("testdb")
arangosh [testdb]> db.DSP.getIndexes()
[ 
  { 
    "id" : "DSP/0", 
    "type" : "primary", 
    "fields" : [ 
      "_key" 
    ], 
    "selectivityEstimate" : 1, 
    "unique" : true, 
    "sparse" : false 
  }, 
  { 
    "id" : "DSP/1241140928711", 
    "type" : "hash", 
    "fields" : [ 
      "content" 
    ], 
    "selectivityEstimate" : 1, 
    "unique" : false, 
    "sparse" : true 
  }, 
  { 
    "id" : "DSP/1241142960327", 
    "type" : "fulltext", 
    "fields" : [ 
      "content" 
    ], 
    "unique" : false, 
    "sparse" : true, 
    "minLength" : 2 
  } 
]
arangosh [testdb]> db.testdb.toArray()
[ 
  { 
    "content" : "test bla", 
    "_id" : "DSP/1241175138503", 
    "_rev" : "1241175138503", 
    "_key" : "1241175138503" 
  } 
]
db._query('FOR doc IN FULLTEXT(DSP, "content", "bla") RETURN doc')

【讨论】:

  • 您是否重新验证了 arangosh 窗口的输出?
  • 是的,我已经重新验证了我的输出
  • 我一直在 debian jessie 上使用 python 2.7.9 进行尝试; /usr/local/lib/python2.7/dist-packages/pyArango-1.0.3-py2.7.egg; ArangoDB 2.6.7 和 ArangoDB 2.8 beta,我从 python 脚本中得到了类似的东西:python /tmp/test.py [{u'_key': u'22654861461', u'content': u'test bla', u'_rev': u'22654861461', u'_id': u'DSP/22654861461'}]
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-08-27
  • 1970-01-01
  • 1970-01-01
  • 2020-05-06
相关资源
最近更新 更多