在 Elasticsearch 中,当数据在没有提供自定义 ID 的情况下被索引时,Elasticsearch 将为您索引的每个文档创建一个新 ID。
因此,由于您没有提供 ID,Elasticsearch 会自动生成它。
但您还想检查Name 是否已经存在。有两种方法:
- 索引数据而不为每个文档传递
_id。在此之后,您必须使用 Name 字段搜索以查看文档是否存在。
- 为每个文档使用您自己的
_id 索引数据。然后用_id搜索。
我将演示创建我们自己的 ID 的第二种方法。由于您正在搜索Name 字段,我将使用 MD5 对其进行散列以生成_id。 (任何哈希函数都可以工作。)
第一个索引数据:
import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
records = [
{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}
]
index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)
for record in records:
#es.indices.update(index="my-index_1", body=record)
es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())
输出:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}}]
下一步:索引新数据
records = [
{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
for record in records:
try:
es.get(index=index_name, id=hashlib.md5(record['Name'].encode()).hexdigest())
except NotFoundError:
print("Record Not found")
es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())
输出:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': 'e2e0f463145568471097ff027b18b40d',
'_score': 1.0,
'_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '23bb4f1a3a41efe7f4cab8a80d766708',
'_score': 1.0,
'_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]
如您所见,Dr. Bernard M. Aaron 记录没有被索引,因为它已经存在