【发布时间】:2020-02-27 16:18:20
【问题描述】:
我正在尝试使用源自 CSV 文件的搜索词来搜索 PubMed。我将搜索词组合成 Biopython 的 Entrez 模块可以理解的形式,如下所示:
term1 = ['"' + name + " AND " + disease + '"' for name, disease in zip(names, diseases)]
其中“名称”和“疾病”是指我使用 eSearch 组合到搜索中的参数。 随后,执行搜索,这是我写的代码:
from Bio import Entrez
Entrez.email = "theofficialvelocifaptor@gmail.com"
for entry in range(0, len(term1)):
handle = Entrez.esearch(db="pubmed", term=term1[entry], retmax="10")
record = Entrez.read(handle)
record["IdList"]
print("The first 10 are\n{}".format(record["IdList"]))
现在,我对代码的期望是,在存储在 term1 中的整个列表上迭代函数。但是,这是我得到的输出:
['Botanical name', 'Asystasia salicifalia', 'Asystasia salicifalia', 'Asystasia salicifalia', 'Barleria strigosa', 'Justicia procumbens', 'Justicia procumbens', 'Strobilanthes auriculata', 'Thunbergia laurifolia', 'Thunbergia similis']
['Disease', 'Puerperal illness', 'Puerperium', 'Puerperal disorder', 'Tonic', 'Lumbago', 'Itching', 'Malnutrition', 'Detoxificant', 'Tonic']
The first 10 are
['31849133', '31751652', '31359527', '31178344', '31057654', '30725751', '28476677', '27798405', '27174082', '26923540']
The first 10 are
[]
The first 10 are
[]
The first 10 are
[]
The first 10 are
[]
The first 10 are
[]
The first 10 are
The first 10 are
[]
The first 10 are
[]
The first 10 are
[]
当然,我缺少一些东西,因为迭代似乎过早地缩短了。在撰写本文时,我已经花了整整 5 个小时,我觉得很傻。我还应该提到我是 Python 新手,所以如果我犯了任何明显的错误,我看不到它。
【问题讨论】:
标签: python web-scraping bioinformatics biopython