您的代码找到所需的链接
g = '12.5'
elem7 = driver.find_element_by_xpath(("//a[contains(., 'leaflet')and contains(.,'" + g + "')]"))
print(elem7.text)
link2 = elem7.get_attribute('href')
print(link2)
结果
ALMOGRAN 12.5 MG FILM-COATED TABLETS
leaflet MAH BRAND_PLPI 20774-1629.pdf
https://mhraproductsprod.blob.core.windows.net/docs-20200406/bf115fe972a98836c2af4072a77e2aaa04bcfa24
但它始终是您在页面上找到的第一个链接,因为您使用了“find_element”。如果您使用“find_elements”测试您的搜索条件,您会发现它实际上找到了所有 4 个链接。
g = '12.5'
elements = driver.find_elements_by_xpath(("//a[contains(., 'leaflet')and contains(.,'" + g + "')]"))
for element in elements:
print(element.text)
link2 = element.get_attribute('href')
print(link2)
结果
ALMOGRAN 12.5 MG FILM-COATED TABLETS
leaflet MAH BRAND_PLPI 20774-1629.pdf
https://mhraproductsprod.blob.core.windows.net/docs-20200406/bf115fe972a98836c2af4072a77e2aaa04bcfa24
ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS
leaflet MAH BRAND_PLPI 20636-1099.pdf
https://mhraproductsprod.blob.core.windows.net/docs-20200406/3ef1e68659b17b5cbc6bed7988dd7d32d8ff5258
ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS
leaflet MAH BRAND_PLPI 18799-1153.pdf
https://mhraproductsprod.blob.core.windows.net/docs-20200406/91cb21381c4aea438fe87a49cfc4abd3557f7614
ALMOGRAN 12.5 MG FILM-COATED TABLETS,ALMOTRIPTAN 12.5 MG FILM-COATED TABLETS
leaflet MAH BRAND_PLPI 20636-2661.pdf
https://mhraproductsprod.blob.core.windows.net/docs-20200406/f30be6ab9c3e85abd455da4262ffaf5932813014
因此,如果您的药物不是页面上的第一个药物,您的代码就会找到另一种药物。您可以使您的搜索条件更具体。另一种方法是使用 find_elements 并在 for 循环中添加类似 'if "extra criteria" in element.text: ...' 之类的内容。
这会搜索页面上的所有药物
# ------------------------------------------------------------------
def find_medicine_leaflet(page_meds, med_name):
print(f'\n------- searching for {med_name} -------')
nr_found = 0
for page_med in page_meds:
if medicine_name in page_med.text:
nr_found += 1
text = page_med.text.replace('\n', ' - ')
print(f"{nr_found} {text}\n leaflet url: {page_med.get_attribute('href')}")
print(f'------- {nr_found} found -------\n')
# ------------------------------------------------------------------
medicine_names = [
'ALMOGRAN 12.5 MG FILM-COATED TABLETS',
'ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS',
'ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS',
'ALMOGRAN 12.5 MG FILM-COATED TABLETS,ALMOTRIPTAN 12.5 MG FILM-COATED TABLETS',
]
g = '12.5'
page_medicines = driver.find_elements_by_xpath(("//a[contains(., 'leaflet')and contains(.,'" + g + "')]"))
for medicine_name in medicine_names:
find_medicine_leaflet(page_medicines, medicine_name)
结果
------- searching for ALMOGRAN 12.5 MG FILM-COATED TABLETS -------
1 ALMOGRAN 12.5 MG FILM-COATED TABLETS - leaflet MAH BRAND_PLPI 20774-1629.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/bf115fe972a98836c2af4072a77e2aaa04bcfa24
2 ALMOGRAN 12.5 MG FILM-COATED TABLETS,ALMOTRIPTAN 12.5 MG FILM-COATED TABLETS - leaflet MAH BRAND_PLPI 20636-2661.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/f30be6ab9c3e85abd455da4262ffaf5932813014
------- 2 found -------
------- searching for ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS -------
1 ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS - leaflet MAH BRAND_PLPI 20636-1099.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/3ef1e68659b17b5cbc6bed7988dd7d32d8ff5258
2 ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS - leaflet MAH BRAND_PLPI 18799-1153.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/91cb21381c4aea438fe87a49cfc4abd3557f7614
------- 2 found -------
------- searching for ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS -------
1 ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS - leaflet MAH BRAND_PLPI 20636-1099.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/3ef1e68659b17b5cbc6bed7988dd7d32d8ff5258
2 ALMOTRIPTAN 12.5MG TABLETS,ALMOGRAN 12.5MG TABLETS - leaflet MAH BRAND_PLPI 18799-1153.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/91cb21381c4aea438fe87a49cfc4abd3557f7614
------- 2 found -------
------- searching for ALMOGRAN 12.5 MG FILM-COATED TABLETS,ALMOTRIPTAN 12.5 MG FILM-COATED TABLETS -------
1 ALMOGRAN 12.5 MG FILM-COATED TABLETS,ALMOTRIPTAN 12.5 MG FILM-COATED TABLETS - leaflet MAH BRAND_PLPI 20636-2661.pdf
leaflet url: https://mhraproductsprod.blob.core.windows.net/docs-20200406/f30be6ab9c3e85abd455da4262ffaf5932813014
------- 1 found -------
据我所知,它没有遗漏任何内容。由于药物名称重叠,它可以找到 2 个传单,但一切都在找到。