【发布时间】:2021-08-20 10:26:30
【问题描述】:
我正在尝试从此链接https://www.1mg.com/drugs-all-medicines 抓取数据(药物名称)此链接有 841 个页面,每页 30 个数据。但我的代码不知何故每页只选择 20 个数据。我不知道是什么原因造成的以及如何解决它。 这是我正在使用的代码。
import requests
import json
import io
from bs4 import BeautifulSoup
medicine_name = []
f = io.open('data.txt', 'a', encoding='utf-8')
for i in range(1,842):
url = "https://www.1mg.com/drugs-all-medicines?page=" + str(i)
r = requests.get(url)
HTMLcontent = r.content
soup = BeautifulSoup(HTMLcontent, 'html.parser')
json_data = json.loads(
soup.select_one("script").string
)
for data in json_data['itemListElement']:
medicine_name.append(data['name'])
f.write('\n'+data['name'])
print("parsed --> " + str(len(medicine_name)) + " from page No. --> " + str(i) + "")
medicine_name = []
f.close()
我得到这个输出:
PS E:\Practice\Python\1mg Scraper> & D:/Python396/python.exe "e:/Practice/Python/1mg Scraper/tool.py"
parsed --> 20 from page No. --> 1
parsed --> 20 from page No. --> 2
parsed --> 20 from page No. --> 3
parsed --> 20 from page No. --> 4
parsed --> 20 from page No. --> 5
parsed --> 20 from page No. --> 6
parsed --> 20 from page No. --> 7
parsed --> 20 from page No. --> 8
parsed --> 20 from page No. --> 9
...................................
<-----------Upto------------------>
...................................
parsed --> 20 from page No. --> 837
parsed --> 20 from page No. --> 838
parsed --> 20 from page No. --> 839
parsed --> 20 from page No. --> 840
parsed --> 20 from page No. --> 841
我期待类似的输出
parsed --> 30 from page No. --> xxx
【问题讨论】:
标签: python-3.x web-scraping beautifulsoup