【发布时间】:2020-04-28 09:04:57
【问题描述】:
我正在尝试使用下面的脚本从webpage 获取一些 json 响应。以下是在该站点中填充结果的步骤。点击位于webpage 底部的同意 按钮,然后点击EDIT SEARCH 按钮并最后在 SHOW RESULTS 按钮上,无需更改任何内容。
我试过这样:
import requests
from bs4 import BeautifulSoup
url = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
post_url = 'http://finra-markets.morningstar.com/bondSearch.jsp'
payload = {
'postData': {'Keywords':[]},
'ticker': '',
'startDate': '',
'endDate': '',
'showResultsAs': 'B',
'debtOrAssetClass': '1,2',
'spdsType': ''
}
payload_second = {
'count': '20',
'searchtype': 'B',
'query': {"Keywords":[{"Name":"debtOrAssetClass","Value":"3,6"},{"Name":"showResultsAs","Value":"B"}]},
'sortfield': 'issuerName',
'sorttype': '1',
'start': '0',
'curPage': '1'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/UserAgreement.jsp'
r = s.post(url,json=payload)
s.headers['Access-Control-Allow-Headers'] = r.headers['Access-Control-Allow-Headers']
s.headers['cf-request-id'] = r.headers['cf-request-id']
s.headers['CF-RAY'] = r.headers['CF-RAY']
s.headers['X-Requested-With'] = 'XMLHttpRequest'
s.headers['Origin'] = 'http://finra-markets.morningstar.com'
s.headers['Referer'] = 'http://finra-markets.morningstar.com/BondCenter/Results.jsp'
r = s.post(post_url,json=payload_second)
print(r.content)
这是我运行上面的脚本时得到的结果:
b'\n\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\n\n\n{}'
如何使脚本填充该站点的预期结果?
附:我不希望使用 selenium 来完成这项工作。
【问题讨论】:
标签: python python-3.x web-scraping http-post