【发布时间】:2022-01-25 12:52:06
【问题描述】:
我正在网上报废一段代码以获取 NSE 公司公告。但问题是我在这段代码中使用的 url 一次只能包含 20 个项目,因此发生的情况是他们每天有很多 100 个公告被错过,因为它一次只包含 20 个
我希望解决这个问题,以便我获得所有之前的公告以及之前的公告。这是我的代码-
import requests
import pandas as pd
from datetime import date
from datetime import datetime
today = date.today()
__request_headers = {
'Host':'www.nseindia.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language':'en-US,en;q=0.5',
'Accept-Encoding':'gzip, deflate, br',
'DNT':'1',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests':'1',
'Pragma':'no-cache',
'Cache-Control':'no-cache',
}
try:
nse_url = 'https://www.nseindia.com/'
url = 'https://www.nseindia.com/api/corporate-announcements?index=equities'
resp = requests.get(url=nse_url, headers=__request_headers)
if resp.ok:
req_cookies = dict(nsit=resp.cookies['nsit'], nseappid=resp.cookies['nseappid'], ak_bmsc=resp.cookies['ak_bmsc'])
tresp = requests.get(url=url, headers=__request_headers, cookies=req_cookies)
result = tresp.json()
result = pd.DataFrame(result)
result.drop(['difference', 'dt','exchdisstime','csvName','old_new','orgid','seq_id','sm_isin','bflag','symbol','sort_date'], axis = 1, inplace = True)
result.rename(columns = {'an_dt':'DateandTime', 'attchmntFile':'Source','attchmntText':'Topic','desc':'Type','smIndustry':'Sector','sm_name':'Company Name'}, inplace = True)
result[['Date','Time']] = result.DateandTime.str.split(expand=True)
result.to_csv( ( str(today.day) +'-'+str(today.month) +'-'+'CA.csv'), index=True)
print(result)
res_data = result["NIFTY"]["data"] if "NIFTY" in result and "data" in result["NIFTY"] else []
if res_data != None and len(res_data) > 0:
__top_list = res_data
print(__top_list)
except OSError as err:
logger.error('Unable to fetch data')
【问题讨论】:
标签: python pandas web-scraping