【发布时间】:2020-11-05 16:21:34
【问题描述】:
我想从网站上抓取一些数据,所以我编写代码来创建一个包含所有记录的列表。然后,我想从所有记录中提取一些元素来创建一个数据框。
但是,缺少数据框的一些信息。在所有数据列表中,它有 2012 年到 2019 年的信息,但数据框只有 2018 年和 2019 年的信息。我尝试了不同的方法来解决问题。最后,我发现如果我不使用Zip功能,不会出现问题,请问我知道为什么,如果我不使用Zip功能,我可以使用任何解决方案吗?
import requests
import pandas as pd
records = []
tickers = ['AAL']
url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A§ion=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'
# scrape all data and append to a list - all_records
for s in tickers:
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_records = []
for d in requests.get(url_metrics.format(s,s)).json():
d['id'] = indicators[d['id']]['name']
all_records.append(d)
gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' in d['id'])
operating_income_growth = next(d for d in all_records if 'Operating Income Growth' in d['id'])
net_income_growth = next(d for d in all_records if 'Net Income Growth' in d['id'])
diluted_eps_growth = next(d for d in all_records if 'EPS Growth (diluted)' in d['id'])
operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' in d['id'])
# extract values from all_records and create the dataframe
for (k1, v1), (_, v2), (_, v3), (_, v4), (_, v5) in zip(gross_profit_growth.items(), operating_income_growth.items(), net_income_growth.items(), diluted_eps_growth.items(), operating_cash_flow_growth.items()):
if k1 in ('id'):
continue
records.append({
'symbol' : s,
'date' : k1,
'gross_profit_growth%': v1,
'operating_income_growth%': v2,
'net_income_growth%': v3,
'diluted_eps_growth%' : v4,
'operating_cash_flow_growth%' : v5
})
df = pd.DataFrame(records)
df.head(50)
结果不正确。它只有 2018 年和 2019 年的数据。它应该有 2012 年到 2019 年的数据。
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 2019-12-31 0.0405 -0.1539 -0.0112 0.2508 0.0798
1 AAL 2018-12-31 -0.0876 -0.2463 0.0 -0.2231 -0.2553
我的异常结果:
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 31/12/2019 0.0405 0.154 0.1941 0.2508 0.0798
1 AAL 31/12/2018 -0.0876 -0.3723 0.1014 -0.2231 -0.2553
2 AAL 31/12/2017 -0.0165 -0.1638 -0.5039 -0.1892 -0.2728
3 AAL 31/12/2016 -0.079 -0.1844 -0.6604 -0.5655 0.044
4 AAL 31/12/2015 0.1983 0.4601 1.6405 1.8168 1.0289
5 AAL 31/12/2014 0.7305 2.0372 2.5714 1.2308 3.563
6 AAL 31/12/2013 0.3575 8.4527 0.0224 nan -0.4747
7 AAL 31/12/2012 0.1688 1.1427 0.052 nan 0.7295
8 AAL 31/12/2011 0.0588 -4.3669 -3.2017 nan -0.4013
9 AAL 31/12/2010 0.3413 1.3068 0.6792 nan 0.3344
【问题讨论】:
-
尝试使用来自
itertools的zip_longest? -
嗨。我尝试使用 zip_longest 而不是 zip,它说名称 'zip_longest' 未定义
-
我是 Python 新手...我尝试导入 itertools 并使用 itertools.zip_longest 但它返回 TypeError: cannot unpack non-iterable NoneType object 我可以知道为什么吗?提前谢谢你
标签: python pandas dataframe web-scraping