【问题标题】:Error of using Zip function is returning missing result使用 Zip 函数的错误是返回缺失的结果
【发布时间】:2020-11-05 16:21:34
【问题描述】:

我想从网站上抓取一些数据,所以我编写代码来创建一个包含所有记录的列表。然后,我想从所有记录中提取一些元素来创建一个数据框。

但是,缺少数据框的一些信息。在所有数据列表中,它有 2012 年到 2019 年的信息,但数据框只有 2018 年和 2019 年的信息。我尝试了不同的方法来解决问题。最后,我发现如果我不使用Zip功能,不会出现问题,请问我知道为什么,如果我不使用Zip功能,我可以使用任何解决方案吗?

import requests
import pandas as pd


records = []
tickers = ['AAL']

url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A&section=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'

# scrape all data and append to a list - all_records

for s in tickers:
    
    indicators = {i['id']: i for i in requests.get(indicators_url).json()}
    all_records = []
    
    for d in requests.get(url_metrics.format(s,s)).json():
        d['id'] = indicators[d['id']]['name']
        all_records.append(d)
    
    gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' in d['id'])
    operating_income_growth = next(d for d in all_records if 'Operating Income Growth' in d['id'])
    net_income_growth = next(d for d in all_records if 'Net Income Growth' in d['id'])
    diluted_eps_growth = next(d for d in all_records if 'EPS Growth (diluted)' in d['id'])
    operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' in d['id']) 
        
# extract values from all_records and create the dataframe

    for (k1, v1), (_, v2), (_, v3), (_, v4), (_, v5) in zip(gross_profit_growth.items(), operating_income_growth.items(), net_income_growth.items(), diluted_eps_growth.items(), operating_cash_flow_growth.items()):
        if k1 in ('id'):
            continue

        records.append({
            'symbol' : s,
            'date' : k1,
            'gross_profit_growth%': v1,
            'operating_income_growth%': v2,
            'net_income_growth%': v3,
            'diluted_eps_growth%' : v4,
            'operating_cash_flow_growth%' : v5
        })

    
df = pd.DataFrame(records)
df.head(50)

结果不正确。它只有 2018 年和 2019 年的数据。它应该有 2012 年到 2019 年的数据。

        symbol  date    gross_profit_growth%    operating_income_growth%    net_income_growth%  diluted_eps_growth% operating_cash_flow_growth%
0   AAL 2019-12-31  0.0405  -0.1539 -0.0112 0.2508  0.0798
1   AAL 2018-12-31  -0.0876 -0.2463 0.0 -0.2231 -0.2553

我的异常结果:

    symbol  date    gross_profit_growth%    operating_income_growth%    net_income_growth%  diluted_eps_growth% operating_cash_flow_growth%
0   AAL 31/12/2019  0.0405  0.154   0.1941  0.2508  0.0798
1   AAL 31/12/2018  -0.0876 -0.3723 0.1014  -0.2231 -0.2553
2   AAL 31/12/2017  -0.0165 -0.1638 -0.5039 -0.1892 -0.2728
3   AAL 31/12/2016  -0.079  -0.1844 -0.6604 -0.5655 0.044
4   AAL 31/12/2015  0.1983  0.4601  1.6405  1.8168  1.0289
5   AAL 31/12/2014  0.7305  2.0372  2.5714  1.2308  3.563
6   AAL 31/12/2013  0.3575  8.4527  0.0224  nan -0.4747
7   AAL 31/12/2012  0.1688  1.1427  0.052   nan 0.7295
8   AAL 31/12/2011  0.0588  -4.3669 -3.2017 nan -0.4013
9   AAL 31/12/2010  0.3413  1.3068  0.6792  nan 0.3344

【问题讨论】:

  • 尝试使用来自itertoolszip_longest
  • 嗨。我尝试使用 zip_longest 而不是 zip,它说名称 'zip_longest' 未定义
  • 我是 Python 新手...我尝试导入 itertools 并使用 itertools.zip_longest 但它返回 TypeError: cannot unpack non-iterable NoneType object 我可以知道为什么吗?提前谢谢你

标签: python pandas dataframe web-scraping


【解决方案1】:
import requests
import pandas as pd

records = []
tickers = ['A', 'AAL', 'AAPL']

url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A&section=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'

for s in tickers:
    print('Getting data for ticker: {}'.format(s))
    indicators = {i['id']: i for i in requests.get(indicators_url).json()}
    all_records = []
    
    for d in requests.get(url_metrics.format(s,s)).json():
        d['id'] = indicators[d['id']]['name']
        all_records.append(d)
    
    gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' == d['id'])
    operating_income_growth = next(d for d in all_records if 'Operating Income Growth' == d['id'])
    net_income_growth = next(d for d in all_records if 'Net Income Growth' == d['id'])
    eps_growth_diluted = next(d for d in all_records if 'EPS Growth (diluted)' == d['id'])
    operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' == d['id'])

    del gross_profit_growth['id']
    del operating_income_growth['id']
    del net_income_growth['id']
    del eps_growth_diluted['id']
    del operating_cash_flow_growth['id']

    d1 = pd.DataFrame({'date': gross_profit_growth.keys(), 'gross_profit_growth%': gross_profit_growth.values()}).set_index('date')
    d2 = pd.DataFrame({'date': operating_income_growth.keys(), 'operating_income_growth%': operating_income_growth.values()}).set_index('date')
    d3 = pd.DataFrame({'date': net_income_growth.keys(), 'net_income_growth%': net_income_growth.values()}).set_index('date')
    d4 = pd.DataFrame({'date': eps_growth_diluted.keys(), 'diluted_eps_growth%': eps_growth_diluted.values()}).set_index('date')
    d5 = pd.DataFrame({'date': operating_cash_flow_growth.keys(), 'operating_cash_flow_growth%': operating_cash_flow_growth.values()}).set_index('date')

    d = pd.concat([d1, d2, d3, d4, d5], axis=1)
    d['symbol'] = s
    records.append(d)

df = pd.concat(records)
print(df)

打印:

           gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth% symbol
2019-10-31  0.0466               0.0409                   2.3892             2.4742              -0.0607                     A    
2018-10-31  0.1171               0.1202                   -0.538             -0.5381             0.2227                      A    
2017-10-31  0.0919               0.3122                   0.4805             0.5                 0.1211                      A    
2016-10-31  0.0764               0.1782                   0.1521             0.1765              0.5488                      A    
2015-10-31  0.0329               0.2458                   -0.2696            -0.1905             -0.2996                     A    
2014-10-31  0.0362               0.0855                   -0.252             -0.3                -0.3655                     A    
2013-10-31  -0.4709              -0.655                   -0.3634            -0.3578             -0.0619                     A    
2012-10-31  0.0213               0.0448                   0.1393             0.1474              -0.0254                     A    
2011-10-31  0.2044               0.8922                   0.4795             0.6102              0.7549                      A    
2019-12-31  0.0405               0.154                    0.1941             0.2508              0.0798                      AAL  
2018-12-31  -0.0876              -0.3723                  0.1014             -0.2231             -0.2553                     AAL  
2017-12-31  -0.0165              -0.1638                  -0.5039            -0.1892             -0.2728                     AAL  
2016-12-31  -0.079               -0.1844                  -0.6604            -0.5655             0.044                       AAL  
2015-12-31  0.1983               0.4601                   1.6405             1.8168              1.0289                      AAL  
2014-12-31  0.7305               2.0372                   2.5714             1.2308              3.563                       AAL  
2013-12-31  0.3575               8.4527                   0.0224             NaN                 -0.4747                     AAL  
2012-12-31  0.1688               1.1427                   0.052              NaN                 0.7295                      AAL  
2011-12-31  0.0588               -4.3669                  -3.2017            NaN                 -0.4013                     AAL  
2010-12-31  0.3413               1.3068                   0.6792             NaN                 0.3344                      AAL  
2020-09-30  0.0667               0.0369                   0.039              NaN                 0.1626                      AAPL 
2019-09-30  -0.0338              -0.0983                  -0.0718            -0.0017             -0.1039                     AAPL 
2018-09-30  0.1548               0.1557                   0.2312             0.2932              0.2057                      AAPL 
2017-09-30  0.0466               0.022                    0.0583             0.1083              -0.0303                     AAPL 
2016-09-30  -0.1                 -0.1573                  -0.1443            -0.0987             -0.185                      AAPL 
2015-09-30  0.3273               0.3567                   0.3514             0.4295              0.3609                      AAPL 
2014-09-30  0.0969               0.0715                   0.0668             0.1358              0.1127                      AAPL 
2013-09-30  -0.0635              -0.113                   -0.1125            -0.0996             0.0553                      AAPL 
2012-09-30  0.567                0.6348                   0.6099             0.595               0.3551                      AAPL 
2011-09-30  0.706                0.8379                   0.8499             0.827               1.0182                      AAPL 

【讨论】:

  • 你好,Anjrej。谢谢你再次帮助我。但是,如果我不想更改搜索元素。你有没有机会给我一些可以解决问题的提示?
  • @janicewww 你能编辑你的问题并把预期的输出放在那里吗?
  • 感谢您的帮助。但是,它返回 TypeError。我正在研究您的代码并尝试解决它。再次感谢您。
猜你喜欢
  • 1970-01-01
  • 2018-05-04
  • 2013-01-11
  • 2011-08-13
  • 2020-07-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-08-22
相关资源
最近更新 更多