【问题标题】:How to segregate data into two dataframe如何将数据分成两个数据框
【发布时间】:2020-04-18 18:31:45
【问题描述】:

我有一个带有 json 数据的链接,我想从中将数据分成两个数据框。

我的代码如下:

import pandas as pd
import requests

pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 10000)

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
data = requests.get(url, headers=headers).json()

for x in range(len(data['records']['data'])):
    print(data['records']['data'][x])

输出包含如下行,其中“CE”和“PE”数据可用:

{'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020PE13900.00', 'openInterest': 1597, 'changeinOpenInterest': 1589, 'pchangeinOpenInterest': 19862.5, 'totalTradedVolume': 9101, 'impliedVolatility': 110.76, 'lastPrice': 1.65, 'change': -1.5, 'pChange': -47.61904761904762, 'totalBuyQuantity': 49800, 'totalSellQuantity': 17560, 'bidQty': 40, 'bidprice': 1.75, 'askQty': 460, 'askPrice': 2.35, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '23-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY23-04-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 2, 'impliedVolatility': 162.07, 'lastPrice': 6901.1, 'change': 3502.4000000000005, 'pChange': 103.05116662253218, 'totalBuyQuantity': 2620, 'totalSellQuantity': 2620, 'bidQty': 200, 'bidprice': 6629.85, 'askQty': 200, 'askPrice': 7208.75, 'underlyingValue': 20681.45}}

{'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020PE13900.00', 'openInterest': 989, 'changeinOpenInterest': 12, 'pchangeinOpenInterest': 1.2282497441146367, 'totalTradedVolume': 134, 'impliedVolatility': 98.26, 'lastPrice': 16.3, 'change': -4.899999999999999, 'pChange': -23.113207547169807, 'totalBuyQuantity': 32900, 'totalSellQuantity': 4100, 'bidQty': 20, 'bidprice': 16.3, 'askQty': 20, 'askPrice': 17, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '30-Apr-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY30-04-2020CE13900.00', 'openInterest': 1, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 5000, 'change': -5000, 'pChange': -100, 'totalBuyQuantity': 2640, 'totalSellQuantity': 2840, 'bidQty': 20, 'bidprice': 6242.05, 'askQty': 20, 'askPrice': 7401.65, 'underlyingValue': 20681.45}}

{'strikePrice': 13900, 'expiryDate': '14-May-2020', 'PE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020PE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 100, 'totalSellQuantity': 0, 'bidQty': 100, 'bidprice': 0.3, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 20681.45}, 'CE': {'strikePrice': 13900, 'expiryDate': '14-May-2020', 'underlying': 'BANKNIFTY', 'identifier': 'OPTIDXBANKNIFTY14-05-2020CE13900.00', 'openInterest': 0, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 0, 'change': 0, 'pChange': -100, 'totalBuyQuantity': 2420, 'totalSellQuantity': 2420, 'bidQty': 2420, 'bidprice': 6223.45, 'askQty': 2420, 'askPrice': 7565.05, 'underlyingValue': 20681.45}}

我想将 CE 和 PE 值存储在两个单独的数据框中,列名为

['strikePrice','expiryDate', 'underlying', 'identifier', 'openInterest', 'changeinOpenInterest', 'pchangeinOpenInterest', 'totalTradedVolume','impliedVolatility', 'lastPrice', 'change', 'pChange', 'totalBuyQuantity', 'totalSellQuantity', 'bidQty', 'bidprice', 'askQty', 'askPrice', 'underlyingValue']

【问题讨论】:

    标签: python json python-3.x pandas dataframe


    【解决方案1】:

    与来自itertoolscollections 的一些朋友一起使用list comprehension 应该有助于将数据放入单独的数据帧中:

    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
    }
    url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
    data = requests.get(url, headers=headers).json()
    
    #list comprehension here
    #note that it is nested, since the data we are truly keen on
    #is embedded in a list
    #this will give us a tuple of the key, and the dataframe of the values
    #as the values are dictionaries
    res = [[(key, pd.DataFrame.from_dict(value, orient='index').T)
            for key, value in entry.items()
            if key in ['PE', 'CE']]
           for entry in data['records']['data']]
    
    from collections import defaultdict
    from itertools import chain
    d = defaultdict(list)
    
    #group the values into a pair
    #they will be a combined list of dataframes
    #belonging to either PE or CE    
    for k, v in chain.from_iterable(res):
        d[k].append(v)
    
    #now we can merge the values
    #and keep our result as a dictionary
    #this allows us to access PE or CE via keys   
    result = {key: pd.concat(values) for key, values in d.items()}
    
    #now, we can access either PE or CE
    #dataframe is quite long, so this is a small part of it
    result['PE'].iloc[:3,:5]
    
    
     strikePrice    expiryDate  underlying  identifier                openInterest
    0   13900      23-Apr-2020  BANKNIFTY   OPTIDXBANKNIFTY23-04-2020PE13900.00 1597
    0   13900      30-Apr-2020  BANKNIFTY   OPTIDXBANKNIFTY30-04-2020PE13900.00 989
    0   13900      14-May-2020  BANKNIFTY   OPTIDXBANKNIFTY14-05-2020PE13900.00 0
    

    【讨论】:

      【解决方案2】:

      假设json_obj 是您在上面粘贴的输出, 转换可以简单地使用 -

      PE

      pe = pd.DataFrame.from_dict(json_obj['PE'], orient='index').reset_index()
      pe.columns = ['param', 'value'] # Optional renaming of columns
      print(pe)
      
      """
                          param                                value
      0             strikePrice                                13900
      1              expiryDate                          23-Apr-2020
      2              underlying                            BANKNIFTY
      3              identifier  OPTIDXBANKNIFTY23-04-2020PE13900.00
      4            openInterest                                 1597
      5    changeinOpenInterest                                 1589
      6   pchangeinOpenInterest                              19862.5
      7       totalTradedVolume                                 9101
      8       impliedVolatility                               110.76
      9               lastPrice                                 1.65
      10                 change                                 -1.5
      11                pChange                              -47.619
      12       totalBuyQuantity                                49800
      13      totalSellQuantity                                17560
      14                 bidQty                                   40
      15               bidprice                                 1.75
      16                 askQty                                  460
      17               askPrice                                 2.35
      18        underlyingValue                              20681.5
      """
      

      同样,重复上述步骤,将CE 转换为数据框。

      【讨论】:

      • 先生,我已经编辑了我的问题以使我的问题更加清晰
      • 先生,我希望这里的所有参数都作为列名和各个列下的值
      【解决方案3】:

      我已经解决了如下问题。有没有其他有效的方法来获得预期的结果?

      import pandas as pd
      import requests
      
      pd.set_option('display.max_rows', 50000)
      pd.set_option('display.max_columns', 100)
      pd.set_option('display.width', 10000)
      
      headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
      url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
      records = requests.get(url, headers=headers).json()
      
      df_PE = pd.DataFrame()
      df_CE = pd.DataFrame()
      # print(len(records['records']['data']))
      for x in range(len(records['records']['data'])):
          if records['records']['data'][x]['expiryDate'] == "23-Apr-2020":
              try:
                  PE = pd.DataFrame([records['records']['data'][x]['PE']])
                  df_PE = df_PE.append(PE)
              except Exception as err:
                  pass
      
              try:
                  CE = pd.DataFrame([records['records']['data'][x]['CE']])
                  df_CE = df_CE.append(CE)
              except Exception as err:
                  pass
      
      print(df_PE)
      print(df_CE)
      

      【讨论】:

        【解决方案4】:

        这是一个示例,显示了完成您正在寻找的东西的一种方法:

        # initialize dataframes to hold data
        pe_df = pd.DataFrame()
        ce_df = pd.DataFrame()
        
        # loop through records
        for x in range(len(data['records']['data'])):
        
            # extract the record
            record = data['records']['data'][x]
            # print the record
            print(record)
        
            # extract the 'PE' and 'CE' data
            pe_row = record['PE']
            ce_row = record['CE']
        
            # append the row to the appropriate dataframe
            pe_df = pe_df.append(pd.Series(pe_row), ignore_index=True)
            ce_df = ce_df.append(pd.Series(ce_row), ignore_index=True)
        

        我假设记录数据采用您在问题中显示的形式。

        【讨论】:

        • 先生,我已经编辑了我的问题以使我的问题更加清晰
        猜你喜欢
        • 2020-10-24
        • 1970-01-01
        • 2019-02-09
        • 2020-11-14
        • 2021-06-03
        • 1970-01-01
        • 1970-01-01
        • 2017-10-24
        • 1970-01-01
        相关资源
        最近更新 更多