【问题标题】:How to create a list of list from a dataframe如何从数据框创建列表列表
【发布时间】:2021-04-14 10:11:27
【问题描述】:

我有一个数据框 df,我想将数据框转换为列表列表

    left_side                                  right_side                             similarity
0114600043776001 loan payment receipt         0421209017073500 loan payment receipt     0.689008
0114600043776001 loan payment receipt         0421209017073500 loan payment receipt     0.689008
vat onverve*issuance fee*506108               vat onverve*issuance fee*5061087       0.743522
vat onverve*issuance fee*506108               verve*issuance fee*506108*********1112    0.684342
verve*issuance fee*506108                     verve*issuance fee*506108*********8296    0.717817
verve*issuance fee*506108                     vat onverve*issuance fee*506108**         0.684342

maint fee recovery jun 2018                   vat maint fee recovery jun 2018          0.896607
maint fee recovery jun 2018                  vat maint fee recovery jun 2018         0.896607
maint fee recovery jun 2018                  vat maint fee recovery jun 2018         0.896607

预期输出应如下所示:

[[0114600043776001 loan payment receipt, 0421209017073500 loan payment receipt,
  0421209017073500 loan payment receipt],
[vat onverve*issuance fee*506108, vat onverve*issuance fee*5061087, 
  verve*issuance fee*506108*********1112], 
[verve*issuance fee*506108*********8296, verve*issuance fee*506108                    
 vat onverve*issuance fee*506108** ],...]

我已尝试将上述 df 按left_side column 分组并将生成的 df 转换为列表,但输出不是我所期望的。请在这方面需要你的帮助

grouup_df = df.groupby(['left_side']).right_side.sum().to_frame()

grouup_df.values.tolist()

输出如下所示:

['0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt']
['vat maint fee recovery jun 2018vat maint fee recovery jun 2018vat maint fee recovery jun 2018maint fee recovery jul 2018maint fee recovery oct 2018maint fee recovery jul 2018maint fee recovery jul 2018']

【问题讨论】:

标签: python pandas


【解决方案1】:

你可以使用df.groupby:

>>> [[k, *g] for k, g in df.groupby('left_side', sort=False)['right_side']]

[['0114600043776001 loan payment receipt',
  '0421209017073500 loan payment receipt',
  '0421209017073500 loan payment receipt'],
 ['vat onverve*issuance fee*506108',
  'vat onverve*issuance fee*5061087',
  'verve*issuance fee*506108*********1112'],
 ['verve*issuance fee*506108',
  'verve*issuance fee*506108*********8296',
  'vat onverve*issuance fee*506108**'],
 ['maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018']]

【讨论】:

    【解决方案2】:
    import pandas as pd
    
    dfold = {'left_side': ['string','string','string','string'],
                'right_side': ['string','string','string','string']
                }
    
    df = pd.DataFrame(dfold, columns= ['left_side', 'right_side'])
    print(df)
    df_list = df.values.tolist()
    print(df_list)
    

    【讨论】:

      【解决方案3】:

      我相信您正在寻找数据报上的to_records() 方法。 试试df.to_records(),你可以找到它的文档here

      【讨论】:

      • 我认为 to_records() 方法从数据帧中创建了一个元组列表,这些元组与预期的输出不同。
      猜你喜欢
      • 2022-11-28
      • 2020-09-19
      • 1970-01-01
      • 2016-02-29
      • 2020-11-15
      • 2016-10-21
      • 1970-01-01
      相关资源
      最近更新 更多