【问题标题】:Convert list of strings to Pandas Dataframe将字符串列表转换为 Pandas Dataframe
【发布时间】:2020-11-20 13:44:28
【问题描述】:

我有一个如下所示的列表:

Sum = ['* Report_type         Leach\n',
       '* Result_text         Concentration \n',
       '* Run_Id              179\n',
       '* Location            MUENSTER\n',
       '* Meteo_station       KREM-M\n',
       '* Soil_type           KREM\n',
       '* Crop_calendar       SUGARBEET\n',
       '* Substance           ABC\n',
       '* Application_scheme  DRY\n',
       '* Deposition_scheme   No\n',
       '* Results             0.0001\n'
       ]

我想将它转换成这样的熊猫数据框:

df =    
        col1                col2
0       Report_type         Leach          
1       Result_text         Concentration
2       Run_Id              179                
3       Location            MUENSTER      
4       Meteo_station       KREM-M             
5       Soil_type           KREM       
6       Crop_calendar       SUGARBEET     
7       Substance           ABC                
8       Application_scheme  DRY                
9       Deposition_scheme   No                 
10      Results             0.0001

列表中第一列字符的长度是固定的。

【问题讨论】:

标签: python pandas string list dataframe


【解决方案1】:

IIUC:

df = pd.DataFrame([i.split(maxsplit=2)[1:] for i in Sum],columns=['col1','col2'])

输出:

                  col1           col2
0          Report_type          Leach
1          Result_text  Concentration
2               Run_Id            179
3             Location       MUENSTER
4        Meteo_station         KREM-M
5            Soil_type           KREM
6        Crop_calendar      SUGARBEET
7            Substance            ABC
8   Application_scheme            DRY
9    Deposition_scheme             No
10             Results         0.0001

【讨论】:

  • 很好的解决方案,但是如果第二列中有多个单词怎么办?就像不是“浓度”而是“水的浓度”。然后它将在每个空白处拆分并再创建两列。
  • 嘿@Bob,编辑了我的答案,因此它最多拆分两次,因此如果它有多个单词,则将整个短语保留在第二列中
【解决方案2】:

使用str 方法

例如:

data = ['* Report_type         Leach\n',
       '* Result_text         Concentration \n',
       '* Run_Id              179\n',
       '* Location            MUENSTER\n',
       '* Meteo_station       KREM-M\n',
       '* Soil_type           KREM\n',
       '* Crop_calendar       SUGARBEET\n',
       '* Substance           ABC\n',
       '* Application_scheme  DRY\n',
       '* Deposition_scheme   No\n',
       '* Results             0.0001\n'
       ]

df = pd.DataFrame({"Col": data})
df[['col1', 'col2']] = df.pop('Col').str.strip(" * ").str.split(expand=True)
print(df)

输出:

                  col1           col2
0          Report_type          Leach
1          Result_text  Concentration
2               Run_Id            179
3             Location       MUENSTER
4        Meteo_station         KREM-M
5            Soil_type           KREM
6        Crop_calendar      SUGARBEET
7            Substance            ABC
8   Application_scheme            DRY
9    Deposition_scheme             No
10             Results         0.0001

【讨论】:

    猜你喜欢
    • 2018-07-13
    • 2014-05-31
    • 2020-04-12
    • 1970-01-01
    • 2014-01-05
    • 2021-11-28
    • 1970-01-01
    相关资源
    最近更新 更多