【问题标题】:combine data in pandas在 pandas 中合并数据
【发布时间】:2015-10-31 08:20:06
【问题描述】:

我有一个这样的熊猫数据框:

index   integer_2_x  integer_2_y
0        49348          NaN
1        26005          NaN
2            5          NaN
3          NaN           26
4        26129          NaN
5          129          NaN
6          NaN           26
7          NaN           17
8        60657          NaN
9        17031          NaN

我想通过取第一和第二列中的数值并消除NaN 来制作第三列,看起来像这样。我该怎么做?

index   integer_2_z
0        49348
1        26005
2            5
3           26
4        26129
5          129
6           26
7           17
8        60657
9        17031

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    一种方法是使用update 函数。

    import pandas as np
    import numpy as np
    
    # some artificial data
    # ========================
    df = pd.DataFrame({'X':[10,20,np.nan,40,np.nan], 'Y':[np.nan,np.nan,30,np.nan,50]})
    print(df)
    
    
        X   Y
    0  10 NaN
    1  20 NaN
    2 NaN  30
    3  40 NaN
    4 NaN  50    
    
    # processing
    # =======================
    df['Z'] = df['X']
    # for every missing value in column Z, replace it with value in column Y
    df['Z'].update(df['Y'])
    print(df)
    
        X   Y   Z
    0  10 NaN  10
    1  20 NaN  20
    2 NaN  30  30
    3  40 NaN  40
    4 NaN  50  50    
    

    【讨论】:

      【解决方案2】:

      我用http://pandas.pydata.org/pandas-docs/stable/basics.html#general-dataframe-combine

      import pandas as pd
      import numpy as np
      df = pd.read_csv("data", sep="\s*")  # cut and pasted your data into 'data' file
      df["integer_2_z"] = df["integer_2_x"].combine(df["integer_2_y"], lambda x, y: np.where(pd.isnull(x), y, x))
      

      输出

             index  integer_2_x  integer_2_y  integer_2_z
      0      0        49348          NaN        49348
      1      1        26005          NaN        26005
      2      2            5          NaN            5
      3      3          NaN           26           26
      4      4        26129          NaN        26129
      5      5          129          NaN          129
      6      6          NaN           26           26
      7      7          NaN           17           17
      8      8        60657          NaN        60657
      9      9        17031          NaN        17031
      

      【讨论】:

      【解决方案3】:

      也许您可以简单地使用fillna 函数。

      # Creating the DataFrame
      df = pd.DataFrame({'integer_2_x': [49348, 26005, 5, np.nan, 26129, 129, np.nan, np.nan, 60657, 17031],
                     'integer_2_y': [np.nan, np.nan, np.nan, 26, np.nan, np.nan, 26, 17, np.nan, np.nan]})
      
      # Using fillna to fill a new column
      df['integer_2_z'] = df['integer_2_x'].fillna(df['integer_2_y'])
      
      # Printing the result below, you can also drop x and y columns if they are no more required
      print(df)
      
         integer_2_x  integer_2_y  integer_2_z
      0        49348          NaN        49348
      1        26005          NaN        26005
      2            5          NaN            5
      3          NaN           26           26
      4        26129          NaN        26129
      5          129          NaN          129
      6          NaN           26           26
      7          NaN           17           17
      8        60657          NaN        60657
      9        17031          NaN        17031
      

      【讨论】:

        猜你喜欢
        • 2018-09-04
        • 2020-06-28
        • 2016-10-17
        • 1970-01-01
        • 2017-07-10
        • 2019-11-19
        • 2015-10-14
        • 2017-10-10
        相关资源
        最近更新 更多