【问题标题】:Pandas DataFrame Get Header Names based on valuesPandas DataFrame 根据值获取标题名称
【发布时间】:2019-04-01 22:17:06
【问题描述】:

我有一个如下所示的 Pandas DataFrame - 我正在使用 Python Pandas。

+------------+---------+---------+----------+--------+
| Movie Name | English | Chinese | Japanese | Korean |
+------------+---------+---------+----------+--------+
| A          |       1 |       0 |        0 |      0 |
| B          |       0 |       1 |        1 |      0 |
| C          |       0 |       1 |        1 |      1 |
| D          |       1 |       0 |        0 |      0 |
| E          |       0 |       1 |        0 |      0 |
+------------+---------+---------+----------+--------+

我想像下面这样转换它,通过基于值(0 或 1)连接标题名称

预期输出

+------------+-------------------------+
| Movie Name |        Languages        |
+------------+-------------------------+
| A          | English                 |
| B          | Chinese, Japanese       |
| C          | Chinese,Japanese,Korean |
| D          | English                 |
| E          | Chinese                 |
+------------+-------------------------+

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    首先通过DataFrame.set_index 创建索引,然后使用DataFrame.dot1 进行矩阵乘法,最后通过Series.str.rstrip 删除最后一个,Series.reset_index 用于2 列DataFrame

    df = df.set_index('Movie Name')
    df1 = df.dot(df.columns + ',').str.rstrip(',').reset_index(name='Languages')
    print (df1)
      Movie Name                Languages
    0          A                  English
    1          B         Chinese,Japanese
    2          C  Chinese,Japanese,Korean
    3          D                  English
    4          E                  Chinese
    

    【讨论】:

      【解决方案2】:

      IIUC,melt 然后问题变成了groupby 问题

      s=df.melt('MovieName').query('value==1').groupby('MovieName').variable.agg(','.join)
      df['New']=df.MovieName.map(s)
      df
      Out[690]: 
        MovieName  English           ...             Korean                      New
      0         A        1           ...                  0                  English
      1         B        0           ...                  0         Chinese,Japanese
      2         C        0           ...                  1  Chinese,Japanese,Korean
      3         D        1           ...                  0                  English
      4         E        0           ...                  0                  Chinese
      [5 rows x 6 columns]
      

      【讨论】:

        【解决方案3】:

        你可以使用:

        df['langauges'] = (df.eq(1)*df.columns).apply(lambda x : ','.join(x), axis = 1).str.strip(',')
        df
        
         Movie Name English Chinese Japanese    Korean  langauges
        0     A       1       0        0          0      English
        1     B       0       1        1          0      Chinese,Japanese
        2     C       0       1        1          1      Chinese,Japanese,Korean
        3     D       1       0        0          0      English
        4     E       0       1        0          0      Chinese
        

        【讨论】:

          【解决方案4】:

          可以使用 pandas.Series.str.cat 来完成。你可以在这里阅读更多关于它的信息https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html

          import pandas as pd
          import numpy as np
          
          df=pd.DataFrame({'Movie Name':['A','B','C','D','E'],'English':[1,0,0,1,0],'Chinese':[0,1,1,0,1],'Japanese':[0,1,1,0,0],'Korean':[0,0,1,0,0]})
          df=df.replace(1,df.columns.to_series())
          df=df.replace(0,np.NaN)
          df['Languages']=df[['English','Chinese','Japanese','Korean']].apply(lambda x: x.str.cat(sep=","),axis=1)
          df=df.drop(columns=['English','Chinese','Japanese','Korean'])
          

          结果:

            Movie Name                Languages
          0          A                  English
          1          B         Chinese,Japanese
          2          C  Chinese,Japanese,Korean
          3          D                  English
          4          E                  Chinese
          

          【讨论】:

            猜你喜欢
            • 2015-10-22
            • 2019-01-11
            • 2019-09-26
            • 2021-11-26
            • 1970-01-01
            • 2013-10-29
            相关资源
            最近更新 更多