【问题标题】:A variable with two columns in pandas熊猫中有两列的变量
【发布时间】:2018-08-02 04:45:11
【问题描述】:

我不确定我的变量 df['education'] 在打印出来时如何具有两个相同的列,而不仅仅是一个列。当我检查变量的类型时,它说它是一个系列,但是一个系列怎么会有两列呢?

 df2['education']

                education            education
0        Higher education     Higher education
1        Higher education     Higher education
2        Higher education     Higher education
3        Higher education     Higher education
4        Higher education     Higher education
5        Higher education     Higher education
6        Higher education     Higher education
7        Higher education     Higher education

[4743 rows x 2 columns]

如何合并两者或只保留一列?

【问题讨论】:

    标签: python-3.x pandas duplicates series


    【解决方案1】:

    存在重复列名的问题,所以如果选择获取DataFrame中的所有列。

    如果使用read_csv,解决方案是为0.19.0+升级pandas,默认情况下会避免它,它会创建2列educationeducation.1

    另一种解决方案是使用cumcount 将计数添加到重复的列名称:

    df = pd.DataFrame({'A':list('abc'),
                        'B':[4,5,4],
                        'C':[7,8,9]})
    df.columns = ['id', 'education', 'education']
    print (df)
      id  education  education
    0  a          4          7
    1  b          5          8
    2  c          4          9
    
    s = df.columns.to_series()
    count = s.groupby(s).cumcount().astype(str)
    
    df.columns = s.mask(count != '0', s + '.' + count)
    print (df)
      id  education  education.1
    0  a          4            7
    1  b          5            8
    2  c          4            9
    

    【讨论】:

      猜你喜欢
      • 2021-04-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-12-16
      • 2019-07-30
      • 2018-11-18
      • 2022-11-28
      • 2017-11-18
      相关资源
      最近更新 更多