【问题标题】:Pandas - Columns not read though PresentPandas - 虽然存在但未阅读的列
【发布时间】:2017-09-02 13:32:46
【问题描述】:

我有以下一组数据。

url, team1, team2, win_toss, bat_or_bowl, outcome, win_game, date,day_n_night, ground, rain, duckworth_lewis, match_id, type_of_match
"espncricinfo-t20/145227.html","Western Australia","Victoria","Victoria","bat","Western Australia won by 8 wickets (with 47 balls remaining)","Western Australia"," Jan 12 2005","1"," Western Australia Cricket Association Ground,Perth","0","0","145227","T20"
"espncricinfo-t20/212961.html","Australian Institute of Sports","New Zealand Academy","New Zealand Academy","bowl","Match tied",""," Jul 7 2005 ","0"," Albury Oval, Brisbane","0","0","212961","T20"
"espncricinfo-t20/216598.html","Air India","New South Wales","Air India","bowl","Air India won by 7 wickets (with 5 balls remaining)","Air India"," Aug 19 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216598","T20"
"espncricinfo-t20/216620.html","Karnataka State Cricket Association XI","Bradman XI","Bradman XI","bowl","Karnataka State Cricket Association XI won by 33 runs","Karnataka State Cricket Association XI"," Aug 20 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216620","T20"
"espncricinfo-t20/216633.html","Chemplast","Bradman XI","Chemplast","bat","Bradman XI won by 6 wickets (with 13 balls remaining)","Bradman XI"," Aug 20 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216633","T20"

这是来自 python 控制台:

**

>>> import pandas as pd
>>> df = pd.read_csv("sample.txt" , quotechar = '\"')
>>> df.shape
(9, 14)


>>> df.columns
Index([u'url', u' team1', u' team2', u' win_toss', u' bat_or_bowl',
       u' outcome', u' win_game', u' date', u' day_n_night', u' ground',
       u' rain', u' duckworth_lewis', u' match_id', u' type_of_match'],
      dtype='object')


>>> df.url.head()
0    espncricinfo-t20/145227.html
1    espncricinfo-t20/212961.html
2    espncricinfo-t20/216598.html
3    espncricinfo-t20/216620.html
4    espncricinfo-t20/216633.html
Name: url, dtype: object


>>> df.team1.head()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python27/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'team1'



>>> df.iloc[1:2]
                            url                           team1  \
1  espncricinfo-t20/212961.html  Australian Institute of Sports

                 team2             win_toss  bat_or_bowl     outcome  \
1  New Zealand Academy  New Zealand Academy         bowl  Match tied

   win_game          date   day_n_night                  ground   rain  \
1       NaN   Jul 7 2005              0   Albury Oval, Brisbane      0

    duckworth_lewis   match_id  type_of_match
1                 0     212961            T20

我们可以看到列 team1 存在,但我无法从 Df 中检索它。我对除 first 之外的所有列都收到此错误。谁能帮我在这里找到问题!谢谢

【问题讨论】:

    标签: python python-2.7 pandas numpy


    【解决方案1】:

    列名中有空格,需要strip删除:

    df.columns = df.columns.str.strip()
    

    【讨论】:

    • 我会试试这个。谢谢你。但为什么 df.shape 显示准确的列数?
    • columns 是项目数组,所以长度是可以的。唯一的问题是项目 - 有些包含空格 - u' team1', u' team2' 但需要 u'team1', u'team2',
    • 感谢您,+1 提供了一种新的处理方式。我会选择 EdChum 的答案,因为它看起来更优雅。感谢您的回复!
    • 很高兴能为您提供帮助。
    【解决方案2】:

    你有一个领先的空间:

    u' team1'
    

    在列中,因此它引发KeyError

    这样做:

    pd.read_csv("sample.txt" , quotechar = '\"', skipinitialspace=True)
    

    因此读取 csv 并忽略前导空格

    docs

    【讨论】:

    • 感谢 EdChum。这看起来更优雅。
    猜你喜欢
    • 2012-03-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-10-09
    • 1970-01-01
    • 1970-01-01
    • 2023-03-05
    相关资源
    最近更新 更多