【发布时间】:2015-10-01 18:43:39
【问题描述】:
我正在关注 Python for Data Analysis 一书。它告诉我从http://www.fec.gov/disclosurep/PDownload.do 获取所有文件并用pandas 加载它:
import pandas as pd
fec = pd.read_csv('P00000001-ALL.csv')
但是自从本书写完之后实际的文件已经改变了。旧文件(可在此处找到https://github.com/pydata/pydata-book/blob/master/ch09/P00000001-ALL.csv)加载得很好
fec = pd.read_csv('../pydata-book/ch09/P00000001-ALL.csv')
但新的加载错误,因为列似乎已经移动(第一列值被删除)
cmte_id P60008059
cand_id Bush, Jeb
cand_nm EASTON, AMY KELLY MRS.
contbr_nm KEY BISCAYNE
contbr_city FL
contbr_st 331491716
contbr_zip HOMEMAKER
contbr_employer HOMEMAKER
contbr_occupation 2700
contb_receipt_amt 26-JUN-15
contb_receipt_dt NaN
receipt_desc NaN
memo_cd NaN
memo_text SA17A
form_tp 1024106
file_num SA17.114991
tran_id P2016
election_tp NaN
实际的行是
C00579458,"P60008059","Bush, Jeb","EASTON, AMY KELLY MRS.","KEY BISCAYNE","FL","331491716","HOMEMAKER","HOMEMAKER",2700,26-JUN-15,"","","","SA17A","1024106","SA17.114991","P2016",
所以 C00579458 在某处丢失了。
标题看起来像这样。 cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
【问题讨论】:
-
您能否添加几行,包括导致问题的 csv 标题以及您为这些行获得的确切输出。
-
嗨阿南德,你有标题和上面的一行吗?需要我再添加几行吗?
-
当您检查数据框时,第一个元素是否被视为索引?