【发布时间】:2019-09-25 17:04:53
【问题描述】:
enter image description here我正在尝试从 csv 文件中提取几列。这是我正在使用的大型面板数据的简单版本。在 Excel 中打开时有点像下面这样。但是,我在运行代码时收到一条错误消息:“ValueError: too many values to unpack (expected 4)”。我只是将我的文件编辑为图像,以便于查看。
companyID 年份 company_age 债务_TA gcp 654001 2000 49 0.14 0 654001 2001 50 0.17 0 654001 2002 51 0.23 1 112089 2013 38 0.11 0 112089 2014 39 0.13 0 342980 2007 54 0.15 0 342980 2008 55 0.22 1
我已经搜索并尝试了几个关于此类错误的答案,但到目前为止没有一个对我有用。我的代码如下所示。
import csv
import numpy as np
from sklearn import feature_extraction
def parseFile (filename):
companies = list ()
with open (filename) as csvfile:
reader = csv.reader (csvfile, delimiter = ',', quotechar = '"')
for index, line in enumerate (reader):
#print index, line
if (index > 0 and index < 150):
CompanyID, year, company_age, gcp = line
#print company_name
company = {\
'CompanyID' : CompanyID,\
'year' : year,\
'company_age' : company_age,\
'gcp': int (gcp),\
}
companies.append (company)
return companies
def extract_year_features (companies):
year_list = list ()
for company in companies:
year_list.append (company['year'] * 10)
tweet_vectorizer = feature_extraction.text.CountVectorizer ()
X = tweet_vectorizer.fit_transform (year_list).toarray ()
return X
def extract_company_age_features (companies):
company_age_list = list ()
for company in companies:
company_age_list.append (company['company_age'] * 10)
tweet_vectorizer = feature_extraction.text.CountVectorizer ()
X = tweet_vectorizer.fit_transform (company_age_list).toarray ()
return X
def extract_all_features (companies):
return np.concatenate ( (extract_year_features (companies), \
extract_company_age_features (companies)), \
axis=1)
def generate_target (companies):
y = [company['gcp'] for company in companies]
return np.array (y)
companies = parseFile ("sample.csv")
X = extract_all_features (companies)
y = generate_target (companies)
#credit to G.Li
谁能指出我做错了什么?我是一名 Python 初学者,已经尝试了几个类似问题的答案,但没有一个对我有用。提前致谢。
【问题讨论】:
-
建议:使用pandas导入和操作csv文件。
-
谢谢你们,Andrejs Cainikovs 和 user9940344。我会看看你的建议,看看效果如何。
标签: python python-3.x