【发布时间】:2017-04-16 20:10:13
【问题描述】:
我正在尝试创建一个以 Pandas 数据框形式表示的文档术语矩阵。到目前为止,这是我的代码:
df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]
profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete
当我运行此代码时,我收到以下错误:
'list' object has no attribute 'lower'
我怎样才能摆脱这个错误?
【问题讨论】:
标签: python pandas text text-analysis