创建文档术语矩阵时出现属性错误答案

【问题标题】：Attribute Error when creating Document Term Matrix创建文档术语矩阵时出现属性错误
【发布时间】：2017-04-16 20:10:13
【问题描述】：

我正在尝试创建一个以 Pandas 数据框形式表示的文档术语矩阵。到目前为止，这是我的代码：

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete

当我运行此代码时，我收到以下错误：

'list' object has no attribute 'lower'

我怎样才能摆脱这个错误？

【问题讨论】：

标签： python pandas text text-analysis

【解决方案1】：

将列表对象包装在 str() 中以将它们转换为字符串：

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete

【讨论】：

所以这似乎已经解决了这个问题，但现在我收到“ValueError：值的长度与索引的长度不匹配”关于为什么会出现这种情况的任何建议？
该错误是熊猫库内部的，所以我不确定。这可能值得提出一个新问题。如果您确实提出了新问题，我建议您使用数据框标签。
好的，谢谢 JacobIRR。我将继续创建一个关于这个新错误的新问题。