【发布时间】:2020-06-25 07:57:19
【问题描述】:
我有一个格式如下的数据框: link to the csv file
image_name caption_number caption
0 1000092795.jpg 0 Two young guys with shaggy hair look at their...
1 1000092795.jpg 1 Two young , White males are outside near many...
2 1000092795.jpg 2 Two men in green shirts are standing in a yard .
3 1000092795.jpg 3 A man in a blue shirt standing in a garden .
4 1000092795.jpg 4 Two friends enjoy time spent together .
我想添加另一列keywords,它使用 NLP 关键字提取方法提取关键字。
这是我尝试过的:
df = pd.read_csv('results.csv', delimiter='|')
df.columns = ['image_name', 'caption_number', 'caption']
stop_words = stopwords.words('english')
def get_keywords(row):
some_text = row['caption']
lowered = some_text.lower()
tokens = nltk.tokenize.word_tokenize(some_text)
keywords = [keyword for keyword in tokens if keyword.isalpha() and not keyword in stop_words]
keywords_string = ','.join(keywords)
return keywords_string
df['Keywords'] = df['caption'].apply(get_keywords, axis=1)
以上返回错误:get_keywords() got an unexpected keyword argument 'axis'
【问题讨论】:
-
结果如何?它有什么问题?你有什么问题?
-
我收到一个错误
get_keywords() got an unexpected keyword argument 'axis' -
当你用双括号写
df[['caption']].apply(get_keywords, axis=1)或省略axis关键字时会发生什么?您正在将 DataFrame 隐式折叠为系列。 -
如果我使用双方括号,我得到
'float' object has no attribute 'lower'", 'occurred at index 19999',当我删除轴关键字时,我得到string indices must be integers