如何在数据集中显示特定单词？答案

【问题标题】：How can I show a specific word in a data set?如何在数据集中显示特定单词？
【发布时间】：2021-06-02 21:11:30
【问题描述】：

我刚开始学习python。我有一个关于在 excel 中匹配我的数据集中的一些单词的问题。

words_list 包含一些我想在数据集中找到的单词。

words_list = ('tried','mobile','abc')

df 是从 excel 中提取的，并选取了一个列。

df =

0        to make it possible or easier for someone to do ...  
1        unable to acquire a buffer item very likely ...  
2        The organization has tried to make...  
3        Broadway tried a variety of mobile Phone for the..

我想得到这样的结果：

'None',
'None',
'tried',
'tried','mobile'

我在木星上是这样尝试的：

list = [ ]
for word in df: 
    if any (aa in word for aa in words_List): 
        list.append(word) 
    else:
        list.append('None')

print(list)

但结果会在df中显示整个句子

'None'  
'None'  
'The organization has tried to make...'  
'Broadway tried a variety of mobile Phone for the..'

我可以只在单词列表中显示结果吗？对不起我的英语和
谢谢大家

【问题讨论】：

您想要什么确切的输出格式？请准确书写

标签： python python-3.x list find

【解决方案1】：

我建议对 DataFrame 进行操作（这应该始终是您的第一个想法，使用 pandas 的力量）

import pandas as pd

words_list = {'tried', 'mobile', 'abc'}

df = pd.DataFrame({'col': ['to make it possible or easier for someone to do',
                           'unable to acquire a buffer item very likely',
                           'The organization has tried to make',
                           'Broadway tried a variety of mobile Phone for the']})

df['matches'] = df['col'].str.split().apply(lambda x: set(x) & words_list)
print(df)


                                                col          matches
0   to make it possible or easier for someone to do               {}
1       unable to acquire a buffer item very likely               {}
2                The organization has tried to make          {tried}
3  Broadway tried a variety of mobile Phone for the  {mobile, tried}

【讨论】：

对 lambda 内部发生的事情进行一些解释会很有帮助，因为 OP 是初学者。
谢谢@azro！如果我想删除匹配列中每个单词的括号，我该怎么办？我试过这段代码，但它不起作用 Remove=df['matches'].replace(['{', '}'],[' ',' '], regex=True)
因为我需要在匹配中选择这些词进行情感分析，所以情感分析不支持 TypeError: unhashable type: 'set'

【解决方案2】：

打印整行的原因与您的：

for word in df:

您的“单词”变量实际上占据了整行。然后它会检查整行以查看它是否包含您的搜索词。如果它确实找到它，那么它基本上会说，“是的，我在这一行中找到了____，所以将该行附加到您的列表中。

听起来你想要做的是首先将行分成单词，然后检查。

list = [ ]
found = False

for line in df:
    words = line.split(" ") 
    for word in word_list:
       if word in words:
          found = True
          list.append(word)
    # this is just to append "None" if nothing found
    if found:
       found = False
    else:
       list.append("None")
        
print(list)

附带说明一下，在处理列表时，您可能希望使用 pprint 而不是 print。它以更易于阅读的布局打印列表、字典等。我不知道您是否需要安装该软件包。这取决于您最初安装 python 的方式。但用法类似于：

from pprint import pprint

dictionary = {'firstkey':'firstval','secondkey':'secondval','thirdkey':'thirdval'}

pprint(dictionary)

【讨论】：