【问题标题】:Remove stopwords list from list in Python (Natural Language Processing)从 Python 列表中删除停用词列表(自然语言处理)
【发布时间】:2019-01-17 10:27:59
【问题描述】:

我一直在尝试使用 python 3 代码删除停用词,但我的代码似乎不起作用,我想知道如何从以下列表中删除停用词。示例结构如下:

    from nltk.corpus import stopwords

    word_split1=[['amazon','brand','- 
    ','solimo','premium','almonds',',','250g','by','solimo'],
    ['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'], 
    ['jbl','t450bt','extra','bass','wireless','on- 
    ear','headphones','with','mic','white','by','jbl','and']]

我正在尝试删除停用词并尝试以下是我的代码,如果有人可以帮助我纠正问题,我将不胜感激。这是下面的代码

    stop_words = set(stopwords.words('english'))

    filtered_words=[]
    for i in word_split1:
        if i not in stop_words:
            filtered_words.append(i)

我得到错误:

    Traceback (most recent call last):
    File "<ipython-input-451-747407cf6734>", line 3, in <module>
    if i not in stop_words:
    TypeError: unhashable type: 'list'

【问题讨论】:

    标签: python-3.x nlp stanford-nlp opennlp


    【解决方案1】:

    你有一个列表列表。

    试试:

    word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
    stop_words = set(stopwords.words('english'))
    filtered_words=[]
    for i in word_split1:
        for j in i:
            if j not in stop_words:
                filtered_words.append(j)
    

    或展平您的列表。

    例如:

    from itertools import chain    
    
    word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
    stop_words = set(stopwords.words('english'))
    filtered_words=[]
    for i in chain.from_iterable(word_split1):
        if i not in stop_words:
            filtered_words.append(i)
    

    filtered_words = [i for i in chain.from_iterable(word_split1) if i not in stop_words]
    

    【讨论】:

    • Hello Rakesh 感谢您对我的问题的支持/贡献,我从以下代码stopset = set(stopwords.words('english')) clean_models = [] for m in word_split1: stop_m = [i for i in m if str(i).lower() not in stopset] clean_models.append(stop_m) 得到了准确的输出
    【解决方案2】:

    该列表是一个二维数组,您正在尝试散列一个列表,先将其转换为一维数组,然后您的代码就可以正常工作了,

    word_split1 = [j for x in word_split1 for j in x] 
    
    stop_words = set(stopwords.words('english'))
    
    filtered_words=[]
    for i in word_split1:
        if i not in stop_words:
            filtered_words.append(i)
    

    【讨论】:

    • 感谢@Specbug 的支持
    猜你喜欢
    • 2023-03-06
    • 1970-01-01
    • 2021-04-23
    • 2019-09-10
    • 2018-09-28
    • 1970-01-01
    • 2017-10-22
    • 2021-12-22
    • 2021-02-02
    相关资源
    最近更新 更多