【问题标题】:How do I get a filtered list from a csv file?如何从 csv 文件中获取过滤列表?
【发布时间】:2019-10-02 18:29:09
【问题描述】:

我正在做一个关于 csv 文件的家庭作业,我被要求做一个过滤器来检查关键字。以前,我已经创建了一个字典列表,现在我被要求检查每个字典中的关键字。如果找到关键字,我被要求将该字典附加到另一个称为过滤列表的列表中。

作为背景,Donald Trump 在 Facebook 上发布的家庭作业是数据样本

[{'link_name': 'Timeline Photos',
'num_angrys': '7',
'num_comments': '543',
'num_hahas': '17',
'num_likes': '6178',
'num_loves': '572',
'num_reactions': '6813',
'num_sads': '0',
'num_shares': '359',
'num_wows': '39',
'status_id': '153080620724_10157915294545725',
'status_link': 'https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10157915294545725/?type=3',
'status_message': 'Beautiful evening in Wisconsin- THANK YOU for your incredible support tonight! Everyone get out on November 8th - and VOTE! LETS MAKE AMERICA GREAT AGAIN! -DJT',
'status_published': '10/17/2016 20:56:51',
'status_type': 'photo'},
{'link_name': '',
'num_angrys': '5211',
'num_comments': '3644',
'num_hahas': '75',
'num_likes': '26649',
'num_loves': '487',
'num_reactions': '33768',
'num_sads': '191',
'num_shares': '17653',
'num_wows': '1155',
'status_id': '153080620724_10157914483265725',
'status_link': 'https://www.facebook.com/DonaldTrump/videos/10157914483265725/',
'status_message': "The State Department's quid pro quo scheme proves how CORRUPT our system is. Attempting to protect Crooked Hillary, NOT our American service members or national security information, is absolutely DISGRACEFUL. The American people deserve so much better. On November 8th, we will END this RIGGED system once and for all!",
'status_published': '10/17/2016 18:00:41',
'status_type': 'video'}]

目前这是我的代码

from nltk.tokenize import sent_tokenize, word_tokenize
def get_update_with_keywords(status_updates, keywords, case_sensitive = "false"):
    # your code here
    with open(input_file, 'r') as infile:
        filtered_status_updates = []
        for row in status_updates:
            tokens = word_tokenize(row["status_message"])
            if tokens == keywords:
                filtered_status_updates.append(row)
        return filtered_status_updates 

keywords = ["clinton", "obama"] 
get_update_with_keywords(status_updates, keywords)

但我不断得到这个输出:

[]

我想这是因为我试图将整个字典附加到列表中?!

【问题讨论】:

  • 看来您需要使用any()in 来检查成员资格,而不是将关键字列表与带有== 的令牌列表进行比较。在您编写的代码中,您只会发现整条推文都是“克林顿奥巴马!!!”
  • @G.Anderson 我该怎么做呢?抱歉,我对 python 非常陌生,真的不知道我在做什么编辑:刚刚看到你的链接,谢谢!

标签: python python-3.x


【解决方案1】:

使用它来代替检查它是否包含在列表中.. 所以你的

如果令牌 == 关键字:

会变成

子列表(关键字,标记)

def sublist(ls1, ls2):
'''
>>> sublist([], [1,2,3])
True
>>> sublist([1,2,3,4], [2,5,3])
True
>>> sublist([1,2,3,4], [0,3,2])
False
>>> sublist([1,2,3,4], [1,2,5,6,7,8,5,76,4,3])
False
'''
    def get_all_in(one, another):
        for element in one:
            if element in another:
                yield element

    for x1, x2 in zip(get_all_in(ls1, ls2), get_all_in(ls2, ls1)):
        if x1 != x2:
            return False

    return True

感谢check list if its a sublist

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-12-10
    • 1970-01-01
    • 2022-12-05
    • 2011-01-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-09-30
    相关资源
    最近更新 更多