【问题标题】:Counting occurrences of a string in a column of a csv file计算 csv 文件列中字符串的出现次数
【发布时间】:2016-08-22 16:54:00
【问题描述】:

我有一个大的 csv 文件(超过 66k 行),我想计算字符串在每一行中出现的次数。我特别关注一列,那一列的每一行都有一个小句子,如下图:

Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall

我知道如何对文本文件执行此操作,但我在将相同的技术应用于 csv 时遇到了困难。我一直在使用 pandas 并尝试了几种方法,但它们返回错误代码或空数据框。

Attempts:
my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)

df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]

如果有人能帮我调试一下,将不胜感激!

【问题讨论】:

    标签: python string csv pandas


    【解决方案1】:

    我认为您可以将str.countsentence 列一起使用:

    print df
    #                                            sentence
    #0    Sam ate an apple and she felt great apple apple
    #1  Jill thinks the sky is purple but Bob says it'...
    #2          Ralph wants to go apple picking this fall
    
    print df.columns
    #Index([u'sentence'], dtype='object')
    
    df['count'] = df['sentence'].str.count('apple')
    print df
    #                                            sentence  count
    #0    Sam ate an apple and she felt great apple apple      3
    #1  Jill thinks the sky is purple but Bob says it'...      0
    #2          Ralph wants to go apple picking this fall      1
    

    【讨论】:

    • 如果我的回答有帮助,别忘了accept。谢谢。
    • 我以前试过这个,但没有用。当我尝试运行代码时,我收到一条错误消息:AttributeError: 'DataFrame' object has no attribute 'sentence'
    • 尝试将df.sentence更改为df['sentence']
    • df['sentence'].str.count("apple") 最终工作了!我不确定为什么我需要在苹果周围加上双引号,但这似乎解决了它!谢谢!
    • @CjV - 使用L = ['apple','orange','pear'],然后使用df['count'] = df['sentence'].str.count('|'.join(L))
    猜你喜欢
    • 2017-03-02
    • 1970-01-01
    • 1970-01-01
    • 2022-01-06
    • 2016-04-28
    • 1970-01-01
    • 2020-12-10
    • 2019-06-03
    相关资源
    最近更新 更多