【问题标题】:remove entire rows from df if the word occurs如果单词出现,则从 df 中删除整行
【发布时间】:2022-11-30 22:45:18
【问题描述】:

stowwords列表:

stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was" ]

df:

words frequency
the company 10
green energy 9
founded in 8
gases for 8
electricity 5

如果它包含任何给定的停用词,我想删除整行,在此示例中输出应该是:

words frequency
green energy 9
electricity 5

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    | 字符有含义,在 Python 的术语中表示 or,因此您需要转义该含义才能在停用词列表中使用它。你用反斜杠转义 (查看更多 here

    话虽如此,你可以这样做:

    stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was"]
    df.loc[~df['words'].str.contains('|'.join(stop_w))]
    

    印刷:

              words  frequency
    1  green energy          9
    4   electricity          5
    

    【讨论】:

      【解决方案2】:

      您可以像这样创建 sub_df:

      sub_df = df[df.words.str not in stop_w]
      

      或者获取我要删除的行的 ID:

      idx = df[df.words.str in stop_w].index
      df.drop(idx)
      

      https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

      【讨论】:

        猜你喜欢
        • 2013-07-02
        • 2020-09-11
        • 1970-01-01
        • 1970-01-01
        • 2014-04-20
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-06-21
        相关资源
        最近更新 更多