【问题标题】:Parse specific text from CSV rows into a new column using Python使用 Python 将 CSV 行中的特定文本解析为新列
【发布时间】:2020-09-08 20:44:01
【问题描述】:

我有一个 csv 文件,其中包含我想提取到新列的字段中的文本块。例如,我的 csv 如下所示:

house, paint, status-text house1, green, this house is nice it gets a status of result: PASS this is good house2, red, this house is not too nice it gets a status of result: FAIL this is bad house3, blue, this house is the best it gets a status of result: PASS this is great,

我想运行一个简单的正则表达式将 (result: PASS) 或 (result: FAIL) 提取到一个新列中,因此 CSV 现在如下所示:

house, paint, status-text, status house1, green, this house is nice it gets a status of result: PASS this is good, PASS house2, red, this house is not too nice it gets a status of result: FAIL this is bad, FAIL house3, blue, this house is the best it gets a status of result: PASS this is great, PASS

我正在考虑使用 Pandas 数据框,但不确定如何解析 (PASS/FAIL) 并将其移动到 3 行的自己的列中,并且可能会扩展到数百行。任何关于如何作为小样本执行此操作的示例将不胜感激。

【问题讨论】:

    标签: python-3.x pandas csv parsing


    【解决方案1】:

    您可以将 csv 加载到 pandas 数据框,然后执行此操作

        conditions = [
            df["status-text"].str.contains("result: PASS "),
            df["status-text"].str.contains("result: FAIL "),
        ]
    
        choices = ["PASS", "FAIL"]
    
        df["status"] = numpy.select(conditions, choices, default=None)
    
        print(df)
    

    【讨论】:

    • 哇,这是一个很棒、易于理解且优雅的解决方案。我能够将它集成到我的 poc 中。非常感谢!
    【解决方案2】:

    您可以使用np.where

    df['status'] = np.where(df["status-text"].str.contains('PASS'), 'PASS', 'FAIL')
    df
        house   paint                                        status-text status
    0  house1   green   this house is nice it gets a status of result...   PASS
    1  house2     red   this house is not too nice it gets a status o...   FAIL
    2  house3    blue   this house is the best it gets a status of re...   PASS
    

    【讨论】:

      猜你喜欢
      • 2013-02-08
      • 1970-01-01
      • 1970-01-01
      • 2018-03-08
      • 1970-01-01
      • 1970-01-01
      • 2019-09-27
      • 2017-08-11
      • 1970-01-01
      相关资源
      最近更新 更多