【发布时间】:2019-11-02 20:15:32
【问题描述】:
假设我在数据集(csv 文件)的单元格中有以下文本:
我想提取出现在关键字Decision 和reason 之后的单词/短语。我可以这样做:
import pandas as pd
text = '''Decision: Postpone\n\nreason:- medical history - information obtained from attending physician\n\nto review with current assessment from Dr Cynthia Dominguez regarding medical history, and current CBC showing actual number of platelet count\n\nmib: F\n'''
keywords = ['decision', 'reason']
new_df = pd.DataFrame(0, index=[0], columns=keywords)
a = text.split('\n')
for cell in a:
for keyword in keywords:
if keyword in cell.lower():
if len(cell.split(':'))>1:
new_df[keyword][0]=cell.split(':')[1]
new_df
但是,在某些单元格中,单词/短语出现在关键字之后的新行中,在这种情况下,此程序无法提取它:
import pandas as pd
text = '''Decision: Postpone\n\nreason: \n- medical history \n- information obtained from attending physician\n\nto review with current assessment from Dr Cynthia Dominguez regarding medical history, and current CBC showing actual number of platelet count\n\nmib: F\n'''
keywords = ['decision', 'reason']
new_df = pd.DataFrame(0, index=[0], columns=keywords)
a = text.split('\n')
for cell in a:
for keyword in keywords:
if keyword in cell.lower():
if len(cell.split(':'))>1:
new_df[keyword][0]=cell.split(':')[1]
new_df
我该如何解决这个问题?
【问题讨论】:
标签: python pandas text pattern-matching text-processing