【发布时间】:2019-12-24 04:43:09
【问题描述】:
背景
我有以下 df
import pandas as pd
df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666 \nDate is Here 123456 ',
'999998 For \nBase ID: 123456 \nDate there',
'So so \nBase ID: 939393 \nDate hey the 123455 ',],
'ID': [1,2,3],
'P_ID': ['A','B','C'],
})
输出
ID P_ID Text
0 1 A But the here is \nBase ID: 666666 \nDate is Here 123456
1 2 B 999998 For \nBase ID: 123456 \nDate there
2 3 C So so \nBase ID: 939393 \nDate hey the 123455
试过
我尝试了以下**BLOCK** \nBase ID: 和 \nDate 之间的 6 位数字
df['New_Text'] = df['Text'].str.replace('ID:(.+?)','ID:**BLOCK**')
我得到以下信息
ID P_ID Text New_Text
0 But the here is \nBase ID:**BLOCK**666666 \nDate is Here 123456
1 999998 For \nBase ID:**BLOCK**123456 \nDate there
2 So so \nBase ID:**BLOCK**939393 \nDate hey the 123455
但我并没有完全得到我想要的
期望的输出
ID P_ID Text New_Text
0 But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1 999998 For \nBase ID:**BLOCK** \nDate there
2 So so \nBase ID:**BLOCK** \nDate hey the 123455
问题
如何调整 str.replace('ID:(.+?)','ID:**BLOCK**') 部分代码以获得所需的输出?
【问题讨论】:
-
试试
ID:\s*(\S+)
标签: python regex pandas text replace