【问题标题】:Alter number string in pandas column更改熊猫列中的数字字符串
【发布时间】:2019-11-23 16:32:57
【问题描述】:

背景

我有一个示例 df,其中 Text 列包含 0,1,或 >1 ABC's

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith  ABC: 1111111 is this here', 
                                   'ABC: 1234567 Mary Lisa Rider found here', 
                                   'Jane A Doe is also here',
                                'ABC: 2222222 Tom T Tucker is here ABC: 2222222 too'], 

                      'P_ID': [1,2,3,4],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df

                            Text                      N_ID  P_ID
0   Jon J Mmith ABC: 1111111 is this here               A1  1
1   ABC: 1234567 Mary Lisa Rider found here             A2  2
2   Jane A Doe is also here                             A3  3
3   ABC: 2222222 Tom T Tucker is here ABC: 2222222...   A4  4  

目标

1) 将Text 列中的ABC 数字(例如ABC: 1111111)更改为ABC: **BLOCK**

2) 创建一个包含此输出的新列Text_ABC

期望的输出

                             Text                  N_ID P_ID Text_ABC
0   Jon J Mmith ABC: 1111111 is this here          A1   1   Jon J Mmith ABC: **BLOCK** is this here
1   ABC: 1234567 Mary Lisa Rider found here        A2   2   ABC: **BLOCK** Mary Lisa Hider found here   
2   Jane A Doe is also here                        A3   3   Jane A Doe is also here 
3   ABC: 2222222 Tom T Tucker is here ABC: 2222222 A4   4   ABC: **BLOCK** Tom T Tucker is here ABC: **BLOCK**

问题

如何实现我想要的输出?

【问题讨论】:

    标签: python string pandas text apply


    【解决方案1】:

    如果要替换所有数字,您可以这样做:

    df['Text_ABC'] = df['Text'].replace(r'\d+', '***BLOCK***', regex=True)
    

    但如果你想更具体一点,只替换ABC:之后的数字,那么你可以使用这个:

    df['Text_ABC'] = df['Text'].replace(r'ABC: \d+', 'ABC: ***BLOCK***', regex=True)
    

    给你:

    df
                                                    Text  P_ID N_ID                                           Text_ABC
    0             Jon J Smith  ABC: 1111111 is this here     1   A1           Jon J Smith  ABC: ***BLOCK*** is this here
    1            ABC: 1234567 Mary Lisa Rider found here     2   A2          ABC: ***BLOCK*** Mary Lisa Rider found here
    2                            Jane A Doe is also here     3   A3                            Jane A Doe is also here
    3  ABC: 2222222 Tom T Tucker is here ABC: 2222222...     4   A4  ABC: ***BLOCK*** Tom T Tucker is here ABC: ***BLOCK...
    

    作为一个正则表达式,\d+ 表示“匹配一个或多个连续数字”,因此在replace 中使用它表示“用***BLOCK*** 替换一个或多个连续数字”

    【讨论】:

    • 感谢\d+的解释。问题:如果我们试图替换“后跟一个或多个连续字符(不是数字)”的MRN 字符串,那么与\d+ 类似的正则表达式是什么,例如MRN: Jon J Smith
    • MRN.* 将匹配 MRN 以及我认为的所有以下字符
    • 我试过了,但它并没有退出工作。我只会问另一个 SO 问题。还是谢谢!
    猜你喜欢
    • 1970-01-01
    • 2014-04-09
    • 1970-01-01
    • 2020-12-24
    • 2023-01-09
    • 2019-11-23
    • 2023-01-09
    • 2021-09-05
    • 1970-01-01
    相关资源
    最近更新 更多