【问题标题】:Create Variable of Extracted Substring Before and After Character在字符前后创建提取子串的变量
【发布时间】:2021-08-03 13:30:46
【问题描述】:

在数据表中,我有一个变量 'RESULT',其中包含大量如下所示的文本:

Transfer Handoff Entered On:  07/25/2021 2:45 EDT    Performed On:  07/25/2021 2:45 EDT by 
LAST, FIRST RN  Handoff   Clinician Relationship to Patient :   Nurse   
Clinician Receiving Report :   Handoff Transfer Type :   Intra-hospital   Sending Unit :   EMTC   Date, Time Report Given :   07/25/2021 2:45 EDT   
Receiving Unit :   4 Main Access   Intrahospital Transfer Mode :   Wheelchair   Transfer 
Notifications :   Father   Transfer Note :   report given via IPASS method    

我想在“接收单位:”之后和“院内”之前提取文本以创建自己的单独列,从而生成如下所示的数据库:

PT_FIN       RESULT                              RECEIVING_UNIT
124324        *All of the text from above*        4 Main Access

我做了一些研究,发现了许多类似的帖子,但不太确定如何让开发一些代码顺利进行。

Extract elements from data column (String) before and after character

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    可以使用Pandas函数str.extract(),如下:

    df['RECEIVING_UNIT'] = df['RESULT'].str.extract(r'Receiving Unit\s*:(.*?)\s*Intrahospital')
    

    演示

    df = pd.DataFrame({'RESULT': ['Receiving Unit :   4 Main Access   Intrahospital Transfer Mode :   Wheelchair   Transfer ']})
    
    df['RECEIVING_UNIT'] = df['RESULT'].str.extract(r'Receiving Unit\s*:(.*?)\s*Intrahospital')
    
    print(df)
    
                                                                                          RESULT    RECEIVING_UNIT
    0  Receiving Unit :   4 Main Access   Intrahospital Transfer Mode :   Wheelchair   Transfer      4 Main Access
    
    

    【讨论】:

      【解决方案2】:

      您可以使用正则表达式来匹配和提取中间文本。

      import re
      def get_text(txt):
          mytxt = re.search(r'Receiving Unit \:(.*?)Intrahospital Transfer Mode', txt).group(1)
          return mytxt
      
      df['RECEIVING_UNIT'] = [get_text(x) for x in df['RESULT']]
      

      如果你的df很大,那么SeaBean的解决方案可能会更高效。

      【讨论】:

        猜你喜欢
        • 2015-06-06
        • 2016-04-04
        • 1970-01-01
        • 1970-01-01
        • 2014-11-24
        • 1970-01-01
        • 1970-01-01
        • 2020-11-29
        • 1970-01-01
        相关资源
        最近更新 更多