【问题标题】:Pandas -Split data and create columns when string occursPandas - 出现字符串时拆分数据并创建列
【发布时间】:2021-07-08 14:24:07
【问题描述】:

我希望读取文本文件(见下文),然后仅为所有英语联赛创建列。因此,我将尝试执行类似“别名”为“England_”的操作,然后创建一个以别名作为标题的新列,然后在行中创建玩家名称。请注意,Alias 的第一次出现在文本文件中以“Aliases”的形式出现。

"-----------------------------------------------------------------------------------------------------------" 
"-                                            NEW TEAM                                                    -" 
"-----------------------------------------------------------------------------------------------------------" 
Europe Players
17/04/2019
07:59 p.m.

Aliases for England_Premier League

-------------------------------------------------------------------------------
Harry Kane
Mohamed Salah
Kevin De Bruyne

The command completed successfully.

Alias name     England_Division 1
Comment        Teams

Members

-------------------------------------------------------------------------------
Will Grigg
Jonson Clarke-Harris
Jerry Yates
Ivan Toney
Troy Parrott
The command completed successfully.

Alias name     Spanish La Liga
Comment        

Members

-------------------------------------------------------------------------------
Lionel Messi
Luis Suarez
Cristiano Ronaldo
Sergio Ramos
The command completed successfully.

Alias name     England_Division 2
Comment        

Members

-------------------------------------------------------------------------------
Eoin Doyle
Matt Watters
James Vughan
The command completed successfully.

这是我目前关于如何读取数据的代码

df = pd.read_csv(r'Desktop\SampleData.txt', sep='\n', header=None)

这给了我一个带有一列的 pandas DF。我对python相当陌生,所以我想知道如何获得以下结果?读取文件时应该使用分隔符吗?

England_Premier League England_Division 1 England_Division 2
Harry Kane Will Griggs Eoin Doyle
Mohamed Salah Jonson Clarke-Harris Matt Watters
Kevin De Bruyne Ivan Toney James Vughan
Troy Parrott

【问题讨论】:

    标签: python pandas string text


    【解决方案1】:

    您可以使用re 模块来完成任务。例如:

    import re
    import pandas as pd
    
    
    txt = """
    "-----------------------------------------------------------------------------------------------------------" 
    "-                                            NEW TEAM                                                    -" 
    "-----------------------------------------------------------------------------------------------------------" 
    Europe Players
    17/04/2019
    07:59 p.m.
    
    Aliases for England_Premier League
    
    -------------------------------------------------------------------------------
    Harry Kane
    Mohamed Salah
    Kevin De Bruyne
    
    The command completed successfully.
    
    Alias name     England_Division 1
    Comment        Teams
    
    Members
    
    -------------------------------------------------------------------------------
    Will Grigg
    Jonson Clarke-Harris
    Jerry Yates
    Ivan Toney
    Troy Parrott
    The command completed successfully.
    
    Alias name     Spanish La Liga
    Comment        
    
    Members
    
    -------------------------------------------------------------------------------
    Lionel Messi
    Luis Suarez
    Cristiano Ronaldo
    Sergio Ramos
    The command completed successfully.
    
    Alias name     England_Division 2
    Comment        
    
    Members
    
    -------------------------------------------------------------------------------
    Eoin Doyle
    Matt Watters
    James Vughan
    The command completed successfully.
    """
    
    r_competitions = re.compile(r"^Alias(?:(?:es for)| name)\s*(.*?)$", flags=re.M)
    r_names = re.compile(r"^-+$\s*(.*?)\s*The command", flags=re.M | re.S)
    
    dfs = []
    for comp, names in zip(r_competitions.findall(txt), r_names.findall(txt)):
        if not "England" in comp:
            continue
        data = []
        for n in names.split("\n"):
            data.append({comp: n})
    
        dfs.append(pd.DataFrame(data))
    
    print(pd.concat(dfs, axis=1).fillna(""))
    

    打印:

      England_Premier League    England_Division 1 England_Division 2
    0             Harry Kane            Will Grigg         Eoin Doyle
    1          Mohamed Salah  Jonson Clarke-Harris       Matt Watters
    2        Kevin De Bruyne           Jerry Yates       James Vughan
    3                                   Ivan Toney                   
    4                                 Troy Parrott                   
    

    【讨论】:

    • 很好的答案
    • @andrej 看起来很完美,如果您不介意,我只是有几个问题。您是否将原始文本文件称为“txt”?如果不是太麻烦的话,您是否可以输入一些简单的 cmets,以便我可以确切地看到每一行代码在做什么?
    • @PythonBeginner txt 只是变量名。您可以加载文件,例如 txt = open("your_file.txt", "r").read()
    • @AndrejKesely 谢谢。我现在必须对 For 循环进行一些研究,看看它是如何提取数据的。
    • @AndrejKesely 你介意带我了解一下使用的 r.compile 方法吗
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-09-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-09-29
    • 1970-01-01
    • 2021-11-18
    相关资源
    最近更新 更多