拆分熊猫列并将新结果附加到数据框答案

【问题标题】：Split a pandas column and append the new results to the dataframe拆分熊猫列并将新结果附加到数据框
【发布时间】：2018-01-03 17:23:03
【问题描述】：

如何拆分 pandas 列并将新结果附加到数据框中？我也希望没有空白。

我想要的输出示例：

col1
Smith, John
Smith, John

col2               
Smith
Smith

col3
John
John

我一直在尝试这个，但是 lambda 函数并没有按照我想要的方式附加结果。

df_split = df1['col1'].apply(lambda x: pd.Series(x.split(',')))
df1['col2']= df_split.apply(lambda x: x[0])
df1['col3']= df_split.apply(lambda x: x[1])

我最终得到了

col2  col3
Smith Smith
John  John

【问题讨论】：

标签： python pandas lambda

【解决方案1】：

使用Series.str.split(..., expand=True):

df[['col2', 'col3']] = df.col1.str.split(',\s+', expand=True); df

          col1   col2  col3
0  Smith, John  Smith  John
1  Smith, John  Smith  John

【讨论】：

谢谢！ \s+ 有什么作用？ \s 是空格，但 + 是什么意思？
@OptimusPrime 如果你有多个空格（抢占式）:)
另外，作为一个新用户，知道你可以accept answers 帮助。
expand=True 有什么作用？
@OptimusPrime 会将每个拆分项放在自己的列中。

【解决方案2】：

我们可以使用Series.str.extract()方法：

In [157]: df[['col2','col3']] = df['col1'].str.extract('(\w+),\s*(\w+)', expand=True)

In [158]: df
Out[158]:
                 col1        col2   col3
0         Smith, John       Smith   John
1         Smith, John       Smith   John
2  Mustermann,    Max  Mustermann    Max
3          Last,First        Last  First

(\w+),\s*(\w+) is a RegEx (Regular Expression) explained here

【讨论】：

【解决方案3】：

如果你只想在拆分后存储第一个字符串，那么使用以下

df['col2'] = df['col1'].str.split(',', 1).str[0] 

          col1   col2
0  Smith, John  Smith  
1  Smith, John  Smith

【讨论】：