【发布时间】:2021-02-19 12:56:16
【问题描述】:
我有以下数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal',
'Roger Federer'],
'birthdat/company': ['1995-01-26Sharp, Reed and Crane',
'1955-08-14Price and Sons',
'2000-06-28Pruitt, Bush and Mcguir']})
df[['data_time','full_company_name']] = df['birthdat/company'].str.split('[0-9]{4}-[0-9]{2}-[0-9]{2}', expand=True)
df
使用我的代码,我得到以下信息:
____|____Name______|__birthdat/company_______________|_birthdate_|____company___________
0 |Steve Smith |1995-01-26Sharp, Reed and Crane | |Sharp, Reed and Crane
1 |Joe Nadal |1955-08-14Price and Sons | |Price and Sons
2 |Roger Federer |2000-06-28Pruitt, Bush and Mcguir| |Pruitt, Bush and Mcguir
我想要的是 - 得到这个正则表达式('[0-9]{4}-[0-9]{2}-[0-9]{2}'),其余的应该去列“ full_company_name”和:
____|____Name______|_birthdate_|____company_name_______
0 |Steve Smith |1995-01-26 |Sharp, Reed and Crane
1 |Joe Nadal |1955-08-14 |Price and Sons
2 |Roger Federer |2000-06-28 |Pruitt, Bush and Mcguir
更新的问题: 我如何处理生日或公司名称的缺失值, 例如:birthdate/company = "NaApple" orbirthdate/company = "2003-01-15Na" 缺失值不仅限于 Na
【问题讨论】:
标签: python python-3.x regex pandas dataframe