拒绝了我发布的第一个答案,然后更新了更多要求的问题,请注意:本网站不是免费的代码编写服务。而且您的代码链接不起作用(至少目前如此)。
鉴于:
import pandas as pd
import numpy as np
df = pd.DataFrame({'First_Name': {0: 'Greg', 1: 'Greg', 2: 'John', 3: 'John', 4: 'Ryan', 5: 'Ryan'}, \
'Last_Name': {0: 'Li', 1: 'Li', 2: 'Doe', 3: 'Doe', 4: 'Lin', 5: 'Lin'}, \
'ContactID': {0: 123, 1: 1877, 2: 566, 3: 234, 4: 789, 5: 52}, \
'Last_Modified_Date': {0: '2021-04-08', 1: '2019-05-06', 2: '2018-02-03', \
3: '2014-05-07', 4: '2019-06-07', 5: '2018-06-07'}, \
'Email': {0: 'grey.li@gmail.com', 1: 'grey.li@gmail.com', 2: 'Johndeo@yahoo.com', \
3: 'Johndeo@aol.net', 4: 'lin@hotmail.com', 5: np.nan}, \
'Address': {0: '44 Sherman', 1: np.nan, 2: '87 Branch Ave', 3: '87 Branch Ave', \
4: '84 Newport', 5: np.nan}, 'Phone': {0: '999-999-9999', 1: np.nan, \
2: '890-523-4667', 3: np.nan, 4: \
'678-900-000', 5: '678-900-000'}})
print(df)
试试:
df['Last_Modified_Date'] = pd.to_datetime(df['Last_Modified_Date'], format='%Y-%m-%d')
df = df.sort_values(by='Last_Modified_Date')
df['AllContactID'] = df['ContactID'].map(str)
df = df.replace(np.nan, '', regex=False)
df = df.groupby(by=['First_Name', 'Last_Name'], as_index=False)\
.agg({'Last_Modified_Date': 'last', 'ContactID' : 'last', \
'Email' : ', '.join, 'Address' : ', '.join, 'Phone' : ', '.join, 'AllContactID' : ', '.join})
df = df.replace(r'(.*?)(,\s)\1', r', \1', regex=True)
df = df.replace(r'^, (.*)$', r'\1', regex=True)
df = df.replace(r', $', r'', regex=True)
#df['AllContactID'] = df.AllContactID.apply(lambda x: list(x.split(', ')))
#df['AllContactID'] = df.AllContactID.apply(lambda x: list(map(int, x)))
print(df)