【发布时间】:2021-09-07 13:32:47
【问题描述】:
在这个例子中,我每行总共有 7 列。我按 AccountID 和姓氏分组。按 AccountID 和 Last Name 分组标识同一个人; Contract、Address、City 和 State 的不同行值表示 AccountID/Last Name 的新位置。
我希望将 AccountID/Last Name 与一组或多组合同、地址、城市和州放在一行中。
当前数据如下所示:
| Contract | AccountID | Last Name | First Name | Address | City | State |
|---|---|---|---|---|---|---|
| 622 | 1234 | Pitt | Brad | 466 7th Ave | Park Slope | NY |
| 28974 | 1234 | Pitt | Brad | 1901 Vine Street | Philadelphia | PA |
| 54122 | 4321 | Ford | Henry | 93 Booth Dr | Nutley | NJ |
| 622 | 2345 | Rhodes | Dusty | 1 Public Library Plaze | Stamford | CT |
| 28974 | 2345 | Rhodes | Dusty | 1001 Kings Highway | Cherry Hill | NJ |
| 54122 | 2345 | Rhodes | Dusty | 444 Amsterdamn Ave | Upper West Side | NY |
希望这样显示数据:
| AccountID | Last Name | First Name | Contract.1 | Address_1 | City_1 | State_1 | Contract_2 | Address_2 | City_2 | State_2 | Contract_3 | Address_3 | City_3 | State_3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1234 | Pitt | Brad | 622 | 466 7th Ave | Park Slope | NY | 28974.0 | 1901 Vine Street | Philadelphia | PA | ||||
| 4321 | Ford | Henry | 54122 | 93 Booth Dr | Nutley | NJ | ||||||||
| 2345 | Rhodes | Dusty | 622 | 1 Public Library Plaze | Stamford | CT | 28974.0 | 1001 Kings Highway | Cherry Hill | NJ | 54122.0 | 444 Amsterdamn Ave | Upper West Side | NY |
这是我到目前为止所做的。第 5 步及以后我一直在返工一周。有什么建议吗?
# Step 1
import pandas as pd
import numpy as np
# read from "my clipboard"
df = pd.read_clipboard()
df
#Step 2
df['Contract_State'] = (df['Contract'].astype(str) + '|' + df['Address'] + '|' + df['City'] + '|' + df['State']).str.split()
df['Contract'] = df['Contract'].astype(str)
df['AccountID'] = df['AccountID'].astype(str)
# Step 3 - groupby
df2 = pd.DataFrame(df.groupby(['AccountID', 'Last Name']).Contract_State.apply(list)).reset_index()
df2
# Step 4 - flatten the lists
df2['Contract_State'] = df2['Contract_State'].apply(lambda x: np.array(x).flatten())
df2
# Step 5 - The number of elements in lists each list is always even => /2
num_columns = df2['Contract_State'].apply(len).max()
num_columns
# Step 6
df3 = pd.DataFrame(list(df2['Contract_State']), columns=columns)
df3
# Step 7 - concatenate df2 with contracts, then drop the column "Contract_State"
df4 = pd.concat([df2, df3], join='inner', axis='columns').drop('Contract_State', axis='columns')
df4
【问题讨论】:
-
如您所见,有很多方法可以重塑您的表格,但最明显的技巧是使用
groupby和cumcount。
标签: python pandas pandas-groupby