【发布时间】:2018-09-03 19:03:35
【问题描述】:
假设我有一个简单的 Pandas DataFrame,其中一列包含国家名称,另一列包含一些值。例如:
# Import Python Libraries
import numpy as np
import pandas as pd
# Create Sample DataFrame
df = pd.DataFrame(data={'Country': ['United States','United States','United States','United States', \
'United States','United States','United States','United States', \
'United States','United States','United States','United States', \
'Canada','Canada','Canada','Canada','Canada','Canada','Mexico', \
'Mexico','Mexico','Mexico','England','England','England','England', \
'England','England','England','England','England','England','England', \
'England','England','England','France','France','France','Spain','Germany', \
'Germany','Germany','Germany','Germany','Germany','Germany','Germany', \
'Germany','Germany'], 'Value': np.random.randint(1000, size=50)})
生成:
print(df.head())
Index Country Value
0 United States 943
1 United States 567
2 United States 534
3 United States 700
4 United States 470
我的问题是,在 Python 中,将这个 DataFrame 转换为每个国家/地区都有自己的列并且该国家/地区的所有值都列在该列中的最简单方法是什么?换句话说,我如何轻松创建一个 DataFrame,其中列数是“Country”列中国家/地区的唯一计数,并且每列的长度将根据相应国家/地区出现在原始 DataFrame 中的次数而有所不同?
这是提供解决方案的示例代码:
# Store Unique Country Names in Variable
columns = df['Country'].unique()
# Create Individual Country DataFrames
df_0 = df[df['Country'] == columns[0]]['Value'].values.tolist()
df_1 = df[df['Country'] == columns[1]]['Value'].values.tolist()
df_2 = df[df['Country'] == columns[2]]['Value'].values.tolist()
df_3 = df[df['Country'] == columns[3]]['Value'].values.tolist()
df_4 = df[df['Country'] == columns[4]]['Value'].values.tolist()
df_5 = df[df['Country'] == columns[5]]['Value'].values.tolist()
df_6 = df[df['Country'] == columns[6]]['Value'].values.tolist()
# Create Desired Output DataFrame
data_dict = {columns[0]: df_0, columns[1]: df_1, columns[2]: df_2, columns[3]: df_3, columns[4]: df_4, columns[5]: df_5, columns[6]: df_6}
new_df = pd.DataFrame({k:pd.Series(v[:len(df)]) for k,v in data_dict.items()})
生成:
print(new_df)
United States Canada Mexico England France Spain Germany
0 838.0 135.0 496.0 568.0 71.0 588.0 811.0
1 57.0 118.0 268.0 716.0 422.0 NaN 107.0
2 953.0 396.0 850.0 860.0 707.0 NaN 318.0
3 251.0 294.0 815.0 888.0 NaN NaN 633.0
4 127.0 466.0 NaN 869.0 NaN NaN 910.0
5 892.0 824.0 NaN 776.0 NaN NaN 472.0
6 11.0 NaN NaN 508.0 NaN NaN 466.0
7 563.0 NaN NaN 299.0 NaN NaN 200.0
8 864.0 NaN NaN 568.0 NaN NaN 637.0
9 810.0 NaN NaN 78.0 NaN NaN 392.0
10 268.0 NaN NaN 106.0 NaN NaN NaN
11 389.0 NaN NaN 153.0 NaN NaN NaN
12 NaN NaN NaN 217.0 NaN NaN NaN
13 NaN NaN NaN 941.0 NaN NaN NaN
虽然上述代码有效,但对于较大的数据集显然不是一个可行的解决方案。从原始 DataFrame 生成此结果的最有效方法是什么?
谢谢!
【问题讨论】:
标签: python python-3.x pandas dataframe matplotlib