如何在python中转换数据如下？答案

【问题标题】：How to convert the data as following in python?如何在python中转换数据如下？
【发布时间】：2017-01-21 08:16:21
【问题描述】：

我在 csv 文件中有一些以下格式的数据。

   Id   Category
    1   A
    2   B
    3   C
    4   B
    5   C
    6   d

我想将其转换为以下格式并保存为另一个 csv 文件

Id  A   B   C   D   E
1   1   0   0   0   0
2   0   1   0   0   0
3   0   0   1   0   0
4   0   1   0   0   0
5   0   0   1   0   0
6   0   0   0   1   0

【问题讨论】：

Dummy variables when not all categories are present的可能重复
它被称为 One Hot Encoding，你可以使用 sklearn OneHotEncoder() 函数来做到这一点
@ayhan 这是一个类似的问题，但我如何传递我的数据所在的 .csv 文件而不是直接传递数据？谢谢

标签： python python-3.x pandas text-processing spyder

【解决方案1】：

试试pd.get_dummies()

>> df = pd.read_csv(<path_to_file>, sep=',', encoding='utf-8', header=0)

>> df
   Id   Category
0   1          A
1   2          B
2   3          C
3   4          B
4   5          C
5   6          d

>> pd.get_dummies(df.Category)

这将编码Category 并为您提供新列：

A B C d

但不会“修复”d -> D 并且不会为您提供任何无法从Category 中的值推导出来的列。

我建议您查看之前评论中发布的解决方案。

编辑

# Load data from .CSV with pd.read_csv() as demonstrated above

In [13]: df
Out[13]: 
  Category  Id
0        A   1
1        B   2
2        C   3
3        B   4
4        C   5
5        D   6

## One-liner for hot-encoding, then concatenating to original dataframe 
## and finally dropping the old column 'Category', you can skip the 
## last part if you want to keep original column as well.
In [14]: df = pd.concat([df, pd.get_dummies(df.Category)], axis=1).drop('Category', axis=1)

In [15]: df
Out[15]: 
   Id    A    B    C    D
0   1  1.0  0.0  0.0  0.0
1   2  0.0  1.0  0.0  0.0
2   3  0.0  0.0  1.0  0.0
3   4  0.0  1.0  0.0  0.0
4   5  0.0  0.0  1.0  0.0
5   6  0.0  0.0  0.0  1.0

## Write to file
In [16]: df.to_csv(<output_path>, sep='\t', encoding='utf-8', index=None)

如您所见，这不是转置，只是将Category 列的热编码结果添加到每一行。

无论 Excel 是否接受最终数据，遗憾的是，Pandas 对此无能为力。

我希望这会有所帮助。

【讨论】：

请检查更新的答案。对于完整的解决方案，我建议您也查看原始帖子下第一条评论中提供的链接。
我得到了我需要的转置并使用了 df.transpose()。谢谢你:)
@MohitVellanki 如果您觉得这个答案有用，请接受它，所以很明显这个问题已经得到解答。

【解决方案2】：

使用数据透视表（更新为包含 .csv 读/写功能）：

import pandas as pd
path = 'the path to your file'
df = pd.read_csv(path)

# your original dataframe
# Category  Id
# 0        A   1
# 1        B   2
# 2        C   3
# 3        B   4
# 4        C   5
# 5        D   6

# pivot table
df.pivot_table(index=['Id'], columns='Category', fill_value=0, aggfunc='size')

# save to file
df.to_csv('path\filename.csv') #e.g. 'C:\\Users\\you\\Documents\\filename.csv'

输出：

Category  A  B  C  D
Id                  
1         1  0  0  0
2         0  1  0  0
3         0  0  1  0
4         0  1  0  0
5         0  0  1  0
6         0  0  0  1

【讨论】：

查看解决方案中的更新：df.to_csv('path/filename.csv') #e.g. 'C:\\Users\\you\\Documents\\filename.csv'
文件没有被转换。它只是在现有列之前添加另一个 Id 列。
“转换”是什么意思？你想用“D”替换“d”吗？
没有。它没有从多标签转换为二进制文件。
当你运行我上面的代码时，df 在你身边是什么样子的？