【发布时间】:2017-09-29 17:54:18
【问题描述】:
我有一个MasterListdataframe,我通过循环将其他数据集合并到其中。每次我合并一个新列时,都会像_x 或_y 一样创建一个新列。我怎样才能将这些保留为一列?
import pandas as pd
MasterList = pd.DataFrame(data = [['0001'],['0002'], ['0003'], ['0004']], columns = ['Order Number'])
customer_file1 = pd.DataFrame(data = [['0003', 'M'], ['0004', 'W']], columns = ['Order Number', 'Day'])
customer_file2 = pd.DataFrame(data = [['0001', 'T'], ['0002', 'S']], columns = ['Order Number', 'Day'])
for x in [customer_file1, customer_file2]:
MasterList = pd.merge(MasterList, x, how='left',left_on= 'Order Number',right_on= 'Order Number')
print MasterList
输出:
Order Number Day_x Day_y
0 0001 NaN T
1 0002 NaN S
2 0003 M NaN
3 0004 W NaN
期望的输出:
Order Number Day
0 0001 T
1 0002 S
2 0003 M
3 0004 W
编辑:人们想要更多数据,因为我过度简化了我的示例: 我知道年和日在数据集购买中并没有真正的意义,这是可以的。每个客户文件确实来自不同数据库的查询,所以我想从数据库中进行查询,然后合并数据并忘记它,而不是查询所有客户数据库,连接,然后合并。
import pandas as pd
MasterList = pd.DataFrame(data = [['0001', '2015'],['0002', '2015'], ['0003', '2016'], ['0004', '2015'], ['0005', '2017'], ['0006', '2018']], columns = ['Order Number', 'Year'])
customer_file1 = pd.DataFrame(data = [['0003', 'M'], ['0004', 'W']], columns = ['Order Number', 'Day'])
customer_file2 = pd.DataFrame(data = [['0001', 'T'], ['0002', 'S']], columns = ['Order Number', 'Day'])
customer_file3 = pd.DataFrame(data = [['0005', 'T'], ['0006', 'S']], columns = ['Order Number', 'Day'])
for x in [customer_file1, customer_file2, customer_file3]:
MasterList = pd.merge(MasterList, x, how='left', left_on='Order Number', right_on='Order Number')
print MasterList
输出:
Order Number Year Day_x Day_y Day
0 0001 2015 NaN T NaN
1 0002 2015 NaN S NaN
2 0003 2016 M NaN NaN
3 0004 2015 W NaN NaN
4 0005 2017 NaN NaN T
5 0006 2018 NaN NaN S
期望的输出:
Order Number Year Day
0 0001 2015 T
1 0002 2015 S
2 0003 2016 M
3 0004 2015 W
4 0005 2017 T
5 0006 2018 S
【问题讨论】:
标签: python python-2.7 pandas