【问题标题】:Merge data frame issue合并数据框问题
【发布时间】:2018-05-30 05:16:31
【问题描述】:

我有 2 个数据框,

第一个是

+------------------------------------------+
|       ID             CustomerType Choice |
+------------------------------------------+
| 0    1.0               Durability    OEM |
| 1    2.0                    Price    OEM |
| 2    3.0               Durability    OEM |
| 3    4.0               Durability    OEM |
| 4    5.0               Durability    OEM |
| 5    6.0  ManufacturerCredibility    OEM |
| 6    7.0                 Warranty    OEM |
| 7    8.0  ManufacturerCredibility    OEM |
| 8    9.0               Durability    OEM |
| 9   10.0                    Price    OEM |
| 10  11.0               Durability    TPN |
| 11  12.0                 Warranty    OEM |
| 12  13.0               Durability    TPN |
+------------------------------------------+

第二个是;

 --------------------------------------------------------+
|        Price  Durability  Warranty  Manufacture   Type |
+--------------------------------------------------------+
| OEM     1.00         4.0       4.0          4.0    OEM |
| TPN     0.80         4.0       1.0          1.0    TPN |
| Reman   0.55         4.0       0.5          1.0  Reman |
| Reuse   0.45         2.5       0.0          1.0  Reuse |
+--------------------------------------------------------+

我需要使用第一个数据帧中的“选择”和第二个数据帧中的“类型”来连接这两个数据帧。

目前我正在使用

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')

并有这样尴尬的结果。

+------------------------------------------------------------------------------+
|     Price  Durability  Warranty  Manufacture             CustomerType Choice |
+------------------------------------------------------------------------------+
| 0     1.0         4.0       4.0          4.0               Durability    OEM |
| 1     1.0         4.0       4.0          4.0                    Price    OEM |
| 2     1.0         4.0       4.0          4.0               Durability    OEM |
| 3     1.0         4.0       4.0          4.0               Durability    OEM |
| 4     1.0         4.0       4.0          4.0               Durability    OEM |
| 5     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 6     1.0         4.0       4.0          4.0                 Warranty    OEM |
| 7     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 8     1.0         4.0       4.0          4.0               Durability    OEM |
| 9     1.0         4.0       4.0          4.0                    Price    OEM |
| 10    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 11    1.0         4.0       4.0          4.0                    Price    OEM |
| 12    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 13    1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
+------------------------------------------------------------------------------+

从结果表中,我们可以看到所有行都包含来自第二个数据帧的 OEM 数据。我在这里做错了什么?

【问题讨论】:

  • 我是编程新手,在我的第一个数据帧索引 10 中; custometype = Durability 和 Choice=TPN,但在我的结果框架中,索引 10;客户类型和选择不同。它应该与第一个数据框相似,并且价格、耐用性、保修和制造值也应该更改为 TPN 类型值作为第二个表。对不起,我的英语也很抱歉。
  • 现在明白了,对我来说它工作得很好。你的熊猫版本是什么?
  • 似乎是我的错,当我使用 head() 时似乎按价格排序(最小值:最大值)我只能看到 OEM 类型,因为它具有最高的价格价值。使用打印命令后,我可以在底部看到其他数据。合并时有什么方法可以停止排序。谢谢
  • 我认为left join 没有对输出进行排序。它只追加新列。
  • 为你排序?

标签: python pandas numpy dataframe


【解决方案1】:

我从您的示例数据中得到了一些不同的输出,最后一行和3.rd 从末尾开始正确合并,left join 也没有通过合并列进行排序:

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  11.0               Durability    TPN  ...        1.0          1.0   TPN
11  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

如果排序似乎使用inner join(默认):

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type')
#same as
#data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='inner')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
11  11.0               Durability    TPN  ...        1.0          1.0   TPN
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

【讨论】:

  • @Yikes - 好的,所以问题是需要左连接还是内连接?在我看来,与好样本的区别在docs 中有解释
  • 谢谢,很快就会阅读。在我的情况下,两者都加入了我需要做的同样的事情。唯一的问题是排序问题。感谢收获
猜你喜欢
  • 2019-05-26
  • 1970-01-01
  • 2021-03-25
  • 2012-12-21
  • 2019-07-16
  • 1970-01-01
  • 1970-01-01
  • 2020-04-22
相关资源
最近更新 更多