【问题标题】:Python Code for Merging Two Spreadsheets that is similar to the "Index-Match-Match" function in Excel用于合并两个电子表格的 Python 代码,类似于 Excel 中的“索引匹配匹配”函数
【发布时间】:2016-03-12 17:12:00
【问题描述】:

我有两个单独的电子表格并使用 pandas 将它们转换为数据框。我希望从电子表格 2 中提取信息并将该信息合并到电子表格 1 中。我知道如何通过使用“索引匹配匹配”函数在 excel 中提取信息,但我无法在 Python 中解决这个问题因为它似乎不适合合并或连接方法......或者我只是错过了连接。

电子表格 1 如下所示: Spreadsheet 1

电子表格 2 如下所示: Spreadsheet 2

最终结果的头部是这样的:

Region Branch Master_Product_Name Retrieved_#

  • 北,弗雷斯诺,产品 1:5,060
  • 北,弗雷斯诺,产品 2:2,429
  • 北,萨克拉门托,产品 3:405
  • 北,弗雷斯诺,产品 6:10

程序会插入一个新列,并根据列和行的索引插入数据以到达单个数据点。因此,对于我的示例中的第一行,即 Fresno 和 Product 1... 它将在电子表格中查找顶行的第二行,直到它到达 Product 1,然后向下直到找到 Fresno 并插入 5,060。

在当前场景下,拥有 Region 列并不重要,但我可以设想具有基于多个条件进行检索的能力。

【问题讨论】:

  • 欢迎来到 SO Matt_L。可悲的是,您的要求不是很清楚。您是否正在寻求有关 Python 代码的帮助?如果是这样,您能否将其包含在您的问题中,以便有人可以提供有针对性的建议。问候,
  • 谢谢!我正在寻找python代码。
  • SO 不是代码编写服务。当您的代码出现问题时,我们会为您提供帮助,而不是为您编写代码。
  • 谢谢亚历克斯。我很抱歉,如果有人愿意为我指出正确的方向来自己研究它,我很乐意这样做。我一直无法找出我遇到的合并/加入方案的任何文档。这是合并两个信息被索引的电子表格,而不是在一列中。
  • 如果您将数据放入 Pandas 数据帧中,您可能需要从这里开始:pandas.pydata.org/pandas-docs/stable/merging.html

标签: python excel pandas indexing dataframe


【解决方案1】:

IIUC 你可以从Spreadsheet 2 创建stack 数据框df2。然后你需要从行转换的rename 列 - Master Product Name

如果您需要RegionMaster Product NameBranch 列中的merge,则必须将Branch 列都小写以进行匹配。

df1 = pd.read_csv('s1.csv', sep=";", index_col=None)
print df1

   Region         Branch Master Product Name
0   North         FRESNO           Product 1
1   North         FRESNO           Product 2
2   North     SACRAMENTO           Product 2
3   North         FRESNO          Product 12
4   North  INLAND EMPIRE          Product 13
5   North         FRESNO           Product 4
6   North         FRESNO           Product 5
7   North      SAN DIEGO           Product 3
8   North         FRESNO           Product 3
9   North         FRESNO           Product 3
10  North    LOS ANGELES           Product 3
11  North         FRESNO           Product 6
12  North         FRESNO           Product 7
13  North         FRESNO           Product 8
14  North  SAN FRANCISCO           Product 8
15  North         FRESNO          Product 11
16  North         FRESNO           Product 9
17  South    LOS ANGELES          Product 12
18  South    LOS ANGELES           Product 3
19  North  SAN FRANCISCO           Product 1
20  North  SAN FRANCISCO           Product 2
21  North  SAN FRANCISCO          Product 12
22  North  SAN FRANCISCO          Product 13
23  North  SAN FRANCISCO           Product 4
24  North  SAN FRANCISCO           Product 5
25  South       SAN JOSE           Product 3
26  South       SAN JOSE           Product 3
df2 = pd.read_csv('s2.csv', sep=";", index_col=None)
print df2
          Branch Region Product 1 Product 2  Product 3  Product 4  Product 5  \
0         Fresno  North     5 060     2 429        675        540         10   
1     Sacramento  North     5 534     2 344        405        324        118   
2  San Francisco  North     2 924     1 167        111         89         32   
3       San Jose  North     4 039     2 019        179        143        102   
4    Los Angeles  South     4 453     2 273        302        298         -2   
5  Inland Empire  South     4 680     2 091        404        400         21   
6  Orange County  South     3 167     1 493         95         94         41   
7      San Diego  South     5 468     2 439        282        279         16   

   Product 6  
0         10  
1        117  
2         32  
3        102  
4         76  
5         41  
6         67  
7        188  
df2 = df2.set_index(['Region', 'Branch'])
df2 = df2.stack()
df2 = df2.reset_index(name = 'Count')

df2 = df2.rename(columns={'level_2':'Master Product Name'})
print df2

   Region         Branch Master Product Name  Count
0   North         Fresno           Product 1  5 060
1   North         Fresno           Product 2  2 429
2   North         Fresno           Product 3    675
3   North         Fresno           Product 4    540
4   North         Fresno           Product 5     10
5   North         Fresno           Product 6     10
6   North     Sacramento           Product 1  5 534
7   North     Sacramento           Product 2  2 344
8   North     Sacramento           Product 3    405
9   North     Sacramento           Product 4    324
10  North     Sacramento           Product 5    118
11  North     Sacramento           Product 6    117
12  North  San Francisco           Product 1  2 924
13  North  San Francisco           Product 2  1 167
14  North  San Francisco           Product 3    111
15  North  San Francisco           Product 4     89
16  North  San Francisco           Product 5     32
17  North  San Francisco           Product 6     32
18  North       San Jose           Product 1  4 039
19  North       San Jose           Product 2  2 019
20  North       San Jose           Product 3    179
21  North       San Jose           Product 4    143
22  North       San Jose           Product 5    102
23  North       San Jose           Product 6    102
24  South    Los Angeles           Product 1  4 453
25  South    Los Angeles           Product 2  2 273
26  South    Los Angeles           Product 3    302
27  South    Los Angeles           Product 4    298
28  South    Los Angeles           Product 5     -2
29  South    Los Angeles           Product 6     76
30  South  Inland Empire           Product 1  4 680
31  South  Inland Empire           Product 2  2 091
32  South  Inland Empire           Product 3    404
33  South  Inland Empire           Product 4    400
34  South  Inland Empire           Product 5     21
35  South  Inland Empire           Product 6     41
36  South  Orange County           Product 1  3 167
37  South  Orange County           Product 2  1 493
38  South  Orange County           Product 3     95
39  South  Orange County           Product 4     94
40  South  Orange County           Product 5     41
41  South  Orange County           Product 6     67
42  South      San Diego           Product 1  5 468
43  South      San Diego           Product 2  2 439
44  South      San Diego           Product 3    282
45  South      San Diego           Product 4    279
46  South      San Diego           Product 5     16
47  South      San Diego           Product 6    188
#for metching by columns Branch - string to lowercase
df1['Branch'] = df1['Branch'].str.lower()
df2['Branch'] = df2['Branch'].str.lower()
df = pd.merge(df1, df2, on=['Region','Master Product Name', 'Branch'])

print df

   Region         Branch Master Product Name  Count
0   North         fresno           Product 1  5 060
1   North         fresno           Product 2  2 429
2   North     sacramento           Product 2  2 344
3   North         fresno           Product 4    540
4   North         fresno           Product 5     10
5   North         fresno           Product 3    675
6   North         fresno           Product 3    675
7   North         fresno           Product 6     10
8   South    los angeles           Product 3    302
9   North  san francisco           Product 1  2 924
10  North  san francisco           Product 2  1 167
11  North  san francisco           Product 4     89
12  North  san francisco           Product 5     32

【讨论】:

  • 我不熟悉堆栈方法,有助于解决我的困境。谢谢!如果有办法在不操纵第二个数据框的情况下拉出数据点,我仍然很好奇。
  • 也许有可能,我不知道。但是 pandas 针对矢量工作进行了优化,并使用了高度优化的函数,例如 stackmerge,这是最佳实践。
  • @jezrael, @Matt_L 合并看起来没有必要。我认为所有相关信息都包含在 df2 中,所以实际上您正在寻找 stack,而不是 merge
猜你喜欢
  • 2018-11-19
  • 1970-01-01
  • 2013-12-20
  • 2015-10-30
  • 1970-01-01
  • 1970-01-01
  • 2021-11-09
  • 2021-09-25
  • 2017-07-31
相关资源
最近更新 更多