【问题标题】:How do I compare values in two dataframe in an efficient way如何以有效的方式比较两个数据框中的值
【发布时间】:2018-07-20 08:22:43
【问题描述】:

df1 df2

我是 python、pandas 和 Stack Overflow 的新手,所以我将不胜感激。我有两个熊猫数据框,第一个按升序排列(从 0 到 100 的值以 0.1 为步长),第二个有 26000 个从 2.3 到 38.5 的值,没有顺序,一些值也在该数据框中重复。我要做的是,对于第一个数据帧中的每个值,以有效的方式找出第二个数据帧中有多少值小于或等于该值。

我下面的代码在 45 秒内完成,但我希望在 10 秒左右完成。

提前致谢:

代码:

def get_CDF2(df1, df2): 
    x=df1 #The first dataframe is already sorted in ascending order
    y = np.sort(df2, axis=0) #Sort the columns of the second dataframe in ascending order
    df_res = []  # keep the results here
    yi = iter(y)  # Use of an iterator to move over y
    yindex = 0
    flag = 0 #Flag, when set to 1 no comparison is done
    y_val = next(yi)
    for value in x:

        if flag >=1:
            df_res.append(largest_ind)#append the number of y_val smaller than value
            #yindex+1
        else:
            # Search through y to find the index of an item bigger than value
            while (y_val) <= (value) and yindex < len(y)-1:
                y_val= next(yi) #Point at the next value in df2
                yindex += 1 #Keep track of how many y_val are smaller than value
            '''if for any value in df1 we iterate through the entire df2 and they are all less, that means
            the rest of values in df1 will have the same effect since df1 is in ascending other, so no need to iterate again,
            just set flag to 1'''
            if ((yindex==len(y)-1)) and ((y_val <= float(value))): 
                flag=1
                largest_ind=yindex+1
                df_res.append(largest_ind)#append the number of y_val smaller than value
            else:
                df_res.append(yindex) #append the number of y_val smaller than value

    return df_res

df1:

     0. ,   0.1,   0.2,   0.3,   0.4,   0.5,   0.6,   0.7,   0.8,
     0.9,   1. ,   1.1,   1.2,   1.3,   1.4,   1.5,   1.6,   1.7,
     1.8,   1.9,   2. ,   2.1,   2.2,   2.3,   2.4,   2.5,   2.6,
     2.7,   2.8,   2.9,   3. ,   3.1,   3.2,   3.3,   3.4,   3.5,
     3.6,   3.7,   3.8,   3.9,   4. ,   4.1,   4.2,   4.3,   4.4,
     4.5,   4.6,   4.7,   4.8,   4.9,   5. ,   5.1,   5.2,   5.3,
     5.4,   5.5,   5.6,   5.7,   5.8,   5.9,   6. ,   6.1,   6.2,
     6.3,   6.4,   6.5,   6.6,   6.7,   6.8,   6.9,   7. ,   7.1,
     7.2,   7.3,   7.4,   7.5,   7.6,   7.7,   7.8,   7.9,   8. ,
     8.1,   8.2,   8.3,   8.4,   8.5,   8.6,   8.7,   8.8,   8.9,
     9. ,   9.1,   9.2,   9.3,   9.4,   9.5,   9.6,   9.7,   9.8,
     9.9,  10. ,  10.1,  10.2,  10.3,  10.4,  10.5,  10.6,  10.7,
    10.8,  10.9,  11. ,  11.1,  11.2,  11.3,  11.4,  11.5,  11.6,
    11.7,  11.8,  11.9,  12. ,  12.1,  12.2,  12.3,  12.4,  12.5,
    12.6,  12.7,  12.8,  12.9,  13. ,  13.1,  13.2,  13.3,  13.4,
    13.5,  13.6,  13.7,  13.8,  13.9,  14. ,  14.1,  14.2,  14.3,
    14.4,  14.5,  14.6,  14.7,  14.8,  14.9,  15. ,  15.1,  15.2,
    15.3,  15.4,  15.5,  15.6,  15.7,  15.8,  15.9,  16. ,  16.1,
    16.2,  16.3,  16.4,  16.5,  16.6,  16.7,  16.8,  16.9,  17. ,
    17.1,  17.2,  17.3,  17.4,  17.5,  17.6,  17.7,  17.8,  17.9,
    18. ,  18.1,  18.2,  18.3,  18.4,  18.5,  18.6,  18.7,  18.8,
    18.9,  19. ,  19.1,  19.2,  19.3,  19.4,  19.5,  19.6,  19.7,
    19.8,  19.9,  20. ,  20.1,  20.2,  20.3,  20.4,  20.5,  20.6,
    20.7,  20.8,  20.9,  21. ,  21.1,  21.2,  21.3,  21.4,  21.5,
    21.6,  21.7,  21.8,  21.9,  22. ,  22.1,  22.2,  22.3,  22.4,
    22.5,  22.6,  22.7,  22.8,  22.9,  23. ,  23.1,  23.2,  23.3,
    23.4,  23.5,  23.6,  23.7,  23.8,  23.9,  24. ,  24.1,  24.2,
    24.3,  24.4,  24.5,  24.6,  24.7,  24.8,  24.9,  25. ,  25.1,
    25.2,  25.3,  25.4,  25.5,  25.6,  25.7,  25.8,  25.9,  26. ,
    26.1,  26.2,  26.3,  26.4,  26.5,  26.6,  26.7,  26.8,  26.9,
    27. ,  27.1,  27.2,  27.3,  27.4,  27.5,  27.6,  27.7,  27.8,
    27.9,  28. ,  28.1,  28.2,  28.3,  28.4,  28.5,  28.6,  28.7,
    28.8,  28.9,  29. ,  29.1,  29.2,  29.3,  29.4,  29.5,  29.6

df2:

0         12.993
1         12.054
2         21.957
3         10.917
4         33.890
5         10.597
6         22.911
7          7.431
8         10.437
9         19.165
10        12.169
11        14.847
12        10.093
13        10.795
14        14.419
15        27.199
16        15.045
17        12.764
18         7.766
19        18.066
20        10.254
21        16.922
22         7.011
23        10.322
24        11.619
25        25.719
26        18.142
27        14.557
28        26.367
29        13.443
30        17.318
31        10.971
32         6.073
33        20.050
34        11.863
35        25.619
36        18.326
37        30.830
38        13.130
39        11.734
40        14.457
41        22.659
42        16.479
43        17.845
44        23.712
45        16.670
46        10.322
47        16.250
48        20.920
49        17.479
50        15.526
51        15.732
52        19.836
53        10.513
54        24.818
55        10.933
56        14.785
57        25.253
58        15.732
59        14.290
60        23.979
61        24.788
62        12.420
63        21.324
64         9.658
65        24.307
66        17.601
67        12.352
68        18.089
69        23.353
70        12.718
71        18.707
72         9.147
73        17.494
74         8.743
75        22.407
76        16.227
77        15.396
78        16.807
79        26.733
80        14.084
81        19.516
82        15.106
83        21.187
84        13.008
85        13.618
86        16.266
87        19.706
88         6.591
89        14.999
90        16.449
91        18.883
92        15.243
93        15.976
94        18.242
95        16.662
96         6.691
97        16.952
98        25.940
99        23.018
100       29.365
101       14.564
102       15.625
103        9.727
104        7.652
105       12.726
106        7.263
107       19.943
108       17.540
109        7.469
110       10.360
111       17.898
112       20.393
113        7.011
114       15.999
115       12.985
116       16.624
117       18.753
118       12.520
119       13.488
120       17.959
121       16.433
122       14.518
123       12.909
124       19.752
125        9.277
126       25.566
127       19.272
128       10.360
129       22.148
130       20.294
131       18.402
132       17.631
133       17.341
134       13.672
135       19.600
136       20.653
137       15.999
138       15.480
139       30.655
140       15.426
141       16.067
142       29.838
143       13.099
144       12.184
145       15.693
146       26.031
147       16.052
148        8.087
149       16.754
150       17.029
151       16.601
152        9.956
153       20.363
154       11.215
155       15.106
156       13.809
157       23.178
158       21.484
159       13.359
160       31.860
161       14.564
162       19.737
163       19.424
164       29.556
165       15.678
166       22.148
167       28.389
168       21.309
169       22.262
170       11.314
171        8.018
172       24.551
173       14.740
174       15.716
175       24.269
176       20.042
177       15.968
178       11.337
179       27.618
180       22.522
181       19.066
182        9.323
183       20.622
184       13.092
185       15.464
186       21.171
187       11.604
188       19.050
189       15.823
190       33.859
191       15.106
192       13.549
193       17.296
194       13.740
195       12.054
196       10.955
197       21.164
198       14.427
199        9.719
200       12.176
201        9.742
202       21.278
203       20.515
204       18.265
205        9.666
206       13.870
207       15.968
208       13.313
209       16.517
210       18.417
211       15.419
212       20.523
213       15.655
214       26.977
215       13.084
216       31.349
217       29.854
218       13.008
219       11.306
220       22.384
221       20.798
222       17.433
223       12.916
224       11.284
225       20.248
226        9.803
227       10.376
228        9.315
229       14.976
230       16.327
231        9.590
232       16.830
233       23.979
234       11.558
235       13.183
236       18.776
237       20.416
238        9.163
239       10.345
240       28.252
241       22.888
242       20.538
243        6.912
244       24.040
245        8.682
246       31.929
247       14.908
248       19.195
249       17.112
250       18.379
251       15.869
252       13.794
253       14.129
254       12.458
255       10.795
256       25.291
257       26.382
258       20.881

【问题讨论】:

  • 你能分享一下你的数据框是什么样子的吗?
  • @jp_data_analysis 查看更新后的帖子
  • @JohnE 好的,看看我刚刚所做的编辑
  • @LeStivi,这些都无法重现。你能从头开始创建一个小的数据框来执行你想要的操作吗?然后询问优化方法。
  • @LeStivi 理想情况下,样本数据比这要小,但这要好得多

标签: python-3.x pandas jupyter-notebook


【解决方案1】:

试试这个。它将向df1 添加一个名为check 的列。该列将包含 df2 中

df1['check'] = df1[0].apply(lambda x: df2[df2[0] <= x].size)

您可能需要将 [0] 替换为数据框中第一列的名称。

【讨论】:

  • 不幸的是,我的两个数据框都没有列
  • 这意味着这些不是数据帧。请在您创建df1df2 的位置发布代码。
  • 我找到了一种添加列名的方法,我尝试了你的答案,但它没有给出正确的答案,而且它所花费的时间与我的解决方案几乎相同
  • 请发布您创建数据帧的代码部分。
猜你喜欢
  • 2016-02-22
  • 1970-01-01
  • 1970-01-01
  • 2016-12-03
  • 2013-11-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多