【发布时间】:2018-07-20 08:22:43
【问题描述】:
我是 python、pandas 和 Stack Overflow 的新手,所以我将不胜感激。我有两个熊猫数据框,第一个按升序排列(从 0 到 100 的值以 0.1 为步长),第二个有 26000 个从 2.3 到 38.5 的值,没有顺序,一些值也在该数据框中重复。我要做的是,对于第一个数据帧中的每个值,以有效的方式找出第二个数据帧中有多少值小于或等于该值。
我下面的代码在 45 秒内完成,但我希望在 10 秒左右完成。
提前致谢:
代码:
def get_CDF2(df1, df2):
x=df1 #The first dataframe is already sorted in ascending order
y = np.sort(df2, axis=0) #Sort the columns of the second dataframe in ascending order
df_res = [] # keep the results here
yi = iter(y) # Use of an iterator to move over y
yindex = 0
flag = 0 #Flag, when set to 1 no comparison is done
y_val = next(yi)
for value in x:
if flag >=1:
df_res.append(largest_ind)#append the number of y_val smaller than value
#yindex+1
else:
# Search through y to find the index of an item bigger than value
while (y_val) <= (value) and yindex < len(y)-1:
y_val= next(yi) #Point at the next value in df2
yindex += 1 #Keep track of how many y_val are smaller than value
'''if for any value in df1 we iterate through the entire df2 and they are all less, that means
the rest of values in df1 will have the same effect since df1 is in ascending other, so no need to iterate again,
just set flag to 1'''
if ((yindex==len(y)-1)) and ((y_val <= float(value))):
flag=1
largest_ind=yindex+1
df_res.append(largest_ind)#append the number of y_val smaller than value
else:
df_res.append(yindex) #append the number of y_val smaller than value
return df_res
df1:
0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7,
1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6,
2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4,
4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3,
5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2,
6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1,
7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. ,
8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9,
9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8,
9.9, 10. , 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7,
10.8, 10.9, 11. , 11.1, 11.2, 11.3, 11.4, 11.5, 11.6,
11.7, 11.8, 11.9, 12. , 12.1, 12.2, 12.3, 12.4, 12.5,
12.6, 12.7, 12.8, 12.9, 13. , 13.1, 13.2, 13.3, 13.4,
13.5, 13.6, 13.7, 13.8, 13.9, 14. , 14.1, 14.2, 14.3,
14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15. , 15.1, 15.2,
15.3, 15.4, 15.5, 15.6, 15.7, 15.8, 15.9, 16. , 16.1,
16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 17. ,
17.1, 17.2, 17.3, 17.4, 17.5, 17.6, 17.7, 17.8, 17.9,
18. , 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7, 18.8,
18.9, 19. , 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7,
19.8, 19.9, 20. , 20.1, 20.2, 20.3, 20.4, 20.5, 20.6,
20.7, 20.8, 20.9, 21. , 21.1, 21.2, 21.3, 21.4, 21.5,
21.6, 21.7, 21.8, 21.9, 22. , 22.1, 22.2, 22.3, 22.4,
22.5, 22.6, 22.7, 22.8, 22.9, 23. , 23.1, 23.2, 23.3,
23.4, 23.5, 23.6, 23.7, 23.8, 23.9, 24. , 24.1, 24.2,
24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25. , 25.1,
25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26. ,
26.1, 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9,
27. , 27.1, 27.2, 27.3, 27.4, 27.5, 27.6, 27.7, 27.8,
27.9, 28. , 28.1, 28.2, 28.3, 28.4, 28.5, 28.6, 28.7,
28.8, 28.9, 29. , 29.1, 29.2, 29.3, 29.4, 29.5, 29.6
df2:
0 12.993
1 12.054
2 21.957
3 10.917
4 33.890
5 10.597
6 22.911
7 7.431
8 10.437
9 19.165
10 12.169
11 14.847
12 10.093
13 10.795
14 14.419
15 27.199
16 15.045
17 12.764
18 7.766
19 18.066
20 10.254
21 16.922
22 7.011
23 10.322
24 11.619
25 25.719
26 18.142
27 14.557
28 26.367
29 13.443
30 17.318
31 10.971
32 6.073
33 20.050
34 11.863
35 25.619
36 18.326
37 30.830
38 13.130
39 11.734
40 14.457
41 22.659
42 16.479
43 17.845
44 23.712
45 16.670
46 10.322
47 16.250
48 20.920
49 17.479
50 15.526
51 15.732
52 19.836
53 10.513
54 24.818
55 10.933
56 14.785
57 25.253
58 15.732
59 14.290
60 23.979
61 24.788
62 12.420
63 21.324
64 9.658
65 24.307
66 17.601
67 12.352
68 18.089
69 23.353
70 12.718
71 18.707
72 9.147
73 17.494
74 8.743
75 22.407
76 16.227
77 15.396
78 16.807
79 26.733
80 14.084
81 19.516
82 15.106
83 21.187
84 13.008
85 13.618
86 16.266
87 19.706
88 6.591
89 14.999
90 16.449
91 18.883
92 15.243
93 15.976
94 18.242
95 16.662
96 6.691
97 16.952
98 25.940
99 23.018
100 29.365
101 14.564
102 15.625
103 9.727
104 7.652
105 12.726
106 7.263
107 19.943
108 17.540
109 7.469
110 10.360
111 17.898
112 20.393
113 7.011
114 15.999
115 12.985
116 16.624
117 18.753
118 12.520
119 13.488
120 17.959
121 16.433
122 14.518
123 12.909
124 19.752
125 9.277
126 25.566
127 19.272
128 10.360
129 22.148
130 20.294
131 18.402
132 17.631
133 17.341
134 13.672
135 19.600
136 20.653
137 15.999
138 15.480
139 30.655
140 15.426
141 16.067
142 29.838
143 13.099
144 12.184
145 15.693
146 26.031
147 16.052
148 8.087
149 16.754
150 17.029
151 16.601
152 9.956
153 20.363
154 11.215
155 15.106
156 13.809
157 23.178
158 21.484
159 13.359
160 31.860
161 14.564
162 19.737
163 19.424
164 29.556
165 15.678
166 22.148
167 28.389
168 21.309
169 22.262
170 11.314
171 8.018
172 24.551
173 14.740
174 15.716
175 24.269
176 20.042
177 15.968
178 11.337
179 27.618
180 22.522
181 19.066
182 9.323
183 20.622
184 13.092
185 15.464
186 21.171
187 11.604
188 19.050
189 15.823
190 33.859
191 15.106
192 13.549
193 17.296
194 13.740
195 12.054
196 10.955
197 21.164
198 14.427
199 9.719
200 12.176
201 9.742
202 21.278
203 20.515
204 18.265
205 9.666
206 13.870
207 15.968
208 13.313
209 16.517
210 18.417
211 15.419
212 20.523
213 15.655
214 26.977
215 13.084
216 31.349
217 29.854
218 13.008
219 11.306
220 22.384
221 20.798
222 17.433
223 12.916
224 11.284
225 20.248
226 9.803
227 10.376
228 9.315
229 14.976
230 16.327
231 9.590
232 16.830
233 23.979
234 11.558
235 13.183
236 18.776
237 20.416
238 9.163
239 10.345
240 28.252
241 22.888
242 20.538
243 6.912
244 24.040
245 8.682
246 31.929
247 14.908
248 19.195
249 17.112
250 18.379
251 15.869
252 13.794
253 14.129
254 12.458
255 10.795
256 25.291
257 26.382
258 20.881
【问题讨论】:
-
你能分享一下你的数据框是什么样子的吗?
-
@jp_data_analysis 查看更新后的帖子
-
@JohnE 好的,看看我刚刚所做的编辑
-
@LeStivi,这些都无法重现。你能从头开始创建一个小的数据框来执行你想要的操作吗?然后询问优化方法。
-
@LeStivi 理想情况下,样本数据比这要小,但这要好得多
标签: python-3.x pandas jupyter-notebook