遍历行，并控制计数 i++答案

【问题标题】：iterate over the rows , and control count i++遍历行，并控制计数 i++
【发布时间】：2016-01-22 20:25:47
【问题描述】：

我正在寻找迭代行的方法，但仅对每 20 或 30 行值应用一些方法就像这样：

更新代码

for index, row in df.iterrows(), index=+20:
     location= geolocator.reverse("%s, %s" % (row['lat'],row['long']),timeout=None)
      row['location']=location.address
      time.sleep(3)
return df

实际上我尽量减少请求的数量，否则我会遇到超时问题。这就是为什么我尝试遍历行，并且仅对每 20 或 60 行应用请求功能（因为我有 7000 行），而不是通过应用 time.sleep 方法来加快进程

【问题讨论】：

标签： python loops pandas dataframe

【解决方案1】：

试试这个：

for index, row in enumerate(df):
    if index % 20 == 0:
        # do something

【讨论】：

【解决方案2】：

只需使用enumerate 和模运算符：

for index, row in enumerate(df.iterrows()):
    if not index%20:
        row['C']=some_function()
return df

我将return 移出循环，这样循环不会在一次迭代后结束。

【讨论】：

看起来不错，但它会抛出TypeError: tuple indices must be integers, not str
这意味着row 是tuple，而不是您期望的类似字典的结构（用字符串键索引）。我不知道pandas；对不起。

【解决方案3】：

为什么不直接使用 iloc 和一个 step 参数对 df 进行切片：

例子：

In [120]:
df = pd.DataFrame({'c':np.random.randn(30)})
df

Out[120]:
           c
0  -0.737805
1   1.158012
2  -0.348384
3   0.044989
4   0.962584
5   2.041479
6   1.376785
7   0.208565
8  -1.535244
9   0.389831
10  0.049862
11 -0.142717
12 -0.794087
13  1.316492
14  0.182952
15  0.850953
16  0.015589
17  0.062692
18 -1.551303
19  0.937899
20  0.583003
21 -0.612411
22  0.762307
23 -0.682298
24 -0.897314
25 -0.101144
26 -0.617573
27 -2.168498
28  0.631021
29 -1.592888

In [121]:
df['c'].iloc[::5] = 0
df

Out[121]:
           c
0   0.000000
1   1.158012
2  -0.348384
3   0.044989
4   0.962584
5   0.000000
6   1.376785
7   0.208565
8  -1.535244
9   0.389831
10  0.000000
11 -0.142717
12 -0.794087
13  1.316492
14  0.182952
15  0.000000
16  0.015589
17  0.062692
18 -1.551303
19  0.937899
20  0.000000
21 -0.612411
22  0.762307
23 -0.682298
24 -0.897314
25  0.000000
26 -0.617573
27 -2.168498
28  0.631021
29 -1.592888

这将比遍历每一行快得多

所以在你的情况下：

df['C'].iloc[::20] = some_function()

应该工作

【讨论】：

看起来不错！但我不想让流程更快，我更新了问题，也许你知道如何面对这个问题
好吧，我不知道您为什么会遇到超时问题，我会将其视为一个问题，因为我的答案是矢量化的，并且比逐行迭代要快得多
好吧，我想问题是服务器受一段时间内请求数量的限制，这就是为什么我把 time.sleep 放在每次迭代结束时
按照你的例子的逻辑，我将这个方法应用为location= geolocator.reverse("%s, %s" % (df['lat'].iloc[::20], df['long'].iloc[::20]),timeout=None) df['location'].iloc[::20] =location.address，它抛出了``` ValueError: Must be a coordinate pair or Point```
这是您的功能不了解系列的问题，您必须这样做df.iloc[::20].apply(lambda x: geolocator.reverse(str(x['lat']), str(x['long']), timeout=None), axis=1)