尝试使用 for 循环时，“float”类型的参数不可迭代答案

【问题标题】：argument of type "float" is not iterable when trying to use for loop尝试使用 for 循环时，“float”类型的参数不可迭代
【发布时间】：2022-01-25 12:39:42
【问题描述】：

我有一个如下的 countrydf，其中国家列中的每个单元格都包含电影发行国家的列表。

countrydf

id  Country            release_year
s1  [US]                 2020
s2  [South Africa]       2021
s3  NaN                  2021
s4  NaN                  2021
s5  [India]              2021

我想制作一个如下所示的新 df：

country_yeardf

Year    US   UK    Japan  India 
1925    NaN  NaN   NaN    NaN
1926    NaN  NaN   NaN    NaN
1927    NaN  NaN   NaN    NaN
1928    NaN  NaN   NaN    NaN

它有发行年份和在每个国家/地区发行的电影数量。我的解决方案是：像第二个一样用一个空白的df，运行一个for循环来统计上映的电影数量，然后相对修改单元格中的值。

countrylist=['Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', ….]
for x in countrylist:
    for j in  list(range(0,8807)):
        if x in countrydf.country[j]:
            t=int (countrydf.release_year[j] )
            country_yeardf.at[t, x] = country_yeardf.at[t, x]+1

发生错误，内容如下：

TypeError                                 Traceback (most recent call last)
<ipython-input-25-225281f8759a> in <module>()
      1 for x in countrylist:
      2  for j in li:
----> 3     if x in countrydf.country[j]:
      4         t=int(countrydf.release_year[j])
      5         country_yeardf.at[t, x] = country_yeardf.at[t, x]+1

TypeError: argument of type 'float' is not iterable

我这里不知道哪个是float类型，我检查了countrydf.country[j]的类型，它返回了int。我正在使用熊猫，我才刚刚开始使用它。谁能解释错误并为我要创建的 df 提出解决方案？ P/s：我的英文不太好，请大家理解。

【问题讨论】：

你能分享你期待的输出吗
旁白：为什么要将range(0, 8807) 转换为列表？
@codeholic24 它在问题中，在“我想制作一个看起来像这样的新 df：”下方：
如果countrydf.country[j] 的类型是int，您希望如何使用if x in 对其进行迭代？
国名怎么可能是整数？

标签： python pandas dataframe

【解决方案1】：

这是使用groupby的解决方案

df = pd.DataFrame([['US', 2015], ['India', 2015], ['US', 2015], ['Russia', 2016]], columns=['country', 'year'])

country year
0   US  2015
1   India   2015
2   US  2015
3   Russia  2016

现在只需按国家和年份分组并取消堆叠输出：

df.groupby(['year', 'country']).size().unstack()
country India   Russia  US
year            
2015    1.0 NaN 2.0
2016    NaN 1.0 NaN

【讨论】：

【解决方案2】：

在没有循环的 pandas 中实现此目的的一些替代方法。

如果 Country Column 在每行的列表中有超过 1 个值，您可以尝试以下操作：

>>df['Country'].str.join("|").str.get_dummies().groupby(df['release_year']).sum()

              India  South Africa  US
release_year                         
2020              0             0   1
2021              1             1   0

如果 Country 的每行只有 1 个值，如示例中所示，则可以使用 crosstab

>>pd.crosstab(df['release_year'],df['Country'].str[0])

Country       India  South Africa  US
release_year                         
2020              0             0   1
2021              1             1   0

【讨论】：