【问题标题】:Split one dataframe to multiple with maximum n rows for each in Python [duplicate]在Python中将一个数据帧拆分为多个,每个数据帧最多n行[重复]
【发布时间】:2021-09-10 03:59:45
【问题描述】:

我有一个数据框df:

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895

我想将其拆分为多个数据帧,每个数据帧最多 10 行,即:

df1:

0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225

df2:

10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848

df3:

20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895

我如何在 Python 中实现这一点?谢谢。

【问题讨论】:

    标签: python python-3.x pandas dataframe numpy


    【解决方案1】:

    一种使用pandas.Dataframe.groupby的方式:

    n = 10
    [d for _, d in df.groupby(df.index//n)]
    

    输出:

    [          a         b         c
     0  0.897134 -0.356157 -0.396212
     1 -2.357861  2.066570 -0.512687
     2 -0.080665  0.719328  0.604294
     3 -0.639392 -0.912989 -1.029892
     4 -0.550007 -0.633733 -0.748733
     5 -0.712962 -1.612912 -0.248270
     6 -0.571474  1.310807 -0.271137
     7 -0.228068  0.675771  0.433016
     8  0.005606 -0.154633  0.985484
     9  0.691329 -0.837302 -0.607225,
                a         b         c
     10 -0.011909 -0.304162  0.422001
     11  0.127570  0.956831  1.837523
     12 -1.074771  0.379723 -1.889117
     13 -1.449475 -0.799574 -0.878192
     14 -1.029757  0.551023  2.519929
     15 -1.001400  0.838614 -1.006977
     16  0.677216 -0.403859  0.451338
     17  0.221596 -0.323259  0.324158
     18 -0.241935 -2.251687 -0.088494
     19 -0.995426  0.665569 -2.228848,
                a         b         c
     20  1.714709 -0.353391  0.671539
     21  0.155050  1.136433 -0.005721
     22 -0.502412 -0.610901  1.520165
     23 -0.853906  0.648321  1.124464
     24  1.149151 -0.187300 -0.412946
     25  0.329229 -1.690569 -2.746895]
    

    【讨论】:

      【解决方案2】:

      尝试使用iloc 的列表推导并解包赋值:

      >>> df1, df2, df3 = [df.iloc[i:i + 10] for i in range(0, len(df), 10)]
      >>> df1
                a         b         c
      0  0.897134 -0.356157 -0.396212
      1 -2.357861  2.066570 -0.512687
      2 -0.080665  0.719328  0.604294
      3 -0.639392 -0.912989 -1.029892
      4 -0.550007 -0.633733 -0.748733
      5 -0.712962 -1.612912 -0.248270
      6 -0.571474  1.310807 -0.271137
      7 -0.228068  0.675771  0.433016
      8  0.005606 -0.154633  0.985484
      9  0.691329 -0.837302 -0.607225
      >>> df2
                 a         b         c
      10 -0.011909 -0.304162  0.422001
      11  0.127570  0.956831  1.837523
      12 -1.074771  0.379723 -1.889117
      13 -1.449475 -0.799574 -0.878192
      14 -1.029757  0.551023  2.519929
      15 -1.001400  0.838614 -1.006977
      16  0.677216 -0.403859  0.451338
      17  0.221596 -0.323259  0.324158
      18 -0.241935 -2.251687 -0.088494
      19 -0.995426  0.665569 -2.228848
      >>> df3
                 a         b         c
      20  1.714709 -0.353391  0.671539
      21  0.155050  1.136433 -0.005721
      22 -0.502412 -0.610901  1.520165
      23 -0.853906  0.648321  1.124464
      24  1.149151 -0.187300 -0.412946
      25  0.329229 -1.690569 -2.746895
      >>> 
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-05-31
        • 2013-11-16
        相关资源
        最近更新 更多