【问题标题】:How to stack number of rows to one row and assign id如何将行数堆叠到一行并分配id
【发布时间】:2019-11-02 06:04:10
【问题描述】:

我有一个这样的数据框:

band    mean    raster
1   894.343482  D:/Python/Copied/selection/20170219_095504.tif
2   1159.282304 D:/Python/Copied/selection/20170219_095504.tif
3   1342.291595 D:/Python/Copied/selection/20170219_095504.tif
4   3056.809463 D:/Python/Copied/selection/20170219_095504.tif
1   516.9624071 D:/Python/Copied/selection/20170325_095551.tif
2   720.1932533 D:/Python/Copied/selection/20170325_095551.tif
3   689.6287879 D:/Python/Copied/selection/20170325_095551.tif
4   4561.576329 D:/Python/Copied/selection/20170325_095551.tif
1   566.2016867 D:/Python/Copied/selection/20170527_095700.tif
2   812.9927101 D:/Python/Copied/selection/20170527_095700.tif
3   760.4621212 D:/Python/Copied/selection/20170527_095700.tif
4   5009.537164 D:/Python/Copied/selection/20170527_095700.tif

我想把它格式化成这样:

band1_mean  band2_mean  band3_mean  band4_mean  raster_name         id
894.343482  1159.282304 1342.291595 3056.809463 20170219_095504.tif 1
516.9624071 720.1932533 689.6287879 4561.576329 20170325_095551.tif 2
566.2016867 812.9927101 760.4621212 5009.537164 20170527_095700.tif 3

所有 4 个波段都属于一个栅格,因此值必须全部位于一行中。我不知道如何在没有每个栅格的密钥 ID 的情况下堆叠它们。 谢谢!

【问题讨论】:

    标签: python pandas dataframe stack pandas-groupby


    【解决方案1】:

    这是pivot的情况:

    # extract the raster name:
    df['raster_name'] = df.raster.str.extract('(\d+_\d+\.tif)')
    
    # pivot
    new_df = df.pivot(index='raster_name', columns='band', values='mean')
    
    # rename the columns:
    new_df.columns = [f'band{i}_mean' for i in new_df.columns]
    

    输出:

                         band1_mean   band2_mean   band3_mean   band4_mean
    raster_name                                                           
    20170219_095504.tif  894.343482  1159.282304  1342.291595  3056.809463
    20170325_095551.tif  516.962407   720.193253   689.628788  4561.576329
    20170527_095700.tif  566.201687   812.992710   760.462121  5009.537164
    

    如果您希望 raster_name 成为普通列,您可以在 new_df 上使用 reset_index

    【讨论】:

    • 谢谢。非常酷。我得到第一行的 Nan 值。我认为 str.extract('(\d+_\d+\.tif)') 格式的东西不正确。
    • 那部分摘录digits_digits.tif。因此,如果某些文件名不遵循该模式,它将返回NaN。您可以用其他方式替换该部分,例如由/分割。
    • 用 df['raster_name'] = df.raster_name.str.split('/').str[4] 解决了。我原来的路径有点长。感谢您的大力帮助! :)
    【解决方案2】:

    df.pivot("raster", "band", "mean") 你会得到

    band                          1            2            3            4
    raster                                                                
    20170219_095504.tif  894.343482  1159.282304  1342.291595  3056.809463
    20170325_095551.tif  516.962407   720.193253   689.628788  4561.576329
    20170527_095700.tif  566.201687   812.992710   760.462121  5009.537164
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-10-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-14
      相关资源
      最近更新 更多