【问题标题】:execute iterative queries over a pandas dataframe对 pandas 数据框执行迭代查询
【发布时间】:2022-09-22 21:11:55
【问题描述】:

我有一个 CSV 文件,如下所示:

Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944

我将其加载为pandas.Dataframe,命名为:imageannotation - 我有兴趣提取一个dictionary,它具有keyimagename(注意:图像名称可以有重复的行)和value,另一个dictionary whit 2 个键:[\'bbox\',, \'species\'],其中bbox 是由TL_x, TL_y, BR_x, BR_y 值给出的列表

我可以使用以下代码完成此操作:


test = {
    i: {
        \"bbox\": imageannotation[imageannotation[\"Imagename\"] == i][
            [\"TL_x\", \"TL_y\", \"BR_x\", \"BR_y\"]
        ].values,
        \"species\": imageannotation[imageannotation[\"Imagename\"] == i][
            [\"Species\"]
        ].values,
    }
    for i in imageannotation[\"Imagename\"].unique()
}

结果如下所示:

mydict = {\'201503.20150619.181140817.204628\': {\'bbox\': array([[272.   , 142.375, 382.5  , 340.   ]]),
  \'species\': array([[\'fish\']], dtype=object)},
 \'201503.20150619.181141498.204632\': {\'bbox\': array([[267.75 ,   6.375, 422.875,  80.75 ]]),
  \'species\': array([[\'fish\']], dtype=object)},
 \'201503.20150619.181141662.204633\': {\'bbox\': array([[ 820.25 ,   78.625,  973.25 ,  382.5  ],
         [1257.   ,   75.   , 1280.   ,  116.   ]]),
  \'species\': array([[\'fish\'],
         [\'fish\']], dtype=object)},
 \'201503.20150619.181141834.204634\': {\'bbox\': array([[194., 281., 233., 336.],
         [766., 271., 789., 293.]]),
  \'species\': array([[\'fish\'],
         [\'fish\']], dtype=object)}}

这是我想要的,但在处理大文件时会变得非常慢。

问:你有更好的方法来完成这个吗?

我的最终目标是在数据框imagemetadata 中添加一个新列,该列大于具有唯一值的 Imagename 字段 - 我最后一次操作是:

for i in mydict:
    imagemetadata.loc[imagemetadata.Imagename == i, \"annotation\"] = [test[I]]

    标签: python arrays pandas multiprocessing


    【解决方案1】:

    (现在我重新阅读了一些修改后的答案。)

    这似乎是您可能追求的;按 Imagename 对注释进行分组,从中制作一个 dict-of-lists,将它们映射到另一个数据框。

    import io
    
    import pandas as pd
    
    imageannotation = pd.read_csv(
        io.StringIO(
            """
    Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
    0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
    1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
    2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
    3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
    4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944
    """
        )
    )
    
    # (Pretend this comes from a separate file)
    imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})
    
    
    def make_annotation(r):
        return {
            "bbox": [r.TL_x, r.TL_y, r.BR_x, r.BR_y],
            "species": r.Species,
        }
    
    
    annotations_by_image = (
        imageannotation.groupby("Imagename")
        .apply(lambda r: r.apply(make_annotation, axis=1).to_list())
        .to_dict()
    )
    imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})
    imagemetadata["annotation"] = imagemetadata.Imagename.map(annotations_by_image)
    
    print(imagemetadata)
    

    输出是

                                  Imagename                                         annotation
    0  201503.20150619.181140817.204628.jpg  [{'bbox': [272.0, 142.375, 382.5, 340.0], 'spe...
    1  201503.20150619.181141498.204632.jpg  [{'bbox': [267.75, 6.375, 422.875, 80.75], 'sp...
    2  201503.20150619.181141662.204633.jpg  [{'bbox': [820.25, 78.625, 973.25, 382.5], 'sp...
    3  201503.20150619.181141834.204634.jpg  [{'bbox': [194.0, 281.0, 233.0, 336.0], 'speci...
    

    如果您希望imagemetadata 有多个行,如果annotation 有多个条目,

    imagemetadata = imagemetadata.explode("annotation").reset_index(drop=True)
    

    【讨论】:

      猜你喜欢
      • 2018-02-02
      • 1970-01-01
      • 1970-01-01
      • 2017-01-04
      • 2019-01-14
      • 1970-01-01
      • 1970-01-01
      • 2018-04-03
      • 2015-02-12
      相关资源
      最近更新 更多