【发布时间】:2022-09-22 21:11:55
【问题描述】:
我有一个 CSV 文件,如下所示:
Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944
我将其加载为pandas.Dataframe,命名为:imageannotation - 我有兴趣提取一个dictionary,它具有keyimagename(注意:图像名称可以有重复的行)和value,另一个dictionary whit 2 个键:[\'bbox\',, \'species\'],其中bbox 是由TL_x, TL_y, BR_x, BR_y 值给出的列表
我可以使用以下代码完成此操作:
test = {
i: {
\"bbox\": imageannotation[imageannotation[\"Imagename\"] == i][
[\"TL_x\", \"TL_y\", \"BR_x\", \"BR_y\"]
].values,
\"species\": imageannotation[imageannotation[\"Imagename\"] == i][
[\"Species\"]
].values,
}
for i in imageannotation[\"Imagename\"].unique()
}
结果如下所示:
mydict = {\'201503.20150619.181140817.204628\': {\'bbox\': array([[272. , 142.375, 382.5 , 340. ]]),
\'species\': array([[\'fish\']], dtype=object)},
\'201503.20150619.181141498.204632\': {\'bbox\': array([[267.75 , 6.375, 422.875, 80.75 ]]),
\'species\': array([[\'fish\']], dtype=object)},
\'201503.20150619.181141662.204633\': {\'bbox\': array([[ 820.25 , 78.625, 973.25 , 382.5 ],
[1257. , 75. , 1280. , 116. ]]),
\'species\': array([[\'fish\'],
[\'fish\']], dtype=object)},
\'201503.20150619.181141834.204634\': {\'bbox\': array([[194., 281., 233., 336.],
[766., 271., 789., 293.]]),
\'species\': array([[\'fish\'],
[\'fish\']], dtype=object)}}
这是我想要的,但在处理大文件时会变得非常慢。
问:你有更好的方法来完成这个吗?
我的最终目标是在数据框imagemetadata 中添加一个新列,该列大于具有唯一值的 Imagename 字段 - 我最后一次操作是:
for i in mydict:
imagemetadata.loc[imagemetadata.Imagename == i, \"annotation\"] = [test[I]]
标签: python arrays pandas multiprocessing