将 CSV 文件中每一列的 xmin、xmax、ymin 和 ymax 重新排序为新列答案

【问题标题】：Reorder xmin, xmax, ymin, and ymax for each column in CSV file into new columns将 CSV 文件中每一列的 xmin、xmax、ymin 和 ymax 重新排序为新列
【发布时间】：2021-10-30 10:30:06
【问题描述】：

我是 python 新手，正在努力计算。我在 CSV 表中有几千行数据，格式如下：

Link to image table

此数据格式错误，因为我的几个 xmin/ymin 值高于 xmax/ymax 值（示例可以在上面的图片链接中看到）。我需要创建新列并使用numpy 或pandas 对数据进行重新排序，以便它们采用正确的格式，例如使用以下代码：

import numpy as np

xmin_new = np.min(xmin, xmax)
xmax_new = np.max(xmin, xmax)
ymin_new = np.min(ymin, ymax)
ymax_new = np.max(ymin, ymax)

问题是我无法在 CSV 中定义列并遍历行来执行此操作。谁能建议我如何修改这个脚本来完成这个？

import pandas
import numpy as np
import os
import csv

#Set cwd
os.chdir("C:\\Users\\desired_directory")

#Open desired csv file
v = open("train.csv")
r = csv.reader(v)
row0 = r.next()

#print header to look at file
print row0

row0.append('xmin_new')
row0.append('xmax_new')
row0.append('ymin_new')
row0.append('ymax_new')

#Check appends
print row0

xmin_new = np.min(xmin, xmax)
xmax_new = np.max(xmin, xmax)
ymin_new = np.min(ymin, ymax)
ymax_new = np.max(ymin, ymax)

#Errors occur here saying that the "xmin_new" column is undefined.
#Also looking to save the file to the directory, but unsure of how to do this properly.

【问题讨论】：

标签： python pandas numpy csv

【解决方案1】：

如果您正在寻找速度，numpy 是一个不错的选择。我假设您知道如何将整个数据读入 DataFrame（查找 pandas.read_csv()）。

# First, make a reproducible example
# In your case, you would read the df instead

n = 6
np.random.seed(0)
cols = 'xmin xmax ymin ymax'.split()
df = pd.DataFrame(
    np.random.randint(0, 10, (n,4)),
    columns=cols,
).assign(foo=np.random.choice(list('abcd'), n))

>>> df
   xmin  xmax  ymin  ymax foo
0     5     0     3     3   a
1     7     9     3     5   d
2     2     4     7     6   a
3     8     8     1     6   d
4     7     7     8     1   b
5     5     9     8     9   c

然后，实际位：

# reorder min/max for both x and y
#
# Note: cols must be ['xmin', 'xmax', 'ymin', 'ymax']
# or ['ymin', 'ymax', 'xmin', 'xmax']

z = df[cols].values.reshape(-1, 2)
df[cols] = np.c_[z.min(1), z.max(1)].reshape(-1, 4)

现在：

>>> df
   xmin  xmax  ymin  ymax foo
0     0     5     3     3   a
1     7     9     3     5   d
2     2     4     6     7   a
3     8     8     1     6   d
4     7     7     1     8   b
5     5     9     8     9   c

注意：如果您想根据您的问题创建新列，请考虑以下情况：

cols_new = [f'{k}_new' for k in cols]
z = df[cols].values.reshape(-1, 2)
df[cols_new] = np.c_[z.min(1), z.max(1)].reshape(-1, 4)

在 pandas-only 中有一种稍微冗长的方式：

df = df.assign(
    xmin=df[['xmin', 'xmax']].min(1),
    xmax=df[['xmin', 'xmax']].max(1),
    ymin=df[['ymin', 'ymax']].min(1),
    ymax=df[['ymin', 'ymax']].max(1),
)

如前所述，如果您打算改为创建新列，则df.assign(xmin_new=...) 等。

【讨论】：

非常感谢！这完全符合我的需要。