CSV 中的行列操作答案

【问题标题】：Row column operations in CSVCSV 中的行列操作
【发布时间】：2017-10-20 13:57:02
【问题描述】：

我有一个 CSV 文件包含以下数据的场景：

Host, Time Up, Time OK
server1.test.com:1717,100.00% ,100.00% 
server2.test.com:1717,100.00% ,100.00%

我正在尝试比较所有行中的列值：

如果col1 <= col2 那么它应该在一个新的col3 中打印col1 的值
如果col1 > col2 则在col3 中打印col2 值。

例子：

Time Up(col1), Time OK(col2), Total(col3)
100%              100%         100%
100%              95%          95%
95%               100%         95%

我通过互联网搜索并找不到任何案例。有什么方法可以实现吗？

编辑2：代码-

import pandas as pd
df = pd.read_csv('3.csv',skipfooter=1)
df2 = pd.read_csv('4.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('combined.csv',index=False)

df =pd.read_csv('combined.csv', skipfooter=1)
col1 = df[' Time Up']
col2 = df[' Time OK']
df['Total'] = col1.where(col1 <= col2, col2)
df.to_csv('combined.csv',index=False)

【问题讨论】：

Using conditional to generate new column in pandas dataframe的可能重复
Pandas conditional creation of a series/dataframe column的可能重复
我在这里看不到重复

标签： python python-2.7 python-3.x pandas csv

【解决方案1】：

当然，只需使用read_csv() 读取数据：

import pandas as pd
df = pd.read_csv('t.csv') # this is your original example input file

现在你有：

                    Host   Time Up   Time OK
0  server1.test.com:1717  100.00%   100.00% 
1  server2.test.com:1717  100.00%   100.00%

第一个问题是您的 CSV 在标题中有虚假的空格。让我们清理一下：

df.columns = [col.strip() for col in df.columns] # " Time Up" -> "Time Up"

接下来，请注意您的数据是像“100.00%”这样的字符串。清理那个：

df['Time Up'] = df['Time Up'].str.strip('% ').astype(float)
df['Time OK'] = df['Time OK'].str.strip('% ').astype(float)

现在我们有了干净的数据：

                    Host  Time Up  Time OK
0  server1.test.com:1717    100.0    100.0
1  server2.test.com:1717    100.0    100.0

最后，我们可以添加新列了：

col1 = df['Time Up']
col2 = df['Time OK']
df['Total'] = col1.where(col1 <= col2, col2)

给我们：

                    Host  Time Up  Time OK  Total
0  server1.test.com:1717    100.0    100.0  100.0
1  server2.test.com:1717    100.0    100.0  100.0

获取 Total 列的另一种方法是：

df['Total'] = df[['Time Up', 'Time OK']].min(axis=1)

即每行取最小值。

如果你想加回百分号：

df['Total'] = df['Total'].astype(str) + '%'

【讨论】：

需要导入pandas模块吗？
是的，为了使用 pandas 中的 read_csv() 函数以及 df（数据帧的简写）
感谢您的回复。
没有成功执行添加了我收到的代码和错误。我错过了什么吗？
你安装了 pandas 吗？此外，它看起来不像 Time OK 被识别为列标题......它是否包含在您的 csv 文件中？