在 python 中使用 .csv 按特定列数据排序答案

【问题标题】：Sorting by Specific Column data using .csv in python在 python 中使用 .csv 按特定列数据排序
【发布时间】：2013-03-11 16:30:49
【问题描述】：

我正在尝试订购一个包含 300 多个条目的 .csv 文件，并将其全部按方言下一个特定列中的数值排序。这是我到目前为止编写的代码，但它似乎只是在输入数据时输出数据

import csv
import itertools
from itertools import groupby as gb

reader = csv.DictReader(open('Full_List.csv', 'r'))

groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]



writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

整个文件中只有 50 行包含方言“红色标签”下的值，其他所有行都留空。它在 .csv 的 Z 列中（但不是最后一个），所以我假设该列的索引是 25（0 是第一个）。任何帮助将不胜感激。

【问题讨论】：

groupby 不是用于排序，而是用于对可迭代对象进行分块。来自itertools.groupby 的文档：“通常，迭代需要已经在相同的键函数上排序。”

标签： python sorting csv

【解决方案1】：

用pandas怎么样？

import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)

您可能需要将选项调整为 read_csv 和 to_csv 以匹配 CSV 文件的格式。

【讨论】：

我尝试过使用你告诉我的 pandas 方法，但是每当我运行脚本时，即使我已经使用 sudo 从我的 python 目录安装了它，我也会收到错误“No module pandas exists” apt-get install python-pandas
你使用的是哪个版本的python和什么操作系统？
我在 Ubuntu 12.10 上使用 python 3.2
编辑：我已经弄清楚了尝试运行熊猫的问题所在。当我安装它时，它安装到我的 python2.7 文件夹中，但是当我运行我的脚本时，它从 python3.2 文件夹运行，该文件夹与 /usr/local/lib 的 2.7 版本位于同一目录中，我不知道如何更改我的脚本以从该目录运行
终于绕过pandas错误但输出还是和上面Steven给我的方法一样

【解决方案2】：

groupby 不是用于排序，而是用于对可迭代对象进行分块。对于排序使用sorted。

import csv

reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))

writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

注意：我更改了您的 lambda，以将您的字符数据转换为浮点数以进行正确的数字排序。

【讨论】：

我已经尝试过了，得到了以下错误： ValueError: could not convert string to float: I changed cast from float to str.它编译但它完全消除了它正在排序的列中的所有值
从ValueError 看来d['red label'] 并不总是返回数字数据。你有空的字段吗？至于“它完全消除了列中的所有值”，我认为情况并非如此。此代码不会覆盖任何值。查看您的实际数据会很有帮助。
是的。该列中除 50 之外的所有条目都是空白字段。
如果这些空白字段可以按照0.0 的值进行排序，请将float(d['red label']) 更改为float(d['red label']) if d['red label']) else 0.0。
@AzKai：发布文件的前十行。有点不对劲。

【解决方案3】：

我通过测试发现以下内容适用于我拥有的 csv 文件。请注意，该列的所有行都有有效的条目。

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()

【讨论】：