【问题标题】:How to remove new-line characters from in between a line without removing the new-line from end of the line python?如何从行之间删除换行符而不从行尾删除换行符python?
【发布时间】:2015-09-09 13:43:53
【问题描述】:

我的输入是一个大的 csv 文件,其中的行如下:

"7807371008","Sat Jan 16 00:07:46 +0000 2010","@bigg_robb welcome to the party life of politics","T 33.417474,-86.705343","al","23845121","1381","502","Wed Mar 11 22:38:27 +0000 2009","2468"

我想要的输出是一个新文件,其中第一列和第三列仅删除了所有特殊字符:

7807371008,  bigg robb welcome to the party life of politics

但是在文本之间有一些换行符的行,即使它在技术上不是该行的末尾。在这种情况下,我收到错误:

IndexError: list index out of range

这样的行的一个例子是:

"7807376607","Sat Jan 16 00:07:57 +0000 2010","RT @CBS8News:The commander of Gov. Riley's task
force on illegal gambling resigns after winning $2,300 at a MS casino.
gt;#conflictofinterest","Montgomery, Alabama","al","33358058","84","164","Mon Apr 20 00:48:37 +0000 2009","4509"

我的代码是:

import csv
import sys
import re

with open('al.csv') as f:
    for line in f:

        j = next(csv.reader([line]))
        id1 = j[0]
        id2 = re.sub('[^A-Za-z0-9\.]+',' ',id1)
        tt1 = j[2]
        tt2 = re.sub('[^A-Za-z0-9\.]+',' ',tt1)
        print id2.strip()+", "+tt2.lower()

我该如何解决这个问题?请帮忙。

【问题讨论】:

    标签: python regex string parsing csv


    【解决方案1】:

    您应该将逗号 , 指定为您的 csv 文件分隔符(或基于您的文件的正确分隔符)而且 csv 阅读器对象没有您循环的行,您需要通过循环访问行来访问 @ 987654322@对象(spamreader):

    >>> import csv
    >>> with open('al.csv', 'rb') as csvfile:
    ...     spamreader = csv.reader(csvfile, delimiter=',')
    ...     for row in spamreader:
                print re.sub('[^A-Za-z0-9\.]+',' ',row[2]) + row[0]
    

    【讨论】:

    • 你刚才说delimiter 应该是, ?...还是我理解错了?
    • @KhalilAmmour-خليلعمور 不仅如此!实际上我正在等待 OP 的响应,因为它更多地取决于文件结构!
    • @user3292085 欢迎您!如果它有帮助,您可以通过投票和接受答案告诉社区! ;)
    猜你喜欢
    • 2012-01-27
    • 2011-04-01
    • 2010-09-21
    • 1970-01-01
    • 2017-11-27
    • 1970-01-01
    • 2016-08-20
    • 2014-12-16
    相关资源
    最近更新 更多