【问题标题】:join rows in CSV with different sized sections python用不同大小的部分python连接CSV中的行
【发布时间】:2015-07-20 10:51:05
【问题描述】:

我有一个结构如下的 csv 文件:

|     publish_date     |sentence_number|character_count|    sentence       |
----------------------------------------------------------------------------
|          1           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       1       |      28       | "Sentence 3 here. |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       2       |      42       | Sentence 4 here." |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       3       |      56       | Sentence 5 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|          2           |               |               |                   |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |      -1       |       0       | Sentence 1 here.  |
----------------------------------------------------------------------------
| 02/01/2012  00:12:00 |       0       |      14       | Sentence 2 here.  |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------
|         end          |               |               |                   |
----------------------------------------------------------------------------

我想做的是将每个句子块组合成段落以输出单个段落:

["Sentence 1 here.", "Sentence 2 here.", ""Sentence 3 here.", "Sentence 4 here."", "Sentence 5 here."]

有些句子是引述,延续到一个新句子中,而另一些则完全嵌入一个句子中。

到目前为止,我得到了这个:

def read_file():

    file = open('test.csv', "rU")
    reader = csv.reader(file)
    included_cols = [3]

    for row in reader:
        content = list(row[i] for i in included_cols)

        print content    
    return content

read_file()

但这只是输出一个句子列表,如下所示:

['Sentence 1 here.']
['Sentence 2 here.']

任何建议表示赞赏。

【问题讨论】:

    标签: python regex csv


    【解决方案1】:

    只需从每一行中获取第四个元素,您正在创建一个包含每个第四个元素的列表:

    def read_file():
        file = open('test.csv', "rU")
        reader = csv.reader(file)
        return [row[3] for row in reader if len(row) > 3 and row[3]]
    

    应该输出:

    ['sentence', 'Sentence 1 here.', 'Sentence 2 here.', ' "Sentence 3 here.', ' Sentence 4 here."', ' Sentence 5 here.', 'Sentence 1 here.', 'Sentence 2 here.']
    

    如果你想把段落分成几个部分:

    from itertools import groupby
    def read_file():
        file = open('temp.txt', "rU")
        reader = csv.reader(file)
        paras = (row[3] for row in reader if len(row) > 3)
        return [list(v) for k, v in groupby(paras,key=lambda x: x != "") if k]
    

    Groupby 应该输出如下内容:

    [['sentence', 'Sentence 1 here.', 'Sentence 2 here.', 
     ' "Sentence 3 here.', ' Sentence 4 here."', ' Sentence 5 here.'],
     ['Sentence 1 here.', 'Sentence 2 here.']]
    

    【讨论】:

    • 它看起来快到了,但现在它抛出了 list index out of range 错误。
    • @sammy88888888,尝试编辑,您的 csv 用逗号分隔是吗?
    • 是的,它是逗号分隔的。差不多就完成了,谢谢!唯一的异常是,有些句子被这样打断了:', ',而其他句子则这样:', "。知道为什么或是否重要吗?
    • 其实问题是因为你有一些带引号的字符串,我不会担心,如果你需要你可以str.strip空格。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-06-07
    • 2019-05-24
    • 2021-08-11
    • 2022-01-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多