【发布时间】:2015-07-20 10:51:05
【问题描述】:
我有一个结构如下的 csv 文件:
| publish_date |sentence_number|character_count| sentence |
----------------------------------------------------------------------------
| 1 | | | |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | -1 | 0 | Sentence 1 here. |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | 0 | 14 | Sentence 2 here. |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | 1 | 28 | "Sentence 3 here. |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | 2 | 42 | Sentence 4 here." |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | 3 | 56 | Sentence 5 here. |
----------------------------------------------------------------------------
| end | | | |
----------------------------------------------------------------------------
| 2 | | | |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | -1 | 0 | Sentence 1 here. |
----------------------------------------------------------------------------
| 02/01/2012 00:12:00 | 0 | 14 | Sentence 2 here. |
----------------------------------------------------------------------------
| end | | | |
----------------------------------------------------------------------------
| end | | | |
----------------------------------------------------------------------------
我想做的是将每个句子块组合成段落以输出单个段落:
["Sentence 1 here.", "Sentence 2 here.", ""Sentence 3 here.", "Sentence 4 here."", "Sentence 5 here."]
有些句子是引述,延续到一个新句子中,而另一些则完全嵌入一个句子中。
到目前为止,我得到了这个:
def read_file():
file = open('test.csv', "rU")
reader = csv.reader(file)
included_cols = [3]
for row in reader:
content = list(row[i] for i in included_cols)
print content
return content
read_file()
但这只是输出一个句子列表,如下所示:
['Sentence 1 here.']
['Sentence 2 here.']
任何建议表示赞赏。
【问题讨论】: