【发布时间】:2014-03-13 19:24:55
【问题描述】:
How to separate tokens in line using Unix? 表明可以使用sed 或xargs 标记文件。
有没有办法反其道而行之?
[在:]
some
sentences
are
like
this.
some
sentences
foo
bar
that
[出]:
some sentences are like this.
some sentences foo bar that
每个句子的唯一分隔符是\n\n。我本可以在 python 中完成以下操作,但是 有 unix 方式吗?
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]
[输出:]
[u'some sentences are like this. ', u'some sentences foo bar that ']
【问题讨论】:
-
总是5个字吗?用点
.检查新行何时更改的模式是什么? -
不,不总是5个字,5个字是巧合。