【问题标题】:Formatting text for version control为版本控制格式化文本
【发布时间】:2016-05-19 15:35:29
【问题描述】:

我的很多文档都是使用 LaTeX 编写的,如果格式正确,它适用于分布式工作流程和版本控制。具体来说,我喜欢将文本格式化为每行一个句子。

我的问题是我有一些不遵循此格式策略的旧文件要转换,我想以自动方式转换它们。我觉得这应该很简单,结合sed 和/或awk,但我遇到了一些麻烦。

我正在尝试转换

This is some unformatted
text that does not have a sentence on one line.

This is a new unformatted paragraph
that does not follow the rule either.

This line \\ has a break in it.

This is some unformatted text that does not have a sentence on one line.

This is a new unformatted paragraph that does not follow the rule either.

This line \\
has a break in it.

我目前拥有的sed/awk如下:

awk ' /^$/ { print "\n"; } /./ { printf("%s", $0); } END { print; } ' <filename> | sed -e $'s/\. /\.\\\n/g'

这让我大部分时间都在那里,但我无法让 \\ 后跟换行符正常工作。

非常感谢您的帮助。

【问题讨论】:

  • 解决方案有效并解决了问题,但你们谁能解释一下他们在做什么? @sjsam
  • @Ed Morton,请看上面

标签: regex awk sed


【解决方案1】:

输入

$ cat text
This is some unformatted
text that does not have a sentence on one line.

This is a new unformatted paragraph
that does not follow the rule either.

This line \\ has a break in it.

This line too \\ contains break.
This is a normal line.

脚本

 $ awk 'BEGIN{RS=".";}
 {$0=gensub(/([[:print:]?])\n/,"\\1 ","g");
 $0=gensub(/(\\\\) /,"\\1\n","g");
 printf "%s.",$0}
 END{printf "\n"}' text

输出

This is some unformatted text that does not have a sentence on one line.

This is a new unformatted paragraph that does not follow the rule either.

This line \\
has a break in it.

This line too \\
contains break.
This is a normal line .

注意:这假设您有 gnu-awk。

【讨论】:

    【解决方案2】:
    $ awk -v RS= -v ORS='\n\n' -F'\\\\\\\\[[:space:]]*' -v OFS='\n' '{gsub(/\n/," "); $1=$1}1' file
    This is some unformatted text that does not have a sentence on one line.
    
    This is a new unformatted paragraph that does not follow the rule either.
    
    This line
    has a break in it.
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-10-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-26
      • 2011-01-12
      • 2011-03-12
      相关资源
      最近更新 更多