如何在unix中找到具有特定模式的行并从中删除换行符答案

【问题标题】：How to find a line with particular pattern and remove new line character from it in unix如何在unix中找到具有特定模式的行并从中删除换行符
【发布时间】：2018-02-03 19:26:16
【问题描述】：

如何在 unix 中找到具有特定模式的行并从中删除换行符。 假设我有一个逗号分隔的文件

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior  

Manager",,,,  

103,"Donald","President of 

united states",,,,

我想要的输出是

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior Manager",,,,  
103,"Donald","President of united states",,,,

【问题讨论】：

您的特定模式是什么？

标签： shell unix awk sed scripting

【解决方案1】：

短sed解决方案：

sed -z 's/\n*//g; s/,,,,/&\n/g' file

输出：

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","Senior Manager",,,,
103,"Donald","President of united states",,,,

或者用 awk：

awk 'BEGIN{ RS=ORS="" }{ gsub(/\n+/," ",$0); gsub(/,,,, */,"&\n",$0); print }' file

【讨论】：

@YOGI，我一直在测试我的解决方案。是的，它正在工作。此外，您没有详细说明您的“特定模式”
@RomanPerekrest，感谢您提供的解决方案有效，但我已编辑帖子以包含我在文件中遇到的其他问题。你能帮我编辑帖子吗？我正在尝试使用 sed '/[A-Za-z]//p' 文件匹配以字符开头的行，然后想从该行中删除 \n。
@k-5，它不应该被称为“缺失”，因为它是尾随空格。此外，仅当它出现在初始文件中（每行之后）时，该尾随空格才会保留

【解决方案2】：

也尝试一下 awk。

awk '/^$/{next} {val=$0 ~ /^[0-9]/?(val?val ORS $0:$0):(val?val OFS $0:$0)} END{print val}' Input_file

编辑：添加一个非单行形式的解决方案以及它的解释。

awk '
/^$/{   ## Checking here if a line starts from space, if yes then do following action.
   next ## next keyword will skip all further actions here.
}
{
val=$0 ~ /^[0-9]/?(val?val ORS $0:$0):(val?val OFS $0:$0) ##creating variable named val here which will check 2 conditions if a line starts with digit then it will concatenate itself with a new line and if a line statrs with non-digit value then it will concatenate its value with a space.
}
END{         ##END block of awk code here.
   print val ##printing the value of variable named val here
}
' Input_file ## Mentioning Input_file here.

【讨论】：

【解决方案3】：

awk '{printf("%s", $0)}/,,,,/{print "\n"}' ORS="" file

100,"John","Clerk",,,,  
101,"Dannis","Manager",,,,  
102,"Michael","Senior Manager",,,,  
103,"Donald","President of united states",,,,

【讨论】：

感谢您的信息！

【解决方案4】：

这可能对你有用（GNU sed）：

sed -r ':a;N;/^([^\n,]*,){6}/!s/\n//;ta;P;D' file

将另一行添加到模式空间 (PS)，如果该行不包含 6 个 ,，则删除一个换行符并重复，否则打印并删除第一行，然后重复。

【讨论】：

【解决方案5】：

如果您不介意使用 Perl

首先删除多余的换行符：

perl -pe 's/^\n//;' file

输出：

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","Senior
Manager",,,,
103,"Donald","President of
united states",,,,

那么您可以：添加新替换以删除每行最后一个单词的换行符。为此，您可以使用：

s/(\w+)\s+\n$/$1 /;

这里 \w+ 匹配 Senior 和 of 并将它们保留在 $1 中，您可以将其与 /$1 / 和 一起使用，并且值得注意的部分是一个空格：在 $1 之后

最后我们有了：

perl -pe 's/^\n//;s/(\w+)\s+\n$/==>$1<== /;' file

输出：

100,"John","Clerk",,,,
101,"Dannis","Manager",,,,
102,"Michael","==>Senior<== Manager",,,,
103,"Donald","President ==>of<== united states",,,,

注意：

删除 ==> 和 <== 并添加 -i.bak 以获取备份和就地编辑

甚至在一次替换中：

perl -lpe '$/=undef; s/(\w+)\s+\n\n^([^\n]+)\n/$1 $2/gm;'  file

【讨论】：

【解决方案6】：

从https://stackoverflow.com/a/45420607/1745001复制代码并更改：

{
    printf "Record %d:\n", ++recNr
    for (i=1;i<=NF;i++) {
        printf "    $%d=<%s>\n", i, $i
    }
    print "----"
}

到这里：

/your regexp/ {
    printf "Record %d:\n", ++recNr
    for (i=1;i<=NF;i++) {
        gsub(/\n/," ",$i)
        printf "    $%d=<%s>\n", i, $i
    }
    print "----"
}

your regexp 是您试图在数据中找到的任何正则表达式（您在问题中提到的“特定模式”）。

与您当前的大多数（全部？）答案不同，以上内容不依赖于以,,,, 结尾的输入行，也不将整个文件读入内存，也不依赖于字段的部分跟随以任何特定值开头的换行符，也不依赖于字段中最多只有 1 个空白行，也不需要任何特定版本的工具等。

【讨论】：