从 bash 中的字符串末尾删除换行符 - 行继续答案

【问题标题】：remove newline from end of string in bash - line continuations从 bash 中的字符串末尾删除换行符 - 行继续
【发布时间】：2015-05-05 13:00:13
【问题描述】：

我知道有几个不同的开放和回答，但我的有点不同。我正在尝试在 bash 中执行此操作。

我有这个文件：

Line1 asd asd asd \
    asd asd asd \

Line2 asd asd asd \
    asd asd asd \

Line3 asd asd asd \
    asd asd asd \

Line4 asd asd asd \
    asd asd asd \

我想要的输出是：

Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd

所以作为 bash 循环更容易阅读。什么命令可以让我这样做？

提前致谢。

【问题讨论】：

有点不清楚你的输入文件是什么样子的。这些是准确的` or they represent something else? Also, what is the underlying logic here? Text XXX XXX XXX XXX XXX` 直到空行吗？
当你不使用-r 时，bash 内置的read 支持反斜杠续行。它应该从文件/等中读取这些行。很好。

标签： bash perl awk sed newline

【解决方案1】：

当您不使用 -r 时，bash 内置的 read 支持反斜杠续行（否则，当您需要这种支持时，您始终应该使用 -r）。

所以它应该从文件/等中读取这些行。正好。（假设它们没有需要保留的其他反斜杠转义序列。

$ while IFS= read line; do
    echo "[$line]"
done < <(printf 'Line1 asd asd asd \
    asd asd asd \

Line2 asd asd asd \
    asd asd asd \

Line3 asd asd asd \
    asd asd asd \

Line4 asd asd asd \
    asd asd asd \
')
[Line1 asd asd asd     asd asd asd ]
[Line2 asd asd asd     asd asd asd ]
[Line3 asd asd asd     asd asd asd ]

【讨论】：

这可能很好地说明了为什么将read -r 用于大多数其他读取用途很重要。此外，OP 可以使用while read; do echo $REPLY; done 形式获得所需的输出
@kojiro 确实如此。这就是为什么-r 几乎总是你想要的。是的，使用$REPLY 的非空白剥离行为是我最近才了解的一种行为，我倾向于更喜欢显式的IFS= 设置而不是$REPLY 的隐式行为，但是。
@mklement0 我将$REPLY 答案的要点解释为避免需要设置IFS，因为（我相信你教过我）使用$REPLY 不会那样做。如果需要剥离空白，则应将 IFS= 关闭。但是，是的，我同意 read 在这里不是正确的解决方案（主要是因为其他反斜杠转义问题。
确实如此。我没有关注未引用的细节，而是关注其他细节，是的，这确实解决了这个特定情况，但不是通用解决方案。
我刚刚注意到带有 unquoted EOF 分隔符的 literal here-document 也执行行继续处理，例如 read，但是，与 read 不同，行内部 \ 实例被保留，除了在 $、\ 和 ` 之前，这导致了一个很大的警告：文档是扩展关于变量引用以及命令和算术替换，就像双引号字符串一样。

【解决方案2】：

Perl 解决方案：

perl -pe 's/\\$// and chomp' < input > output

s/// 是一个替代品。 \\ 匹配反斜杠，$ 匹配行尾。
chomp 删除尾随换行符（如果存在）。

要同时删除前导空格，请使用

 's/^ +//; s/\\$// and chomp'

^ 匹配行首。 + 匹配一个或多个空格。

改为。

【讨论】：

【解决方案3】：

$ awk -v RS= '{gsub(/\s*\\\s*/,"")}1' file
Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd

如果您没有 GNU awk，请使用 [[:space:]] 而不是 \s。

请注意，尽管您在 shell 中编写循环只是为了操作文本，但您的方法是错误的，因此执行上述操作以准备简化 bash 读取循环总体上可能是个坏主意。

【讨论】：

++ 为简单起见，并且适用于示例输入，但有两点值得指出（为了那些寻找 generic 行继续处理的人的利益） : 如果输入文件有 no 空行，它将被作为一个整体读入内存（因为，由于-v RS=，任何连续的非空行的运行构成一个 single 输入记录）；相反，如果空行存在，它们将被无条件删除，无论它们是否是续行的一部分。

【解决方案4】：

注意：

下面的第一个解决方案反映了 OP 的特定空白处理要求； 通用行继续处理见底部。
此处的解决方案符合 POSIX 标准，因此它们应该适用于大多数类 Unix 平台（已在 OSX 和 Linux 上验证）。
OP's own solution 表明输入具有 Windows 样式的行尾 (\r\n)。但是，鉴于问题中没有说明这一点，这里的解决方案仅匹配 Unix 风格的解决方案 (\n)。要匹配\r\n 行结尾，请将\n 替换为'"$(printf '\r')"'\n（原文如此），或者在bash 中，在下面的sed 命令中替换'$'\r''\n。（使用 GNU sed 您可以简单地使用 \r\n，但 POSIX sed 不会将 \r 识别为转义序列）。

OP's own solution 的更正版本，它还可以正确处理以\ 结尾且位于空行之前的行。

sed -e ':a' -e '$!{N;ba' -e '}; s/ \\\n[[:blank:]]*/ /g' filename

-e ':a' -e '$!{N;ba' -e '}' 是一个常见的 sed 习语：将所有输入行一次读入模式空间（输入缓冲区）的循环 - BSD sed 需要多个 -e 选项才能使这项工作（或者，多行脚本）。

文本替换命令 s/ \\\n[[:blank:]]*/ /g 然后在所有输入行上运行，并且全局 (g) 替换单个空格的运行，然后是 \ ( \\)，然后是换行符 (@ 987654351@)，后跟任意数量的空格和/或制表符。 ([[:blank:]]*)，并用一个空格 () 替换每个这样的运行。
简而言之：在删除尾随 \ 并从下一行去除前导空格之后，行尾的 <space>\ 会导致该行与 下一行 行连接。

注意：

以下解决方案有 awk 和 sed 两种风格。
通常，awk 解决方案更可取，因为它们不会一次读取所有输入，这对于大文件可能会出现问题。（可以说，它们也更容易理解。）
请注意，以下用作示例输入的 here-document 使用引用 EOF 分隔符 (<<'EOF') 以保留未修改的字符串；在不引用EOF 的情况下，shell 自己的字符串文字处理将解析嵌入的行继续并在命令看到字符串之前加入行。

通用行继续处理没有空格处理：

这些解决方案只需删除\<newline>序列，然后按原样加入行，没有分隔符;例如，这就是 read 默认所做的。

但是，与read相比，这些解决方案有两个优势：

Line-interior \ 实例被单独留下。
sed 和 awk 的速度要快得多，而不仅仅是几行输入。

`awk`解决方案：

awk '/\\$/ { printf "%s", substr($0, 1, length($0)-1); next } 1' <<'EOF'
Line1 starts here\
 and ends here.

Line2 starts here, \
 continues here,\
  and ends here.
EOF
Line1 starts here and ends here.

Line2 starts here,  continues here,  and ends here.

/\\$/ 匹配行尾 ($) 的 \，表示线路延续。
substr($0, 1, length($0)-1) 从输入行 $0 中删除尾随 \。
通过使用printf "%s"，（修改后的）当前行打印时没有尾随换行符，这意味着接下来的任何打印命令都将直接附加到它，从而有效地连接当前行和下一行。
next 完成当前行的处理。
1 是一个常见的 awk 习语，是 { print } 的简写，即用于简单地打印输入行（尾随 \n）。

`sed`解决办法：

$ sed -e ':a' -e '$!{N;ba' -e '}; s/\\\n//g' <<'EOF'
Line1 starts here\
 and ends here.

Line2 starts here, \
 continues here,\
  and ends here.
EOF 
Line1 starts here and ends here.

Line2 starts here,  continues here,  and ends here.

注意最后一行的两个 double 空格，因为所有空格都被保留了。

[不推荐] 纯 shell（例如，`bash`）解决方案：

以下解决方案非常简单，但不完全可靠并且存在安全风险：它可能导致执行任意命令：

# Store input filename, passed as the 1st argument,
# in variable $file.
file=$1

# Construct a string that results in a valid shell command containing a
# *literal* here-document with *unquoted* EOF delimiter 0x3 - chosen so
# that it doesn't conflict with the input.
#
# When the resulting command is evaluated by `eval`, the *shell itself* 
# performs the desired line-continuation processing, BUT:
# '$'-prefixed tokens in the input, including command substitutions
# ('$(...)' and '`...`'), ARE EXPANDED, therefore:
# CAUTION: Maliciously constructed input can result in
#          execution of arbitrary commands.
eval "cat <<$(printf '\3')
$(cat "$file")"

具有空白规范化的通用行继续处理：

这些解决方案规范化空格如下：删除\<newline> 之前的任何尾随空格，以及下一个行的前导空格；然后生成的行由一个单个空格连接。
行中的空格不参与行延续保留原样。 ^{后者将这些解决方案与choroba's Perl solution区分开来}

`awk`解决方案

awk '
  contd { contd=0; sub(/^[[:blank:]]+/, "") } 
  /\\$/ { contd=1; sub(/[[:blank:]]*\\$/, ""); printf "%s ", $0; next } 
  1' <<'EOF'
Line1 starts here   \
      and ends here.
  I am a loner. 
Line3 starts here,   \
      continues here,    \
and ends here.
EOF
Line1 starts here and ends here.
  I am a loner.
Line3 starts here, continues here, and ends here.

变量contd（在布尔上下文中默认为 0 / false）用作标志，以指示前一行是否表示行继续并带有尾随 \。
如果设置了标志（模式contd），它会立即重置（尽管如果继续的行也在下一行继续，它可能会在下面再次设置），并且从当前行修剪前导空格(sub(/^[[:blank:]]+/, ""));请注意，不将目标变量指定为第三个参数会隐式针对整个输入行 $0。
/\\$/ 匹配线路末尾 ($) 处的 \，表示线路延续。
- 因此，设置了标志 (contd=1)，
- 行尾 \ 之前的尾随空格被删除（sub(/[[:blank:]]*\\$/, "") 以及 \ 本身，
- 打印结果时带有尾随空格，但没有换行符，由printf "%s "提供。
- next 然后继续下一个输入行，而不处理当前行的进一步命令。
1 是一个常见的awk 成语，是{ print } 的简写，即用于简单地打印输入行（尾随\n）；请注意，在两种情况下会到达此打印命令：
- 任何不涉及行延续的行，都被打印出来未修改。
- 任何结束续行的行（构成续行的一部分，但它们本身并不在下一行继续），由于第一个执行的修改，这些行在打印时删除了前导空格行动。

`sed`解决方案

$ sed -e ':a' -e '$!{N;ba' -e '}; s/[[:blank:]]*\\\n[[:blank:]]*/ /g' <<'EOF'
Line1 starts here   \
      and ends here.
  I am a loner.
Line3 starts here,   \
      continues here,    \
and ends here.
EOF
Line1 starts here and ends here.
  I am a loner.
Line3 starts here, continues here, and ends here.

行尾和行首空白被规范化为单个空格，用于延续涉及的行。请注意没有尾随 \ 的行是如何在未修改的情况下打印的。

【讨论】：

【解决方案5】：

编辑

此命令将删除下一行的空格、反斜杠和制表符。

sed ':a;N;$!ba;s/ \\\x0D\x0A\x09/ /g' filename

line1 asd asd asd \
     asd asd asd

到

line1 asd asd asd asd asd asd

然后我可以使用：

sed '/^[[:space:]]*$/d' filename

删除这些文件行之间不需要的空格

【讨论】：

你能详细说明一下，让它更像一个答案吗？这如何解决您的问题？
@gung：它使用常见的sed 习惯用法a;N;$!ba，将所有行一次读入模式空间（输入缓冲区），然后全局读入删除带有<space> 的<space>\r\n\t 序列。结果是以下行以 <space>\ 结尾的行具有其前导 \t 字符。剥离，然后直接连接到前一行，用一个空格作为分隔符。请注意，该命令的语法暗示 GNU sed (Linux)。另外，Austin，最好提到您的输入中有 Windows 样式的行尾，并且前导空格是 \t。
为了提高可读性，\\\x0D\x0A\x09可以表示为\\\r\n\t。
这将仅部分与问题的示例输入一起使用，因为尾随 \ 将不会从前面的行中删除空行。

通用行继续处理没有空格处理：

awk解决方案：

sed解决办法：

[不推荐] 纯 shell（例如，bash）解决方案：

具有空白规范化的通用行继续处理：

awk解决方案

sed解决方案

`awk`解决方案：

`sed`解决办法：

[不推荐] 纯 shell（例如，`bash`）解决方案：

`awk`解决方案

`sed`解决方案