Ruby 的字符串：转义和取消转义自定义字符答案

【问题标题】：Ruby's string: Escape and unescape a custom characterRuby 的字符串：转义和取消转义自定义字符
【发布时间】：2011-10-28 23:14:46
【问题描述】：

假设我说£ 字符很危险，并且我希望能够保护和取消保护任何字符串。反之亦然。

示例 1：

"Foobar £ foobar foobar foobar."  # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string

示例 2：

"Foobar £ foobar £££££££foobar foobar."         # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string

示例 3：

"Foobar \£ foobar \\£££££££foobar foobar."        # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string

有没有一种简单的方法可以使用 Ruby 从字符串中转义（和取消转义）给定字符（例如我的示例中的 £）？

编辑：这里是关于这个问题的行为的解释。

首先，感谢您的回答。我有一个带有Tweet 模型的Rails 应用程序，该模型具有content 字段。推文示例：

tweet = Tweet.create(content: "Hello @bob")

在模型内部，有一个序列化过程可以像这样转换字符串：

dump('Hello @bob') # => '["Hello £", 42]'
                   # ... where 42 is the id of bob username

然后，我可以像这样反序列化并显示它的推文：

load('["Hello £", 42]') # => 'Hello @bob'

同样，也可以使用多个用户名：

dump('Hello @bob and @joe!')        # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello @bob and @joe!'

这就是目标:)

但是这种查找和替换可能很难通过以下方式执行：

tweet = Tweet.create(content: "£ Hello @bob")

因为在这里我们还必须转义£ char。而且我认为您的解决方案对此很有用。所以结果变成：

dump('£ Hello @bob')       # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello @bob'

完美。

现在，如果有这个：

tweet = Tweet.create(content: "\£ Hello @bob")

我认为我们首先应该转义每个\，然后转义每个£，例如：

dump('\£ Hello @bob')       # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello @bob'

但是……在这种情况下我们该怎么办：

tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello @bob")

...tweet.content.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\") 似乎不起作用。

【问题讨论】：

你到底想用这个做什么？
Ruby 1.9.2 中的字符串与 1.8.7 中的字符串工作方式非常不同，因此您可能应该指定您使用的版本。
@ben-alpert：我刚刚更新了我的问题，关于这种行为背后的原因。
你为什么不把它序列化为["Hello ", 42, "!"] for "Hello @bob!"？
@ben-alpert: ...哎呀，我完全正确！非常感谢你的想法。超级简单好方法！

标签： ruby regex string escaping

【解决方案1】：

希望您的 ruby 版本支持lookbehinds。如果不是，我的解决方案将不适合您。

转义字符：

str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")

取消转义字符：

str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")

无论反斜杠的数量如何，这两个正则表达式都可以工作。它们相互补充。

转义说明：

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   £           # Match the character “£” literally
)
"

不是我匹配某个位置。根本不消耗任何文本。当我确定我想要的位置时，我插入一个 \。

unescape的解释：

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(           # Match the regular expression below and capture its match into backreference number 1
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\          # Match the character “\” literally
£           # Match the character “£” literally
"

这里我保存了所有的反斜杠减一并且我用特殊字符替换了这个数量的反斜杠。棘手的东西:)

【讨论】：

Ruby 1.9 的好答案，它已经向后看。 +1

【解决方案2】：

如果您使用的是向后看的 Ruby 1.9，那么 FailedDev 的答案应该可以很好地工作。如果您使用的是没有后视功能的 Ruby 1.8（我认为），则可以使用不同的方法。试试这个：

text.gsub!(/(\\.)|£)/m) do
    if ($1 != nil)  # If escaped anything
        "$1"        # replace with self.
    else            # Otherwise escape the
        "\\£"       # unescaped £.
    end
end

请注意，我不是 Ruby 程序员，并且此 sn-p 未经测试（特别是我不确定：if ($1 != nil) 语句的用法是否正确 - 它可能需要：if ($1 != "") 或 if ($1) )，但我确实知道这种通用技术（使用代码代替简单的替换字符串）有效。我最近对 my JavaScript solution to a similar question 使用了同样的技术，它正在寻找未转义的星号。

【讨论】：

【解决方案3】：

我不确定这是否是你想要的，但我认为你可以做一个简单的查找和替换：

str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape

请注意，我将\ 更改为\\，因为您必须在双引号字符串中转义反斜杠。

编辑：我认为你想要的是一个匹配奇数个反斜杠的正则表达式：

str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")

执行以下转换

"£"       #=> "£"
"\\£"     #=> "£"
"\\\\£"   #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"

【讨论】：

是的，但是如果我们想要取消转义的字符串是\\\\£，则结果是\\£（带有查找和替换）。恐怕查找和替换无法匹配所有可能性。