用于查找评论的 Ruby 正则表达式？答案

【问题标题】：Ruby regex for finding comments?用于查找评论的 Ruby 正则表达式？
【发布时间】：2011-05-03 05:21:43
【问题描述】：

我整天都在做这个，我想不通。我在下面的字符串中有一些 Ruby 代码，并且只想将行与上面的代码以及代码的第一个注释（如果存在）进行匹配。

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan" # Setting an attribute # Another ignored comment

这将捕获：

1. "1 + 1"
2. "Simple math"
1. "user = User.new"
2. nil
1. "user.name = "Ryan"
2. "Setting an attribute"

我使用/^\x20*(.+)\x20*(#\x20*.+\x20*){1}$/ 匹配每一行，但它似乎不适用于所有代码。

【问题讨论】：

我不了解 Ruby，但我很确定 # Simple math (this comment would be collected) # ignored 只是一条评论，而不是两条。第二个# 被逐字处理，因为它已经被第一个# 注释掉了。
我知道，这就是我需要的功能。
这可能是极难做到的。考虑像 user = "We're #1 \" #_# " # comment 这样的行，更不用说内联 cmets（如 i = 1 + /*comment */ 1，如果 ruby 有这些）。如果您需要强大的功能，请使用解析器（尽管可能会忽略换行）。
我有一个解决方案。我正在使用一个可以输出代码块源代码的库。我可以在没有 cmets 和 cmets 的情况下输出源代码，因此我可以删除所有不是注释的内容。

标签： ruby regex

【解决方案1】：

Kobi 的回答部分有效，但与末尾缺少注释的代码行不匹配。

遇到字符串插值也会失败，例如：

str = "My name is #{first_name} #{last_name}" # first comment

...会被错误匹配为：str = "My name is #{first_name}

您需要一个更全面的正则表达式。这是一个想法：

/^[\t ]*([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*)(#([^#\r\n]*))?/

^[\t ]* - 前导空格。
([^#"'\r\n]("(\\"|[^"])*"|'(\\'|[^'])*'|[^#\n\r])*) - 匹配一行代码。
细分：
- [^#"'\r\n] - 一行代码中的第一个字符，以及...
- "(\\"|[^"])*" - 双引号字符串，或者...
- '(\\'|[^'])*' - 单引号字符串，或者...
- [^#\n\r] - 带引号的字符串之外的任何其他字符，不是# 或行尾。
(#([^#\r\n]*))? - 匹配代码行末尾的第一个注释（如果有）。

由于更复杂的逻辑，这将为每个匹配捕获 6 个子模式。子模式1是代码，子模式6是注释，其他的可以忽略。

给定以下代码块：

# Some ignored comment.
1 + 1 # Simple math (this comment would be collected) # ignored 
# ignored

user = User.new
user.name = "Ryan #{last_name}" # Setting an attribute # Another ignored comment

上述正则表达式将产生以下结果（为简洁起见，我排除了子模式 2、3、4、5）：

1。 1 + 1
6. Simple math (this comment would be collected)
1。 user = User.new
6.
1。 user.name = "Ryan #{last_name}"
6. Setting an attribute

演示：http://rubular.com/r/yKxEazjNPC

【讨论】：

【解决方案2】：

虽然根本问题相当困难，但您可以使用该模式在此处找到您需要的内容：

^[\t ]*[^\s#][^#\n\r]*#([^#\n\r]*)

内容如下：

[\t ]* - 前导空格。
[^\s#] - 一个实际角色。这应该与代码匹配。
[^#\n\r]* - # 号之前的字符。除了哈希或换行符之外的任何内容。
#([^#\n\r]*) - “第一个”评论，在第 1 组中捕获。

工作示例：http://rubular.com/r/wNJTMDV9Bw

【讨论】：