使用 python 从源文件中剥离 C/C++ 注释答案

【问题标题】：Stripping off C/C++ comments from a source file using python使用 python 从源文件中剥离 C/C++ 注释
【发布时间】：2015-10-28 16:10:12
【问题描述】：

我有一个忽略以 /* ... */ 开头的多行 cmets 的正则表达式但不适用于以 //

开头的行

有人可以建议在这个正则表达式中添加什么以使其忽略

pattern = r"""
                        ##  --------- COMMENT ---------
       /\*              ##  Start of /* ... */ comment
       [^*]*\*+         ##  Non-* followed by 1-or-more *'s
       (                ##
         [^/*][^*]*\*+  ##
       )*               ##  0-or-more things which don't start with /
                        ##    but do end with '*'
       /                ##  End of /* ... */ comment
     |                  ##  -OR-  various things which aren't comments:
       (                ## 
                        ##  ------ " ... " STRING ------
         "              ##  Start of " ... " string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^"\\]       ##  Non "\ characters
         )*             ##
         "              ##  End of " ... " string
       |                ##  -OR-
                        ##
                        ##  ------ ' ... ' STRING ------
         '              ##  Start of ' ... ' string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^'\\]       ##  Non '\ characters
         )*             ##
         '              ##  End of ' ... ' string
       |                ##  -OR-
                        ##
                        ##  ------ ANYTHING ELSE -------
         .              ##  Anything other char
         [^/"'\\]*      ##  Chars which doesn't start a comment, string
       )                ##    or escape
    """

【问题讨论】：

你用这个做什么？真的需要正则表达式吗？
此时您可能会停止使用正则表达式（无论如何，多行 cmets 不是上下文无关的语法）。我确实使用自定义解析器在 C/C++ 源文件中查找原始字符串：github.com/lucasg/MSVCUnicodeUpdater/blob/master/sed.py
看起来在这种情况下，使用 pyparsing 等解析框架可能更易于管理。
嗨@MichaelSPriz。我正在编写一个 python 工具，它在两个 c/cpp 文件之间剥离 cmets（其中包含 perforce 头信息和日期和时间修改信息），以便我可以比较它们并查看代码是否有变化。

标签： python regex

【解决方案1】：

如果您打算使用当前的正则表达式，您可以执行以下操作来匹配//... cmets：

下面：

 /                ##  End of /* ... */ comment

添加这个：

 |                  ## OR it is a line comment with //
  \s*//.*           ## Single line comment

见demo

【讨论】：