【问题标题】:Regex to capture class and methods正则表达式捕获类和方法
【发布时间】:2015-02-07 01:50:12
【问题描述】:

如何从 python 文件中捕获类和方法?

我不关心 attrs 或 args。

class MyClass_1(...):
    ...
    def method1_of_first_class(self):
        ...

    def method2_of_first_class(self):
        ...

    def method3_of_first_class(self):
        ...

class MyClass_2(...):
    ...
    def method1_of_second_class(self):
        ...

    def method2_of_second_class(self):
        ...

    def method3_of_second_class(self):
        ...

到目前为止我尝试了什么:

class ([\w_]+?)\(.*?\):.*?(?:def ([\w_]+?)\(self.*?\):.*?)+?

选项:点匹配换行符

学习课程

Match the characters “class ” literally «class »
Match the regular expression below and capture its match into backreference number 1 «([\w_]+?)»
   Match a single character present in the list below «[\w_]+?»
      Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
      A word character (letters, digits, etc.) «\w»
      The character “_” «_»
Match the character “(” literally «\(»
Match any single character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match the character “:” literally «:»
Match any single character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

捕获方法:

Match the regular expression below «(?:def ([\w_]+?)\(self.*?\):.*?)+?»
   Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
   Match the characters “def ” literally «def »
   Match the regular expression below and capture its match into backreference number 2 «([\w_]+?)»
      Match a single character present in the list below «[\w_]+?»
         Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
         A word character (letters, digits, etc.) «\w»
         The character “_” «_»
   Match the character “(” literally «\(»
   Match the characters “self” literally «self»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “)” literally «\)»
   Match the character “:” literally «:»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

但它只捕获类名和第一个方法,我认为这是因为反向引用编号 2 不能捕获超过 1,即使它在 (?:myregex)+ 内?

电流输出:

'MyClass_1':'method1_of_first_class',
'MyClass_2':'method1_of_second_class'

期望的输出:

'MyClass_1':['method1_of_first_class','method2_of_first_class',...],
'MyClass_2':['method1_of_second_class','method2_of_second_class',...]

【问题讨论】:

  • 你的预期输出是什么?
  • [MyClass_1, [method1_of_first_class,method2_of_first_class,...]] [MyClass_2, [method1_of_second_class,method2_of_second_class,...]]
  • 用正则表达式解析代码是困难。请参阅123。我建议使用专用的解析器。此外,在询问正则表达式问题时,请定义您正在使用的语言/工具。

标签: regex


【解决方案1】:

由于一个类可以包含另一个类或另一个函数,而一个函数可以包含另一个函数或另一个类,简单地用正则表达式抓取类和函数声明会导致层次结构信息丢失。

特别是,您的 Python 安装中的 pydoc.py(从 2.1 版开始提供)是此类情况的主要示例。

在 Python 中解析 Python 代码很简单,因为 Python 在 parser 模块和(从版本 2.6)ast 模块中包含一个内置解析器

这是使用ast 模块(2.6. 及更高版本)在 Python 中解析 Python 代码的示例代码:

from ast import *
import sys

fi = open(sys.argv[1])
source = fi.read()
fi.close()

parse_tree = parse(source)

class Node:
    def __init__(self, node, children):
        self.node = node;
        self.children = children

    def __repr__(self):
        return "{{{}: {}}}".format(self.node, self.children)

class ClassVisitor(NodeVisitor):
    def visit_ClassDef(self, node):
        # print(node, node.name)

        r = self.generic_visit(node)
        return Node(("class", node.name), r)

    def visit_FunctionDef(self, node):
        # print(node, node.name)

        r = self.generic_visit(node)
        return Node(("function", node.name), r)


    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        node_list = []

        def add_child(nl, children):
            if children is None:
                pass
                ''' Disable 2 lines below if you need more scoping information '''
            elif type(children) is list:
                nl += children
            else:
                nl.append(children)

        for field, value in iter_fields(node):
            if isinstance(value, list):
                for item in value:
                    if isinstance(item, AST):
                        add_child(node_list, self.visit(item))
            elif isinstance(value, AST):
                add_child(node_list, self.visit(value))

        return node_list if node_list else None

print(ClassVisitor().visit(parse_tree))

代码已在 Python 2.7 和 Python 3.2 中测试。

由于generic_visit 的默认实现不返回任何内容,我复制了generic_visit 的源代码并对其进行了修改以将返回值传回给调用者。

【讨论】:

    【解决方案2】:

    您可以使用this regex 开头:

    /class\s(\w+)|def\s(\w+)/gm
    

    这将匹配所有类和方法名。要将其放入您在 cmets 中提到的结构中,您可能需要使用实现语言。

    编辑:here's a PHP implementation example

    $output = array();
    
    foreach ($match_array[0] as $key => $value) {
        if (substr($value, 0, 5) === 'class') {
            $output[$value] = array();
            $parent_key = $value;
            continue;
        }
        $output[$parent_key][] = $value;
    }
    
    // print_r($output);
    
    foreach ($output as $parent => $values) {
        echo '[' . $parent . ', [' . implode(',', $values) . ']]' . PHP_EOL;
    }
    

    示例输出:

    [class MyClass_1, [def method1_of_first_class,def method2_of_first_class,def method3_of_first_class]]
    [class MyClass_2, [def method1_of_second_class,def method2_of_second_class,def method3_of_second_class]]
    

    【讨论】:

    • 这只是一个例子。取决于您以及您使用什么语言来实现它。
    • @f.rodrigues,请注意,此解决方案不适用于包含单词 classdef 的字符串的输入。即""" a class that does something """ 会找到一个名为that 的类。一个更强大的解决方案是 nhahtdh 的建议。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-01-02
    • 2018-08-19
    • 2015-01-20
    • 2016-02-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多