【问题标题】:How to find a special word inside a file using Python?如何使用 Python 在文件中查找特殊单词?
【发布时间】:2016-06-03 00:05:59
【问题描述】:

我在一个目录中有一堆.java 文件,我想通过python 代码将它们全部编译为.class 文件。

如您所知,Javac 命令行工具是我必须使用的工具,它要求.java 文件的名称与类名相同。不幸的是,对于我的 .java 文件,它不是。我的意思是他们有不同的随机名称,不等于他们的类名。

所以我需要从.java 文件的内容中提取类的名称。如果指定了类定义行很简单,但事实并非如此。 .java 文件的顶部可能包含一些 cmets,其中也可能包含 classpackage 字样。

问题是如何提取每个文件的包名和类名?

例如这是其中之一的内容:

//This is a sample package that its class name is HelloWorldApplet. in this package we blah blah blah and this class blah blah blah.
package helloWorldPackage;
//This is another comment that may or may not have the word "package" and "class" inside.
import javacard.framework.APDU;
import javacard.framework.Applet;
import javacard.framework.ISO7816;
import javacard.framework.ISOException;
import javacard.framework.Util;
/* this is also a multi line comment. blah blah blah package, blah blah blah package ... */
public class HelloWorldApplet extends Applet 
{
    private static final byte[] helloWorld = {(byte)'H',(byte)'e',(byte)'l',(byte)'l',(byte)'o',(byte)' ',(byte)'W',(byte)'o',(byte)'r',(byte)'l',(byte)'d',};
    private static final byte HW_CLA = (byte)0x80;
    private static final byte HW_INS = (byte)0x00;

    public static void install(byte[] bArray, short bOffset, byte bLength) 
        {
        new HelloWorldApplet().register(bArray, (short) (bOffset + 1), bArray[bOffset]);
        }

    public void process(APDU apdu) 
        {
        if (selectingApplet()) 
            {
            return;
            }

        byte[] buffer = apdu.getBuffer();
        byte CLA = (byte) (buffer[ISO7816.OFFSET_CLA] & 0xFF);
        byte INS = (byte) (buffer[ISO7816.OFFSET_INS] & 0xFF);

        if (CLA != HW_CLA)
            {
            ISOException.throwIt(ISO7816.SW_CLA_NOT_SUPPORTED);
            }

        switch ( INS ) 
            {
            case HW_INS:
                getHelloWorld( apdu );
                break;
            default:
                ISOException.throwIt(ISO7816.SW_INS_NOT_SUPPORTED);
            }
        }

    private void getHelloWorld( APDU apdu)
        {
        byte[] buffer = apdu.getBuffer();
        short length = (short) helloWorld.length;
        Util.arrayCopyNonAtomic(helloWorld, (short)0, buffer, (short)0, (short) length);
        apdu.setOutgoingAndSend((short)0, length);
        }
}

如何提取每个文件的包名(即helloWorldPackage)和类名(即HelloWorldApplet)?

请注意,.java 文件内部可能有不同的类,但我只需要扩展 Applet 的类的名称。

更新:

我尝试了以下方法,但没有奏效(Python 2.7.10):

import re

prgFile = open(r"yourFile\New Text Document.txt","r")
contents = prgFile.read()

x = re.match(r"(?<=class)\b.*\b(?=extends Applet)",contents)
print x
x = re.match(r"^(public)+",contents)
print x
x = re.match(r"^package ([^;\n]+)",contents)
print x
x = re.match(r"(?<=^public class )\b.*\b(?= extends Applet)",contents)
print x

输出:

>>> ================================ RESTART ================================
>>> 
None
None
None
None
>>> 

【问题讨论】:

标签: python regex python-2.7 python-3.x


【解决方案1】:

在许多情况下,一个简单的正则表达式就可以工作。

如果您想 100% 确定,我建议使用成熟的 Java 解析器(如 javalang)来解析每个文件,然后通过 AST 提取类名。

类似

import glob
import javalang

# look at all .java files in the working directory
for fname in glob.glob("*.java"):
    # load the sourcecode
    with open(fname) as inf:
        sourcecode = inf.read()

    try:
        # parse it to an Abstract Syntax Tree
        tree = javalang.parse.parse(sourcecode)
        # get package name
        pkg = tree.package.name

        # look at all class declarations
        for path, node in tree.filter(javalang.tree.ClassDeclaration):
            # if class extends Applet
            if node.extends.name == 'Applet':
                # print the class name
                print("{}: package {}, main class is {}".format(fname, pkg, node.name))

    except javalang.parser.JavaSyntaxError as je:
        # report any files which don't parse properly
        print("Error parsing {}: {}".format(fname, je))

给了

sample.java: package helloWorldPackage, main class is HelloWorldApplet

【讨论】:

    【解决方案2】:

    这个正则表达式对我有用。 (?&lt;=^public class )\b.*\b(?= extends Applet).

    正确使用方法:

    re.compile(ur'(?<=^public class )\b.*\b(?= extends Applet)', re.MULTILINE)
    

    【讨论】:

      【解决方案3】:

      你可以想出以下正则表达式:

      import re
      string = your_string_here
      classes = [x.strip() for x in re.findall(r'^(?:public class|package) ([^;]+?)(?=extends|;)', string, re.MULTILINE)]
      # look for public class or package at the start of the line 
      # then anything but a semicolon
      # make sure the match is immediately followed by extends or a colon
      print classes
      # ['helloWorldPackage', 'HelloWorldApplet']
      

      【讨论】:

        猜你喜欢
        • 2015-03-10
        • 1970-01-01
        • 2013-08-05
        • 1970-01-01
        • 1970-01-01
        • 2017-11-14
        • 1970-01-01
        • 2011-05-06
        • 2012-10-31
        相关资源
        最近更新 更多