【发布时间】:2014-09-10 18:43:47
【问题描述】:
我想用正则表达式从 Java/JSP 源代码中提取完全限定的类名。
已经有一些关于这个的话题,尤其是。 Regular expression matching fully qualified class names
虽然我非常接近解决问题,但我无法摆脱误报。
这里有一些例子。在行尾,我附上了预期值。
Logger l = LoggerFactory.getLogger("test"); // not a FQN, because it starts with an uppercase letter ("LoggerFactory")
if(!com.db.TFSec.isPermitted("test") return; // should return "com.db.TFSecurity"
new java.util.concurrent.BrokenException(); // java.util.concurrent.BrokenException
java.util.Set<Log> ls = new java.util.HashSet<>(); // java.util.Set, java.util.HashSet
java.awt.Component c1d2 = new java.awt.List(); // java.awt.Component, java.awt.List
com.de.tfsecurity.TFUser u; // com.de.tfsecurity.TFUser
我已经尝试了这 3 个正则表达式:
// Own try. Only one false positive in line1: [oggerFactory.getLogger("test")] ([a-z]\\w*\\.\\w+(\\.\\w+)*)[<\\( ;] // The following two regexes are the "correct" answers from the thread mentioned above. But I get false positives. ([a-zA-Z_$][a-zA-Z\\d_$]*\\.)*[a-zA-Z_$][a-zA-Z\\d_$]* // false positives: [Logger, LoggerFactory.getLogger, test, if, return, new, c1d2] etc. ([a-z][a-z_0-9]*\\.)*[A-Z_]($[A-Z_]|[\\w_])* // false positives: the same as in the previous example
这是我的源代码:
public class FileUsageScanner {
// This is my own try. Works for most of the time, but we have false positives with LoggerFactory.getLogger, which is not a FQN
private final Pattern fqnPatternOwnTry = Pattern.compile("([a-z]\\w*\\.\\w+(\\.\\w+)*)[<\\( ;]");
// Solutions from https://stackoverflow.com/questions/5205339/regular-expression-matching-fully-qualified-java-classes
// Lots of false positives like: [Logger, LoggerFactory.getLogger, test, if, return, new, c1d2] etc.
private final Pattern fqnPatternThr = Pattern.compile("([a-zA-Z_$][a-zA-Z\\d_$]*\\.)*[a-zA-Z_$][a-zA-Z\\d_$]*");
private final Pattern fqnPatternThr2 = Pattern.compile("([a-z][a-z_0-9]*\\.)*[A-Z_]($[A-Z_]|[\\w_])*");
public static void main(String[] args) throws IOException {
FileUsageScanner scan = new FileUsageScanner();
scan.getFQClassname("Logger logger = LoggerFactory.getLogger(\"test\");)"); // not a FQN
scan.getFQClassname("if(!com.db.TFSec.isPermitted(\"test\") return;"); // com.db.TFSec
scan.getFQClassname("new java.util.concurrent.BrokenException();"); // java.util.concurrent.BrokenException
scan.getFQClassname("java.util.Set<Log> loggers = new java.util.HashSet<>();"); // java.util.Set, java.util.HashSet
scan.getFQClassname("java.awt.Component c1d2 = new java.awt.List();"); // java.awt.Component, java.awt.List
scan.getFQClassname("com.de.tfsecurity.TFUser u;"); //com.de.tfsecurity.TFUser
}
private List<String> getFQClassname(String line) {
if (line != null && !line.isEmpty() && line.contains(".")) {
Matcher matcher = fqnPatternThr2.matcher(line);
List<String> l = null;
while (matcher.find()) {
if (l == null) {
l = new ArrayList<String>();
}
l.add(matcher.group());
}
if (l != null)
System.out.println("Found FQN in " + line + " -> " + l);
return l;
}
return null;
}
}
我怎样才能摆脱误报?
感谢任何cmets,
伯恩哈德
【问题讨论】:
-
以下 FQN 是合法的(一件事是约定,另一件事是 Java 认为有效的):
My.Packages.With.UpperCase、a.package.aClass。您假设将有两个或多个小写包级别,后跟一个驼峰式类名。如果是,请在问题中明确说明。 -
谢谢。
My.Packages.With.UpperCase:好吧,我不知道这是可能的,因为我从未见过大写的包名(只是浏览了一堆 jar 库)。是的,我假设像 low.lower.UppercaseClassname 这样的包名。也许这只是惯例,但我可以接受,即使 Java 规范告诉我不是这样。