我加入了一个计数器来检查每 n 次 charAt 读取,以减少开销。
注意事项:
有人说 carAt 的调用频率可能不够高。我刚刚添加了 foo 变量,以说明调用了多少 charAt,并且它足够频繁。如果您要在生产中使用它,请删除该计数器,因为如果长时间在服务器中运行,它会降低性能并最终导致长时间溢出。在这个例子中,charAt 每 0.8 秒左右被调用 3000 万次(没有在适当的微基准测试条件下进行测试,这只是一个概念证明)。如果您想要更高的精度,您可以设置较低的 checkInterval,但会以性能为代价(从长远来看,System.currentTimeMillis() > timeoutTime 比 if 子句更昂贵。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.goikosoft.test.RegexpTimeoutException;
/**
* Allows to create timeoutable regular expressions.
*
* Limitations: Can only throw RuntimeException. Decreases performance.
*
* Posted by Kris in stackoverflow.
*
* Modified by dgoiko to ejecute timeout check only every n chars.
* Now timeout < 0 means no timeout.
*
* @author Kris https://stackoverflow.com/a/910798/9465588
*
*/
public class RegularExpressionUtils {
public static long foo = 0;
// demonstrates behavior for regular expression running into catastrophic backtracking for given input
public static void main(String[] args) {
long millis = System.currentTimeMillis();
// This checkInterval produces a < 500 ms delay. Higher checkInterval will produce higher delays on timeout.
Matcher matcher = createMatcherWithTimeout(
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y", 10000, 30000000);
try {
System.out.println(matcher.matches());
} catch (RuntimeException e) {
System.out.println("Operation timed out after " + (System.currentTimeMillis() - millis) + " milliseconds");
}
System.out.print(foo);
}
public static Matcher createMatcherWithTimeout(String stringToMatch, String regularExpression, long timeoutMillis,
int checkInterval) {
Pattern pattern = Pattern.compile(regularExpression);
return createMatcherWithTimeout(stringToMatch, pattern, timeoutMillis, checkInterval);
}
public static Matcher createMatcherWithTimeout(String stringToMatch, Pattern regularExpressionPattern,
long timeoutMillis, int checkInterval) {
if (timeoutMillis < 0) {
return regularExpressionPattern.matcher(stringToMatch);
}
CharSequence charSequence = new TimeoutRegexCharSequence(stringToMatch, timeoutMillis, stringToMatch,
regularExpressionPattern.pattern(), checkInterval);
return regularExpressionPattern.matcher(charSequence);
}
private static class TimeoutRegexCharSequence implements CharSequence {
private final CharSequence inner;
private final long timeoutMillis;
private final long timeoutTime;
private final String stringToMatch;
private final String regularExpression;
private int checkInterval;
private int attemps;
TimeoutRegexCharSequence(CharSequence inner, long timeoutMillis, String stringToMatch,
String regularExpression, int checkInterval) {
super();
this.inner = inner;
this.timeoutMillis = timeoutMillis;
this.stringToMatch = stringToMatch;
this.regularExpression = regularExpression;
timeoutTime = System.currentTimeMillis() + timeoutMillis;
this.checkInterval = checkInterval;
this.attemps = 0;
}
public char charAt(int index) {
if (this.attemps == this.checkInterval) {
foo++;
if (System.currentTimeMillis() > timeoutTime) {
throw new RegexpTimeoutException(regularExpression, stringToMatch, timeoutMillis);
}
this.attemps = 0;
} else {
this.attemps++;
}
return inner.charAt(index);
}
public int length() {
return inner.length();
}
public CharSequence subSequence(int start, int end) {
return new TimeoutRegexCharSequence(inner.subSequence(start, end), timeoutMillis, stringToMatch,
regularExpression, checkInterval);
}
@Override
public String toString() {
return inner.toString();
}
}
}
还有自定义异常,所以你可以只捕获那个异常以避免吞下其他 RE 模式/匹配器可能抛出的异常。
public class RegexpTimeoutException extends RuntimeException {
private static final long serialVersionUID = 6437153127902393756L;
private final String regularExpression;
private final String stringToMatch;
private final long timeoutMillis;
public RegexpTimeoutException() {
super();
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String message, Throwable cause) {
super(message, cause);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String message) {
super(message);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(Throwable cause) {
super(cause);
regularExpression = null;
stringToMatch = null;
timeoutMillis = 0;
}
public RegexpTimeoutException(String regularExpression, String stringToMatch, long timeoutMillis) {
super("Timeout occurred after " + timeoutMillis + "ms while processing regular expression '"
+ regularExpression + "' on input '" + stringToMatch + "'!");
this.regularExpression = regularExpression;
this.stringToMatch = stringToMatch;
this.timeoutMillis = timeoutMillis;
}
public String getRegularExpression() {
return regularExpression;
}
public String getStringToMatch() {
return stringToMatch;
}
public long getTimeoutMillis() {
return timeoutMillis;
}
}
基于Andreas' answer。主要功劳应该归于他和他的来源。