【问题标题】:Java equivalent to PHP's preg_replace_callbackJava 等价于 PHP 的 preg_replace_callback
【发布时间】:2010-09-27 09:17:30
【问题描述】:

我正在将应用程序从 PHP 迁移到 Java,并且代码中大量使用了正则表达式。我在 PHP 中遇到了一些似乎没有 java 等价物的东西:

preg_replace_callback()

对于正则表达式中的每个匹配,它都会调用一个函数,该函数将匹配文本作为参数传递。作为示例用法:

$articleText = preg_replace_callback("/\[thumb(\d+)\]/",'thumbReplace', $articleText);
# ...
function thumbReplace($matches) {
   global $photos;
   return "<img src=\"thumbs/" . $photos[$matches[1]] . "\">";
}

在 Java 中执行此操作的理想方法是什么?

【问题讨论】:

    标签: java php regex preg-replace


    【解决方案1】:

    重要提示:正如 cmets 中的 Kip 所指出的,如果匹配的正则表达式与替换字符串匹配,则此类具有无限循环错误。如有必要,我会将其作为练习留给读者进行修复。


    我不知道有什么类似的东西内置在 Java 中。使用 Matcher 类,你可以毫不费力地自己动手:

    import java.util.regex.*;
    
    public class CallbackMatcher
    {
        public static interface Callback
        {
            public String foundMatch(MatchResult matchResult);
        }
    
        private final Pattern pattern;
    
        public CallbackMatcher(String regex)
        {
            this.pattern = Pattern.compile(regex);
        }
    
        public String replaceMatches(String string, Callback callback)
        {
            final Matcher matcher = this.pattern.matcher(string);
            while(matcher.find())
            {
                final MatchResult matchResult = matcher.toMatchResult();
                final String replacement = callback.foundMatch(matchResult);
                string = string.substring(0, matchResult.start()) +
                         replacement + string.substring(matchResult.end());
                matcher.reset(string);
            }
        }
    }
    

    然后调用:

    final CallbackMatcher.Callback callback = new CallbackMatcher.Callback() {
        public String foundMatch(MatchResult matchResult)
        {
            return "<img src=\"thumbs/" + matchResults.group(1) + "\"/>";
        }
    };
    
    final CallbackMatcher callbackMatcher = new CallbackMatcher("/\[thumb(\d+)\]/");
    callbackMatcher.replaceMatches(articleText, callback);
    

    请注意,您可以通过调用matchResults.group()matchResults.group(0) 来获取整个匹配的字符串,因此无需将当前字符串状态传递给回调。

    编辑:使它看起来更像 PHP 函数的确切功能。

    这是原版,因为提问者喜欢它:

    public class CallbackMatcher
    {
        public static interface Callback
        {
            public void foundMatch(MatchResult matchResult);
        }
    
        private final Pattern pattern;
    
        public CallbackMatcher(String regex)
        {
            this.pattern = Pattern.compile(regex);
        }
    
        public String findMatches(String string, Callback callback)
        {
            final Matcher matcher = this.pattern.matcher(string);
            while(matcher.find())
            {
                callback.foundMatch(matcher.toMatchResult());
            }
        }
    }
    

    对于这个特定的用例,最好在回调中简单地对每个匹配进行排队,然后向后遍历它们。这将避免在修改字符串时重新映射索引。

    【讨论】:

    • 我实际上更喜欢您的原始答案,因为它对返回的字符串和索引进行了排队。然后反向应用它们。这种方式更简单,但似乎做更多的工作,必须重新扫描每个匹配的整个字符串。感谢您的建议!
    • 我重新添加了原始建议。预期的输入大小将决定重新扫描或排队然后替换是否更有效。我想人们也可以让替换方法将它们与替换字符串一起排队......
    • 错误...错了。显然,就 CPU 时间而言,排队总是更有效。不同之处在于它是否是一个值得担心的大问题。
    • 这有一个错误,即您在每次循环迭代结束时调用 matcher.reset()。如果替换字符串与模式匹配,您将进入无限循环。使用带有 StringBuffer 的 appendReplacement() 和 appendTail() 会更安全。
    • 不错的基普。我认为使用这些接口正确实现这一点的唯一方法是将匹配排队并在所有匹配操作完成后替换它们。我很困惑为什么你认为使用 StringBuffer 会有所帮助。除非您只是意味着它会提高性能,而不是使用 + 运算符。真正的症结在于,如果不破坏较高索引的匹配项,您就无法用较低索引替换匹配项。因此需要将它们排队并向后处理,或者在每次替换后重置匹配器。
    【解决方案2】:

    这是我对您的建议所做的最终结果。我认为如果有人遇到同样的问题,在这里会很好。生成的调用代码如下所示:

    content = ReplaceCallback.find(content, regex, new ReplaceCallback.Callback() {
        public String matches(MatchResult match) {
            // Do something special not normally allowed in regex's...
            return "newstring"
        }
    });
    

    整个班级列表如下:

    import java.util.regex.MatchResult;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    import java.util.Stack;
    
    /**
     * <p>
     * Class that provides a method for doing regular expression string replacement by passing the matched string to
     * a function that operates on the string.  The result of the operation is then used to replace the original match.
     * </p>
     * <p>Example:</p>
     * <pre>
     * ReplaceCallback.find("string to search on", "/regular(expression/", new ReplaceCallback.Callback() {
     *      public String matches(MatchResult match) {
     *          // query db or whatever...
     *          return match.group().replaceAll("2nd level replacement", "blah blah");
     *      }
     * });
     * </pre>
     * <p>
     * This, in effect, allows for a second level of string regex processing.
     * </p>
     *
     */
    public class ReplaceCallback {
        public static interface Callback {
            public String matches(MatchResult match);
        }
    
        private final Pattern pattern;
        private Callback callback;
    
        private class Result {
            int start;
            int end;
            String replace;
        }
    
        /**
         * You probably don't need this.  {@see find(String, String, Callback)}
         * @param regex     The string regex to use
         * @param callback  An instance of Callback to execute on matches
         */
        public ReplaceCallback(String regex, final Callback callback) {
            this.pattern = Pattern.compile(regex);
            this.callback = callback;
        }
    
        public String execute(String string) {
            final Matcher matcher = this.pattern.matcher(string);
            Stack<Result> results = new Stack<Result>();
            while(matcher.find()) {
                final MatchResult matchResult = matcher.toMatchResult();
                Result r = new Result();
                r.replace = callback.matches(matchResult);
                if(r.replace == null)
                    continue;
                r.start = matchResult.start();
                r.end = matchResult.end();
                results.push(r);
            }
            // Improve this with a stringbuilder...
            while(!results.empty()) {
                Result r = results.pop();
                string = string.substring(0, r.start) + r.replace + string.substring(r.end);
            }
            return string;
        }
    
        /**
         * If you wish to reuse the regex multiple times with different callbacks or search strings, you can create a
         * ReplaceCallback directly and use this method to perform the search and replace.
         *
         * @param string    The string we are searching through
         * @param callback  A callback instance that will be applied to the regex match results.
         * @return  The modified search string.
         */
        public String execute(String string, final Callback callback) {
            this.callback = callback;
            return execute(string);
        }
    
        /**
         * Use this static method to perform your regex search.
         * @param search    The string we are searching through
         * @param regex     The regex to apply to the string
         * @param callback  A callback instance that will be applied to the regex match results.
         * @return  The modified search string.
         */
        public static String find(String search, String regex, Callback callback) {
            ReplaceCallback rc = new ReplaceCallback(regex, callback);
            return rc.execute(search);
        }
    }
    

    【讨论】:

    • 我不会使用实例变量来存储回调,而是将其作为参数传递。将其存储为实例变量会使您的类在同时从单独的线程调用时出现意外行为。 (第二个回调将从第一个和第二个获得匹配)。
    【解决方案3】:

    当您可以在循环中使用 appendReplacement() 和 appendTail() 时,尝试模拟 PHP 的回调功能似乎需要大量工作:

    StringBuffer resultString = new StringBuffer();
    Pattern regex = Pattern.compile("regex");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
      // You can vary the replacement text for each match on-the-fly
      regexMatcher.appendReplacement(resultString, "replacement");
    }
    regexMatcher.appendTail(resultString);
    

    【讨论】:

    • 我认为一些 JDK 类确实有强大的功能,但这些功能有时隐藏在奇怪的类名或奇怪的方法名后面......虽然这里使用的 appendReplacement/appendTail 策略需要更少的代码, callback 策略(OP 选择的答案)更清晰、更明显!
    • 如果我需要匹配字符串来获得正确的替换怎么办?假设 subjectString 可能包含“foo bar”,但我需要将“foo”替换为“Jan”,将“bar”替换为“Goyvaerts”?
    • 使用foo|bar 作为您的正则表达式并在循环内查询regexMatcher.group() 以查看您需要追加哪个替换。
    • 这是正确答案。接受的答案会因某些输入而失败,因为它调用.reset()
    • 这与php的功能不太匹配——这里的替换字符串必须注意不要包含特殊字符和反向引用。使用Matcher.quoteReplacement
    【解决方案4】:

    我发现如果您返回的字符串可以再次匹配,jdmichal 的答案将无限循环;下面是防止这种匹配出现无限循环的修改。

    public String replaceMatches(String string, Callback callback) {
        String result = "";
        final Matcher matcher = this.pattern.matcher(string);
        int lastMatch = 0;
        while(matcher.find())
        {
            final MatchResult matchResult = matcher.toMatchResult();
            final String replacement = callback.foundMatch(matchResult);
            result += string.substring(lastMatch, matchResult.start()) +
                replacement;
            lastMatch = matchResult.end();
        }
        if (lastMatch < string.length())
            result += string.substring(lastMatch);
        return result;
    }
    

    【讨论】:

      【解决方案5】:

      我对这里的任何解决方案都不太满意。我想要一个无状态的解决方案。如果我的替换字符串碰巧与模式匹配,我不想最终陷入无限循环。当我这样做时,我添加了对limit 参数和返回的count 参数的支持。 (我使用AtomicInteger 来模拟通过引用传递整数。)我将callback 参数移到参数列表的末尾,以便更容易定义匿名类。

      这是一个使用示例:

      final Map<String,String> props = new HashMap<String,String>();
      props.put("MY_NAME", "Kip");
      props.put("DEPT", "R&D");
      props.put("BOSS", "Dave");
      
      String subjectString = "Hi my name is ${MY_NAME} and I work in ${DEPT} for ${BOSS}";
      String sRegex = "\\$\\{([A-Za-z0-9_]+)\\}";
      
      String replacement = ReplaceCallback.replace(sRegex, subjectString, new ReplaceCallback.Callback() {
        public String matchFound(MatchResult match) {
          String group1 = match.group(1);
          if(group1 != null && props.containsKey(group1))
            return props.get(group1);
          return match.group();
        }
      });
      
      System.out.println("replacement: " + replacement);
      

      这是我的 ReplaceCallback 类版本:

      import java.util.concurrent.atomic.AtomicInteger;
      import java.util.regex.*;
      
      public class ReplaceCallback
      {
        public static interface Callback {
          /**
           * This function is called when a match is made. The string which was matched
           * can be obtained via match.group(), and the individual groupings via
           * match.group(n).
           */
          public String matchFound(MatchResult match);
        }
      
        /**
         * Replaces with callback, with no limit to the number of replacements.
         * Probably what you want most of the time.
         */
        public static String replace(String pattern, String subject, Callback callback)
        {
          return replace(pattern, subject, -1, null, callback);
        }
      
        public static String replace(String pattern, String subject, int limit, Callback callback)
        {
          return replace(pattern, subject, limit, null, callback);
        }
      
        /**
         * @param regex    The regular expression pattern to search on.
         * @param subject  The string to be replaced.
         * @param limit    The maximum number of replacements to make. A negative value
         *                 indicates replace all.
         * @param count    If this is not null, it will be set to the number of
         *                 replacements made.
         * @param callback Callback function
         */
        public static String replace(String regex, String subject, int limit,
                AtomicInteger count, Callback callback)
        {
          StringBuffer sb = new StringBuffer();
          Matcher matcher = Pattern.compile(regex).matcher(subject);
          int i;
          for(i = 0; (limit < 0 || i < limit) && matcher.find(); i++)
          {
            String replacement = callback.matchFound(matcher.toMatchResult());
            replacement = Matcher.quoteReplacement(replacement); //probably what you want...
            matcher.appendReplacement(sb, replacement);
          }
          matcher.appendTail(sb);
      
          if(count != null)
            count.set(i);
          return sb.toString();
        }
      }
      

      【讨论】:

        【解决方案6】:
        public static String replace(Pattern pattern, Function<MatchResult, String> callback, CharSequence subject) {
            Matcher m = pattern.matcher(subject);
            StringBuffer sb = new StringBuffer();
            while (m.find()) {
                m.appendReplacement(sb, callback.apply(m.toMatchResult()));
            }
            m.appendTail(sb);
            return sb.toString();
        }
        

        用法示例:

        replace(Pattern.compile("cat"), mr -> "dog", "one cat two cats in the yard")
        

        会产生返回值:

        院子里一只狗两只狗

        【讨论】:

        • StringBuilder 的性能会稍微好一些:journaldev.com/137/stringbuffer-vs-stringbuilder
        • 我编辑它以将其更改为 StringBuilder,然后我意识到这不起作用,因为 appendReplacement 需要一个 StringBuffer。我恢复了它,对此感到抱歉。
        【解决方案7】:

        Matcher#replaceAll 是你要找的。​​p>

        Pattern.compile("random number")
            .matcher("this is a random number")
            .replaceAll(r -> "" + ThreadLocalRandom.current().nextInt()) 
        

        输出:

        this is a -107541873
        

        【讨论】:

          【解决方案8】:

          Java 9 引入了 Matcher#replaceAll 方法,该方法接受 Function&lt;MatchResult,String&gt; 以返回给定特定匹配的替换,这非常优雅。

          Patern.compile("regex").matcher("some string")
               .replaceAll(matchResult -> "something" + matchResult.group());
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2011-04-02
            • 1970-01-01
            • 2011-10-12
            • 1970-01-01
            • 2011-03-18
            • 2011-09-04
            • 1970-01-01
            • 2018-12-16
            相关资源
            最近更新 更多