【问题标题】:Splitting paragraph based on special characters and parantheses and brackets in javajava中基于特殊字符和括号和方括号分割段落
【发布时间】:2015-02-24 11:31:33
【问题描述】:

我正在尝试根据句号逗号和 () {} 和 [] 等括号将一个小段落拆分为句子。如何使用 Java 正则表达式来做这样的事情

例如,如果我有一个类似的段落

到目前为止,我喜欢这辆车,而且驾驶起来很有趣。它作为日常驾驶员工作得非常好,并且有一些不错的踢球,尽管它离跑车还很远。我对这辆车有一个主要问题。沃尔沃在挡风玻璃中内置了加热元件(每隔几毫米的小电线)。到了晚上,所有的灯光都会从这些电线上反射出来,使灯光变得模糊。这是一个巨大的安全问题,非常烦人。仅由于这个问题,我不确定我是否会保留这辆车很长时间。虽然有一个加热的方向盘很好,但如果你买这辆车,请跳过气候包。

拆分段落的结果应该是

到目前为止,我喜欢这辆车,而且驾驶起来很有趣

它作为日常驾驶非常好用,并且有一些不错的踢法

虽然距离跑车还很远

我对此有一个主要问题

沃尔沃在挡风玻璃中内置了加热元件

每隔几毫米的小电线

晚上所有的灯光都会从这些电线上反射出来,使灯光变得模糊

这是一个巨大的安全问题,非常烦人

仅仅因为这个问题

我不确定我是否会保留这辆车很长时间

虽然有一个加热的方向盘很好

如果您购买这辆车,请跳过气候套餐

【问题讨论】:

  • 请解释一下这个简单的编程问题的哪一部分给您带来了困难?如果答案是“全部”,那么我建议您阅读vogella.com/tutorials/JavaRegularExpressions/article.html 和/或全部docs.oracle.com/javase/tutorial/essential/regex
  • 你不能使用正则表达式来做到这一点,因为你的语法显然是上下文无关的语法。想象一句“你好(美丽的(确实)世界)”。解析树是什么?是 ['hello', 'beautiful', 'indeed', 'world'] 还是 ['hello', 'beautiful world', 'indeed']?或者只是['你好','美丽的(确实)世界']?正则表达式不会构建可以检查您的解析深度的自动机,这将不允许您配对 ()、[] 等标记。

标签: java regex pattern-matching paragraph


【解决方案1】:

你可以试试这样的:

String str = "...";
str = str.replaceAll(" ?[,.()]+ ?", System.getProperty("line.separator"));

如果你想要一个数组,使用这个:

    String[] strArr = str.split(" ?[,.()]+ ?");
    for(String strr : strArr)
    {
        System.out.println(strr);
    }

产量:

So far I like this car and it is fun to drive
It works very well as a daily driver and has some good kick
although it is still far from a sports car
I have one major issue with this car
Volvo has built heating elements into the windshield
small wires every few millimeters
At night all lights reflect off of these wires and makes the lights blurry
It is a huge safety issue and is extremely annoying
Due to this issue alone
I am not sure if I will keep this car very long
Although it is nice to have a heated steering wheel
skip the Climate Package if you buy this car

【讨论】:

    【解决方案2】:

    试试这个正则表达式:

     \s*[.,()\[\]{}]+\s*
    

    示例代码

    public class Main {
        public static void main(String[] args) {
            String str = "So far I like this car and it is fun to drive. It works very well as a daily driver and has some good kick, although it is still far from a sports car. I have one major issue with this car.Volvo has built heating elements into the windshield (small wires every few millimeters). At night all lights reflect off of these wires and makes the lights blurry. It is a huge safety issue and is extremely annoying. Due to this issue alone, I am not sure if I will keep this car very long. Although it is nice to have a heated steering wheel, skip the Climate Package if you buy this car.";
    
            String[] output = str.split("\\s*[.,()\\[\\]{}]+\\s*");
    
            for (String s : output) {
                System.out.println(s + System.getProperty("line.separator"));
            }
        }
    }
    

    输出

    So far I like this car and it is fun to drive
    
    It works very well as a daily driver and has some good kick
    
    although it is still far from a sports car
    
    I have one major issue with this car
    
    Volvo has built heating elements into the windshield
    
    small wires every few millimeters
    
    At night all lights reflect off of these wires and makes the lights blurry
    
    It is a huge safety issue and is extremely annoying
    
    Due to this issue alone
    
    I am not sure if I will keep this car very long
    
    Although it is nice to have a heated steering wheel
    
    skip the Climate Package if you buy this car
    

    【讨论】:

      【解决方案3】:

      这应该可行:

      String reg = "\\ ?[.,()\\[\\]{}]+\\ ?";
      String[] res = str.split(reg);
      

      【讨论】:

        猜你喜欢
        • 2015-02-10
        • 1970-01-01
        • 2017-08-24
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-10-13
        • 1970-01-01
        相关资源
        最近更新 更多