【问题标题】:Special text processing in JavaJava中的特殊文本处理
【发布时间】:2020-10-20 14:40:21
【问题描述】:

我有一个文档包含原始格式的文本,例如:

(11) test(1/2/3) for 11 (15) test(1/2/3) for 15 (21) test(1/2/3) for 21
(22) test(1/2/3) 
for 22
(30) test(1/2/3) for 30 (43) test(1/2/3) for 43
(45) test(1/2/3) 
for 45
(51) test(1/2/3) for 51 (54) test(1/2/3) for 54
(57) test(1/2/3) for 57
(62) test(1/2/3) for 62 (67) test(1/2/3) for 67
(71) test(1/2/3) for 71
(72) test(1/2/3) for 72 (73) test(1/2/3) for 73
(74) test(1/2/3) for 74
(75) test(1/2/3) for 75 (76) test(1/2/3) for 76
(85) test(1/2/3) for 85 (86) test(1/2/3) for 86
(87) test(1/2/3) for 87

我想将它提取到如下对象:

String s11 = test(1/2/3) for 11;
String s15 = test(1/2/3) for 15;
String s21 = test(1/2/3) for 21;
String s22 = test(1/2/3) for 22;
String s30 = test(1/2/3) for 30;
String s43 = test(1/2/3) for 43;
String s45 = test(1/2/3) for 45;
String s51 = test(1/2/3) for 51;
String s54 = test(1/2/3) for 54;
String s57 = test(1/2/3) for 57;
String s62 = test(1/2/3) for 62;
String s67 = test(1/2/3) for 67;
String s71 = test(1/2/3) for 71;
String s72 = test(1/2/3) for 72;
String s73 = test(1/2/3) for 73;
String s74 = test(1/2/3) for 74;
String s75 = test(1/2/3) for 75;
String s76 = test(1/2/3) for 76;
String s85 = test(1/2/3) for 85;
String s86 = test(1/2/3) for 86;
String s87 = test(1/2/3) for 87;

谁能给我一个关于如何用 Java 方式做到这一点的提示?

【问题讨论】:

  • 对不起,我再举一个例子来了解更多细节,上面的例子可能会导致误解

标签: java text-processing


【解决方案1】:

假设您可以容忍将整个文件读入单个 Java 字符串,那么 Java 的正则表达式引擎有一种干净的方式来处理这个问题:

String input = "(11) test(1/2/3) for 11 (15) test(1/2/3) for 15 (21) test(1/2/3) for 21";
String pattern = "\\(\\d+\\) test\\(.*?\\) for \\d+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
List<String> lines = new ArrayList<>();
while (m.find()) {
    lines.add(m.group(0));
    System.out.println(m.group(0));
}

打印出来:

(11) test(1/2/3) for 11
(15) test(1/2/3) for 15
(21) test(1/2/3) for 21

请注意,通常您不希望为每个匹配项创建单独的字符串实例。相反,您只需将所有匹配项添加到一个集合中,或者在匹配它们时一个接一个地处理它们。

【讨论】:

  • 我更新了问题以获取更多详细信息。在您的回答中,标签未修复。
  • 我已将您的问题回滚到原始版本,因为您的编辑完全使我的回答无效。
【解决方案2】:
String input = "(11) test(1/2/3) for 11 (15) test(1/2/3) for 15 (21) test(1/2/3) for 21";
String pattern = "\\(\\d+\\) test\\(.*?\\) for \\d+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
List<String> lines = new ArrayList<>();
while (m.find()) {
    lines.add(m.group(0));
    String[] split = m.group(0).split(" ");
    split[0] = split[0].replaceAll("\\p{P}","");
    System.out.println("String s"+split[0]+" = "+split[1] +" "+split[2]+" "+ split[3] );
}

输出:

String s11 = test(1/2/3) for 11
String s15 = test(1/2/3) for 15
String s21 = test(1/2/3) for 21

【讨论】:

    【解决方案3】:

    假设你的文本文档是这样的:

    (11) test(1/2/3) for 11 
    (15) test(1/2/3) for 15 
    (21) test(1/2/3) for 21
    (22) test(1/2/3) for 22
    (30) test(1/2/3) for 30 
    (43) test(1/2/3) for 43
    (45) test(1/2/3) for 45
    (51) test(1/2/3) for 51 
    (54) test(1/2/3) for 54
    (57) test(1/2/3) for 57
    (62) test(1/2/3) for 62 
    (67) test(1/2/3) for 67
    (71) test(1/2/3) for 71
    (72) test(1/2/3) for 72 
    (73) test(1/2/3) for 73
    (74) test(1/2/3) for 74
    (75) test(1/2/3) for 75 
    (76) test(1/2/3) for 76
    (85) test(1/2/3) for 85 
    (86) test(1/2/3) for 86
    (87) test(1/2/3) for 87
    

    那么你可以这样做:

    String filepath = "file.txt";
    File file = new File(filepath);
    Scanner sc = new Scanner(file);
    String pattern = "\\(\\d+\\) test\\(.*?\\) for \\d+";
    String input = sc.findInLine(pattern);
    while(sc.hasNextLine()){
        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(input);
        List<String> lines = new ArrayList<>();
        while (m.find()) {
            lines.add(m.group(0));
            //System.out.println(m.group(0));
            String[] split = m.group(0).split(" ");
            split[0] = split[0].replaceAll("\\p{P}","");
            System.out.println("String s"+split[0]+" = "+split[1] +" "+split[2]+" "+ split[3] );
        }
        sc.nextLine();
        input = sc.findInLine(pattern);
    }
    sc.close();
    

    输出:

    String s11 = test(1/2/3) for 11
    String s15 = test(1/2/3) for 15
    String s21 = test(1/2/3) for 21
    String s22 = test(1/2/3) for 22
    String s30 = test(1/2/3) for 30
    String s43 = test(1/2/3) for 43
    String s45 = test(1/2/3) for 45
    String s51 = test(1/2/3) for 51
    String s54 = test(1/2/3) for 54
    String s57 = test(1/2/3) for 57
    String s62 = test(1/2/3) for 62
    String s67 = test(1/2/3) for 67
    String s71 = test(1/2/3) for 71
    String s72 = test(1/2/3) for 72
    String s73 = test(1/2/3) for 73
    String s74 = test(1/2/3) for 74
    String s75 = test(1/2/3) for 75
    String s76 = test(1/2/3) for 76
    String s85 = test(1/2/3) for 85
    String s86 = test(1/2/3) for 86
    String s87 = test(1/2/3) for 87
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-08-09
      • 1970-01-01
      • 1970-01-01
      • 2013-12-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多