【问题标题】:How to split a string by quotation marks, site operator and non quotation mark?如何用引号,站点运算符和非引号分割字符串?
【发布时间】:2020-03-31 16:16:18
【问题描述】:

我收到这样的用户请求

site:www.example.com \"hello world\" \"hi abc\" where are you

我想从这个字符串中提取并保存 url 然后从上面的字符串中删除它,它应该看起来像这样"hello world" "hi abc" where are you 现在将剩余的字符串拆分为两个字符串数组

String str1 = {hello world, hi abc};
String str2 = {where, are, you};

我怎样才能在java中做到这一点? 用户查询可以是任何顺序。各种例子:

 "hi" excitement site:www.example.com \"hello world\" \"hi abc\" where are you "amazing"   
OR
    Hello World friends
OR
 Greeting is an "act of communication" human beings "intentionally"  

【问题讨论】:

标签: java string parsing


【解决方案1】:

我认为这段代码可以帮助你:

static class ExtractResponse {
    String newStr;
    String site;
}

public static ExtractResponse extractSite(String origin) {
    Pattern pattern = Pattern.compile("site:\\S* ");
    Matcher matcher = pattern.matcher(origin);

    ExtractResponse response = new ExtractResponse();
    StringBuffer buffer = new StringBuffer();
    while (matcher.find()) {
        response.site = matcher.group().substring(5); // 5 is length of "site:"
        matcher.appendReplacement(buffer, "");
    }
    matcher.appendTail(buffer);

    response.newStr = buffer.toString();
    return response;
}

它将返回包含新字符串的响应,其中不包含站点:* 和站点 url。 例如,我使用了来自 answer 和 cmets 的案例:

public static void main(String[] args) {
    String str1 = "site:www.example.com \"hello world\" \"hi abc\" where are you";
    String str2 = "\"hello world\" \"hi abc\" site:www.example.com where are you";

    ExtractResponse response1 = extractSite(str1);
    System.out.println(response1.newStr);
    System.out.println(response1.site);

    ExtractResponse response2 = extractSite(str2);
    System.out.println(response2.newStr);
    System.out.println(response2.site);
}

输出:

“你好世界”“嗨 abc”你在哪里

www.example.com

“你好世界”“嗨 abc”你在哪里

www.example.com

【讨论】:

    【解决方案2】:

    这是一个非常具体的问题,下面的逻辑可能对您有所帮助。我建议您在使用实际数据进行测试时完善这一点。

    public static void main(String[] args) {
        String test1 = "site:www.example.com \"hello world\" \"hi abc\" where are you";
        String regex = "\\b(https?|ftp|file|site):[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
        String[] info = test1.split("\"");
    
        //read url
        String url;
        if (info.length > 0 && info[0].trim().matches(regex))
            url = info[0].trim();
        else
            throw new RuntimeException("Not a valid input");
    
        // read str1
        String[] info1 = Arrays.copyOfRange(info, 1, info.length - 1);
        String str1 = mkString(info1, ",");
    
        //read str2
        String[] info2 = info[info.length - 1].trim().split("\\s");
        String str2 = mkString(info2, ",");
    
    
        System.out.println("URL: " + url + " STR1: " + str1 + " STR2: " + str2);
    
    }
    
    // returns a delimited and curly parentheses {} enclosed string
    public static String mkString(String[] input, String delimeter) {
        String result = "{";
        for (int i = 0; i < input.length - 1; i++) {
            if (input[i].trim().length() > 0) {
                result += (input[i] + delimeter);
            }
        }
        result += (input[input.length - 1] + "}");
        return result;
    }
    

    【讨论】:

    • 用户可以按任意顺序查询:"hi" 兴奋站点:www.example.com \"hello world\" \"hi abc\" where are you "amazing"
    • 能否请您更新此问题?里面不清楚。此外,您要问的是一个非常具体的用例。我建议你使用正则表达式,就像我上面的 url 一样,也可以用于其他字符串。例如,对于带引号的字符串,正则表达式将是 - (?:(?
    猜你喜欢
    • 2011-03-26
    • 1970-01-01
    • 1970-01-01
    • 2011-06-25
    • 2018-02-07
    • 1970-01-01
    • 2013-12-13
    • 1970-01-01
    • 2011-08-09
    相关资源
    最近更新 更多