【问题标题】:Separate list element while scraping from web with commas使用逗号从 Web 抓取时分隔列表元素
【发布时间】:2020-10-11 03:55:17
【问题描述】:

我是从网页上抓取数据,一个div里有li元素,网页界面是这样的

 Job Description:
• Developing application programming interfaces (APIs) to support mobile functionality
• Keeping up to date with the terminology, concepts and best practices for coding mobile apps
• Using and adapting existing web applications for apps
• working closely with colleagues to constantly innovate app functionality and design

这是我对这些部分的抓取代码的一部分,如下所示(job 和 jobTtle 是 JSON 数组)

Elements ele3=doc.select("div.job-sections div[itemprop=description] section#st-jobDescription");
for (Element element3 : ele3.select("div[itemprop=responsibilities] ul")) {
     String job_description=element3.select("li").text();
     job.put(jobTitle.put(new JSONObject().put("description",job_description)));
}

这样的输出

{"description" : "Developing application programming interfaces (APIs) to support mobile functionality Keeping up to date with the terminology, concepts and best practices for coding mobile apps Using and adapting existing web applications for apps Working closely with colleagues to constantly innovate app functionality and design"}

但我想用逗号分隔每个 li 元素,所以输出应该是这样的

{"description" : ["Developing application programming interfaces (APIs) to support mobile functionality", "Keeping up to date with the terminology, concepts and best practices for coding mobile apps", "Using and adapting existing web applications for apps", "Working closely with colleagues to constantly innovate app functionality and design"]}

我该如何解决这个问题?有人可以帮忙吗? 谢谢

【问题讨论】:

    标签: java json web-scraping jsoup org.json


    【解决方案1】:

    您需要改变存储工作职责的方式。您正在创建所需类型为 JSON 数组的 JSON 对象。

    // JSON 数组

    Elements responsibilityElements = ele3.select("div[itemprop=responsibilities] ul li");
    
    JSONArray responsibilities = new JSONArray();
    
    for (Element responsibilityElement : responsibilityElements) {
         String description = responsibilityElement.text();
    
         responsibilities.put(description);
    }
    
    job.put("description", responsibilities);
    
    

    // 在单个字符串中

    Elements responsibilityElements = doc.select("ul li");
    //        Elements responsibilityElements = ele3.select("div[itemprop=responsibilities] ul li");
    
    List<String> lines = new ArrayList<>();
    
    for (Element responsibilityElement : responsibilityElements) {
        lines.add(responsibilityElement.text());
    }
    
    String description = String.join(", ", lines);
    job.put("description", description);
    

    【讨论】:

    • hmm 明白了你的意思,但问题是我不能用逗号分隔
    • 元素,当我存储它时,输出显示了所有
    • 元素的一行,我想用逗号分隔,怎么做?
  • @aarons 您的预期输出格式不清楚,在您表明您希望在 JSON 数组中包含 lis 的内容的问题中,请提供示例输出以便我提供帮助.
  • 感谢您的帮助,我的代码的预期输出看起来在这句话“输出像这样”,但我的预期输出看起来像这句话“但我想用逗号分隔每个 li 元素,所以输出应该是这样的”。我认为这很清楚。
  • 猜你喜欢
    相关资源
    最近更新 更多
    热门标签