【问题标题】:Java library for HTML to Java (POJO) conversion用于 HTML 到 Java (POJO) 转换的 Java 库
【发布时间】:2013-01-26 12:29:09
【问题描述】:

使用Apache Velocity Api,我们可以将Java 对象(列表、POJO 等)与(HTML)template 组合并创建(HTML)输出。

是否有任何 Java API 可以帮助对此进行逆向工程?此 API 的输入可以是 HTML 输出和使用的模板,输出应该是用于生成输出的数据(Java/XML 格式)。

我知道HTTP Unit API,但这只是让我提取 HTML 元素(如表格)。我正在寻找基于某个模板提取数据的东西。

【问题讨论】:

    标签: java html reverse-engineering templating


    【解决方案1】:

    您可以使用 google protobuf 来转换不同类型的消息。定义模板也很容易。我使用 JSON.parse() 创建 JavaScript 对象,在 Java 中,您可以使用 protobuf 将 JSON 转换为 Java 对象。

    1. http://code.google.com/p/protobuf/
    2. http://code.google.com/p/protobuf-java-format/

    【讨论】:

      【解决方案2】:

      我的回答可能对这个问题的作者没有用(我迟到了 5 年,所以我猜这不是正确的时间)但因为这是我在输入 HTML to POJO 时在 Google 上找到的第一个结果,我想它可能对可能遇到此答案的许多其他开发人员有用。

      今天,我刚刚(以我公司的名义)发布了一个 HTML 到 POJO 的完整框架,您可以使用它来将 HTML 映射到任何 POJO 类,只需一些注释。该库本身非常方便,并且具有许多其他功能,同时非常易于插入。你可以在这里看看:https://github.com/whimtrip/jwht-htmltopojo

      如何使用:基础知识

      假设我们需要解析以下 html 页面:

      <html>
          <head>
              <title>A Simple HTML Document</title>
          </head>
          <body>
              <div class="restaurant">
                  <h1>A la bonne Franquette</h1>
                  <p>French cuisine restaurant for gourmet of fellow french people</p>
                  <div class="location">
                      <p>in <span>London</span></p>
                  </div>
                  <p>Restaurant n*18,190. Ranked 113 out of 1,550 restaurants</p>  
                  <div class="meals">
                      <div class="meal">
                          <p>Veal Cutlet</p>
                          <p rating-color="green">4.5/5 stars</p>
                          <p>Chef Mr. Frenchie</p>
                      </div>
      
                      <div class="meal">
                          <p>Ratatouille</p>
                          <p rating-color="orange">3.6/5 stars</p>
                          <p>Chef Mr. Frenchie and Mme. French-Cuisine</p>
                      </div>
      
                  </div> 
              </div>    
          </body>
      </html>
      

      让我们创建我们想要映射到的 POJO:

      public class Restaurant {
      
          @Selector( value = "div.restaurant > h1")
          private String name;
      
          @Selector( value = "div.restaurant > p:nth-child(2)")
          private String description;
      
          @Selector( value = "div.restaurant > div:nth-child(3) > p > span")    
          private String location;    
      
          @Selector( 
              value = "div.restaurant > p:nth-child(4)"
              format = "^Restaurant n\*([0-9,]+). Ranked ([0-9,]+) out of ([0-9,]+) restaurants$",
              indexForRegexPattern = 1,
              useDeserializer = true,
              deserializer = ReplacerDeserializer.class,
              preConvert = true,
              postConvert = false
          )
          // so that the number becomes a valid number as they are shown in this format : 18,190
          @ReplaceWith(value = ",", with = "")
          private Long id;
      
          @Selector( 
              value = "div.restaurant > p:nth-child(4)"
              format = "^Restaurant n\*([0-9,]+). Ranked ([0-9,]+) out of ([0-9,]+) restaurants$",
              // This time, we want the second regex group and not the first one anymore
              indexForRegexPattern = 2,
              useDeserializer = true,
              deserializer = ReplacerDeserializer.class,
              preConvert = true,
              postConvert = false
          )
          // so that the number becomes a valid number as they are shown in this format : 18,190
          @ReplaceWith(value = ",", with = "")
          private Integer rank;
      
          @Selector(value = ".meal")    
          private List<Meal> meals;
      
          // getters and setters
      
      }
      

      现在还有Meal 类:

      public class Meal {
      
          @Selector(value = "p:nth-child(1)")
          private String name;
      
          @Selector(
              value = "p:nth-child(2)",
              format = "^([0-9.]+)\/5 stars$",
              indexForRegexPattern = 1
          )
          private Float stars;
      
          @Selector(
              value = "p:nth-child(2)",
              // rating-color custom attribute can be used as well
              attr = "rating-color"
          )
          private String ratingColor;
      
          @Selector(
              value = "p:nth-child(3)"
          )
          private String chefs;
      
          // getters and setters.
      }
      

      我们在 github 页面上对上述代码提供了更多解释。

      目前,让我们看看如何废弃它。

      private static final String MY_HTML_FILE = "my-html-file.html";
      
      public static void main(String[] args) {
      
      
          HtmlToPojoEngine htmlToPojoEngine = HtmlToPojoEngine.create();
      
          HtmlAdapter<Restaurant> adapter = htmlToPojoEngine.adapter(Restaurant.class);
      
          // If they were several restaurants in the same page, 
          // you would need to create a parent POJO containing
          // a list of Restaurants as shown with the meals here
          Restaurant restaurant = adapter.fromHtml(getHtmlBody());
      
          // That's it, do some magic now!
      
      }
      
      
      private static String getHtmlBody() throws IOException {
          byte[] encoded = Files.readAllBytes(Paths.get(MY_HTML_FILE));
          return new String(encoded, Charset.forName("UTF-8"));
      
      }
      

      另一个简短的例子可以找到here

      希望这会对那里的人有所帮助!

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-12-22
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多