【发布时间】:2019-09-12 08:10:54
【问题描述】:
我正在使用 Fuzi Swift 库来解析this hackernews page。
我只需要提取包含主要帖子详细信息的帖子的顶部描述(即“也许 HN 可以帮助解决这个小谜团......low.com/a/55711457/2251982)
附上截图:
这是我的xpath 代码:
print("Description: \(String(describing: document.xpath("//*[@id=\"hnmain\"]//tr[2]/td/table[1]//tr[4]/td").first?.rawXML))")
但我的输出一直显示两个表格,即顶部的帖子以及评论表格:
Description: Optional("<td>Maybe HN can help solve this little mystery. The default font sizes in HTML have, since at least 1998 [1], been .83em and .67em for h5 and h6, respectively, making them smaller than normal text by default (1em). This leads to the bizarre situation that without any styling, the h5 and h6 headings are smaller than the text they head!<p>Does anyone know why headings were made smaller than normal text? I bet the answer is buried in some mailing list from the mid 90s, but so far my searches have not been fruitful. Perhaps someone here was around at the time of, or was even involved in, this decision.<p>[1] https://stackoverflow.com/a/55711457/2251982</p></p>\n <tr style=\"height:10px\"/><tr><td colspan=\"2\"/><td>\n <form method=\"post\" action=\"comment\"><input type=\"hidden\" name=\"parent\" value=\"19722704\"><input type=\"hidden\" name=\"goto\" value=\"item?id=19722704\"><input type=\"hidden\" name=\"hmac\" value=\"78883e7dccb14e8eed04ba1f3b825085ecd4c545\"><textarea name=\"text\" rows=\"6\" cols=\"60\"/>\n <br><br><input type=\"submit\" value=\"add comment\"/>\n </br></br>\n </input><br><br>\n <table border=\"0\" class=\"comment-tree\">\n <tr class=\"athing comtr \" id=\"19725000\"><td>\n <table border=\"0\"> <tr> <td class=\"ind\"><img src=\"s.gif\" height=\"1\" width=
为什么还要选择第二张桌子?
【问题讨论】:
标签: html parsing xpath web-scraping xml-parsing