【发布时间】:2019-12-26 07:22:38
【问题描述】:
<html>
<head>
</head>
<body>
<div style="width: 100%;"> This question already
</div>
<div id="player"> hi crawler4j </div>
<script>
player = new Clappr.Player({source: "http://123.30.215.65/hls/4545780bfa790819/5/3/d836ad614748cdab11c9df291254cf836f21144da20bf08142455a8735b328ca/dnR2MQ==_m.m3u8",
parentId: '#player',
width: '100%', height: "100%",
hideMediaControl: true,
autoPlay: true
});
</script>
</body>
</html>
<!-- begin snippet: js hide: false console: true babel: false -->
在我上面作为示例给出的代码行中,我执行以下操作;
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String body = htmlParseData.getHtml();
crawler4j 将<script> </script> 标记之间的行检测为文本。
我想删除body变量中<script> </script>标签之间的所有内容,然后执行getText()。
你帮帮我好吗?
我想打印出来:
This question already
hi crawler4j
【问题讨论】:
标签: web-crawler html-parsing crawler4j