如何使用 casperjs 下载与我在浏览器中看到的页面相同的页面答案

【问题标题】：How to use casperjs to download a page same as the one I see in the broswer如何使用 casperjs 下载与我在浏览器中看到的页面相同的页面
【发布时间】：2013-11-07 09:20:09
【问题描述】：

不熟悉js和coffeescript，打算用casperjs下载页面，用python解析。但是我发现我下载的页面与我在浏览器中看到的页面不一样——实际上其中的某些部分在页面保存之前还没有加载。我猜原因可能是没有执行 onload 回调。如果我想下载与我在浏览器中看到的页面相同的页面，我应该怎么做？非常感谢！

我的代码（咖啡脚本）：

urls =
  'jd' : 'http://list.jd.com/652-654-831-0-0-0-0-0-0-0-1-1-1-1-1-72-4137-33.html'

casper = require("casper").create()

process = (urls) ->
  casper.start "", ->
    @echo "begin to work"
  for name, url of urls
    casper.thenOpen url, ->
      @echo @download url, "#{name}.html"

process(urls)

casper.run()

【问题讨论】：

也许你可以跳过中间步骤，直接用 Python 进行屏幕抓取：stackoverflow.com/questions/5272338/…
谢谢。看来我找到了原因：浏览器或 casperjs 只是不知道页面何时完全加载。在方法 download() 中，casperjs 将在下载所有 html 文本后保存页面，而不关心 js 的执行。

标签： javascript download coffeescript phantomjs casperjs

【解决方案1】：

正如您所见，casper.download() 实际上会下载文件。既然要当前页面源，可以使用casper.getHTML()。要将页面内容字符串实际写入文件，您可以使用 PhantomJS 提供的文件系统模块。它有一个fs.write() 函数。

把它们放在一起，在 JavaScript 中应该是这样的：

var fs = require("fs");
casper.start();
for(name in urls){
    casper.thenOpen(name, function(){
        this.echo("download " + name);
        fs.write(name+".html", this.getHTML(), "w");
    });
}
casper.run();

或者在 CoffeeScript 中这样：

casper = require("casper").create()
fs = require("fs")

casper.start "", ->
    @echo "begin to work"
for name, url of urls
    casper.thenOpen url, ->
        @echo "download " + name
        fs.write "#{name}.html", @getHTML(), "w"

casper.run()

【讨论】：