如何截取网页截图？答案

【问题标题】：How to take a webpage screenshot?如何截取网页截图？
【发布时间】：2011-10-25 08:14:03
【问题描述】：

我在下面使用此代码，但生成的图像已损坏。我想可能是因为渲染选项。有人知道发生了什么吗？

package webpageprinter;

import java.net.URL;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import java.beans.PropertyChangeListener;
import java.beans.PropertyChangeEvent;
import javax.swing.text.html.*;
import java.awt.*;
import javax.swing.*;
import java.io.*;

public class WebPagePrinter {
private BufferedImage image = null;

public BufferedImage Download(String webpageurl) {
try
{
    URL url = new URL(webpageurl);
    final JEditorPane jep = new JEditorPane();
    jep.setContentType("text/html");
    ((HTMLDocument)jep.getDocument()).setBase(url);
    jep.setEditable(false);
    jep.setBounds(0,0,1024,768);
    jep.addPropertyChangeListener("page",new
    PropertyChangeListener() {
                @Override
    public void propertyChange(PropertyChangeEvent e) {
    try
    {
        image = new
        BufferedImage(1024,768,BufferedImage.TYPE_INT_RGB );
        Graphics g = image.getGraphics();
        Graphics2D graphics = (Graphics2D) g;
        graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
        jep.paint(graphics);
        ImageIO.write(image,"png",new File("C:/webpage.png"));
    }
    catch (Exception re)
    {
        re.printStackTrace();
    }
    }});
    jep.setPage(url);

}
catch (Exception e)
{
e.printStackTrace();
}
return image;
}

    public static void main(String[] args) {

        new WebPagePrinter().Download("http://www.google.com");

    }
}

【问题讨论】：

“生成的图像不知何故损坏了” 一张图片描绘了一千个单词。与其说那千言万语，不如上传一张图片给我们看？如果你能让它打破@ 400x300px（而不是1024x768px），那就更好了。
[在真正的浏览器中使用 Selenium 查看我的答案][1] [1]：stackoverflow.com/questions/1504034/…

标签： screenshot java

【解决方案1】：

我认为该代码存在 3 个问题和一个脆弱性：

问题

JEditorPane 从未打算成为浏览器。
setPage(URL) 异步加载。需要添加一个监听器来确定页面何时加载。
您可能会发现某些网站会自动拒绝与 Java 客户端的连接。

脆弱

对setBounds() 的调用包含了脆弱的本性。使用布局。

400x600 的图像

但是看这张图，好像3在这里不适用，2不是问题。归结为第 1 点。JEditorPane 从未打算用作浏览组件。底部的那些随机字符是 JavaScript，JEP 不仅没有编写脚本，而且在页面中显示不正确。

【讨论】：

感谢上传图片。
我使用此代码的目的是拍摄一个不时不上的网页。我想在我办公室的大屏幕上显示图像，为此我已经有了一个 HTML 代码。我仍然需要在 Java 代码中实现计时器，但是您对我应该使用什么来代替 JEditorPane 有什么建议吗？
抱歉，我没有仔细研究。

【解决方案2】：

您可以使用 Java Robot (API Here) 进行整个屏幕截图。

import java.awt.AWTException;
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.Toolkit;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

public class RobotExp {

    public static void main(String[] args) {

        try {

            Robot robot = new Robot();
            // Capture the screen shot of the area of the screen defined by the rectangle
            BufferedImage bi=robot.createScreenCapture(new Rectangle(Toolkit.getDefaultToolkit().getScreenSize()));
            ImageIO.write(bi, "jpg", new File("C:/imageTest.jpg"));

        } catch (AWTException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

这个例子是在here. 找到的，我做了一些修改。

【讨论】：

是的，我已经尝试过了，但我的重点是拍摄一个没有上过的网页。我想稍后改进代码以不时自动截取屏幕截图。不过谢谢！

【解决方案3】：

您的问题是您使用 Java 的 JEditorPane 来呈现网页，该网页的 HTML 呈现引擎非常有限。它根本无法显示更复杂的网页以及现代浏览器。

如果您需要使用 Java 生成正确渲染的复杂网页的屏幕截图，最好的方法可能是 use Selenium to control a real browser，如 Firefox。

【讨论】：

【解决方案4】：

javadoc states

HTML 文本。在这种情况下使用的套件是类 javax.swing.text.html.HTMLEditorKit 提供 HTML 3.2 支持。

这可能解释了为什么页面看起来有点破损，因为现在的页面大多使用 HTML4、5 或 XHTML.....

这里有一篇关于 Java 浏览器组件的 SO 文章：Best Java/Swing browser component?

【讨论】：

【解决方案5】：

看看flying-saucer。非常适合从 HTML 页面生成图像和 pdf。

【讨论】：

他们的页面指定了XHTML，你用HTML4还是5试试？
当 HTML 无效时，我将它与 JSoup (jsoup.org) 结合使用 XHTML - Jsoup.parse(loadedHTML)
警告：它根本不包含 JS 支持。

【解决方案6】：

我在 Selenium WebDriver 中使用 VirtualFramebuffer 和 Firefox Binary 获得了最好的结果。这是在ubuntu下测试的。您需要安装 xvfb 和 Firefox。优点：你运行的是真实的浏览器，所以截图看起来就像是真实浏览器中的真实截图。

首先安装火狐和虚拟帧缓冲：

aptitude 安装 xvfb firefox

编译运行这个类，之后打开/tmp/screenshot.png

import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxBinary;
import org.openqa.selenium.firefox.FirefoxDriver;

public class CaptureScreenshotTest
{
    private static int DISPLAY_NUMBER=99;
    private static String XVFB="/usr/bin/Xvfb";
    private static String XVFB_COMMAND= XVFB + " :" + DISPLAY_NUMBER;
    private static String URL="http://www.google.com/";
    private static String RESULT_FILENAME="/tmp/screenshot.png";

    public static void main ( String[] args ) throws IOException
    {
        Process p = Runtime.getRuntime().exec(XVFB_COMMAND);
        FirefoxBinary firefox = new FirefoxBinary();
        firefox.setEnvironmentProperty("DISPLAY", ":" + DISPLAY_NUMBER);
        WebDriver driver = new FirefoxDriver(firefox, null);
        driver.get(URL);
        File scrFile = ( (TakesScreenshot) driver ).getScreenshotAs(OutputType.FILE);
        FileUtils.copyFile(scrFile, new File(RESULT_FILENAME));
        driver.close();
        p.destroy();
    }
}

【讨论】：