无法使用 HtmlUnitDriver [Selenium WebDriver java] 截屏答案

【问题标题】：Not able to take screenShot using HtmlUnitDriver [Selenium WebDriver java]无法使用 HtmlUnitDriver [Selenium WebDriver java] 截屏
【发布时间】：2016-03-28 01:32:10
【问题描述】：

我想使用 HtmlUnitDriver 截取页面的屏幕截图，我遇到了这个Link，这个人在其中制作了一个自定义 HTML 单元驱动程序来截取屏幕截图。但不幸的是，在实施时我遇到了一个异常。

“线程“main”中的异常 java.lang.ClassCastException: [B 无法转换为 java.io.File 在 Test.main(Test.java:39)"

我的代码如下-

import java.io.File;
import java.io.IOException;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.WebDriver;
import com.gargoylesoftware.htmlunit.BrowserVersion;

public class Test extends ScreenCaptureHtmlUnitDriver {

    public static void main(String[] args) throws InterruptedException, IOException {

        WebDriver driver = new ScreenCaptureHtmlUnitDriver(BrowserVersion.FIREFOX_38);
        driver.get("https://www.google.com/?gws_rd=ssl");
        try{
        File scrFile = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.FILE);
        FileUtils.copyFile(scrFile, new File("D:\\TEMP.PNG"));
        }catch (Exception e) {
            e.printStackTrace();
        }
    }
}

我正在使用的 HtmlUnit 驱动程序（链接中的那个）是这个-

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.net.URL;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
import org.apache.commons.io.FilenameUtils;
import org.apache.commons.io.IOUtils;
import org.openqa.selenium.Capabilities;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriverException;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
import org.openqa.selenium.internal.Base64Encoder;
import org.openqa.selenium.remote.CapabilityType;
import org.openqa.selenium.remote.DesiredCapabilities;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.WebWindow;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class ScreenCaptureHtmlUnitDriver extends HtmlUnitDriver implements TakesScreenshot {

private static Map<String, byte[]> imagesCache = Collections.synchronizedMap(new HashMap<String, byte[]>());

private static Map<String, String> cssjsCache = Collections.synchronizedMap(new HashMap<String, String>());

// http://stackoverflow.com/questions/4652777/java-regex-to-get-the-urls-from-css
private final static Pattern cssUrlPattern = Pattern.compile("background(-image)?[\\s]*:[^url]*url[\\s]*\\([\\s]*([^\\)]*)[\\s]*\\)[\\s]*");// ?<url>

public ScreenCaptureHtmlUnitDriver() {
    super();
}

public ScreenCaptureHtmlUnitDriver(boolean enableJavascript) {
    super(enableJavascript);
}

public ScreenCaptureHtmlUnitDriver(Capabilities capabilities) {
    super(capabilities);
}

public ScreenCaptureHtmlUnitDriver(BrowserVersion version) {
    super(version);
    DesiredCapabilities var = ((DesiredCapabilities) getCapabilities());
    var.setCapability(CapabilityType.TAKES_SCREENSHOT, true);
}

//@Override
@SuppressWarnings("unchecked")
public <X> X getScreenshotAs(OutputType<X> target) throws WebDriverException {
    byte[] archive = new byte[0];
    try {
        archive = downloadCssAndImages(getWebClient(), (HtmlPage) getCurrentWindow().getEnclosedPage());
    } catch (Exception e) {
    }
    if(target.equals(OutputType.BASE64)){
        return target.convertFromBase64Png(new Base64Encoder().encode(archive));
    }
    if(target.equals(OutputType.BYTES)){
        return (X) archive;
    }
    return (X) archive;
}

// http://stackoverflow.com/questions/2244272/how-can-i-tell-htmlunits-webclient-to-download-images-and-css
protected byte[] downloadCssAndImages(WebClient webClient, HtmlPage page) throws Exception {
    WebWindow currentWindow = webClient.getCurrentWindow();
    Map<String, String> urlMapping = new HashMap<String, String>();
    Map<String, byte[]> files = new HashMap<String, byte[]>();
    WebWindow window = null;
    try {
        window = webClient.getWebWindowByName(page.getUrl().toString()+"_screenshot");
        webClient.getPage(window, new WebRequest(page.getUrl()));
    } catch (Exception e) {
        window = webClient.openWindow(page.getUrl(), page.getUrl().toString()+"_screenshot");
    }

    String xPathExpression = "//*[name() = 'img' or name() = 'link' and (@type = 'text/css' or @type = 'image/x-icon') or  @type = 'text/javascript']";
    List<?> resultList = page.getByXPath(xPathExpression);

    Iterator<?> i = resultList.iterator();
    while (i.hasNext()) {
        try {
            HtmlElement el = (HtmlElement) i.next();
            String resourceSourcePath = el.getAttribute("src").equals("") ? el.getAttribute("href") : el
                    .getAttribute("src");
            if (resourceSourcePath == null || resourceSourcePath.equals(""))
                continue;
            URL resourceRemoteLink = page.getFullyQualifiedUrl(resourceSourcePath);
            String resourceLocalPath = mapLocalUrl(page, resourceRemoteLink, resourceSourcePath, urlMapping);
            urlMapping.put(resourceSourcePath, resourceLocalPath);
            if (!resourceRemoteLink.toString().endsWith(".css")) {
                byte[] image = downloadImage(webClient, window,  resourceRemoteLink);
                files.put(resourceLocalPath, image);
            } else {
                String css = downloadCss(webClient, window, resourceRemoteLink);
                for (String cssImagePath : getLinksFromCss(css)) {
                    URL cssImagelink = page.getFullyQualifiedUrl(cssImagePath.replace("\"", "").replace("\'", "")
                            .replace(" ", ""));
                    String cssImageLocalPath = mapLocalUrl(page, cssImagelink, cssImagePath, urlMapping);
                    files.put(cssImageLocalPath, downloadImage(webClient, window, cssImagelink));
                }
                files.put(resourceLocalPath, replaceRemoteUrlsWithLocal(css, urlMapping)
                        .replace("resources/", "./").getBytes());
            }
        } catch (Exception e) {
        }
    }
    String pagesrc =  replaceRemoteUrlsWithLocal(page.getWebResponse().getContentAsString(), urlMapping);
    files.put("page.html", pagesrc.getBytes());
    webClient.setCurrentWindow(currentWindow);
    return createZip(files);
}

String downloadCss(WebClient webClient, WebWindow window, URL resourceUrl) throws Exception {
    if (cssjsCache.get(resourceUrl.toString()) == null) {
        cssjsCache.put(resourceUrl.toString(), webClient.getPage(window, new  WebRequest(resourceUrl))
                .getWebResponse().getContentAsString());

    }
    return cssjsCache.get(resourceUrl.toString());
}

byte[] downloadImage(WebClient webClient, WebWindow window, URL resourceUrl)  throws Exception {
    if (imagesCache.get(resourceUrl.toString()) == null) {
        imagesCache.put(
                resourceUrl.toString(),
                IOUtils.toByteArray(webClient.getPage(window, new  WebRequest(resourceUrl)).getWebResponse()
                        .getContentAsStream()));
    }
    return imagesCache.get(resourceUrl.toString());
}

 public static byte[] createZip(Map<String, byte[]> files) throws IOException      {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ZipOutputStream zipfile = new ZipOutputStream(bos);
    Iterator<String> i = files.keySet().iterator();
    String fileName = null;
    ZipEntry zipentry = null;
    while (i.hasNext()) {
        fileName = i.next();
        zipentry = new ZipEntry(fileName);
        zipfile.putNextEntry(zipentry);
        zipfile.write(files.get(fileName));
    }
    zipfile.close();
    return bos.toByteArray();
}

    List<String> getLinksFromCss(String css) {
    List<String> result = new LinkedList<String>();
    Matcher m = cssUrlPattern.matcher(css);
    while (m.find()) { // find next match
        result.add( m.group(2));
    }
    return result;
}

 String replaceRemoteUrlsWithLocal(String source, Map<String, String>  replacement) {
    for (String object : replacement.keySet()) {
        // background:url(http://org.com/images/image.gif)
        source = source.replace(object, replacement.get(object));
    }
    return source;
}

String mapLocalUrl(HtmlPage page, URL link, String path, Map<String, String>  replacementToAdd) throws Exception {
    String resultingFileName = "resources/" +    FilenameUtils.getName(link.getFile());
    replacementToAdd.put(path, resultingFileName);
    return resultingFileName;
}

}

更新

Andrew 提供的代码可以工作，但我想知道是否有一种方法可以只下载选定的资源。例如this 网站我只想下载那些id 为“//*[@id='cimage']”的验证码图片，因为下载所有资源需要很长时间。有没有一种方法可以让我们只下载特定资源。因为使用提供的现有代码下面的所有资源都会被下载。

byte[] zipFileBytes = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.BYTES);
FileUtils.writeByteArrayToFile(new File("D:\\TEMP.PNG"), zipFileBytes);

【问题讨论】：

你能添加完整的异常堆栈并告诉“B”的类型吗？
嗨 Florent 我编辑了代码并使用 printstacktrace 添加了 try catch，但我仍然收到“java.lang.ClassCastException: [B 无法在 Test.main(Test. java:19)" 作为堆栈跟踪
嗨，如果不是，是否有必要使用 HtmlUnitDriver，请在说话截图时使用 Phantom js 更好
我对phantom js不熟悉！我们可以将 phantom js 与 selenium web 驱动程序一起使用吗？因为上面的代码只是我试图通过无头浏览器截取网页截图的较大代码的一部分
我想使用 Html 单元驱动程序的原因是它的速度。虽然 phantom.js 也比 chrome 和 firefox 快，但不如 HtmlUnit 驱动！

标签： java selenium selenium-webdriver htmlunit-driver

【解决方案1】：

错误表明代码正在尝试将byte[] 转换为File。很容易看出为什么如果你只是从 getScreenshotAs 中去掉未使用的路径：

public <X> X getScreenshotAs(OutputType<X> target) throws WebDriverException {
    byte[] archive = new byte[0];
    try {
        archive = downloadCssAndImages(getWebClient(), (HtmlPage) getCurrentWindow().getEnclosedPage());
    } catch (Exception e) {
    }
    return (X) archive;
}

您无法从中获得File。不支持OutputType.FILE，所以你必须自己处理文件输出。幸运的是，这很容易。您可以将代码更改为：

byte[] zipFileBytes = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.BYTES);
FileUtils.writeByteArrayToFile(new File("D:\\TEMP.PNG"), zipFileBytes);

请参阅FileUtils.writeByteArrayToFile() 了解更多信息。

【讨论】：

非常感谢你，安德鲁，效果很好:) 我在我的问题中添加了更多细节，你可以看看！非常感谢。

【解决方案2】：

看看这个可能对你有帮助

File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("C:/Users/home/Desktop/screenshot.png"));// copy it somewhere

【讨论】：

这没有解释/处理实际的错误消息，误解了问题和 OP 试图实现的目标，并将用另一个错误替换一个错误。
这是截取当前网页屏幕截图的最简单方法
这不是问题所在。 OP 想知道如何通过自定义版本的 HtmlUnitDriver 中的 "ClassCastException: [B cannot be cast to java.io.File"，通常根本无法截屏。