【问题标题】:Download an image through URL通过 URL 下载图像
【发布时间】:2013-01-29 18:13:38
【问题描述】:

我正在尝试在此链接上下载高分辨率产品图片

http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm

当点击下载高分辨率照片时,我可以轻松下载它,但是当我尝试复制图像 URL,然后从其他选项卡下载时,我得到了 3005_75310 .jpg 不存在。

所以我尝试从第一个请求中查看请求标头并将它们设置在我的 URL java 对象中,但是创建的文件是空的,有人知道吗?

public static void saveImage(String imageUrl, String destinationFile) {
    URL url;
    try {
        url = new URL(imageUrl);
        URLConnection uc = url.openConnection();

        uc.setRequestProperty("Accept",
                "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
        uc.setRequestProperty("Accept-Charset",
                "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
        uc.setRequestProperty("Accept-Encoding", "gzip,deflate,sdch");
        uc.setRequestProperty("Accept-Language", "en-US,en;q=0.8");
        uc.setRequestProperty("Connection", "keep-alive");

        uc.setRequestProperty(
                "Referer",
                "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm");

        InputStream is = url.openStream();
        OutputStream os = new FileOutputStream(destinationFile);

        byte[] b = new byte[2048];
        int length;

        while ((length = is.read(b)) != -1) {
            os.write(b, 0, length);
        }

        is.close();
        os.close();
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

【问题讨论】:

  • 您的请求返回什么HTTP状态码?
  • 尝试更准确的推荐人,例如已经发布的链接。
  • 当我将 URL (hookerfurniture.com/index.cfm/furniture/…) 复制到新标签页时,200 OK,但 3005_75310.jpg 不存在。请访问 Hookerfurniture.com 了解产品信息。那么这里发生了什么?但是当从他们的网站内点击时,下载开始正常
  • 你去设置你的 UrlConnection 很麻烦,然后你使用 url.openStream() 而不是 uc.getInputStream()....

标签: java url network-programming download


【解决方案1】:

所提供的引荐来源网址并不是网站编码人员所期望的一种防止您执行抓取的方法。示例工作请求:

$ wget \
  --referer=http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm \
  http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg


Length: unspecified [image/jpeg]
Saving to: `3005_75310.jpg'

    [  <=>                                                                                ] 346,125      949K/s   in 0.4s

2013-01-29 13:24:02 (949 KB/s) - `3005_75310.jpg' saved [346125]

【讨论】:

    【解决方案2】:

    不管怎样,看起来唯一重要的标题是“Referer”标题:

    这失败了:

    curl "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg" > /test/3005_75310.jpg
    

    这行得通:

    curl -H "Referer: http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.show-product/American-furniture/3005-75310/spindle-back-side-chair---ebony.cfm" "http://www.hookerfurniture.com/index.cfm/furniture/furniture-catalog.photo-download/photo/3005_75310.jpg" > /test/3005_75310.jpg
    

    对于在 Java 中提取图像数据,我发现使用 DataInputStream 的 readFully() 方法最成功。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-12-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多