【问题标题】:Using java is there a way to download an image from a website that doesn't have an extension in the typical format?使用 java 有没有办法从没有典型格式扩展名的网站下载图像?
【发布时间】:2020-04-16 04:23:01
【问题描述】:

随着整个 COVID-19 危机在世界各地发生,我决定开始一个书呆子的小项目。

我正在尝试大量制作卡片的数字副本,以在一款名为桌面模拟器的游戏上进行魔术聚会...我也有点生疏但想重新开始编程,为什么不呢?

我现在所处的位置:我制作了一个程序(来源如下),目前应该从具有所有常见扩展名的网站中提取所有图像。

编辑:当我获得图片网址时,它没有提示性文件名。我不明白如何从呈现给我的方式中提取图像。 ImageIO.read(imgURL) 出于某种原因返回 null。

源代码如下:

<a href="../Card/Details.aspx?multiverseid=482864" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImageLink" onclick="return CardLinkAction(event, this, 'SameWindow');">

<img src="../../Handlers/Image.ashx?multiverseid=482864&type=card" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImage" style="border-radius:6px;-webkit-border-radius:6px;-moz-border-radius:6px;" width="95" height="132" alt="Abandoned Sarcophagus" border="0">

</a>

This 链接是拉出卡片图像的... 我注意到对我来说是新的格式是“.jfif”,我想它是“.jpeg”的新版本。我是通过直接从浏览器下载获得这种格式的。 如何从页面中提取它?

代码不是我自己的想法,这是从一个老的post得到这个的

已编辑代码:

        HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
        htmlKit.read(br, htmlDoc, 0);

        for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.IMG); iterator.isValid(); iterator.next()) {
            AttributeSet attributes = iterator.getAttributes();
            String imgSrc = (String) attributes.getAttribute(HTML.Attribute.SRC);

            System.out.println(imgSrc);
            if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.toLowerCase().endsWith("type=card") || (imgSrc.endsWith(".jfif")) || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico"))))) {
                System.out.println(imgSrc);
                try {
                    downloadImage(webUrl, imgSrc);
                } catch (IOException ex) {
                    System.out.println(ex.getMessage());
                }
            }

        }
    private static void downloadImage(String url, String imgSrc) throws IOException {
        BufferedImage image = null;
        try {
            if (!(imgSrc.startsWith("http"))) {
                url = url + imgSrc;
            } else {
                url = imgSrc;
            }
            imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
            String imageFormat = null;
            imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
            String imgPath = null;
            imgPath = "/img depository" + imgSrc + "";
            URL imageUrl = new URL(url);
            image = ImageIO.read(imageUrl); // null is returned here!!
            if (image != null) {
                File file = new File(imgPath);
                ImageIO.write(image, imageFormat, file);
                System.out.println("Success!");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

控制台输出:

../../Handlers/Image.ashx?multiverseid=482864&type=card
../../Handlers/Image.ashx?multiverseid=482826&type=card
../../Handlers/Image.ashx?multiverseid=482827&type=card
../../Handlers/Image.ashx?multiverseid=482793&type=card
../../Handlers/Image.ashx?multiverseid=482828&type=card
../../Handlers/Image.ashx?multiverseid=482700&type=card
../../Handlers/Image.ashx?multiverseid=484896&type=card
../../Handlers/Image.ashx?multiverseid=482829&type=card
../../Handlers/Image.ashx?multiverseid=484713&type=card
../../Handlers/Image.ashx?multiverseid=482701&type=card
../../Handlers/Image.ashx?multiverseid=482702&type=card
../../Handlers/Image.ashx?multiverseid=482771&type=card
../../Handlers/Image.ashx?multiverseid=482757&type=card
../../Handlers/Image.ashx?multiverseid=482703&type=card
../../Handlers/Image.ashx?multiverseid=482794&type=card
../../Handlers/Image.ashx?multiverseid=482865&type=card
../../Handlers/Image.ashx?multiverseid=482830&type=card
../../Handlers/Image.ashx?multiverseid=482831&type=card
../../Handlers/Image.ashx?multiverseid=482883&type=card
../../Handlers/Image.ashx?multiverseid=482704&type=card
../../Handlers/Image.ashx?multiverseid=484869&type=card
../../Handlers/Image.ashx?multiverseid=482884&type=card
../../Handlers/Image.ashx?multiverseid=482866&type=card
../../Handlers/Image.ashx?multiverseid=482705&type=card
../../Handlers/Image.ashx?multiverseid=482885&type=card
../../Handlers/Image.ashx?multiverseid=482795&type=card
../../Handlers/Image.ashx?multiverseid=482796&type=card
../../Handlers/Image.ashx?multiverseid=482886&type=card
../../Handlers/Image.ashx?multiverseid=482887&type=card
../../Handlers/Image.ashx?multiverseid=484914&type=card
../../Handlers/Image.ashx?multiverseid=484887&type=card
../../Handlers/Image.ashx?multiverseid=482888&type=card
../../Handlers/Image.ashx?multiverseid=482867&type=card
../../Handlers/Image.ashx?multiverseid=482706&type=card
../../Handlers/Image.ashx?multiverseid=484711&type=card
../../Handlers/Image.ashx?multiverseid=482758&type=card
../../Handlers/Image.ashx?multiverseid=484870&type=card
../../Handlers/Image.ashx?multiverseid=482889&type=card
../../Handlers/Image.ashx?multiverseid=484905&type=card
../../Handlers/Image.ashx?multiverseid=482772&type=card
../../Handlers/Image.ashx?multiverseid=484871&type=card
../../Handlers/Image.ashx?multiverseid=482707&type=card
../../Handlers/Image.ashx?multiverseid=482708&type=card
../../Handlers/Image.ashx?multiverseid=482709&type=card
../../Handlers/Image.ashx?multiverseid=482890&type=card
../../Handlers/Image.ashx?multiverseid=484712&type=card
../../Handlers/Image.ashx?multiverseid=482773&type=card
../../Handlers/Image.ashx?multiverseid=482774&type=card
../../Handlers/Image.ashx?multiverseid=482775&type=card
../../Handlers/Image.ashx?multiverseid=482736&type=card
../../Handlers/Image.ashx?multiverseid=482891&type=card
../../Handlers/Image.ashx?multiverseid=482710&type=card
../../Handlers/Image.ashx?multiverseid=482711&type=card
../../Handlers/Image.ashx?multiverseid=482832&type=card
../../Handlers/Image.ashx?multiverseid=482776&type=card
../../Handlers/Image.ashx?multiverseid=482892&type=card
../../Handlers/Image.ashx?multiverseid=482868&type=card
../../Handlers/Image.ashx?multiverseid=482777&type=card
../../Handlers/Image.ashx?multiverseid=482833&type=card
../../Handlers/Image.ashx?multiverseid=482834&type=card
../../Handlers/Image.ashx?multiverseid=482797&type=card
../../Handlers/Image.ashx?multiverseid=484868&type=card
../../Handlers/Image.ashx?multiverseid=484878&type=card
../../Handlers/Image.ashx?multiverseid=482798&type=card
../../Handlers/Image.ashx?multiverseid=482737&type=card
../../Handlers/Image.ashx?multiverseid=484906&type=card
../../Handlers/Image.ashx?multiverseid=484888&type=card
../../Handlers/Image.ashx?multiverseid=482893&type=card
../../Handlers/Image.ashx?multiverseid=482835&type=card
../../Handlers/Image.ashx?multiverseid=484889&type=card
../../Handlers/Image.ashx?multiverseid=482759&type=card
../../Handlers/Image.ashx?multiverseid=482712&type=card
../../Handlers/Image.ashx?multiverseid=482836&type=card
../../Handlers/Image.ashx?multiverseid=484879&type=card
../../Handlers/Image.ashx?multiverseid=482713&type=card
../../Handlers/Image.ashx?multiverseid=484897&type=card
../../Handlers/Image.ashx?multiverseid=482714&type=card
../../Handlers/Image.ashx?multiverseid=482894&type=card
../../Handlers/Image.ashx?multiverseid=482895&type=card
../../Handlers/Image.ashx?multiverseid=482896&type=card
../../Handlers/Image.ashx?multiverseid=482897&type=card
../../Handlers/Image.ashx?multiverseid=482837&type=card
../../Handlers/Image.ashx?multiverseid=482715&type=card
../../Handlers/Image.ashx?multiverseid=482898&type=card
../../Handlers/Image.ashx?multiverseid=482760&type=card
../../Handlers/Image.ashx?multiverseid=484872&type=card
../../Handlers/Image.ashx?multiverseid=482838&type=card
../../Handlers/Image.ashx?multiverseid=482738&type=card
../../Handlers/Image.ashx?multiverseid=484890&type=card
../../Handlers/Image.ashx?multiverseid=482899&type=card
../../Handlers/Image.ashx?multiverseid=482778&type=card
../../Handlers/Image.ashx?multiverseid=482839&type=card
../../Handlers/Image.ashx?multiverseid=482900&type=card
../../Handlers/Image.ashx?multiverseid=484880&type=card
../../Handlers/Image.ashx?multiverseid=482779&type=card
../../Handlers/Image.ashx?multiverseid=482716&type=card
../../Handlers/Image.ashx?multiverseid=484881&type=card
../../Handlers/Image.ashx?multiverseid=482761&type=card
../../Handlers/Image.ashx?multiverseid=482799&type=card
../../Handlers/Image.ashx?multiverseid=482901&type=card
/images/Redesign/Shadow.png
//media.wizards.com/2018/images/magic/gatherer/footerbanner.jpg
/images/Redesign/hasbro_logo.png
/images/Redesign/wizards_logo.png

【问题讨论】:

  • 这不是一个足够精确的错误描述,我们无法帮助您。 什么不起作用? 如何不起作用?你的代码有什么问题?您收到错误消息吗?错误信息是什么?你得到的结果不是你期望的结果吗?你期望什么结果,为什么,你得到的结果是什么,两者有什么不同?您正在观察的行为不是期望的行为吗?期望的行为是什么,为什么,观察到的行为是什么,它们有何不同?
  • 另外,请确保构造一个minimal reproducible example。请注意,所有这三个词都很重要:它应该只是一个示例,您不应该发布整个实际代码,而应该创建一个简化的示例来演示您的问题。此外,它应该是minimal,即它不应该包含任何不是绝对需要证明问题的东西。 (大多数初学者问题可以用不到 5 行简单的代码来演示。)它应该是reproducible,这意味着如果我复制并粘贴并运行代码,我应该会看到与你完全相同的问题见。
  • 我想要的结果是下载给定 URL 上以 .jpeg 或 .png 等结尾的所有图像。令我惊讶的是,它不会下载任何东西,但不会产生错误。它所做的只是吐出文件名。我会将其编辑为我认为最相关的代码部分
  • 我真的不知道使用堆栈溢出的约定,所以我很抱歉。

标签: java html css bufferedimage javax.imageio


【解决方案1】:

我不知道您在哪里看到与该链接相关的 .jfif,因为我在任何地方都没有看到。

我看到的是一个链接网址:
https://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=482864&type=card

当在网络浏览器(对我来说是 FireFox)中打开时,我看到服务器响应具有以下 HTTP 标头:

Cache-Control: public
Content-Type: image/jpeg
Expires: Fri, 16 Apr 2021 04:30:35 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Thu, 16 Apr 2020 04:30:35 GMT
Content-Length: 170170

重要的部分是Content-Type,值为image/jpeg,告诉你内容是JPEG图像。

不幸的是,服务器没有提供一个暗示性的文件名,它应该是这样的标题:

Content-Disposition: attachment; filename="filename.jpg"

如果没有来自服务器的建议,并且您知道并理解 URL,您 可以 例如编写代码以从 URL 和 Content-Type 标头命名文件,将文件命名为 card482864.jpeg

【讨论】:

  • 我通过从浏览器下载图像获得了格式。您能否举例说明如何从 URL 中命名文件?我只是在字符串中的某处添加 .jpeg 吗?
  • @HarleyFioretti 查看网址。你在某处看到card482864 的值吗?正如我所说,因为你知道 URL,你可以编写代码从 URL 中提取这两个值来构建文件名,并将文件扩展名设置为.jpeg,因为Content-Type 标头是image/jpeg。跨度>
  • 我认为我的问题还在于使用 ImageIO....png 文件也没有下载
  • @HarleyFioretti 实际上,我的网络浏览器说文件是 PNG,尽管 Content-Typeimage/jpeg。网络浏览器知道网络服务器是不可靠的,所以它们会自动检测很多东西。单击链接查看图像,然后右键单击并选择“图像信息”。我的浏览器显示Type: PNG ImageDimensions: 265px x 370px
  • 这些是正确的尺寸。 Microsoft Edge 只想将图像保存为 .jfif,所以我的假设是格式,但也许浏览器进行了看不见的转换,我无法用我当前的代码复制...
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-06-28
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-03-07
  • 1970-01-01
  • 2023-04-09
相关资源
最近更新 更多