如何将图片从 URL 保存到磁盘答案

【问题标题】：How to save pictures from URL to disk如何将图片从 URL 保存到磁盘
【发布时间】：2014-11-13 16:19:35
【问题描述】：

我想从一个 URL 下载图片，例如：http://trinity.e-stile.ru/，并将图片保存到一个目录，例如“C:\pickaxe\pictures”。使用 Nokogiri 很重要。

我在这个网站上阅读了类似的问题，但我没有找到它是如何工作的，我也不了解算法。

我编写了解析 URL 的代码，并将带有“img”标签的部分网页源代码放入链接对象中：

require 'nokogiri'
require 'open-uri'

PAGE_URL="http://trinity.e-stile.ru/"
page=Nokogiri::HTML(open(PAGE_URL))   #parsing into object
links=page.css("img") #object with html code with img tag
puts links.length # it is 24 images on this url
puts
links.each{|i| puts i } #it looks like: <img border="0" alt="" src="/images/kroliku.jpg"> 
puts
puts
links.each{|link| puts link['src'] } #/images/kroliku.jpg

抓取HTML代码后用什么方法保存图片？

我改了代码，还是有错误：

/home/action/.parts/packages/ruby2.1/2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `initialize': getaddrinfo: Name or service not known (SocketError)

这是现在的代码：

require 'nokogiri'
require 'open-uri'
require 'net/http'

LOCATION = 'pics'
if !File.exist? LOCATION         # create folder if it is not exist
    require 'fileutils'
    FileUtils.mkpath LOCATION
end

#PAGE_URL = "http://ruby.bastardsbook.com/files/hello-webpage.html"
#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"
page=Nokogiri::HTML(open(PAGE_URL))   
links=page.css("img")

links.each{|link| 
    Net::HTTP.start(PAGE_URL) do |http|
      localname = link.gsub /.*\//, '' # left the filename only
      resp = http.get link['src']
      open("#{LOCATION}/#{localname}", "wb") do |file|
        file.write resp.body
      end
    end
 }

【问题讨论】：

标签： ruby-on-rails ruby nokogiri

【解决方案1】：

你快完成了。唯一剩下的就是存储文件。让我们去做吧。

LOCATION = 'C:\pickaxe\pictures'
if !File.exist? LOCATION         # create folder if it is not exist
    require 'fileutils'
    FileUtils.mkpath LOCATION
end

require 'net/http'
.... # your code with nokogiri etc.
links.each{|link| 
    Net::HTTP.start(PAGE_URL) do |http|
      localname = link.gsub /.*\//, '' # left the filename only
      resp = http.get link['src']
      open("#{LOCATION}/#{localname}", "wb") do |file|
        file.write resp.body
      end
    end
end

就是这样。

【讨论】：

我改了代码，谢谢，但是有错误：`initialize': getaddrinfo: Name or service not known (SocketError)
我试过了..它切换另一个错误：/home/action/.parts/packages/ruby2.1/2.1.1/lib/ruby/2.1.0/open-uri.rb： 36：在“初始化”中：没有这样的文件或目录@rb_sysopen - www.youtube.com (Errno::ENOENT)
我有点迷失在你的解释中。你是在 Windows 上（因为“将图像保存到这样的目录：C:\pickaxe\pictures”还是在 ×nix 因为“/home/action/.parts...”？它看起来像你用来运行的主机该代码无法访问互联网。您收到的错误实际上是“DNS失败”。
对不起，我忘了告诉我。我在 Nitrous Web box(nitrous.io) 上工作，因为 nokogiri 在我的 PC (Windows 7) 上不能正常工作。这个盒子是基于 Linux 安装的开发工作室。但我认为互联网可以工作，因为我成功解析了网页......
是的，我可能将“DNS 问题”拼错为“无法访问互联网”。 :) 无论如何，问题在于地址解析，从现在开始，您将在您的 nitrous 上从 mteh 命令行尝试host www.youtube.com 之类的东西，以了解那里发生了什么。

【解决方案2】：

正确的版本：

require 'nokogiri'
require 'open-uri'


LOCATION = 'pics'
if !File.exist? LOCATION         # create folder if it is not exist
    require 'fileutils'
    FileUtils.mkpath LOCATION
end

#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"

page=Nokogiri::HTML(open(PAGE_URL)) 
links=page.css("img")

links.each{|link|
  uri = URI.join(PAGE_URL, link['src'] ).to_s # make absolute uri
  localname=File.basename(link['src'])
   File.open("#{LOCATION}/#{localname}",'wb') { |f| f.write(open(uri).read) }
  }

【讨论】：

不要做link.gsub /.*\//, ''。使用File.basename 仅提取文件名：File.basename('http://a.com/path/baz.jpg') # => "baz.jpg"。此外，OpenURI 比 Net::HTTP 更智能，因此请改用它。
我根据您的建议修改了代码，现在可以正常使用了！谢谢！