使用您的示例,并为简单起见专注于 Net::HTTP,第一个示例无法按所写的那样工作:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
相反,我以此为出发点:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
通过这个工作,我们可以尝试第二个 URL。有趣的是,根据我是重复使用初始连接还是建立新连接,我会得到不同的结果:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
所以我怀疑您有时会收到 301 重定向,这会造成混乱。还有另一个 question and answer here 用于了解如何跟踪重定向。