【问题标题】:mechanize dealing with errors机械化处理错误
【发布时间】:2018-08-05 05:46:53
【问题描述】:

你会通过一系列问题看到我已经建立了一个小机械化任务来访问页面()找到咖啡馆的链接并将咖啡馆的详细信息保存在 csv 中。

 task :estimateone => :environment do
  require 'mechanize'
  require 'csv'

  mechanize = Mechanize.new
  mechanize.history_added = Proc.new { sleep 30.0 }
  mechanize.ignore_bad_chunking = true
  mechanize.follow_meta_refresh = true
  page = mechanize.get('http://www.siteexamplea.com/city/list/50-city-cafes-you-should-have-eaten-breakfast-at')
  results = []
  results << ['name', 'streetAddress', 'addressLocality', 'postalCode', 'addressRegion', 'addressCountry', 'telephone', 'url']
  page.css('ol li a').each do |link|
   mechanize.click(link)

   name = mechanize.page.css('article h1[itemprop="name"]').text.strip
   streetAddress = mechanize.page.css('address span span[itemprop="streetAddress"]').text.strip
   addressLocality = mechanize.page.css('address span span[itemprop="addressLocality"]').text.strip
   postalCode = mechanize.page.css('address span span[itemprop="postalCode"]').text.strip
   addressRegion = mechanize.page.css('address span span[itemprop="addressRegion"]').text.strip
   addressCountry = mechanize.page.css('address span meta[itemprop="addressCountry"]').text.strip
   telephone = mechanize.page.css('address span[itemprop="telephone"]').text.strip
   url = mechanize.page.css('article p a[itemprop="url"]').text.strip
   tags = mechanize.page.css('article h1[itemprop="name"]').text.strip

    results << [name, streetAddress, addressLocality, postalCode, addressRegion, addressCountry, telephone, url]
  end

  CSV.open("filename.csv", "w+") do |csv_file|
    results.each do |row|
      csv_file << row
    end
  end
end

当我到达第十个链接时,我遇到了 503 错误。

Mechanize::ResponseCodeError: 503 => Net::HTTPServiceUnavailable for https://www.city.com/city/directory/morning-after -- unhandled response

我已经尝试了一些方法来阻止这种情况的发生或从这种状态中解救出来,但我无法解决。有什么建议吗?

【问题讨论】:

  • 尝试检查 Mechanize::ResponseCodeError,如果是 50x,则添加 x 时间等待并重试。或者您可以尝试为每个要访问的网址添加一些延迟。
  • @SebastianPalma 不是 mechanize.history_added = Proc.new { sleep 30.0 } 在做什么吗?
  • 503 是服务器错误。服务器不喜欢您的请求。尝试在浏览器中发出该请求,看看会发生什么。另外,请研究调试代理,如 Fiddler 或 Charles。
  • @pguardiario 感谢队友,该网址确实有效,因此我将查看 Fiddler 或 Charles

标签: ruby-on-rails mechanize


【解决方案1】:

您想在请求失败时进行救援,just like here

task :estimateone => :environment do
  require 'mechanize'
  require 'csv'

  begin
  # ...
  page = mechanize.get('http://www.theurbanlist.com/brisbane/a-list/50-brisbane-cafes-you-should-have-eaten-breakfast-at')
  rescue Mechanize::ResponseCodeError
    # do something with the result, log it, write it, mark it as failed, wait a bit and then continue the job
    next
  end
end

我的猜测是您达到了 API 速率限制。这不会解决您的问题,因为它不在您身边,而是在服务器;但会给您提供工作范围,因为现在您可以标记无效的链接并从那里继续。

【讨论】:

  • 比你厉害
  • 你能告诉我如何将它添加到上面的代码中
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-08-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-04-26
相关资源
最近更新 更多