使用 ruby mechanize 捕获超时错误答案

【问题标题】：Catching timeout errors with ruby mechanize使用 ruby mechanize 捕获超时错误
【发布时间】：2011-09-15 04:47:09
【问题描述】：

我有一个机械化功能可以让我退出网站，但在极少数情况下它会让我超时。该功能涉及转到特定页面，然后单击注销按钮。有时，当进入注销页面或单击注销按钮时，机械化会超时，代码会崩溃。因此，我进行了一次小型救援，它似乎正在工作，如下面的第一段代码所示。

def logmeout(agent)
  page = agent.get('http://www.example.com/')
  agent.click(page.link_with(:text => /Log Out/i))
end

带救援的 Logmeout：

def logmeout(agent)
  begin
  page = agent.get('http://www.example.com/')
  agent.click(page.link_with(:text => /Log Out/i))
  rescue Timeout::Error 
    puts "Timeout!"
    retry
  end
end

假设我正确理解了救援，即使只是单击超时，它也会执行这两个操作，所以为了提高效率，我想知道我是否可以在这种情况下使用 proc 并将代码块传递给它。会不会有这样的工作：

def trythreetimes
  tries = 0
  begin
  yield
  rescue
    tries += 1
    puts "Trying again!"
    retry if tries <= 3
  end
end

def logmeout(agent)
  trythreetimes {page = agent.get('http://www.example.com/')}
  trythreetimes {agent.click(page.link_with(:text => /Log Out/i))}
end

请注意，在我的 trythreetimes 函数中，我将其保留为通用救援，因此该函数将更具可重用性。

非常感谢任何人提供的任何帮助，我意识到这里有几个不同的问题，但它们都是我正在努力学习的东西！

【问题讨论】：

标签： ruby mechanize

【解决方案1】：

我认为您最好将 Mechanize::HTTP::Agent::read_timeout 属性设置为合理的秒数，例如 2 或 5 秒，而不是重试某些机械化请求的超时，无论如何，这样可以防止此请求出现超时错误。

然后，您的注销过程似乎只需要访问一个简单的 HTTP GET 请求。我的意思是没有要填写的表格，所以没有 HTTP POST 请求。因此，如果我是您，我更愿意检查页面源代码（使用 Firefox 或 Chrome 的 Ctrl+U）以识别您的agent.click(page.link_with(:text => /Log Out/i)) 访问的链接它应该更快，因为这些类型的页面通常是空白的，而且 Mechanize 不必在内存中加载完整的 html 网页。

这是我更喜欢使用的代码：

def logmeout(agent)
  begin
  agent.read_timeout=2  #set the agent time out
  page = agent.get('http://www.example.com/logout_url.php')
  agent.history.pop()   #delete this request in the history
  rescue Timeout::Error 
    puts "Timeout!"
    puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
    #retry      #retry is no more needed
  end
end

但您也可以使用重试功能：

def trythreetimes
  tries = 0
  begin
  yield
  rescue Exception => e  
  tries += 1
  puts "Error: #{e.message}"
  puts "Trying again!" if tries <= 3
  retry if tries <= 3
  puts "No more attempt!"
  end
end

def logmeout(agent)
  trythreetimes do
  agent.read_timeout=2  #set the agent time out
  page = agent.get('http://www.example.com/logout_url.php')
  agent.history.pop()       #delete this request in the history
  end
end

希望对您有所帮助！ ;-)

【讨论】：

感谢您的回答！您的首选代码是假设您通过源代码找到了正确的链接？
嗯，在 HTML 源代码中找到链接并不难。由于需要时间和内存，我更喜欢这个解决方案。但是您可以将您的解决方案与read_timeout 集一起使用。如果您将它用于多个域，这是一个好主意。只需编辑我的第二个代码和平并将其更改为访问主页并根据需要单击链接。
哦抱歉，其实我没有回答你的问题。是的，假设您通过源代码找到了正确的链接...
非常感谢您在这方面的帮助，让我知道 read_timeout！

【解决方案2】：

使用 mechanize 1.0.0 我从另一个错误源中遇到了这个问题。

在我的情况下，我被代理阻止，然后是 SSL。这对我有用：

ag = Mechanize.new
ag.set_proxy('yourproxy', yourport)
ag.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
ag.get( url )

【讨论】：