【问题标题】:How to Process Items in an Array in Parallel using Ruby (and open-uri)如何使用 Ruby(和 open-uri)并行处理数组中的项目
【发布时间】:2011-11-25 13:45:16
【问题描述】:

我想知道如何使用 open-uri 打开多个并发连接?我认为我需要以某种方式使用线程或纤维,但我不确定。

示例代码:

def get_doc(url)
  begin
    Nokogiri::HTML(open(url).read)
  rescue Exception => ex
    puts "Failed at #{Time.now}"
    puts "Error: #{ex}"
  end
end

array_of_urls_to_process = [......]

# How can I iterate over items in the array in parallel (instead of one at a time?)
array_of_urls_to_process.each do |url|
  x = get_doc(url)
  do_something(x)
end

【问题讨论】:

    标签: ruby multithreading open-uri fibers


    【解决方案1】:

    还有一个名为Parallel 的gem,类似于Peach,但正在积极更新。

    【讨论】:

    • 这颗宝石是涂料 AF。如果您需要获取索引,请确保使用 each_with_index 而不是 startfinish 回调。它的性能提高了 10 到 50 倍。
    【解决方案2】:

    我希望这能给你一个想法:

    def do_something(url, secs)
        sleep secs #just to see a difference
        puts "Done with: #{url}"
    end
    
    threads = []
    urls_ary = ['url1', 'url2', 'url3']
    
    urls_ary.each_with_index do |url, i|
        threads << Thread.new{ do_something(url, i+1) }
        puts "Out of loop #{i+1}"
    end
    threads.each{|t| t.join}
    

    也许为Array 创建一个方法,例如:

    class Array
        def thread_each(&block)
            inject([]){|threads,e| threads << Thread.new{yield(e)}}.each{|t| t.join}
        end
    end
    
    [1, 2, 3].thread_each do |i|
        sleep 4-i #so first one ends later
        puts "Done with #{i}"
    end
    

    【讨论】:

    【解决方案3】:
    module MultithreadedEach
      def multithreaded_each
        each_with_object([]) do |item, threads|
          threads << Thread.new { yield item }
        end.each { |thread| thread.join }
        self
      end
    end
    

    用法:

    arr = [1,2,3]
    
    arr.extend(MultithreadedEach)
    
    arr.multithreaded_each do |n|
      puts n # Each block runs in it's own thread
    end
    

    【讨论】:

      【解决方案4】:

      使用线程的简单方法:

      threads = []
      
      [1, 2, 3].each do |i|
        threads << Thread.new { puts i }
      end
      
      threads.each(&:join)
      

      【讨论】:

        【解决方案5】:

        有一个名为 peach (https://rubygems.org/gems/peach) 的 gem 可以让你这样做:

        require "peach"
        
        array_of_urls_to_process.peach do |url|
          do_something(get_doc(url))
        end
        

        【讨论】:

        • 宝石只有 jruby
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-12-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多