【问题标题】:Scraping with Nokogiri in Ruby在 Ruby 中使用 Nokogiri 进行刮擦
【发布时间】:2017-03-11 14:18:51
【问题描述】:

如何将名称和描述属性放在一起并为每个岛屿制作一个岛屿对象?我已经尽我所能将两个属性放在一起来制作一个对象,但只能分别获取它们。我需要帮助,因为我必须在两天内提交。这就是我所拥有的:

class MostBeautifulIslands::Islands
  attr_accessor :name, :description

  @@all = []

  def initialize(name)
     @name = name
     @description = description
     @@all << self
  end

  def self.scrape_world_best_islands
    doc = Nokogiri::HTML(open("http://www.planetware.com/world/most-beautiful-islands-in-the-world-sey-1-2.htm"))
    islands_names = doc.search("div h2.sitename")
    names = islands_names.collect{|island_name| new(island_name.text.strip)} 
    island_description = doc.search("div.site_desc > p")
    descriptions = island_description.collect{|d| d.first.text.strip}

    new_island = self.new(names)
    new_island

    binding.pry
    #end
  end
end

【问题讨论】:

  • 请阅读“minimal reproducible example”。我们需要问题本身中的最小(剥离)HTML。事实上,代码本身加载它并没有帮助,它实际上减慢了我们帮助你的能力,因为我们必须遍历那个 HTML 才能找到你在说什么。

标签: ruby nokogiri


【解决方案1】:

首先,在initialize 中,您使用从未得到的description 参数。应该是:

def initialize(name, description)
  @name = name
  @description = description
  @@all << self
end

其次,您应该收集名称、描述,然后使用这些值(压缩)来生成新实例:

islands_names = doc.search("div h2.sitename").map(&:text)
islands_descs = doc.search("div.site_desc > p").map(&:text)

islands_names.zip(islands_descs).map { |(name, desc)| new(name, desc) }
#⇒ Array of 15 newly created objects

【讨论】:

  • 感谢 Mudasobwa,这对我帮助很大。
【解决方案2】:

我会将它分成两个单独的类。一个处理 Nokogiri 解析,另一个处理 MostBeutifulIslands::Islands 对象。这为您处理数据提供了更多的灵活性。

require 'open-uri'
require 'nokogiri' 

module MostBeutifulIslands
  class Islands
    attr_reader :name, :description

    def initialize(name, description)
      @name = name
      @description = description
    end

    def valid?
      !name.nil? && !description.nil? 
    end

    def save
      # if using rails could save to Islands object
      island =  Island.new(name: name, description: description)

      if island.save
        puts island.save
      else
        puts island.errors
      end
    end
  end
end

module MostBeutifulIslands
  class ParseIslands
    attr_reader :url, :islands

    def initialize(url)
      @url = url
    end

    def html
      Nokogiri::HTML(open(url))
    end

    def scrap_world_best_islands
      # maybe no need to us each_with_object could do everything you need inside the block
      html.css("div .article_block").css('.site').each_with_object([]).map do |node, array|
        name = node.css('.sitename').text.strip
        description = node.css('.site_desc').text.strip
        @islands = array.push MostBeutifulIslands::Islands.new(name, description)
      end 
    end

    # just an example 
    def save_islands
      @islands.each do |island|
        if island.valid?
          island.save 
        end
      end
    end

    islands = MostBeutifulIslands::ParseIslands.new("http://www.planetware.com/world/most-beautiful-islands-in-the-world-sey-1-2.htm")
    islands.scrap_world_best_islands
    islands.save_islands
  end
end

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-09
    • 2013-09-21
    • 2021-12-25
    • 2023-03-29
    • 1970-01-01
    相关资源
    最近更新 更多