在scrapy中搜索html？答案

【问题标题】：Searching through html in scrapy?在scrapy中搜索html？
【发布时间】：2018-07-21 11:23:30
【问题描述】：

是否可以使用 for 循环来搜索与某个短语相对应的标签文本。我一直在尝试创建这个循环，但没有一直在工作。任何帮助表示感谢！这是我的代码：

    def parse_page(self, response):
        titles2 = response.xpath('//div[@id = "mainColumn"]/h1/text()').extract_first()
        year =  response.xpath('//div[@id = "mainColumn"]/h1/span/text()').extract()[0].strip()
        aud = response.xpath('//div[@id="scorePanel"]/div[2]')
        a_score = aud.xpath('./div[1]/a/div/div[2]/div[1]/span/text()').extract()
        a_count = aud.xpath('./div[2]/div[2]/text()').extract()
        c_score = response.xpath('//a[@id = "tomato_meter_link"]/span/span[1]/text()').extract()[0].strip()
        c_count = response.xpath('//div[@id = "scoreStats"]/div[3]/span[2]/text()').extract()[0].strip()
        info = response.xpath('//div[@class="panel-body content_body"]/ul')
        mp_rating = info.xpath('./li[1]/div[2]/text()').extract()[0].strip()
        genre = info.xpath('./li[2]/div[2]/a/text()').extract_first()
        date = info.xpath('./li[5]/div[2]/time/text()').extract_first()
        box = response.xpath('//section[@class = "panel panel-rt panel-box "]/div')
        actor1 = box.xpath('./div/div[1]/div/a/span/text()').extract()
        actor2 = box.xpath('./div/div[2]/div/a/span/text()').extract()
        actor3 = box.xpath('./div/div[3]/div/a/span/text()').extract_first()

        for x in info.xpath('//li'):
            if info.xpath("./li[x]/div[1][contains(text(), 'Box Office: ')/text()]]
                box_office = info.xpath('./li[x]/div[2]/text()')
            else if info.xpath('./li[x]/div[1]/text()').extract[0] == "Runtime: "):
                runtime = info.xpath('./li[x]/div[2]/time/text()')

【问题讨论】：

是的。但你真正的问题是什么？你试过什么？您的意见和预期结果是什么？

标签： python html xpath scrapy tags

【解决方案1】：

您的for 循环完全错误： 1.您使用info.，但从根目录搜索

for x in info.xpath('.//li'):

2。 x 是一个 HTML 节点元素，你可以这样使用它：

if x.xpath("./div[1][contains(., 'Box Office: ')]"):
    box_office = x.xpath('./div[2]/text()').extract_first()

【讨论】：

当我尝试使用 if 语句时，我得到错误：“name 'x' is not defined”
@user9343592 你需要在内部你的循环中使用它
@user9343592 我不知道，但如果你给我看你的 HTML，我可以提供帮助
我使用 x.xpath (..).extract_first() == "Runtime:" 让它工作。谢谢！

【解决方案2】：

我认为您可能需要 re() 或 re_first() 来匹配某个短语。

例如：

            elif info.xpath('./li[x]/div[1]/text()').re_first('Runtime:') == "Runtime: "):
            runtime = info.xpath('./li[x]/div[2]/time/text()')

而且你需要修改你的for循环，因为其中的变量x实际上是一个Selector而不是一个数字，所以这样使用它是不对的：li[x]。

上一个答案中的gangabas对此提出了很好的观点。

【讨论】：