我应该将 XPath 与抓取代码分开吗？答案

【问题标题】：Should I separate XPath's from scraping code?我应该将 XPath 与抓取代码分开吗？
【发布时间】：2020-11-15 23:50:56
【问题描述】：

我正在使用scrapy框架进行抓取项目，我需要将response.xpath('....')分开。是否有任何解耦代码和配置或数据资源的解决方案：您使用的 XPath 可以与代码分开放在配置文件中，这样可以更轻松地进行未来修改，因为每当网站或 Web 应用程序运行时，xpath 都会永远变化更新了。

def parse(self, response):
    nav_xp = "//div[@id='mainNav']//li/a/@href"
    #res = response.xpath(nav_xp).extract()
    #req = [Request(self.start_urls[0]+url) for url in res[1:-1]]
    return (Request(self.start_urls[0] + url, callback=self.parse_articles, headers=response.headers) for  url in response.xpath(nav_xp).extract()[1:-1])


def parse_articles(self, response):
    for article_section in response.xpath('//h2[@class="section_title"]/a'):
        title = article_section.xpath('text()').extract_first()
        href = article_section.xpath('@href').extract_first()
        href_splitted = article_section.xpath('@href').extract_first().split('/')[1:]
        category = href_splitted[0]
        article_id = int(''.join([char for char in href_splitted[1] if char.isdigit()]))
        article = Article()
        article['title'] = title
        article['category'] = category
        article['article_id'] = article_id

【问题讨论】：

标签： python html xml xpath scrapy

【解决方案1】：

xpath() 函数的参数是一个字符串，因此您当然可以将这些字符串从代码中分解到配置文件中。

这只有在代码稳定且 XPath 的变化独立于代码时才值得做。

大多数抓取脚本都是轻量级工具，它们本身必须随着时间的推移而发展。如果像测试工具这样的东西可能有稳定的代码和不同的 XPath 来表达测试，通常抓取代码会与其 XPath 共同发展，因此将两者分开不会有什么好处。

【讨论】：

感谢您的回复.. 我应该创建一个简单的 python 文件来存储这些字符串还是为这项工作指定一个特殊的配置文件？
谢谢你的回答:)