如何使用 Scrapy 读取文本并获取一些特定的行值答案

【问题标题】：How to read text and get some specific lines values using Scrapy如何使用 Scrapy 读取文本并获取一些特定的行值
【发布时间】：2023-04-06 09:23:01
【问题描述】：

我需要点击网址> http://something.com/requirements.txt 内容将是这样的，(response.text)。

    From the 8th to the 12th century, Old English gradually transformed through language contact into Middle English. Middle English is often arbitrarily defined as beginning with the conquest of England by William the Conqueror in 1066, but it developed further in the period from 1200–1450.
    
Year: 2020

    First, the waves of Norse colonisation of northern parts of the British Isles in the 8th and 9th centuries put Old English into intense contact with Old Norse, a North Germanic language. Norse influence was strongest in the north-eastern varieties of Old English spoken in the Danelaw area around York, which was the centre of Norse colonisation; today these features are still particularly present in Scots and Northern English. However the centre of norsified English seems to have been in the Midlands around Lindsey, and after 920 CE when Lindsey was reincorporated into the Anglo-Saxon polity, Norse features spread from there into English varieties that had not been in direct contact with Norse speakers. An element of Norse influence that persists in all English varieties today is the group of pronouns beginning with th- (they, them, their) which replaced the Anglo-Saxon pronouns with h- (hie, him, hera).[43]

我想使用 scrapy 从文本响应中仅抓取“年份：”值并将其映射到 ItemLoader。有什么办法可以用scrapy做吗？

【问题讨论】：

标签： web-scraping scrapy scrapy-shell

【解决方案1】：

您可以使用正则表达式re。

import re

re.findall(r'Year: (.*)\n', response.text)

【讨论】：