【问题标题】:Search a list based on key word to append specific list contents根据关键字搜索列表以附加特定列表内容
【发布时间】:2021-01-18 09:45:13
【问题描述】:

上下文

我有一个从这个网站上抓取的链接列表: https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/economicactivityfasterindicatorsuk

此链接列表如下所示;

['https://twitter.com/ONS',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
 'https://www.facebook.com/ONS',
 'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
 'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx'...

我现在想使用 Helium/Selenium 找到它们并将它们打印出来。只有链接列表包含我不需要的链接和我需要下载的 excel 文档的组合。我希望能够仅附加包含 xlsx 的链接。

我尝试了this 解决方案,但没有奏效。我也尝试了.remove 函数,但这更耗时。我还尝试通过切片来整理链接列表,但这又很耗时。

问题

有没有更简单的方法可以在指向它们的链接列表中找到一个字符串,允许我附加到一个列表并通过 selenium 循环遍历它们(我可以做后者,只需要附加帮助)。

【问题讨论】:

  • 谢谢,但已经通过以下答案解决了:)

标签: python list


【解决方案1】:

使用列表理解。

linklist = ['https://twitter.com/ONS',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
 'https://www.facebook.com/ONS',
 'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
 'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']

relevant_links = [link for link in linklist if ".xlsx" in link]

会输出

['https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']

【讨论】:

    【解决方案2】:

    检查字符串终止:

    new_list = [link for link in original_list if link.endswith(".xlsx")]
    

    然后你可以打开new_list中的每个链接。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-07-03
      • 1970-01-01
      • 2013-09-08
      • 2019-07-26
      • 1970-01-01
      相关资源
      最近更新 更多