如何在没有 for 循环的情况下抓取 url 列表？答案

【问题标题】：How to crawl a list of url without for loop?如何在没有 for 循环的情况下抓取 url 列表？
【发布时间】：2018-12-07 07:44:27
【问题描述】：

我有一批url列表，我想爬取这些url的一些信息

daa = ['https://old.reddit.com/r/Games/comments/a2p1ew/', 'https://old.reddit.com/r/Games/comments/9zzo0e/', 'https://old.reddit.com/r/Games/comments/a31a6q/', ]

for y in daa:
uClient = requests.get(y, headers = {'User-agent': 'your bot 0.1'})
page_soup = soup(uClient.content, "html.parser")
time= page_soup.findAll("p", {"class":"tagline"})[0].time.get('datetime').replace('-', '')

而且我很好地得到了我想要的所有time。但是我需要在没有 for 循环的情况下执行此操作，或者我的意思是我需要 open 并在下一步编写一个文件，但如果我在同一个循环中执行此操作，则输出很奇怪。如何在没有 for 循环的情况下获得 time？

【问题讨论】：

open(file, 'a')（附加到文件末尾）不满足吗？
你的意思是输出很奇怪。它可以帮助显示您正在获得的输出以及您尝试实现的输出。

标签： python-3.x list for-loop beautifulsoup

【解决方案1】：

您可以使用open(file, 'a') 进行上述操作。或者我喜欢做的是将所有内容附加到一个表中，然后将整个内容写入一个文件。

import requests
import bs4 
import pandas as pd


results = pd.DataFrame()

daa = ['https://old.reddit.com/r/Games/comments/a2p1ew/', 'https://old.reddit.com/r/Games/comments/9zzo0e/', 'https://old.reddit.com/r/Games/comments/a31a6q/', ]

for y in daa:
    w=1
    uClient = requests.get(y, headers = {'User-agent': 'your bot 0.1'})
    page_soup = bs4.BeautifulSoup(uClient.content, "html.parser")
    time= page_soup.findAll("p", {"class":"tagline"})[0].time.get('datetime').replace('-', '')

    temp_df = pd.DataFrame([[y, time]], columns=['url','time'])
    results = results.append(temp_df).reset_index(drop = True)

result.to_csv('path/to_file.csv', index=False)

【讨论】：