每秒向网站发送抓取请求答案

【问题标题】：Send request to website for crawl every second每秒向网站发送抓取请求
【发布时间】：2021-02-13 05:17:52
【问题描述】：

我想每秒钟抓取一个网站 4 小时，我该怎么做。我的代码如下。

import requests 
from bs4 import BeautifulSoup 

site = requests.get("http://example.com") 
soup =BeautifulSoup(site.text,'html.parser')
r = str(soup).split(",")
update_time = r[0]
price1 = r[2]
price2 = r[3]
print(update_time,price1,price2)

【问题讨论】：

标签： python time beautifulsoup request web-crawler

【解决方案1】：

您可以使用time 和threading 模块

import requests 
from threading import Thread
from time import sleep
from bs4 import BeautifulSoup 

def scrape():
    site = requests.get("http://example.com") 
    soup =BeautifulSoup(site.text,'html.parser')
    r = str(soup).split(",")
    update_time = r[0]
    price1 = r[2]
    price2 = r[3]
    print(update_time,price1,price2)

for i in range(14400):
    t = Thread(target=scrape)
    t.start()
    sleep(1)

【讨论】：

【解决方案2】：

您可以为此使用计划模块。

import schedule
import time
import requests 
from bs4 import BeautifulSoup 

def crawl():
    site = requests.get("http://example.com") 
    soup =BeautifulSoup(site.text,'html.parser')
    r = str(soup).split(",")
    update_time = r[0]
    price1 = r[2]
    price2 = r[3]
    print(update_time,price1,price2)

schedule.every(1).seconds.do(crawl)

while True:
    schedule.run_pending()
    time.sleep(1)

四个小时的窗口可以通过 crontab 或 for 循环来实现。

您必须安装调度模块才能运行上述脚本

sudo pip install schedule

【讨论】：