美丽的汤保存会话结帐产品答案

【问题标题】：Beautiful Soup saving sessions to checkout products美丽的汤保存会话结帐产品
【发布时间】：2017-12-13 18:30:34
【问题描述】：

我正在尝试编写一个机器人来将商品添加到我的购物车然后为我购买，因为我需要进行非常定期的购买，而自己购买它们变得乏味。

from bs4 import BeautifulSoup
import requests
import numpy as np


page = requests.get("http://www.onlinestore.com/shop")
soup = BeautifulSoup(page.content, 'html.parser')
try:
    for i in soup.find_all('a'):

        if "shop" in i['href']:
            shop_page = requests.get("http://www.onlinestore.com" + i['href'])
            item_page = BeautifulSoup(shop_page.content, 'html.parser')
            for h in item_page.find_all('form', class_="add"):
                print(h['action'])
                try:
                    shop_page = requests.get("http://www.online.com" + h['action'])
                except:
                    print("None left")
            for h in item_page.find_all('h1', class_="protect"):
                print(h.getText())
except:
    print("either ended or error occured")
checkout_page = requests.get("http://www.onlinestore.com/checkout")

checkout = BeautifulSoup(checkout_page.content, 'html.parser')
for j in checkout.find_all('strong', id_="total"):
    print(j)

我在检查产品时遇到了一些麻烦，因为这些物品没有结转。有没有一种方法可以实现 cookie，以便跟踪我添加到购物车的商品？

谢谢

【问题讨论】：

标签： python-3.x session beautifulsoup screen-scraping

【解决方案1】：

即使requests 不是浏览器，它仍然可以跨多个请求保留标头和cookie，但是如果您使用Session：

with requests.Session() as session:
    page = session.get("http://www.onlinestore.com/shop")

    # use "session" instead of "requests" later on

请注意，由于持久连接，您还可以免费获得性能提升：

如果您向同一主机发出多个请求，底层 TCP 连接将被重用，这可能会导致显着的 性能提升

根据此特定在线商店上的购物车和结帐的实施方式，这可能会或可能不会起作用。但是，至少，这是向前迈出的一步。

【讨论】：