【问题标题】:Matching Input Text in HTML parse with Python使用 Python 匹配 HTML 解析中的输入文本
【发布时间】:2018-02-07 22:40:23
【问题描述】:

我认为只设置“text=size”会产生我需要的值。也许我想念理解?我做错了什么?

import requests, re, json, time, sys, os,webbrowser
import subprocess as s
from bs4 import BeautifulSoup as bs
global size

size = "Medium"

'''html  = <option selected="selected" data-sku="51728-003" value="660654030868">Medium - $138.00 USD</option>'''

url = "https://us.octobersveryown.com/collections/shop-all/products/varsity-ovo-polartec-half-zip-pullover-black"

def getStuff():
    print ('')
session = requests.session()
response = session.get(url)
soup = bs(response.text, 'html.parser')
prod_name = soup.find('h1',{'itemprop':'name'}).text
price = soup.find('span',{'id':'ProductPrice'}).text
variant = soup.find(text=size).findPrevious('value').text
#variant ="notworking"
print("\nProd Name: "+prod_name)
print("\nPrice: "+price)
print("\nMatching Variant Value: "+variant)

getStuff()

我的错误是

Traceback (most recent call last):
  File "trythis.py", line 20, in <module>
    variant = soup.find(text=size).findPrevious('value').text
AttributeError: 'NoneType' object has no attribute 'text'

【问题讨论】:

    标签: python json python-3.x parsing


    【解决方案1】:

    这是一个有趣的问题。由于您正在尝试匹配 option 标记文本中的单词并从该标记获取属性,因此我会执行以下操作:

    import requests, re, json, time, sys, os,webbrowser
    import subprocess as s
    from bs4 import BeautifulSoup as bs
    global size
    
    size = "Medium"
    
    html  = """<option selected="selected" data-sku="51728-003" value="660654030868">Medium - $138.00 USD</option>"""
    
    url = "https://us.octobersveryown.com/collections/shop-all/products/varsity-ovo-polartec-half-zip-pullover-black"
    
    def getStuff():
        print ('')
        session = requests.session()
        response = session.get(url)
        soup = bs(response.text, 'html.parser')
        prod_name = soup.find('h1',{'itemprop':'name'}).text
        price = soup.find('span',{'id':'ProductPrice'}).text
        #variant = soup.find(text=size).findPrevious('value').text
        def get_option(size):
            options = soup.find_all('option')
            target_option = [o for o in options if size in o.text][0]
            value = target_option['value']
            return value
        def get_options_and_values():
            option_dic = {}
            options = soup.find_all('option')
            for o in options:
                try:
                    option_dic[o.text.split('-')[0].strip()] = int(o['value'])
                except:
                    pass
            return option_dic
        print(get_options_and_values())
        print("\nProd Name: "+prod_name)
        print("\nPrice: "+price.strip())
        variant = get_option(size)
        print("\nMatching Variant Value: "+variant)
    
    getStuff()
    

    这给了我:

    {'Medium': 660654030868, 'Large': 660654063636}
    
    Prod Name: VARSITY OVO POLARTEC® HALF-ZIP PULLOVER - BLACK
    
    Price: $138.00
    
    Matching Variant Value: 660654030868
    

    这将允许您轻松更改想要获得的大小。这有意义吗?

    【讨论】:

    • 是的!这次真是万分感谢!最后一个附加问题,我们如何将其设置为循环函数?比如说,其他选项尚不可用,但我们希望该功能进入睡眠状态,然后再试一次。我们会为 get_option(size) 使用 while 或 for 循环吗?
    • 酷。当然,请参阅我刚刚所做的编辑。您可能希望以不同的方式处理“售罄”选项。您只需要确保要抓取的内容具有一致的结构。
    猜你喜欢
    • 1970-01-01
    • 2012-01-05
    • 1970-01-01
    • 1970-01-01
    • 2011-04-04
    • 2011-10-16
    • 1970-01-01
    • 2016-04-05
    • 1970-01-01
    相关资源
    最近更新 更多