【问题标题】:Python Parsing Javascript with beautifulsoupPython用beautifulsoup解析Javascript
【发布时间】:2018-11-20 20:04:04
【问题描述】:

我正在尝试解析 JavaScript 中的内容。我对如何做到这一点有一个想法,但我并不完全确定。我已经阅读了一些示例,并且我认为使用 re 库可能是要走的路。

到目前为止,这是我的代码:

import requests
import json
import re
from bs4 import BeautifulSoup

url = r'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'


s = requests.Session()


soup = BeautifulSoup(s.get(url).content, 'html.parser')


print(soup.find_all("script", type="text/javascript")[5].prettify())

这里只是解析内容的一个sn-p。我正在尝试访问这些数据,尤其是“价值”

<input type="hidden" name="PPFT" id="i0327" value="Dd**Lkp2L3EKDvGi3u6PEweEQUhvW*1jPrA3FgGSdeYoY8FERluiTqDef6QF3V5NkN*4yPg7vvxI3jo5oKPRelhfU3rYGFkxbxyvSBssiwFA!8LwocAbVDtrDq11Wk3F4LzRBQck3H4ca5r3Qhv8b0h4CxcEZgAnGAkcWE7fExGn1dBwGoY8sZVL2!ZBMjnJEanidLF!Yi975frkQ6Cys2oUb863xoLxdvZGuLQRxRLjjKubaCHlWQbD0b*Wzq49EA$$"/>

我提前感谢所有回复。谢谢!

【问题讨论】:

    标签: javascript python beautifulsoup python-requests html-parsing


    【解决方案1】:
    from bs4 import BeautifulSoup as bs
    import requests
    import re
    url = 'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
    page = requests.get(url)
    html = bs(page.text, 'lxml')
    input = html.findAll('script', type="text/javascript")[5].prettify()
    value = re.findall(r'value=".+"/', input)
    #value = str(value).replace('value="', '').replace('"/','')
    value = str(value).replace('value="', '').replace('"/','').replace("['",'').replace("']",'')
    print(value)
    Output:
    DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWk*npajBfgXK5yp3*!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$
    

    【讨论】:

    • When I run this, the output is ['DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWknpajBfgXK5yp3!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$'] How can I remove the [' '] ?
    • 它是动态内容,因此会不断变化。此外,只需用我刚刚编辑代码的新数据替换该值。
    • 我明白,但我只想要字符串,没有开头的 [' 和结尾的 ']
    • 工作完美:) 谢谢!
    猜你喜欢
    • 2012-01-26
    • 1970-01-01
    • 2021-04-26
    • 2020-04-03
    • 1970-01-01
    • 2020-02-06
    • 2011-05-03
    • 2014-03-06
    • 2014-06-16
    相关资源
    最近更新 更多