【问题标题】:BeautifulSoup not picking up meta tagBeautifulSoup 没有获取元标记
【发布时间】:2018-04-20 21:35:58
【问题描述】:

我有一个简单的脚本,它获取一个 html 页面并尝试输出关键字元标记的内容。不知何故,即使 html 包含标签,它也没有获取关键字元标签的内容。任何帮助表示赞赏。

    url = “https://www.mediapost.com/publications/article/316086/google-facebook-others-pitch-in-app-ads-brand-s.html”
    req = urllib2.Request(url=url)
    f = urllib2.urlopen(req)
    mycontent = f.read()
    soup = BeautifulSoup(mycontent, 'html.parser')
    keywords = soup.find("meta", property="keywords")
    print keywords

【问题讨论】:

    标签: python beautifulsoup meta


    【解决方案1】:

    我强烈推荐你requests

    代码:

    from bs4 import BeautifulSoup
    import requests
    
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    keywords = soup.select_one('meta[name="keywords"]')['content']
    
    >>> keywords
    'Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018'
    

    【讨论】:

      【解决方案2】:

      如果您检查正确,您正在寻找的元标记具有属性 name 而不是 property 所以将您的代码更改为

      keywords = soup.find("meta", attrs={'name':'keywords'})
      

      然后显示你需要写的内容

      print keywords['content']
      

      输出:

      更多主要品牌正在向手机游戏投入大量广告资金, 将谷歌、Facebook 和其他公司推向应用内游戏广告领域。 有些人认为这是对品牌寻求安全的回应, 投放视频广告和与消费者互动的安全场所。 2018 年 3 月 16 日

      【讨论】:

        【解决方案3】:

        使用'lxml' 代替'html.parser' 并使用soup.find_all

        soup = BeautifulSoup(doc, 'lxml')
        keywords = soup.find_all('meta',attrs={"name": 'keywords'})
        for x in keywords:
            print(x['content'])
        

        输出

        Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018
        

        【讨论】:

        • 但是我只想提取一个meta标签的内容,其中meta name="keywords"?
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2016-08-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-08-29
        • 1970-01-01
        相关资源
        最近更新 更多