尝试使用漂亮的汤从 html 页面中提取价值答案

【问题标题】：Trying to extract value from html page using beautiful soup尝试使用漂亮的汤从 html 页面中提取价值
【发布时间】：2017-08-02 08:29:33
【问题描述】：

我是 python 和美丽汤的新手，但我得到了类似的页面

<div class='pid-details'><p>
  <span>Drug:</span> <a href='/search.php?searchterm=amantadine&amp;referer=pillid'>Amantadine Hydrochloride</a><br />
  <span>Strength:</span> 100 mg<br/>
  <span>Pill Imprint:</span> <a href='/imprints/c-122-6021.html'>C-122</a><br /><span>Color:</span> Yellow<br /><span>Shape:</span> Capsule-shape</p>
  <a class='input-button small' href='/imprints/c-122-6021.html'>View Images &amp; Details</a>
  <a class='input-button input-button-outline-grey small' href='/imprints/c-122-6021.html?printable=1' rel='nofollow' target='_blank'><i class='icon icon-print'></i>Print</a>
</div>

我的目标是提取标签内的价值

<a href='/search.php?searchterm=amantadine&amp;referer=pillid'>Amantadine Hydrochloride</a>

所以结果应该是

"Amantadine Hydrochloride"

请指导我，以便我开始爬行。提前致谢

【问题讨论】：

我正在参考一些网站如何做到这一点，例如 guru99 等。
你有没有试过写一些代码？ SO 是你碰壁时提出具体问题的地方，而不是别人代替你编写代码的地方。
你在尝试这个网址：drugs.com/amantadine-images.html？然后。在您的示例中包含相同的内容。
我写的和它的工作，
谢谢，我想我要结束这个问题了

标签： python html beautifulsoup tags

【解决方案1】：

我想这就是你想要的。此代码返回带有内部标签的列表（找到）

        page = '<div class=\'pid-details\'><p>\
                  \<span>Drug:</span> <a href=\'/search.php?searchterm=amantadine&amp;referer=pillid\'>Amantadine Hydrochloride</a><br />\
                  <span>Strength:</span> 100 mg<br/>\
                  <span>Pill Imprint:</span> <a href=\'/imprints/c-122-6021.html\'>C-122</a><br /><span>Color:</span> Yellow<br /><span>Shape:</span> Capsule-shape</p>\
                  <a class=\'input-button small\' href=\'/imprints/c-122-6021.html\'>View Images &amp; Details</a>\
                  <a class=\'input-button input-button-outline-grey small\' href=\'/imprints/c-122-6021.html?printable=1\' rel=\'nofollow\' target=\'_blank\'><i class=\'icon icon-print\'>\
                  </i>Print</a>\
                </div>'

        soup = BeautifulSoup(page,'html.parser')  

        found = []

        hrefs = soup.find_all('a')
        p = re.compile('<a href.*>(.*)</a>', re.IGNORECASE)
        for h in hrefs:
            m = re.search(p,str(h)) 
            if m:
                found.append(m.group(1))

        found

【讨论】：

感谢您的努力
没有问题！ - 如果这是一个值得的努力，你也可以投票给我:)