【发布时间】:2022-02-06 18:35:56
【问题描述】:
我正在努力从 rss 提要中检索到图像的链接。我基本上是在尝试从 'src=' 中获取 url,但我尝试过的所有方法似乎都无法将其绘制出来。
<content:encoded>&lt;h4&gt;Using sklearn’s GridSearchCV on random forest model&lt;/h4&gt;&lt;figure&gt;&lt;img alt="" src="https://cdn-images-1.medium.com/max/1024/1*M-LcJEuYvBjUFh1DhSOicA.jpeg" /&gt;&lt;figcaption&gt;Image by Annie Spratt via Unsplash&lt;/figcaption&gt;&lt;/figure&gt;&lt;p&gt;Finding the optimal tuning parameters for a machine learning problem can often be very difficult. We may encounter &lt;strong&gt;overfitting,&lt;/strong&gt; which means our machine learning model trains too specifically on our training dataset and causes higher levels of error when applied to our test/holdout datasets. Or, we may run into &lt;strong&gt;underfitting,&lt;/strong&gt; which means our model doesn’t train specifically enough to our training dataset. </content:encoded>
下面是我到目前为止一直在尝试的代码。
from bs4 import BeautifulSoup
import requests
resp = requests.get("https://towardsdatascience.com/feed")
soup = BeautifulSoup(resp.content, features='xml')
items = soup.findAll('item')
content_item = {}
content_item['title'] = items[0].title.text
content_item['link'] = items[0].link.text
content_item['Twitter'] = '@TDataScience'
content_item['Media'] = items[0].encoded['src']
与以往一样,我们将非常感谢您提供的任何帮助。
提前致谢。
【问题讨论】:
标签: python html web-scraping beautifulsoup