【问题标题】:I am trying to get image link(.jpg) using requests_html. can anyone help me?我正在尝试使用 requests_html 获取图像链接(.jpg)。谁能帮我?
【发布时间】:2020-09-06 09:13:05
【问题描述】:
- 我试过 requests_html
- 对于链接,我尝试了链接和绝对链接。
from requests_html import HTMLSession
session=HTMLSession()
url='https://www.shutterstock.com/featured-collections/autumn-208192415'
r=session.get(url)
r.html.render(sleep=1,keep_page=True, scrolldown=1)
images=r.html.find('img')
for item in images:
links={
'link':item.links
}
print(links)
【问题讨论】:
标签:
web-scraping
python-requests-html
【解决方案1】:
要获取所有图像,您可以从<a> 标签中提取链接。例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.shutterstock.com/featured-collections/autumn-208192415'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.select('a:has(> img)'):
href, num = ('https://image.shutterstock.com' + a['href']).rsplit('-', maxsplit=1)
print(href + '-260nw-' + num + '.jpg')
打印:
...
https://image.shutterstock.com/image-photo/autumn-nature-hiker-girl-walking-national-260nw-718962793.jpg
https://image.shutterstock.com/image-photo/autumn-composition-wreath-made-leaves-pine-260nw-700154173.jpg
https://image.shutterstock.com/image-photo/autumn-mountain-forest-lake-house-reflection-260nw-1068990077.jpg
https://image.shutterstock.com/image-photo/popular-photographers-attraction-braies-lake-colorful-260nw-705417145.jpg
https://image.shutterstock.com/image-photo/floral-autumn-background-mug-coffee-womans-260nw-717249397.jpg
https://image.shutterstock.com/image-photo/autumn-forest-nature-vivid-morning-colorful-260nw-766886038.jpg
https://image.shutterstock.com/image-photo/autumn-season-hipster-style-shoes-260nw-310459334.jpg
https://image.shutterstock.com/image-photo/autumn-forest-lake-reflection-landscape-260nw-1128352901.jpg