【发布时间】:2021-01-19 09:35:53
【问题描述】:
我有以下代码试图从网页下载 HTML 代码并将该列表中的第二首歌曲打印到 shell 窗口中。
from urllib.request import urlopen
#-----
url1 = 'http://www.itunescharts.net/aus/charts/songs/2020/10/03'
#-----
# Get a link to the web page from the server, using one
# of the URLs above
itunes_page = urlopen(url1)
#-----
# Extract the web page's content as a Unicode string
html_code = itunes_page.read().decode('UTF-8')
#----
# close the connection to the web server
itunes_page.close()
#-----
#finding second song on the chart
start_marker = '<span class="no">2</span> <span class="artist">'
end_marker = '</span>'
start_position = html_code.find(start_marker)
end_position = html_code.find(end_marker)
if start_position == -1 or end_position == -1:
print('Error: Unable to Second Artist')
else:
print('\n' + html_code[start_position + len(start_marker) : end_position].upper())
标记开始和结束的代码:
<li id="chart_aus_songs_2" class="no-move">
<span class="no">2</span>
<span class="artist">Jawsh 685, Jason Derulo & BTS</span> - <span class="entry">
我想知道如何更改我的标记,所以 shell 窗口中的结果是 == "Jawsh 685, Jason Derulo & BTS" 。当我尝试运行代码时,我得到一个空白响应。非常感谢任何帮助!
【问题讨论】:
标签: python beautifulsoup html-parsing