谷歌搜索结果/初学者 Python答案

【问题标题】：Google Search Results/ Beginner Python谷歌搜索结果/初学者 Python
【发布时间】：2014-03-08 10:20:02
【问题描述】：

关于 Python 3 的一些问题。

def AllMusic():
    myList1 = ["bob"]
    myList2 = ["dylan"]
    x = myList1[0]
    z = myList2[0]
    y = "-->Then 10 Numbers?"
    print("AllMusic")
    print("http://www.allmusic.com/artist/"+x+"-"+z+"-mn"+y)

到目前为止，这是我的代码。

我想写一个打印出变量y的程序。

当您访问 AllMusic.com 时。不同的艺术家有唯一的 10 个数字。

例如，www.allmusic.com/artist/the-beatles-mn0000754032‎、www.allmusic.com/artist/arcade-fire-mn0000185591。

x 是艺术家的第一个词，y 是艺术家的第二个词。一切正常，但我无法找到一种方法来找到那个 10 位数字并将其返回给我输入到我的 python 程序中的每个艺术家。

我发现当你去谷歌并输入例如“Arcade Fire AllMusic”时，在标题下方的第一个结果中，它会为你提供网站的网址。 www.allmusic.com/artist/arcade-fire-mn0000185591

如何将 10 位代码 0000185591 复制到我的 python 程序中并打印出来供我查看。

【问题讨论】：

您是否已经拥有带有号码的 URL，或者您正在考虑上网并从那里获取该 URL？
没错，我想输入艺术家姓名并让我的程序找到唯一代码。
似乎您可能需要从网站上抓取数字。我可以想象 BeautifulSoup 在这里有用
您应该搜索允许向网络询问给定名称的数字的 API。
美汤和API？抱歉，对 python 和编程有点陌生。是的，我只想在 mn 部分之后刮掉数字。

标签： python

【解决方案1】：

我根本不会使用 Google - 您可以使用网站上的搜索。有许多有用的工具可以帮助您在 python 中进行网络抓取：我建议安装 BeautifulSoup。这是一个您可以试验的小脚本：

import urllib
from bs4 import BeautifulSoup

def get_artist_link(artist):
    base = 'http://www.allmusic.com/search/all/'
    # encode spaces                                                                 
    query = artist.replace(' ', '%20')
    url = base + query
    page = urllib.urlopen(url)
    soup = BeautifulSoup(page.read())
    artists = soup.find_all("li", class_ = "artist")
    for artist in artists:
        print(artist.a.attrs['href'])

if __name__ == '__main__':
    get_artist_link('the beatles')
    get_artist_link('arcade fire')

对我来说这是打印出来的：

/artist/the-beatles-mn0000754032
/artist/arcade-fire-mn0000185591

【讨论】：

谢谢。 %20 有什么用？
URL 中不能有空格，所以这是一种对其进行编码的方法。对于那个和其他特殊字符，请在此处查看更多信息：Percent-encoding。 urllib 有一个内置函数 urllib.quote_plus 来处理这个问题。