【问题标题】:Is it possible to call a function inside another function in Python? (Web-Scraping problem)是否可以在 Python 中的另一个函数中调用一个函数? (网络抓取问题)
【发布时间】:2021-09-05 18:43:55
【问题描述】:

我正在做一个网络抓取任务,我已经可以用非常基本的方式收集数据了。

基本上,我需要一个函数来从 Allmusic.com 收集歌曲和艺术家的列表,然后将数据添加到 df 中。在这个例子中,我使用这个链接:https://www.allmusic.com/mood/tender-xa0000001119/songs

到目前为止,我设法完成了大部分目标,但是,我必须执行两个不同的函数(def get_song() 和 def get_performer())。

如果可能的话,我想要一个替代方案来加入这两个功能。

使用的代码如下:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"


# Function to collect songs (title)
songs = []

def get_song():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'title'}):
    for a in td.findAll('a')[0]:
        song = a.string
        songs.append(song)

# Function to collect performers
performers = []

def get_performer():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'performer'}):
    for a in td.findAll('a'):
        performer = a.string
        performers.append(performer)

get_song(), get_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.

df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation

【问题讨论】:

标签: python pandas web-scraping beautifulsoup


【解决方案1】:

您可以在第一个函数中添加来自执行者的 soup.findAll 代码。

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
    link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
    
    
    # Function to collect songs (title)
    songs = []
    performers = []
    
    def get_song_and_performer():
        url = link
        source_code = requests.get(url, headers=headers)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for td in soup.findAll('td', {'class': 'title'}):
            for a in td.findAll('a')[0]:
                song = a.string
                songs.append(song)
        for td in soup.findAll('td', {'class': 'performer'}):
            for a in td.findAll('a'):
                performer = a.string
                performers.append(performer)


get_song_and_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.

df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation

【讨论】:

    【解决方案2】:

    要获得标题/表演者,您可以使用下一个示例:

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    url = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
    
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    
    all_data = []
    for td in soup.select("td.title"):
        title = td.get_text(strip=True)
        performer = td.find_next("td").get_text(strip=True)
        all_data.append((title, performer))
    
    df = pd.DataFrame(all_data, columns=["title", "performer"])
    print(df)
    df.to_csv("data.csv", index=False)
    

    打印:

                                  title                          performer
    0                    Knock You Down                        Keri Hilson
    1   Down Among the Wine and Spirits                     Elvis Costello
    2                  I Felt The Chill                     Elvis Costello
    3            She Handed Me A Mirror                     Elvis Costello
    4         I Dreamed Of My Old Lover                     Elvis Costello
    5                   She Was No Good                     Elvis Costello
    6                  The Crooked Line                     Elvis Costello
    7                 Changing Partners                     Elvis Costello
    8           Small Town Southern Man                       Alan Jackson
    9                    Find Your Love                              Drake
    10            Today Was a Fairytale                       Taylor Swift
    11                     Need You Now                             Lady A
    12                   American Honey                             Lady A
    13                      Peace Dream                        Ringo Starr
    14                  If I Died Today                         Tim McGraw
    15                            Still                         Tim McGraw
    16                      I Need Love                             Ledisi
    17                          Uhh Ahh                        Boyz II Men
    18                  Shattered Heart                             Brandy
    19            Right Here (Departed)                             Brandy
    20           Warm It Up (With Love)                             Brandy
    21                  If I Were a Boy                            Beyoncé
    22                Why Does She Stay                              Ne-Yo
    23              Daddy Needs a Drink                  Drive-By Truckers
    24                  Think About You                        Ringo Starr
    25                      Liverpool 8                        Ringo Starr
    26                        Nefertiti                     Herbie Hancock
    27                            River  Herbie Hancock/Corinne Bailey Rae
    28                   Both Sides Now                     Herbie Hancock
    29                  Court and Spark         Herbie Hancock/Norah Jones
    30  I Taught Myself How to Grow Old                         Ryan Adams
    31                           Ghetto           Kelly Rowland/Snoop Dogg
    32                      Little Girl                   Enrique Iglesias
    33          The Magdalene Laundries                     Emmylou Harris
    34                   Because of You                              Ne-Yo
    35               We Belong Together                       Mariah Carey
    36          Thank You for Loving Me                           Bon Jovi
    37        He's Younger Than You Are                      Sonny Rollins
    

    并保存 data.csv(来自 LibreOffice 的屏幕截图):

    【讨论】:

      【解决方案3】:

      您可以创建一个单独的函数来获取歌曲信息;如果您想保持功能分开,这将是最有条理的方式。

      import requests
      from bs4 import BeautifulSoup
      
      headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
      link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
      
      
      # Function to collect songs (title)
      songs = []
      
      def get_song():
          url = link
          source_code = requests.get(url, headers=headers)
          plain_text = source_code.text
          soup = BeautifulSoup(plain_text)
          for td in soup.findAll('td', {'class': 'title'}):
              for a in td.findAll('a')[0]:
                  song = a.string
                  songs.append(song)
      
      # Function to collect performers
      performers = []
      
      def get_performer():
          url = link
          source_code = requests.get(url, headers=headers)
          plain_text = source_code.text
          soup = BeautifulSoup(plain_text)
          for td in soup.findAll('td', {'class': 'performer'}):
              for a in td.findAll('a'):
                  performer = a.string
                  performers.append(performer)
      
      # Function for getting song and performer
      def get_song_info():
          get_song()
          get_performer()
      
      get_song_info() # Call just one function!
      
      df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
      
      

      【讨论】:

        【解决方案4】:

        对于您的网址,您可以使用pd.read_html:

        source_code = requests.get(link, headers=headers)
        df = pd.read_html(source_code.text)[0]  # <- Only one table in the page
        

        输出:

        >>> df
                             Title/Composer                            Performer   Stream
        0                    Knock You Down                          Keri Hilson  Spotify
        1   Down Among the Wine and Spirits                       Elvis Costello      NaN
        2                  I Felt The Chill                       Elvis Costello      NaN
        3            She Handed Me A Mirror                       Elvis Costello      NaN
        4         I Dreamed Of My Old Lover                       Elvis Costello      NaN
        5                   She Was No Good                       Elvis Costello      NaN
        6                  The Crooked Line                       Elvis Costello      NaN
        7                 Changing Partners                       Elvis Costello      NaN
        8           Small Town Southern Man                         Alan Jackson  Spotify
        9                    Find Your Love                                Drake  Spotify
        10            Today Was a Fairytale                         Taylor Swift  Spotify
        11                     Need You Now                               Lady A  Spotify
        12                   American Honey                               Lady A      NaN
        13                      Peace Dream                          Ringo Starr  Spotify
        14                  If I Died Today                           Tim McGraw  Spotify
        15                            Still                           Tim McGraw  Spotify
        16                      I Need Love                               Ledisi  Spotify
        17                          Uhh Ahh                          Boyz II Men  Spotify
        18                  Shattered Heart                               Brandy  Spotify
        19            Right Here (Departed)                               Brandy  Spotify
        20           Warm It Up (With Love)                               Brandy  Spotify
        21                  If I Were a Boy                              Beyoncé      NaN
        22                Why Does She Stay                                Ne-Yo  Spotify
        23              Daddy Needs a Drink                    Drive-By Truckers  Spotify
        24                  Think About You                          Ringo Starr      NaN
        25                      Liverpool 8                          Ringo Starr      NaN
        26                        Nefertiti                       Herbie Hancock  Spotify
        27                            River  Herbie Hancock / Corinne Bailey Rae  Spotify
        28                   Both Sides Now                       Herbie Hancock  Spotify
        29                  Court and Spark         Herbie Hancock / Norah Jones  Spotify
        30  I Taught Myself How to Grow Old                           Ryan Adams  Spotify
        31                           Ghetto           Kelly Rowland / Snoop Dogg      NaN
        32                      Little Girl                     Enrique Iglesias  Spotify
        33          The Magdalene Laundries                       Emmylou Harris  Spotify
        34                   Because of You                                Ne-Yo  Spotify
        35               We Belong Together                         Mariah Carey  Spotify
        36          Thank You for Loving Me                             Bon Jovi  Spotify
        37        He's Younger Than You Are                        Sonny Rollins  Spotify
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2010-10-10
          • 1970-01-01
          • 1970-01-01
          • 2022-01-03
          • 1970-01-01
          • 1970-01-01
          • 2016-01-22
          • 1970-01-01
          相关资源
          最近更新 更多