【问题标题】:How to limit number of rows that fill a dataframe in a for loop如何限制在for循环中填充数据框的行数
【发布时间】:2019-05-22 19:11:31
【问题描述】:

我编写了以下从网站上抓取多个页面的函数。我只想得到前 20 页左右。如何限制我在数据框中填写的行数:

def scrape_page(poi,page_name):
    base_url="https://www.fake_website.org/"
    report_url=(base_url+poi)
    page=urlopen(report_url)
    experiences=BeautifulSoup(page,"html.parser")
    empty_list=[]
    for link in experiences.findAll('a', attrs={'href': re.compile(page_name+".shtml$")}):
        url=urljoin(base_url, link.get("href"))
        subpage=urlopen(url)
        expages=BeautifulSoup(subpage, "html.parser")
        for report in expages.findAll('a', attrs={'href': re.compile("^/experiences/exp")}):
            url=urljoin(base_url, report.get("href"))
            reporturlopen=urlopen(url)
            reporturl=BeautifulSoup(reporturlopen, "html.parser")
            book_title= reporturl.findAll("div",attrs={'class':'title'})
            for i in book_title:
                title=i.get_text()
            book_genre= reporturl.findAll("div",attrs={'class':'genre'})
            for i in book_genre:
                genre=i.get_text()
            book_author= reporturl.findAll("div",attrs={'class':'author'})
            for i in book_author:
                author=i.get_text()
                author = re.sub("by", "",author)
     empty_list.append({'title':title,'genre':genre,'author':author})
     setattr(sys.modules[__name__], '{}_df'.format(poi+"_"+page_name), empty_list)

【问题讨论】:

  • 在循环中添加一个计数器?

标签: python dataframe web-scraping


【解决方案1】:

例如,您可以添加一个 while 循环:

i = 0
while i < 20:
    < insert your code >
    i += 1

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-11
    • 1970-01-01
    • 2022-01-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多