【问题标题】:webscraping search results in R在 R 中抓取搜索结果
【发布时间】:2020-08-19 20:14:33
【问题描述】:

我是网络抓取的新手,我正在尝试抓取一些来自网站内搜索功能的数据。我正在使用 rvest 提取信息,但没有得到结果。这是网站:

https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=30350&City=&StateProvCd=&Latitude=&Longitude=

这就是我正在运行的:

URL <- 'https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=21403&City=&StateProvCd=&Latitude=&Longitude='

webpage <- read_html(URL)

name_html <- html_nodes(webpage,'.locator_result_name')

name_data <- html_text(name_html)

当我运行此代码时,我得到以下响应: 字符(0)

我希望回复是作为邮政编码搜索结果的每家公司的名称(例如“Townley-Kenton Insurance Agency”、“Bradford Turner Insurance Group LLC”)。

我知道这个页面上有一些 Javascript,我可能会遗漏重要的部分,但鉴于我对 html、CSS、javascript 的了解有限,我不确定如何应用 V8 或 PhantomJS 来完成这项工作。

感谢任何帮助。

【问题讨论】:

    标签: javascript r web-scraping phantomjs rvest


    【解决方案1】:

    数据确实是使用 javascript 动态获取的(通过 XHR GET 请求)。但是,可以使用 httr 包直接从 R 发送此请求。它返回一个 JSON 字符串,用jsonlite 很容易解析。

    您想要抓取的几乎所有信息都将在数据框Info$OfficeInfo

    library(httr)
    library(jsonlite)
    
    res <- content(GET(paste0("https://alr.encompassinsurance.com/",
                              "?PostalCode=30350&City=&StateProvCd=",
                              "&Latitude=&Longitude=")), "text")
    info <- fromJSON(res)
    
    info$OfficeInfo$Name
    #>  [1] "Townley-Kenton Insurance Agency"                          
    #>  [2] "Bradford Turner Insurance Group LLC"                      
    #>  [3] "Arthur J Gallagher Risk Management Services, Inc."        
    #>  [4] "Lanigan Insurance Group Inc"                              
    #>  [5] "Haven Insurance Group"                                    
    #>  [6] "The Leavitt Insurance Group of Atlanta, Incorporated"     
    #>  [7] "Findley Insurance Agency Inc"                             
    #>  [8] "Grimes Insurance Agency Inc"                              
    #>  [9] "Larry L Talbert Ins Agency DBA Talbert Insurance Services"
    #> [10] "The Alliance Group, Inc."                                 
    #> [11] "Concierge Insurance Group LLC"                            
    #> [12] "Sutter McLellan & Gilbreath Inc"                          
    #> [13] "The Wichalonis Insurance Agency"                          
    #> [14] "The Beck Agency"                                          
    #> [15] "USI Insurance Services LLC"                               
    #> [16] "The Insurance Store"                                      
    #> [17] "Southern Insurance Associates of Dunwoody"                
    #> [18] "D.C.J.D. Corporation DBA The Markey Insurance Group"      
    #> [19] "DM Services, Incorporated"                                
    #> [20] "Southern Insurance Advisors"                              
    #> [21] "Metro Brokers Insurance Services"                         
    #> [22] "1 Source Insurance, LLC"                                  
    #> [23] "The Bates Agency II, LLC"                                 
    #> [24] "Risk & Insurance Consultants Inc"                         
    #> [25] "Integrity Insurance & Financial Services Inc"             
    #> [26] "HN Insurance Services Inc"                                
    #> [27] "Norton Metro LLC"                                         
    #> [28] "The Nsure Network LLC"                                    
    #> [29] "Henssler Norton Insurance LLC"                            
    #> [30] "Brown & Brown Insurance of Georgia"                       
    #> [31] "America Insurance Brokers, Inc. DBA AIB"                  
    #> [32] "Clear View Insurance Agency"                              
    #> [33] "Relation Insurance Services"                              
    #> [34] "Partners Risk Services LLC"                               
    #> [35] "PointeNorth Insurance Group LLC"                          
    #> [36] "Advanced Insurors Inc"                                    
    #> [37] "Mcever & Tribble, Inc."                                   
    #> [38] "The Bethea Insurance Group, LLC"                          
    #> [39] "Watchko - Young Ins Agcy Inc"                             
    #> [40] "Sterling Seacrest Partners Inc"                           
    #> [41] "Little & Smith, Incorporated"                             
    #> [42] "LMG Insurance Services Inc"                               
    #> [43] "Granite Risk Advisors LLC"                                
    #> [44] "Mountain Lakes Insurance, LLC"                            
    #> [45] "Hutchinson Traylor Insurance"                             
    #> [46] "Edgewood Partners Insurance Center"                       
    #> [47] "ADC Agency"                                               
    #> [48] "MLG Insurance & Financial Services"                       
    #> [49] "Burnette Insurance Agency"                                
    #> [50] "Campbell and Company Enterprise, Incorporated"
    

    reprex package (v0.3.0) 于 2020-08-19 创建

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-12-05
      • 2016-01-28
      • 2020-11-17
      • 2018-01-15
      相关资源
      最近更新 更多