尽管发出 purrr 的“否则”错误 - 为什么没有触发 purrr/可能的“否则”？答案

【问题标题】：Error despite purrr's 'otherwise' - Why is purrr/possibly's 'otherwise' not triggered?尽管发出 purrr 的“否则”错误 - 为什么没有触发 purrr/可能的“否则”？
【发布时间】：2021-03-29 01:11:46
【问题描述】：

我正在从网站上抓取内容。为此，我遍历链接。如果发生错误，purrr 的 possibly 副词应继续进行，并在结果中放置一个“缺失”（或“NA_character”）。

当链接到的站点不存在时，以下代码将按预期工作，即输出“缺失”；但是，如果链接到的站点存在，但我尝试从站点中提取的元素不存在，则尽管为“otherwise”定义了值，该函数仍会引发错误。

对我来说这很令人惊讶，因为文档指出

' possibly : 包装函数在发生错误时使用默认值 (otherwise)。'

知道为什么会这样吗？我知道我可以相应地修改函数（例如检查返回对象的长度）。但我不明白为什么没有使用 'otherwise' 值。

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#> 
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#> 
#>     guess_encoding

# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409

sample_data <- tibble::tibble(
  link = c(
    #link ok, selected item exists
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
    #link not ok
    "https://www.wrong-url.foobar",
    #link ok, selected item does not exist on site
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
    
           )
)


fn_get_link_to_records <- function(link_to_overview_sessions) {
  
print(link_to_overview_sessions)
    
link_to_overview_sessions %>% 
    rvest::read_html() %>% 
    rvest::html_elements("a") %>% 
    rvest::html_attr("href") %>% 
    enframe(name = NULL,
            value = "link_to_text") %>% 
    filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>% 
    mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>% 
    pull()
}


sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.

sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.

^{由reprex package (v1.0.0) 于 2021-03-28 创建}

更新：我添加了下面的输出以使意外结果（最后一个块）更清晰。

sample_data[1:2,] %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#>   link                                  link_to_text                            
#>   <chr>                                 <chr>                                   
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar          missing
sample_data[3, ] %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.

^{由reprex package (v1.0.0) 于 2021 年 3 月 29 日创建}

【问题讨论】：

对我来说，它按预期工作，3 个值为“缺失”，没有任何错误。你的packageVersion('purrr') 是什么？我在[1] ‘0.3.4’。
很容易测试，运行这个：possibly( function(){stop("FOO!")}, otherwise=NA, quiet=TRUE )()。
@RonakShah 我在 0.3.4。也是。我更新了上面的问题，以表明前 2 个链接会产生预期的输出；然而，最后一个让我感到惊讶。
我刚刚在fn_get_link_to_records函数中发现rvest中没有html_elements函数，而且read_html不是来自rvest而是xml2。我不确定它是如何为您工作的，但即使对于前 2 个值，由于这些错误，我也将它们都设为 missing。
html_elements 是在 rvest 1.0 中引入的

标签： r purrr

【解决方案1】：

错误来自map_chr，但您将possibly 包裹在fn_get_link_to_records 函数周围。如果您运行fn_get_link_to_records(sample_data$link[3])，您将看到 URL get 被打印出来，没有返回任何内容，也没有生成错误。但是，map_chr 无法将此空输出更改为字符值，因此您会收到错误消息。如果您使用 map 而不是 map_chr，您会看到它有效。

sample_data[3,] %>% 
  mutate(link_to_text= map(link, fn_get_link_to_records))

#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
#  link                                                                                     link_to_text
#  <chr>                                                                                    <list>      
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]>

但是link_to_text 是空的。您已经知道的解决方案是检查输出值的长度并返回NA 或在fn_get_link_to_records 函数中生成错误，这些情况将使用possibly 处理。

【讨论】：