如何从 qdap::mgsub() 平滑切换到 textclean::mgsub()？答案

【问题标题】：How to switch smoothly from qdap::mgsub() to textclean::mgsub()?如何从 qdap::mgsub() 平滑切换到 textclean::mgsub()？
【发布时间】：2018-10-10 10:06:43
【问题描述】：

由于 R 版本问题，我需要在 qdap::mgsub() 和 textclean::mgsub() 之间切换。除了参数的顺序之外，功能几乎相同：

qdap::mgsub(pattern,replacement,x)
textclean::mgsub(x,pattern,replacement)

我有很多使用qdap::mgsub() 的代码。不幸的是，当我将参数传递给函数时，我没有正确命名它们。所以我需要重新排序所有这些以便能够使用 textclean::mgsub()。

是否有（以编程方式）一种优雅的方式在这两个函数之间切换而无需更改参数的顺序？

【问题讨论】：

标签： r qdap

【解决方案1】：

想了@duckmayr 的回答，我想出了另一个解决方案来解决我的问题：

首先运行这个函数：

reorder_mgsub <- function(pattern,replacement,x){
  output <- textclean::mgsub(x,pattern,replacement)
  return(output)
}

第二：查找qdap::mgsub并将reorder_mgsub替换

此解决方案可能不太优雅，因为我必须手动执行第 2 步，但对我来说效果很好。

【讨论】：

【解决方案2】：

您可以使用正则表达式替换您在其中调用旧函数的每个文件的文本中出现的位置，使用如下函数：

replace_mgsub <- function(path) {
    file_text <- readr::read_file(path)
    file_text <- gsub("qdap::mgsub\\(([^, ]+) *, *([^, ]+) *, *([^\\)]) *\\)",
                      "textclean::mgsub\\(\\3, \\1, \\2\\)", file_text)
    readr::write_file(file_text, path)
}

然后你会调用每个相关的path（我假设你知道你需要调用函数的文件列表；如果没有，请在下面评论，我可以在上面添加一些东西）。下面是函数gsub()部分的演示：

file_text <- "qdap::mgsub(pattern,replacement,x)"
cat(gsub("qdap::mgsub\\(([^, ]+) *, *([^, ]+) *, *([^\\)]) *\\)",
         "textclean::mgsub\\(\\3, \\1, \\2\\)", file_text))
#> textclean::mgsub(x, pattern, replacement)
file_text <- "# I'll have in this part some irrelevant code
# to show it won't interfere with that
y = rnorm(1000)
qdap::mgsub(pattern,replacement,x)
z = rnorm(10)
# And also demonstrate multiple occurrences of the function
# as well as illustrate that it doesn't matter if you have spaces
# between comma separated arguments
qdap::mgsub(pattern, replacement, x)"
cat(gsub("qdap::mgsub\\(([^, ]+) *, *([^, ]+) *, *([^\\)]) *\\)",
         "textclean::mgsub\\(\\3, \\1, \\2\\)", file_text))
#> # I'll have in this part some irrelevant code
#> # to show it won't interfere with that
#> y = rnorm(1000)
#> textclean::mgsub(x, pattern, replacement)
#> z = rnorm(10)
#> # And also demonstrate multiple occurrences of the function
#> # as well as illustrate that it doesn't matter if you have spaces
#> # between comma separated arguments
#> textclean::mgsub(x, pattern, replacement)

【讨论】：

一段不错的代码！我不知道文件列表，但我想我可以自己弄清楚。我希望有一个解决方案，尽管它做了这样的事情：new_mgsub <- reorder(mgsub(),3,2,1)
@rdatasculptor (1) 是的，我有点想，但我认为这实际上可能是一个更清洁的解决方案，因为即使有这样的解决方案，您仍然必须将该定义放在您正在谈论的每个文件的开头（除非我们在这里处理一个包）以及将所有对library(qdap) 的调用替换为library(textclean) 或qdap:: 到（无）。 (2) 所有这些代码是在你正在构建的包中，还是只是你机器上的代码？ (3) 您正在使用什么操作系统（更改有关识别您需要运行该功能的文件的建议）？
我认为你的代码是对我问题的回答，所以我会排除它！考虑到您的解决方案，我想出了另一个简单的代码。我也会将此添加为答案。

【解决方案3】：

嗯，您也可以重新分配包中的原始函数以适合您的代码。

即使用mgsub的源码，

reorder_mgsub <- function(pattern,replacement,x, leadspace = FALSE, trailspace = FALSE, 
fixed = TRUE, trim = FALSE, order.pattern = fixed, safe = FALSE, 
...){
    if (!is.null(list(...)$ignore.case) & fixed) {
        warning(paste0("`ignore.case = TRUE` can't be used with `fixed = TRUE`.\n", 
            "Do you want to set `fixed = FALSE`?"), call. = FALSE)
    }
    if (safe) {
        return(mgsub_regex_safe(x = x, pattern = pattern, replacement = replacement, 
            ...))
    }
    if (leadspace | trailspace) {
        replacement <- spaste(replacement, trailing = trailspace, 
            leading = leadspace)
    }
    if (fixed && order.pattern) {
        ord <- rev(order(nchar(pattern)))
        pattern <- pattern[ord]
        if (length(replacement) != 1) 
            replacement <- replacement[ord]
    }
    if (length(replacement) == 1) {
        replacement <- rep(replacement, length(pattern))
    }
    if (any(!nzchar(pattern))) {
        good_apples <- which(nzchar(pattern))
        pattern <- pattern[good_apples]
        replacement <- replacement[good_apples]
        warning(paste0("Empty pattern found (i.e., `pattern = \"\"`).\n", 
            "This pattern and replacement have been removed."), 
            call. = FALSE)
    }
    for (i in seq_along(pattern)) {
        x <- gsub(pattern[i], replacement[i], x, fixed = fixed, 
            ...)
    }
    if (trim) {
        x <- gsub("\\s+", " ", gsub("^\\s+|\\s+$", "", x, perl = TRUE), 
            perl = TRUE)
    }
    x
}

紧随其后

assignInNamespace('mgsub', reorder_mgsub, 'textclean')

它应该将您更新的函数分配给textclean 包的命名空间，并且使用textclean::mgsub 的任何代码现在都将使用您更新的函数。这样就不需要修改所有的代码了。

【讨论】：

优雅的解决方案！（我还不确定为什么，但我有点不愿意更改包代码本身）
@rdatasculptor 是的，在使用旧的未维护或仍处于开发阶段的包时，这可能是一个有用的技巧。在某种程度上，它只是原始包的一个分支，并且该功能仅在当前会话中重新分配，因此不是永久性的。但是，是的，我也经常不愿意使用类似的功能覆盖，并且倾向于在不太侵入性的情况下使用它（比如你的情况）。如果您要向函数添加额外的代码，也可以将其包装在 tryCatch 中并带有非常明显的错误，即 error = function(e){print("My custom edit crashed the code!")} :)