【问题标题】:Error when running R function with rpy2使用 rpy2 运行 R 函数时出错
【发布时间】:2016-04-11 15:12:36
【问题描述】:

我正在尝试使用 rpy2 来运行 questionr 包中的 multi.split 函数。

这是我的代码

from rpy2 import robjects
from rpy2.robjects.packages import importr

questionr = importr(str('questionr'))

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

在最后一行之后,我收到以下错误:

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue",  : 
 'names' attribute [4] must be the same length as the vector [3]

我认为这与我发送的向量的大小有关,因为如果我删除最后一项

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]

然后运行

data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

我没有收到任何错误消息。如果我更改“split_char”变量,例如:

data_table = multi_split(data_vector, split_char='.')

无论我发送的数组大小如何,我都不会收到错误消息。

我尝试直接在 R(使用 R-Studio)中运行匹配的代码,它运行时没有问题。 关于如何解决此问题的任何想法?

【问题讨论】:

    标签: python r rpy2


    【解决方案1】:

    这似乎是因为函数multi_split(R 包中的multi.split)试图使用与第一个参数(此处为"data_vector")关联的表达式的字符串表示。

    R函数的签名是:

    multi.split(var, split.char = "/", mnames = NULL)
    

    mnames 的文档是:

    给生成的变量命名。如果为 NULL,则名称为 根据原始变量名和答案计算得出。

    在调用multi_split(data_vector, split_char='/') 中,嵌入式R 看不到变量名,因为这是一个Python 调用,data_vector 是一个匿名变量(只有内容,没有变量名)。

    我虽然您可以指定mnames,但您检查过并且这不起作用(请参阅下面的 cmets)。这就是代码的意思:vname &lt;- deparse(substitute(var)) 行被评估,无论是否指定 mnames:https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161

    另一种方法是计算 R 表达式的使用。较早的帖子应为此提供必要的信息:What object to pass to R from rpy2?

    第三种可能性是创造性地混合 Python-strings-as-R-code:

    data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
    data_vector = robjects.StrVector(data)
    # binding the R vector to a symbol in R's "GlobalEnv"
    robjects.globalenv['mydata'] = data_vector
    # the call is now in a Python string that is evaluated as R code
    data_table = robjects.r("multi.split(data_vector, split_char='/')")
    

    【讨论】:

    • 我已尝试添加 mnames 参数,您可以在此处看到:data_table = multi_split(data_vector, split_char='/', mnames=robjects.StrVector(['a', 'b'])) 但我仍然收到相同的错误消息。
    • 好的。我更新了答案。希望其中一个选项对您有用。每当 R 代码使用未计算的表达式作为字符串来创建标签或变量名称时,使用匿名对象就会产生麻烦。
    猜你喜欢
    • 2017-02-18
    • 1970-01-01
    • 1970-01-01
    • 2018-06-14
    • 1970-01-01
    • 2020-12-16
    • 2019-01-08
    • 1970-01-01
    • 2021-04-20
    相关资源
    最近更新 更多