如何在 Rpy python 中应用库 VCD 的 R 函数 Assocstats答案

【问题标题】：How to apply R function Assocstats of library VCD in Rpy python如何在 Rpy python 中应用库 VCD 的 R 函数 Assocstats
【发布时间】：2014-04-30 08:15:08
【问题描述】：

我正在尝试使用 python 的 Rpy 模块计算统计参数 phi 系数、Cramer 的 V 和连续系数。在 R 中我可以这样做，但我在尝试在 python 中复制相同的内容时束手无策

Library(vcd)
data <- read.csv("test.csv")
assocstats(table(data$var_4, data$target)

Output     
                X^2 df P(> X^2)
Likelihood Ratio 113.28  1        0
Pearson          112.51  1        0

Phi-Coefficient   : 0.15 
Contingency Coeff.: 0.148 
Cramer's V        : 0.15

在python中的实现

from Rpy import r
# Already connected with mysql
q="Select var_4 , target from test"
cur.execute(q)
data=cur.fetchall()
ls1=[]
ls2=[]
for i in range(len(data)):
  ls1.append(data[i][0])
  ls2.append(data[i][1])
rpy.r.library("vcd")
rpy.r.assocstats(rpy.r.table(ls1,ls2))

错误：

Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
rpy.r.assocstats(rpy.r.table(ls1,ls2))
RPy_RException: Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

我尝试的另一种方法是从 scipy 模块计算 phi sq，然后使用数学公式计算 cramer 的 v 等。但我打算在我的项目中大量使用 Rpy。我真的很感激你能指出上述方法中的问题。我想我无法在公式中以正确的格式传递输入提前致谢

【问题讨论】：

标签： python r rpy2

【解决方案1】：

从错误中我们可以看到sort 函数与list 输入有问题。测试此案例以获取示例列表

templist<-list(c(3,2,1))
> sort(templist)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic

newlist<-unlist(templist)
>is.atomic(newlist)
[1] TRUE

> sort(newlist)
[1] 1 2 3

这里的关键是unlist，你可以使用rpy.r.is.list来确认你的输入ls1和ls2是list(s)。要取消列出它们，rpy.r.unlist 需要同时调用 ls1 和 ls2。

为了能够使用函数名称中带有.的函数，例如is.list()，可以参考(Accessing functions with a dot in theior name (eg. "as.vector") using rpy2)

我没有 rpy，因此无法确认，但我想这应该可行，请告诉我们。

【讨论】：

非常感谢。现在我面临一个新的错误。我正在努力解决。我会让你知道情况如何。错误：回溯（最近一次调用）：文件“”，第 1 行，在 r.assocstats(r.table(newls1,newls2)) RPy_RException：函数错误 (x)：仅函数为 2-way 表定义。

【解决方案2】：

您真的使用标签中所述的rpy2 吗？对我来说似乎是rpy。无论如何，如果您还没有这样做，我强烈建议您迁移到rpy2。

看起来您的ls1 ls2 只是数字列表，问题应该很简单：

In [60]:
#setting up
import rpy2.robjects as ro
mydata = ro.r['data.frame']
table = ro.r['table']
assocstats = ro.r['assocstats']
summary = ro.r['summary']
ro.r['library']('vcd')
ls1=np.random.random(50)
ls2=np.random.random(50)
result=assocstats(table(ls1, ls2))

In [61]:
#what is in the result
print result.names
[1] "table"       "chisq_tests" "phi"         "contingency" "cramer"     

In [62]:
#access the chi-sqaure table
print result.rx('chisq_tests')
$chisq_tests
                       X^2   df  P(> X^2)
Likelihood Ratio  391.2023 2401 1.0000000
Pearson          2450.0000 2401 0.2382456

【讨论】：

这正是我想要的。让我安装 rpy2 并测试。如果我遇到任何麻烦，我会通知你
非常感谢它的工作。我唯一需要更改的是取消列出我在 python 中的列表。