【问题标题】:Named list equivalent in rpy2/dataframe accessrpy2/dataframe 访问中等效的命名列表
【发布时间】:2015-07-02 17:20:45
【问题描述】:

我正在尝试以两种不同的方式从 rpy2 复制 R 中 MNP 包中的示例。首先,我只是使用 robjects.r 和一个完全复制和粘贴 R 代码的字符串:

import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri
import rpy2.robjects.pandas2ri
import rpy2.robjects.packages as rpackages

robjects.pandas2ri.activate()
mnp = rpackages.importr('MNP')
base = rpackages.importr('base')

r = robjects.r
r.data('detergent')
rcmd = '''\
mnp(choice ~ 1, choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
cXnames = "price", data = detergent, n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)'''

res = r(rcmd)

这很好用,并且重现了我可以直接在 R 中执行的操作。我还想尝试使用 python 可访问对象运行此代码,从数据帧中传递数据:

import rpy2.rlike.container as rlc
df = robjects.pandas2ri.ri2py(r['detergent'])

choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'], 
                         tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))

res = mnp.mnp('choice ~ 1', 
              choiceX=['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

这失败并出现错误:

Error in xmatrix.mnp(formula, data = eval.parent(data), choiceX = call$choiceX,  : 
  Error: Invalid input for `choiceX.'
 You must specify the choice-specific varaibles at least for all non-base categories.

另一个SO response 建议用rpy2 TaggedList 替换R 命名列表。如果我删除了 MNP 的 choiceXcXnames 参数(它们是可选的),代码就会运行,所以看起来熊猫数据框正在正确传递。

我不确定 TaggedList 进入 R 后是否没有被正确解释为命名列表,或者 MNP 是否存在一些问题,没有将 choiceX 的内容与 pandas 数据帧相关联。

有人知道这里会发生什么吗?

更新

根据@lgautier 的建议,我将代码修改为:

choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'), base.as_symbol('TidePrice'), 
                          base.as_symbol('WiskPrice'), base.as_symbol('EraPlusPrice'), 
                          base.as_symbol('SoloPrice'), base.as_symbol('AllPrice')], 
                         tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))

res = mnp.mnp(robjects.Formula('choice ~ 1'), 
              choiceX=choiceX,
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

但是,我收到了与之前发布的相同的错误。

更新 2

按照@lgautier 建议的解决方法,以下代码:

choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'),
                          base.as_symbol('TidePrice'), 
                          base.as_symbol('WiskPrice'),
                          base.as_symbol('EraPlusPrice'), 
                          base.as_symbol('SoloPrice'),
                          base.as_symbol('AllPrice')], 
                         tags=('Surf', 'Tide', 'Wisk',
                               'EraPlus', 'Solo', 'All'))

choiceX = robjects.conversion.py2ro(choiceX)
# add the names
choiceX.names = robjects.vectors.StrVector(('Surf', 'Tide',
                                            'Wisk', 'EraPlus',
                                            'Solo', 'All'))

res = mnp.mnp(robjects.Formula('choice ~ 1'), 
              choiceX=choiceX,
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

仍然产生错误(尽管不同):

Error in as.vector(x, mode) : 
  cannot coerce type 'symbol' to vector of type 'any'
---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-21-7de5ad805801> in <module>()
      3               cXnames='price',
      4               data=df, n_draws=500, burnin=100,
----> 5               thin=3, verbose=True)

/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
    168                 v = kwargs.pop(k)
    169                 kwargs[r_k] = v
--> 170         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    171 
    172 pattern_link = re.compile(r'\\link\{(.+?)\}')

/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
     98         for k, v in kwargs.items():
     99             new_kwargs[k] = conversion.py2ri(v)
--> 100         res = super(Function, self).__call__(*new_args, **new_kwargs)
    101         res = conversion.ri2ro(res)
    102         return res

RRuntimeError: Error in as.vector(x, mode) : 
  cannot coerce type 'symbol' to vector of type 'any'

【问题讨论】:

    标签: python r pandas rpy2


    【解决方案1】:

    Python 代码与您的 R 不对应。您在发布后就发现了这一点,因此请在下面提供详细信息。总结是 R 符号和 Python 字符串是不等价的(尽管 R 通过在某些地方同时允许两者来混淆它自己的用户 - 例如,library("MNP")library(MNP) 都可以工作)。

    这和这个问题没什么不同:pandas and rpy2: Why does ezANOVA work via robjects.r but not robjects.packages.importr?

    ...除了 choiceX 将是一个未计算的 R 表达式而不仅仅是一个符号。

    R 代码是:

    data(detergent)
    mnp(choice ~ 1,
        # ^- this is a "formula", which is an expression in R
        choiceX = list(Surf=SurfPrice, Tide=TidePrice,
                       Wisk=WiskPrice, EraPlus=EraPlusPrice,
                       Solo=SoloPrice, All=AllPrice),
        # ^- this is a list of objects, but with the cautionary note
        #    that R evaluates expressions in argument lazily. Therefore
        #    the safest is to have it as an R expression (it may or may
        #    not work if evaluated, but this depends on the code in
        #    `mnp`)
        cXnames = "price",
        # ^- this is a string
        data = detergent,
        n.draws = 500, burnin = 100,
        thin = 3, verbose = TRUE)
    

    您拥有的 Python 是(与 cmets 关于差异):

    choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice',
                              'EraPlusPrice', 'SoloPrice', 'AllPrice'], 
                             tags=('Surf', 'Tide', 'Wisk',
                                   'EraPlus', 'Solo', 'All'))
    # ^- this is a "tagged list", and the R equivalent would be
    #    list(Surf="SurfPrice", Tide="TidePrice", Wisk="WiskPrice",
    #         EraPlus="EraPlusPrice", Solo="SoloPrice", All="AllPrice")
    #    Something closer to your R code above would be:
    #    rlc.TaggedList([as_symbol('SurfPrice'), as_symbol('TidePrice'),
    #                   ...
    #                   tags=('Surf', 'Tide', ...))
    
    res = mnp.mnp('choice ~ 1', 
                  # ^- this is a string. To make it an R formula, do
                  # robjects.Formula('choice ~ 1')
                  choiceX=['SurfPrice', 'TidePrice', 'WiskPrice',
                           'EraPlusPrice', 'SoloPrice', 'AllPrice'],
                  # ^- this should be choiceX defined above, I guess
                  cXnames='price',
                  # ^- this is a string, like in R 
                  data=df,
                  n_draws=500, burnin=100,
                  thin=3, verbose=True)
    

    编辑:

    现在这意味着以下应该可以工作

    choiceX = robjects.rinterface.parse("""
        list(Surf=SurfPrice, Tide=TidePrice,
             Wisk=WiskPrice, EraPlus=EraPlusPrice,
             Solo=SoloPrice, All=AllPrice)""")
    

    目前rpy2 没有为构建 R 表达式提供很多实用程序。如果变量名是 Python 级别的参数 你可以考虑这样的事情:

    rcode = 'list('+''.join('%s=%s' % (k,v) \
                            for k,v in \
                            (('Surf','SurfPrice'),
                             ('Tide', 'TidePrice'),
                             ('Wisk','WiskPrice'),
                             ('EraPlus','EraPlusPrice'),
                             ('Solo','SoloPrice'),
                             ('All','AllPrice'))) + ')'
    choiceX = robjects.rinterface.parse(rcode)
    

    【讨论】:

    • 感谢您的建议。如上面的更新所示,我相信我复制了您的建议,但得到了相同的错误。我还缺少什么吗?
    • 感谢您对此进行调查。第一个解决方法仍然给出错误。请参阅我原帖中的更新 2
    • @JoshAdel。啊,是的……choiceX 应该是一个未计算的 R 表达式。第一种解决方法不起作用。
    猜你喜欢
    • 2018-02-11
    • 2018-03-11
    • 1970-01-01
    • 1970-01-01
    • 2021-05-17
    • 2016-10-31
    • 1970-01-01
    • 1970-01-01
    • 2021-04-24
    相关资源
    最近更新 更多