如何将此 Python 代码转换为 R？答案

【问题标题】：How to convert this Python code to R?如何将此 Python 代码转换为 R？
【发布时间】：2016-10-18 11:48:21
【问题描述】：

，我需要帮助将 python 代码转换为 R 代码。

我有一个数据框 df，其中包含 IndicatorOfDefault 列，我也想生成一个名为 indvalues 的列。

例子：

row number   IndicatorOfDefault   indvalues

823602                        P           0 

823603                        P           0

823604                  N1,N13,           8

823605                      N1,           1

823606                        P           0

823607         N1,N2,N3,N9,N10,          13

823608                        P           0

我要转换的代码如下：

df['indicators'] = df['IndicatorOfDefault'].str.split(',')

Nvalues = {'' : -1, 'P' : 0, 'N1' : 1, 'N2' : 2, 'N11' : 3, 'N12' : 4, 'N3' : 5, 'N4' : 6, 
           'N6' : 7, 'N10' : 8, 'N13' : 9, 'N5' : 10, 'N7' : 11, 'N8': 12, 'N9' : 13}

df['indvalues'] = df['indicators'].apply(lambda x: max([Nvalues.get(y,y) for y in x ]))

我想在 R 中执行相同的代码，但我不知道如何在 R 中编写它。

谁能帮帮我？

提前致谢

为什么这个问题跑题了？我不明白出了什么问题......我是这个网站的新手，所以如果有人能解释为什么这个特定问题不属于这里，我将不胜感激？我已经阅读了帮助中心的内容，但我仍然不知道哪里出了问题。

我设法以不同的方式解决了我的问题。我得到了我想要的结果 - 最重要的指标（它不需要是必要的数字）。

    df$ind <- "P"
for(i in c(1, 2, 11, 12, 3, 4, 6, 10, 13, 5, 7, 8, 9)){
       df <- transform(df, ind = ifelse(grepl(as.character(paste0("N",i,",")),IndicatorOfDefault),as.character(paste0("N",i)),ind))
    }

例子：

row number   IndicatorOfDefault         ind

823602                        P           P 

823603                        P           P

823604                  N1,N13,         N13

823605                      N1,          N1

823606                        P           P

823607         N1,N2,N3,N9,N10,          N9

823608                        P           P

【问题讨论】：

取消标记 python，dput 你的 r 数据框并粘贴到问题中，描述你想要做什么，并添加你的尝试
@rawr：dput 你的 r 数据框是什么意思？
我拥有的数据框非常大（61 列和 823610 行），这就是为什么我发布了一个小示例，在 IndicatorOfDefault 和 indvalues 列中显示第 823602 到 823608 行（我想生成）。
dput(df[823602:823608, c('IndicatorOfDefault', 'indvalues')])
这个问题是话题，不应该被搁置。 @rawr：Jasmina 有一个 Python 数据框，而不是 R（所以没有 dput()）。她想将此 Python 代码转换为等效的 R。

标签： python r dataframe

【解决方案1】：

由于 R 没有字典对象，最好的翻译将是命名列表。请注意：R 不允许零长度名称，因此使用 N0。从那里您可以使用 R 的 vapply() 和 strsplit() 对列进行字符串拆分并找到其对应的最大值。具体来说，vapply() 与标准 lapply() 不同，用于将输出类型指定为数据框列的数字向量。

下面包含两个脚本，以显示精确的 result 列与发布的数据一起呈现。

Python

from io import StringIO
import pandas as pd

data = '''row number;IndicatorOfDefault;indvalues
823602;P;0
823603;P;0
823604;N1,N13,;8
823605;N1,;1
823606;P;0
823607;N1,N2,N3,N9,N10;13
823608;P;0'''

df = pd.read_table(StringIO(data), sep=";")

df['indicators'] = df['IndicatorOfDefault'].str.split(',')

Nvalues = {'' : -1, 'P' : 0, 'N1' : 1, 'N2' : 2, 'N11' : 3, 'N12' : 4, 'N3' : 5, 'N4' : 6, 
           'N6' : 7, 'N10' : 8, 'N13' : 9, 'N5' : 10, 'N7' : 11, 'N8': 12, 'N9' : 13}

df['result'] = df['indicators'].apply(lambda x: max([Nvalues.get(y,y) for y in x ]))

print(df)
#    row number IndicatorOfDefault  indvalues             indicators  result
# 0      823602                  P          0                    [P]       0
# 1      823603                  P          0                    [P]       0
# 2      823604            N1,N13,          8            [N1, N13, ]       9
# 3      823605                N1,          1                 [N1, ]       1
# 4      823606                  P          0                    [P]       0
# 5      823607    N1,N2,N3,N9,N10         13  [N1, N2, N3, N9, N10]      13
# 6      823608                  P          0                    [P]       0

data = 'row number;IndicatorOfDefault;indvalues
823602;P;0
823603;P;0
823604;N1,N13,;8
823605;N1,;1
823606;P;0
823607;N1,N2,N3,N9,N10;13
823608;P;0'

df = read.table(text=data, sep=";", header=TRUE, stringsAsFactors = FALSE)

Nvalues = list(N0=-1, P=0, N1=1, N2=2, N11=3, N12=4,
               N3=5, N4=6, N6=7, N10=8, N13=9, N5=10,
               N7=11, N8=12, N9=13)

df$indicators <- lapply(df$IndicatorOfDefault, function(i) {
     as.character(strsplit(i, ",")[[1]])
})

df$result <- vapply(df$indicators, function(i) {  
     max(as.numeric(Nvalues[i]))  
}, numeric(1))

df

#   row.number IndicatorOfDefault indvalues          indicators result
# 1     823602                  P         0                   P      0
# 2     823603                  P         0                   P      0
# 3     823604            N1,N13,         8             N1, N13      9
# 4     823605                N1,         1                  N1      1
# 5     823606                  P         0                   P      0
# 6     823607    N1,N2,N3,N9,N10        13 N1, N2, N3, N9, N10     13
# 7     823608                  P         0                   P      0

【讨论】：

非常感谢您的努力，但代码不起作用。当我执行时
df$indicators
我收到以下错误：
[7] 错误：“闭包”类型的对象不是子集
我尝试再次运行代码，第一个错误消失了，但现在我得到错误：[12] 错误：替换有0行，数据有823479。我的理解是函数lapply 不返回向量。