【问题标题】:Updating a dataframe with function and sapply使用函数和 sapply 更新数据框
【发布时间】:2012-10-05 15:14:05
【问题描述】:

我正在尝试将数据框中的列设置为“美国”或“外国”,具体取决于国家/地区。我相信这样做的正确方法是编写一个函数,然后使用sapply 来实际更新数据帧。这是我第一次在R 中尝试这样的事情——在SQL 中,我会写一个UPDATE 查询。

这是我的数据框:

str(clients)
'data.frame':   252774 obs. of  4 variables:
 $ ClientID     : Factor w/ 252774 levels "58187855","59210128",..: 19 20 21 22 23 24 25 26 27 28 ...
 $ Country          : Factor w/ 207 levels "Afghanistan",..: 196 60 139 196 196 40 40 196 196 196 ...
 $ CountryType     : chr  "" "" "" "" ...
 $ OrderSize        : num  12.95 21.99 5.00 7.50 44.5 ...


head(clients)
       ClientID  Country       CountryType  OrderSize
1      58187855  United States              12.95
2      59210128  France                     21.99
3      65729284  Pakistan                   5.00
4      25819711  United States              7.50
5      62837458  United States              44.55
6      88379852  China                      99.28

我试图写的函数是这样的:

updateCountry <- function(x) {
  if (clients$Country == "US") {
        clients$CountryType <- "US"
  } else {
    clients$CountryType <- "Foreign"
    }
}

然后我会像这样应用它:

sapply(clients, updateCountry)

当我对数据帧的头部运行sapply 时,我得到了这个:

"US" "US" "US" "US" "US" "US" 
Warning messages:
1: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used
2: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used
3: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used
4: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used
5: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used
6: In if (clients$Country == "United States") { :
  the condition has length > 1 and only the first element will be used

该函数似乎正确地对 Country 进行了分类,但没有正确更新 clients$CountryType 列。我究竟做错了什么?另外 - 这是完成更新数据框的最佳方式吗?

【问题讨论】:

    标签: r sapply


    【解决方案1】:

    ifelse 似乎是您真正想要的。它是 if/else 构造的矢量化版本。

     clients$CountryType <- ifelse(clients$Country == "US", "US", "Foreign")
    

    【讨论】:

    • 更简单的方法。我很感激。
    • 最简单最好的(奥卡姆剃刀 - SO 中的简约原则)。 +1
    猜你喜欢
    • 1970-01-01
    • 2018-05-24
    • 2018-09-18
    • 1970-01-01
    • 2021-10-16
    • 1970-01-01
    • 2018-09-02
    • 2016-09-05
    相关资源
    最近更新 更多