【问题标题】:variable value occuring on 2 dates R发生在 2 个日期 R 上的变量值
【发布时间】:2013-07-19 20:05:53
【问题描述】:

我想找出谁在至少 2 个不同(唯一)日期吃过苹果或橙子。我想创建一个新列,其中包含一个二元指标,用于显示个人在至少两个日期(1=是,0=否)是否有橙子或苹果。

离我最近的是这个 plyr 代码。

df1<- ddply(df, .(names, fruit), mutate, acne = ifelse(fruit=="apple" | fruit=="orange" & length(unique(dates))>=2,1,0))

但这不是解决方案。安妮在同一天得到了两次苹果,所以她不应该在这里得到 1。同样,ted 得到 1,尽管他只得到过一次苹果。

这更接近,但仍然不正确。它对出现两次的任何水果给出 1。需要水果在每人两个单独的日期每人出现两次

df2<- ddply(df, .(fruit), mutate, acne = ifelse(length(unique(dates))>=2, 1, 0
##this one gives a 1 to any fruit that has occurred twice. Need the fruit to occur twice per person on two individual dates per person.

如果有人能在这里指出正确的方向,我将不胜感激。

提前谢谢你

样本 DF

names<-as.character(c("john", "john", "philip", "ted", "john", "john", "anne", "john", "mary","anne", "mary","mary","philip","mary", "su","mary", "jim", "sylvia", "mary", "ted","ted","mary", "sylvia", "jim", "ted", "john", "ted"))
dates<-as.Date(c("2010-07-01", "2010-07-13", "2010-05-12","2010-02-14","2010-06-30","2010-08-15", "2010-03-21","2010-04-04","2010-09-01", "2010-03-21", "2010-12-01", "2011-01-01", "2010-08-12",  "2010-11-11", "2010-05-12",  "2010-12-03", "2010-07-12",  "2010-12-21", "2010-02-18",  "2010-10-29", "2010-08-13",  "2010-11-11", "2010-05-12",  "2010-04-01", "2010-05-06",  "2010-09-28", "2010-11-28" ))
fruit<-as.character(c("kiwi","apple","mango", "banana","strawberry","orange","apple","raspberry", "orange","apple","orange", "apple", "strawberry", "apple", "pineapple", "peach", "orange", "nectarine", "grape","banana", "melon", "apricot", "plum", "lychee", "mango", "watermelon", "apple" ))
df<-data.frame(names,dates,fruit)
df

所需的输出

    names      dates      fruit v1
7    anne 2010-03-21      apple  0
10   anne 2010-03-21      apple  0
17    jim 2010-07-12     orange  0
24    jim 2010-04-01     lychee  0
1    john 2010-07-01       kiwi  1
2    john 2010-07-13      apple  1
5    john 2010-06-30 strawberry  1
6    john 2010-08-15     orange  1
8    john 2010-04-04  raspberry  1
26   john 2010-09-28 watermelon  1
9    mary 2010-09-01     orange  1
11   mary 2010-12-01     orange  1
12   mary 2011-01-01      apple  1
14   mary 2010-11-11      apple  1
16   mary 2010-12-03      peach  1
19   mary 2010-02-18      grape  1
22   mary 2010-11-11    apricot  1
3  philip 2010-05-12      mango  0
13 philip 2010-08-12 strawberry  0
15     su 2010-05-12  pineapple  0
18 sylvia 2010-12-21  nectarine  0
23 sylvia 2010-05-12       plum  0
4     ted 2010-02-14     banana  0
20    ted 2010-10-29     banana  0
21    ted 2010-08-13      melon  0
25    ted 2010-05-06      mango  0
27    ted 2010-11-28      apple  0

【问题讨论】:

  • +1 用于提供可重现的示例、明确的代码预期目标以及解决问题的尝试。

标签: r unique


【解决方案1】:

这应该可以解决问题:

 v1 = ave(1:nrow(df),df$names,FUN=function(x) length(unique(df$dates[x[df$fruit[x]
                                              %in% c("orange","apple")]]))>1)
 df$v1 = v1
 df = df[order(df$names),]

【讨论】:

  • 谢谢阿米特!简洁的好代码 - 我的数据似乎总是需要很长时间 - 到崩溃的地步(101974行和27列)。这是典型的吗?
【解决方案2】:

如果我理解正确,就您的问题而言,苹果 == 橙子。所以计划是 (1)创建一个小的数据框,其中水果仅是橙子或苹果,因为您不关心其他水果,(b)仅选择唯一的日期/名称行,(c)按名称聚合和(d)合并回到你原来的 data.frame 得到你的结果:

ndf <- subset(df, fruit %in% c("apple", "orange"))
ndf <- ndf[!duplicated(ndf[, c("names", "dates")]), ]

这里可以使用table,但我更喜欢聚合

v <- aggregate(rep(1, nrow(ndf)), by = ndf[, "names", drop = FALSE], sum)
v$x <- ifelse(v$x > 1, 1, 0)
rv <- merge(df, v)

在代码方面,它比其他答案要长一些,但很清楚,而且肯定能胜任。 您可以在没有前两部分的情况下只使用聚合,但如果您有巨大的 data.frame,为每个名称聚合​​大量名称可能会非常昂贵。

【讨论】:

  • 嗨乔治,我喜欢这个主意。但似乎在某处放错了逗号??? v
  • 是的,不是逗号,而是')'。它已经修复了。很抱歉这个错误。
【解决方案3】:

我使用by 做了类似于@amit 的解决方案。在do.call 期间,行名被破坏了,但你可以解决这个问题。

result <- by(df, INDICES = df$names, FUN = function(x) {
  if (length(unique(x$dates)) == 1) {
    x$index <- 0
    return(x)
  }
  ao.sum <- sum(x$fruit %in% c("apple", "orange"))
  if (ao.sum < 2) x$index <- 0 else x$index <- 1
  x
})

do.call("rbind", result)

           names      dates      fruit index
anne.7      anne 2010-03-21      apple     0
anne.10     anne 2010-03-21      apple     0
jim.17       jim 2010-07-12     orange     0
jim.24       jim 2010-04-01     lychee     0
john.1      john 2010-07-01       kiwi     1
john.2      john 2010-07-13      apple     1
john.5      john 2010-06-30 strawberry     1
john.6      john 2010-08-15     orange     1
john.8      john 2010-04-04  raspberry     1
john.26     john 2010-09-28 watermelon     1
mary.9      mary 2010-09-01     orange     1
mary.11     mary 2010-12-01     orange     1
mary.12     mary 2011-01-01      apple     1
mary.14     mary 2010-11-11      apple     1
mary.16     mary 2010-12-03      peach     1
mary.19     mary 2010-02-18      grape     1
mary.22     mary 2010-11-11    apricot     1
philip.3  philip 2010-05-12      mango     0
philip.13 philip 2010-08-12 strawberry     0
su            su 2010-05-12  pineapple     0
sylvia.18 sylvia 2010-12-21  nectarine     0
sylvia.23 sylvia 2010-05-12       plum     0
ted.4        ted 2010-02-14     banana     0
ted.20       ted 2010-10-29     banana     0
ted.21       ted 2010-08-13      melon     0
ted.25       ted 2010-05-06      mango     0
ted.27       ted 2010-11-28      apple     0

【讨论】:

  • 谢谢 Roman - amit 的回答中的 ave 代码又好又短(而且易于理解),但正如我对他所说的那样 - ave 函数需要很长时间才能处理我的数据。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-05-21
相关资源
最近更新 更多