【问题标题】:Count combinations of two variables in a data.frame计算 data.frame 中两个变量的组合
【发布时间】:2018-01-23 13:50:42
【问题描述】:

我有两个 data.frames - g 包含两个变量的所有可能(这里:8)组合,h 包含 8 个组合中任何一个的 62 个观察值(dput() 在底部)。

我在g 中添加了第三列,它应该对h 中的每个组合进行观察计数:

> g
  where what days
1    sg free    0
2    in free    0
3    hk free    0
4    de free    0
5    sg work    0
6    in work    0
7    hk work    0
8    de work    0

我想计算g 中的每个组合出现在h 中的频率,我现在使用运行良好的老式嵌套循环来计算:

for( i in seq( nrow( g ) ) )
    for( j in seq( nrow( h ) ) )
        if( all( g[ i, 1:2 ] == h[ j, ] ) ) g[ i, 3 ] <- g[ i, 3 ] + 1

这给了我想要的:

> g
  where what days
1    sg free   10
2    in free    0
3    hk free    4
4    de free    4
5    sg work   18
6    in work   10
7    hk work    6
8    de work   10

但我想知道是否有不那么神秘、更简洁的方法来做到这一点;我特别好奇base R 是否提供了我还没有发现的工具。

数据:

g <- structure(list(where = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), .Label = c("sg", "in", "hk", "de"), class = "factor"), what = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("free", "work"), class = "factor"), 
days = c(0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("where", "what", 
"days"), out.attrs = structure(list(dim = c(4L, 2L), dimnames = structure(list(
Var1 = c("Var1=sg", "Var1=in", "Var1=hk", "Var1=de"), Var2 = c("Var2=free", 
"Var2=work")), .Names = c("Var1", "Var2"))), .Names = c("dim", "dimnames")), 
row.names = c(NA, -8L), class = "data.frame")

h <- structure(list(values = c("sg", "sg", "sg", "sg", "sg", "sg", 
"sg", "sg", "sg", "sg", "sg", "sg", "sg", "sg", "in", "in", "in", 
"in", "in", "hk", "hk", "hk", "hk", "hk", "de", "de", "de", "de", 
"de", "de", "de", "sg", "sg", "sg", "sg", "sg", "sg", "sg", "sg", 
"sg", "sg", "sg", "sg", "sg", "sg", "in", "in", "in", "in", "in", 
"hk", "hk", "hk", "hk", "hk", "de", "de", "de", "de", "de", "de", 
"de"), values.1 = c("free", "work", "work", "work", "work", "free", 
"free", "work", "work", "work", "work", "work", "free", "free", 
"work", "work", "work", "work", "work", "free", "free", "work", 
"work", "work", "work", "work", "free", "free", "work", "work", 
"work", "free", "work", "work", "work", "work", "free", "free", 
"work", "work", "work", "work", "work", "free", "free", "work", 
"work", "work", "work", "work", "free", "free", "work", "work", 
"work", "work", "work", "free", "free", "work", "work", "work"
)), .Names = c("values", "values.1"), row.names = c(NA, -62L), class = "data.frame")

【问题讨论】:

  • a 's' 缺少 h ..
  • 你在寻找类似as.data.frame(table(h))的东西吗?
  • 这正是我想要的 - 甚至连这个基本功能都没有考虑是多么愚蠢。其实纯table( h )更好!它简短而甜蜜,但你为什么不把它变成一个答案呢?
  • @nicola - 请回答

标签: r combinations


【解决方案1】:

有一个简单整洁的解决方案。我更改了 h 中的列名以匹配 g 中的内容(位置和内容)。按两个值分组并总结 - 这将给出组合的计数。然后,将left_join 返回到 g,然后您就知道了。

library(dplyr)

h_s = h %>% 
  group_by(where,what) %>% 
  summarise(days=n())

g %>% 
  left_join(h_s,by=c("where","what")) %>% 
  select(where,what,days=days.y) %>%
  mutate(days = ifelse(is.na(days),0,days))

编辑

左连接的原因是为了确保在 h 中找不到任何情况。我添加了一个 mutate 来将缺失值转换为 0。

【讨论】:

  • 非常感谢,我将从您对dplyr() 的使用中学习。但是,我发现table() 选项更简单、更优雅。
  • 有很多方法可以给众所周知的猫剥皮。请注意 table 选项 - 它不能保证 g 中的每个组合的计数。您仍然需要合并或进行验证。
  • 是的,这就是 R 的美妙之处... -- 我已经检查过了,在这种情况下,它给出了所有组合的答案,包括计数为 0 的 in / free。可能当 in / freein / work 都为零时,这将是没有出现。任何,我非常感谢深思!
猜你喜欢
  • 2018-07-11
  • 1970-01-01
  • 2015-09-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-04-28
相关资源
最近更新 更多