【发布时间】:2015-02-09 22:33:46
【问题描述】:
我正在为一项相当简单的任务而苦苦挣扎。我有以下数据,并想为每个 visit_high 查找 event_list 中的项目计数。所以它可能如下所示。
Visit_high visit event_list
101 1 3
101 2 5
102 1 2
103 1 6
103 2 8
103 3 5
...
Visit high 是用户 id,visit 是指他们的访问次数,event list 是他们采取的操作数。因此,用户 101 两次访问该网站,并在第一次访问时执行了 3 次操作,在第二次访问时执行了 5 次操作。
> dput(tail(mydf[1:50,c(5,10)], 10))
structure(list(event_list = structure(c(2L, 2L, 2L, 2L, 76L,
36L, 64L, 37L, 14L, 25L), .Label = c("", "100,101,102,115,116",
"100,101,102,115,116,146", "100,101,102,116", "100,101,102,116,146",
"100,101,115,116", "100,101,117,118", "100,102,115,116", "100,102,115,116,146",
"100,102,116", "100,102,116,146", "100,107,115,116", "100,107,116",
"100,115,116", "100,115,116,146", "100,116", "100,116,146", "100,117",
"102,115,116", "102,115,116,146", "102,116", "102,116,146", "107,115,116",
"108,117,118", "115,116", "115,116,146", "116", "116,146", "202",
"202,120", "205,100,101,109,117,118", "206,115,116", "206,115,116,146",
"206,116", "206,116,146", "206,214,115,116", "206,214,115,116,146",
"206,214,116", "206,214,116,146", "206,215,115,116", "206,215,115,116,146",
"207,102,115,116", "207,102,115,116,146", "207,102,116", "207,102,116,146",
"207,115,116", "208,100,101,102,115,116", "208,100,101,102,116",
"208,100,102,115,116", "208,100,115,116", "208,102,109,115,116",
"208,102,109,116", "208,102,115,116", "208,102,116", "208,109,115,116",
"208,109,115,116,146", "208,109,116", "208,115,116", "208,116",
"210,102,108,115,116", "210,102,108,116", "212,102,109,115,116",
"212,102,109,116", "212,109,115,116", "212,109,116", "212,115,116",
"214,100,101,102,115,116", "214,100,101,102,115,116,146", "214,100,115,116",
"214,100,115,116,146", "214,100,116", "214,100,116,146", "214,102,115,116",
"214,102,115,116,146", "214,102,116", "214,115,116", "214,115,116,146",
"214,116", "214,116,146", "214,207,102,115,116", "214,221,102,115,116",
"214,221,102,115,116,146", "215,100,101,102,115,116", "215,100,101,102,115,116,146",
"215,100,101,102,116", "215,100,101,115,116", "215,100,102,115,116",
"215,100,102,116", "215,100,115,116", "215,100,115,116,146",
"215,100,116", "215,102,115,116", "215,102,115,116,146", "215,102,116",
"215,115,116", "215,115,116,146", "215,116", "215,207,102,115,116",
"215,207,102,116", "215,221,100,102,115,116", "215,221,100,102,116",
"215,221,102,115,116", "215,221,102,116", "220,102,115,116",
"221,100,102,115,116", "221,100,102,115,116,146", "221,100,102,116",
"221,102,115,116", "221,102,115,116,146", "221,102,116", "226,100,117,119,120",
"227,102,115,116", "227,102,116", "228,102,115,116", "232,102,115,116",
"234,102,115,116", "235"), class = "factor"), visid_high = c(2710815361820866560,
2710815518587167232, 2710815707565725184, 2710815726893081600,
2710815857889578496, 2710815857889578496, 2710815857889578496,
2710815883659387904, 2710815902986739712, 2710815950231374336
)), .Names = c("event_list", "visid_high"), row.names = 41:50, class = "data.frame")
我有每个访问者 ID 的访问次数,但我对如何区分每个 visit_high 实例有点迷茫。
event_sum = cbind(mmf$visid_high, mmf$event_list, sapply(strsplit(mmf$event_list, ","), length))
【问题讨论】:
-
您到底想达到什么目的?每个
Visit_high的event_list的总和?例如:访客 101 的 8 个操作? -
如果您的示例使用一致的拼写,例如
visid与visit。但我们真正需要的是您提供的非常短的样本输入的样本输出。