【发布时间】:2015-04-15 03:49:53
【问题描述】:
我正在使用 R 从 Google Analytics API 获取一些数据。在这个特定场景中,我获得了按性别和年龄组划分的用户的兴趣爱好相关信息。我得到的数据结构类似于:
gender ageGroup interest sessions
male 18-24 Autos 4
male 18-24 Autos/Luxury 1
male 18-24 Autos/Vans 1
male 25-34 Autos 8
male 25-34 Autos/Luxury 2
male 25-34 Autos/Vans 2
male 25-34 Autos/Compacts 1
...
female 65+ Fashion 20
然而,这种结构的问题是,作为主要兴趣的汽车也包含子类别的会话,如果我在数据透视表中使用这些数据,我会得到错误的信息。
因此,我将子类别“通才”添加到每个主要类别中作为其自己的子类别,并将此列一分为二:
for (i2 in 1:nrow(ga.genderAgeAffinityTable) ) {
# main categories <- chrFound = integer(0)
chrFound <- grep("[/]", ga.genderAgeAffinityTable$interest[i2] )
if (length(chrFound) < 1) {
ga.genderAgeAffinityTable$interest[i2] <-
sprintf("%s/Generalists", ga.genderAgeAffinityTable$interest[i2])
}
ga.genderAgeAffinityTable <- as.data.frame
(cSplit(ga.genderAgeAffinityTable, "interest", sep = "/"))
}
View(ga.genderAgeAffinityTable)
gender ageGroup interest subcategory sessions
male 18-24 Autos Generalists 4
male 18-24 Autos Luxury 1
male 18-24 Autos Vans 1
male 25-34 Autos Generalists 8
male 25-34 Autos Luxury 2
male 25-34 Autos Vans 2
male 25-34 Autos Compacts 1
...
female 65+ Fashion Generalists 20
我仍然必须摆脱错误的会话计算,至于第一组(男性,18-24 岁,汽车爱好者),通才应该只有 2 个会话(会话 - 总和(其他子类别))。我正在使用 auxId (genderAgeInterestSubcategory) 执行此操作,按该 auxId 汇总所有会话,将聚合的会话合并为我的数据框中的新列并重新计算子类别“通才”的会话:
ga.genderAgeAffinityTable$auxId <- sprintf("%s%s%s",
ga.genderAgeAffinityTable$gender, ga.genderAgeAffinityTable$age,
ga.genderAgeAffinityTable$interest_1 )
ga.interestAggregated <- aggregate(ga.genderAgeAffinityTable[,c("sessions")],
by=list(ga.genderAgeAffinityTable$auxId), "sum")
colnames(ga.interestAggregated) <- c("auxId", "aggregated")
ga.genderAgeAffinityTable <- (merge(ga.genderAgeAffinityTable,
ga.interestAggregated, by = 'auxId'))
for (i3 in 1:nrow(ga.genderAgeAffinityTable) ) {
if (ga.genderAgeAffinityTable$interest_2[i3] == "Generalists" ) {
# Do not recalculate sessions for interests with only Generalists as subcategory
if (ga.genderAgeAffinityTable$aggregated[i3] -
ga.genderAgeAffinityTable$sessions[i3] != 0 ) {
ga.genderAgeAffinityTable$sessions[i3] <-
ga.genderAgeAffinityTable$aggregated[i3] -
ga.genderAgeAffinityTable$sessions[i3]
}
}
}
您知道不使用 auxid 的更直接的方法吗?
【问题讨论】:
-
我想这里的很多人都愿意帮助你,但你的例子并不完全是
minimal reproducible examplestackoverflow.com/questions/5963269/…
标签: r