【发布时间】:2020-08-01 04:41:44
【问题描述】:
如果这个问题的答案很明显,请原谅我,我对 R 很陌生。
我正在尝试聚合这组数据,但其中一列一直返回 NA。
> dput(head(DrivingDistance,50))
structure(list(player_name = c("Brian Stuard", "Billy Hurley III",
"Greg Chalmers", "William McGirt", "Russell Knox", "Cody Gribble",
"Tony Finau", "Dustin Johnson", "Justin Thomas", "Vaughn Taylor",
"Jason Day", "Brendan Steele", "Si Woo Kim", "Brandt Snedeker",
"Jason Dufner", "Ryan Moore", "Rod Pampling", "Fabián Gómez",
"Jimmy Walker", "Jim Herman", "Pat Perez", "Daniel Berger", "Patrick Reed",
"James Hahn", "Mackenzie Hughes", "Branden Grace", "Jordan Spieth",
"Hideki Matsuyama", "Charley Hoffman", "Jhonattan Vegas", "Aaron Baddeley",
"Bubba Watson", "J.T. Poston", "Shawn Stefani", "Stewart Cink",
"William McGirt", "Fabián Gómez", "David Lingmerth", "Henrik Norlander",
"Tim Wilkinson", "Gonzalo Fernandez-Castaño", "Daniel Summerhays",
"Webb Simpson", "Peter Malnati", "Jason Bohn", "Vaughn Taylor",
"Daniel Berger", "Zac Blair", "Ryan Brehm", "Chez Reavie"), date = structure(c(17174,
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174,
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174,
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174,
17174, 17174, 17174, 17174, 17181, 17181, 17181, 17181, 17181,
17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181,
17181, 17181, 17181, 17181), class = "Date"), DrDis = c("263.1",
"265.4", "266.5", "267.9", "269.3", "270.8", "304.8", "319.6",
"301.6", "269.6", "300.4", "288.5", "271.6", "271.9", "272.0",
"272.6", "275.1", "275.4", "275.6", "276.6", "278.4", "278.5",
"279.3", "279.8", "280.4", "283.3", "283.4", "283.6", "286.0",
"286.3", "287.9", "300.3", "304.3", "304.1", "304.0", "303.9",
"303.5", "303.3", "304.5", "303.0", "301.6", "301.6", "299.6",
"298.9", "297.6", "296.3", "302.6", "295.1", "305.3", "305.5"
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))
这是尝试聚合后的回报。
player_name date DrDis
<chr> <date> <dbl>
1 A.J. McInerney 2018-02-21 NA
2 Aaron Baddeley 2018-08-01 NA
3 Aaron Rai 2019-06-06 NA
4 Aaron Wise 2018-10-28 NA
5 Abraham Ancer 2019-02-13 NA
6 Adam Bland 2018-03-04 NA
7 Adam Hadwin 2018-08-11 NA
8 Adam Long 2019-09-22 NA
9 Adam Schenk 2019-03-03 NA
10 Adam Scott 2018-08-12 NA
# ... with 551 more rows
There were 50 or more warnings (use warnings() to see the first 50)
这是我用来创建行驶距离然后聚合这组数据的代码。
DrivingDistance <-CurrentData[CurrentData$statistic == 'Driving Distance' & CurrentData$variable == 'AVG.',] %>%
select(player_name, date, value) %>%
dplyr::rename(DrDis = value)
DrivingDistance %>%
group_by(player_name) %>%
summarize_all(mean, na.rm = TRUE)
【问题讨论】:
-
如果您包含一个简单的reproducible example,其中包含可用于测试和验证可能解决方案的示例输入和所需输出,则更容易为您提供帮助。你的数据中有 NA 值吗?听起来像你。很可能是:stackoverflow.com/questions/14261619/… 的副本,因为您看到使用的是
dplyr,所以使用CurrentData %>% filter(statistic == 'Driving Distance' & variable == 'AVG.')而不是[,] -
您可以使用
dput(head(CurrentData))帮助生成数据的可行子集... -
您的数据框中是否有名为
value的列?您的示例输出与您的示例命令不匹配(具有其他字段),因此查看CurrentData而不是DrivingDistance会很有用。我也避免使用date作为变量名,因为它有其他含义。 -
@beroe 抱歉,我应该更具体一些。我刚刚用 CurrentData 的负责人更新了原始帖子。值在 CurrentData 中,但我在 DrivingDistance 中将其重命名为 DrDis
-
@MrFlick 我刚刚回顾了数据集中的每个值,并且没有 NA。我还编辑了原始帖子,以便您可以看到 CurrentData 的负责人。
标签: r