【问题标题】:Aggregate function returning NA for an entire column [duplicate]为整列返回 NA 的聚合函数 [重复]
【发布时间】:2020-08-01 04:41:44
【问题描述】:

如果这个问题的答案很明显,请原谅我,我对 R 很陌生。

我正在尝试聚合这组数据,但其中一列一直返回 NA。

> dput(head(DrivingDistance,50))
structure(list(player_name = c("Brian Stuard", "Billy Hurley III", 
"Greg Chalmers", "William McGirt", "Russell Knox", "Cody Gribble", 
"Tony Finau", "Dustin Johnson", "Justin Thomas", "Vaughn Taylor", 
"Jason Day", "Brendan Steele", "Si Woo Kim", "Brandt Snedeker", 
"Jason Dufner", "Ryan Moore", "Rod Pampling", "Fabián Gómez", 
"Jimmy Walker", "Jim Herman", "Pat Perez", "Daniel Berger", "Patrick Reed", 
"James Hahn", "Mackenzie Hughes", "Branden Grace", "Jordan Spieth", 
"Hideki Matsuyama", "Charley Hoffman", "Jhonattan Vegas", "Aaron Baddeley", 
"Bubba Watson", "J.T. Poston", "Shawn Stefani", "Stewart Cink", 
"William McGirt", "Fabián Gómez", "David Lingmerth", "Henrik Norlander", 
"Tim Wilkinson", "Gonzalo Fernandez-Castaño", "Daniel Summerhays", 
"Webb Simpson", "Peter Malnati", "Jason Bohn", "Vaughn Taylor", 
"Daniel Berger", "Zac Blair", "Ryan Brehm", "Chez Reavie"), date = structure(c(17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 17174, 
17174, 17174, 17174, 17174, 17181, 17181, 17181, 17181, 17181, 
17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181, 17181, 
17181, 17181, 17181, 17181), class = "Date"), DrDis = c("263.1", 
"265.4", "266.5", "267.9", "269.3", "270.8", "304.8", "319.6", 
"301.6", "269.6", "300.4", "288.5", "271.6", "271.9", "272.0", 
"272.6", "275.1", "275.4", "275.6", "276.6", "278.4", "278.5", 
"279.3", "279.8", "280.4", "283.3", "283.4", "283.6", "286.0", 
"286.3", "287.9", "300.3", "304.3", "304.1", "304.0", "303.9", 
"303.5", "303.3", "304.5", "303.0", "301.6", "301.6", "299.6", 
"298.9", "297.6", "296.3", "302.6", "295.1", "305.3", "305.5"
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))

这是尝试聚合后的回报。

   player_name    date       DrDis
   <chr>          <date>     <dbl>
 1 A.J. McInerney 2018-02-21    NA
 2 Aaron Baddeley 2018-08-01    NA
 3 Aaron Rai      2019-06-06    NA
 4 Aaron Wise     2018-10-28    NA
 5 Abraham Ancer  2019-02-13    NA
 6 Adam Bland     2018-03-04    NA
 7 Adam Hadwin    2018-08-11    NA
 8 Adam Long      2019-09-22    NA
 9 Adam Schenk    2019-03-03    NA
10 Adam Scott     2018-08-12    NA
# ... with 551 more rows
There were 50 or more warnings (use warnings() to see the first 50)

这是我用来创建行驶距离然后聚合这组数据的代码。

DrivingDistance <-CurrentData[CurrentData$statistic == 'Driving Distance' & CurrentData$variable == 'AVG.',] %>% 
  select(player_name, date, value) %>% 
  dplyr::rename(DrDis = value) 


DrivingDistance %>%
  group_by(player_name) %>%
  summarize_all(mean, na.rm = TRUE)

【问题讨论】:

  • 如果您包含一个简单的reproducible example,其中包含可用于测试和验证可能解决方案的示例输入和所需输出,则更容易为您提供帮助。你的数据中有 NA 值吗?听起来像你。很可能是:stackoverflow.com/questions/14261619/… 的副本,因为您看到使用的是dplyr,所以使用CurrentData %&gt;% filter(statistic == 'Driving Distance' &amp; variable == 'AVG.') 而不是[,]
  • 您可以使用dput(head(CurrentData)) 帮助生成数据的可行子集...
  • 您的数据框中是否有名为value 的列?您的示例输出与您的示例命令不匹配(具有其他字段),因此查看CurrentData 而不是DrivingDistance 会很有用。我也避免使用date 作为变量名,因为它有其他含义。
  • @beroe 抱歉,我应该更具体一些。我刚刚用 CurrentData 的负责人更新了原始帖子。值在 CurrentData 中,但我在 DrivingDistance 中将其重命名为 DrDis
  • @MrFlick 我刚刚回顾了数据集中的每个值,并且没有 NA。我还编辑了原始帖子,以便您可以看到 CurrentData 的负责人。

标签: r


【解决方案1】:

试试这个解决方案:

DrivingDistance %>% mutate(DrDis=as.numeric(DrDis)) %>%
  group_by(player_name) %>%
  summarize_all(mean, na.rm = TRUE)

# A tibble: 46 x 3
   player_name      date       DrDis
   <chr>            <date>     <dbl>
 1 Aaron Baddeley   2017-01-08  288.
 2 Billy Hurley III 2017-01-08  265.
 3 Branden Grace    2017-01-08  283.
 4 Brandt Snedeker  2017-01-08  272.
 5 Brendan Steele   2017-01-08  288.
 6 Brian Stuard     2017-01-08  263.
 7 Bubba Watson     2017-01-08  300.
 8 Charley Hoffman  2017-01-08  286 
 9 Chez Reavie      2017-01-15  306.
10 Cody Gribble     2017-01-08  271.
# ... with 36 more rows

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-06-15
    • 1970-01-01
    • 2018-05-17
    • 2019-09-03
    • 2012-11-15
    • 2019-09-30
    • 2013-10-05
    相关资源
    最近更新 更多