R：从时间数据序列计算中值返回错误答案

【问题标题】：R: Calculating median value from time data series returns an errorR：从时间数据序列计算中值返回错误
【发布时间】：2018-07-21 13:20:51
【问题描述】：

从时间数据序列计算中值时，我遇到了以下 R 问题。当需要计算像中值这样简单的东西时，有人能理解为什么 R 表现得如此奇怪吗？

任务：根据赛跑比赛数据集计算完成时间的中值。
问题：从时间值中获取中间值时，R 返回错误消息“argument is not numeric or logical: returned NA”。
从“NEJ_21_km_results.csv”文件中读取数据并将因子转换为 char 值。尝试将时间值从 char 转换为数字“强制引入的 NA”消息时返回（但数据帧中没有 NA 值）。
在某些其他情况下（使用其他文件时），只有在按性别过滤数据时才会返回错误消息（有时仅针对一种性别）。

1) 将数据读入“all_runners”数据帧

all_runners <- read.csv("NEJ_21_km_results.csv", stringsAsFactors=FALSE, strip.white = TRUE)

“RESULT”数据字段信息属于“chr”数据类型

str(all_runners)

'data.frame':   100 obs. of  10 variables:
 $ POS  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ BIB     : int  3 2 1 9 5 10 8 33 34 67 ...
 $ NAME    : chr  "DOMINIC KIPTARUS" "TIIDREK NURME" "ROMAN FOSTI" "RAIDO MITT"...
 $ YOB     : int  1996 1985 1983 1991 1984 1982 1993 1992 1984 1996 ...
 $ NAT     : chr  "KEN" "EST" "EST" "EST" ...
 $ CITY    : chr  "" "" "" "" ...
 $ RESULT  : chr  "01:03:55" "01:03:57" "01:06:18" "01:09:33" ...
 $ BEHIND  : chr  "" "00:00:02" "00:02:23" "00:05:38" ...
 $ NET.TIME: chr  "01:03:55" "01:03:57" "01:06:18" "01:09:31"...
 $ CAT     : chr  "MN" "M" "M" "M" ...

2) 计算所有跑步者结果的中位数

> all_runners_median = median(all_runners$RESULT, na.rm = TRUE)

警告信息：在 mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) 中：参数不是数字或逻辑：返回 NA

3) 将时间值从字符转换为数值

> results_to_numeric <- as.numeric(all_runners$RESULT)

警告信息：强制引入的 NAs

4) 计算所有女性结果的中位数（'N'=>women, 'M'=>men）

all_womens <- all_runners %>%
  filter(str_sub(CAT, 1, 1) == "N") %>%
  select(RESULT)

“RESULT”数据字段信息属于“chr”数据类型

> str(all_womens)

'data.frame'：8 obs。 1 个变量： $ 结果：字符“01:18:36”“01:20:07”“01:22:52”“01:25:11”...

警告信息：在 mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) 中：参数不是数字或逻辑：返回 NA

> all_womens
    RESULT
1 01:18:36
2 01:20:07
3 01:22:52
4 01:25:11
5 01:26:04
6 01:26:09
7 01:26:42
8 01:26:55

【问题讨论】：

您有两个问题，无法从数据框和/或字符列计算中值/平均值。首先将RESULT 更改为日期类，你会没事的。而不是select 使用pull。抱歉，我正在通过电话工作，否则我会更有帮助。

标签： r

【解决方案1】：

这里如何按时申请median：

# Get sample of 'Date/Time Type'
x <- c("01:03:55", "01:03:57", "01:06:18", "01:09:33")

# Convert to proper format 
y <- as.POSIXct(x, format = "%H:%M:%S")

# Find the median
y <- median(y)

#  Updated, no need to use strsplit and sapply, directly use format
#  ys <- strsplit(as.character(y), split = " ")
#  sapply(ys, function(x) x[2])

# Get the time
format(y,"%H:%M:%S" )
[1] "01:05:07"

当您申请as.POSIXct 时，它会关联一个日期。

编辑：根据Rich Scriven的建议，我们可以直接使用format，这样就不需要使用拆分和循环了。

如果你想按组进行分析，例如性别，你可以简单地使用：

x <- c("01:03:55", "01:03:57", "01:06:18", "01:09:33")
df <- data.frame(Gender = rep(c("M", "F"), each = 4), time = x)
# > df
#   Gender     time
# 1      M 01:03:55
# 2      M 01:03:57
# 3      M 01:06:18
# 4      M 01:09:33
# 5      F 01:03:55
# 6      F 01:03:57
# 7      F 01:06:18
# 8      F 01:09:33

df$time <- as.POSIXct(x, format = "%H:%M:%S")
time_group_by_gender <- split(df$time, df$Gender )
# > time_group_by_gender
# $F
# [1] "2018-07-21 01:03:55 +03" "2018-07-21 01:03:57 +03" "2018-07-21 01:06:18 +03"
# [4] "2018-07-21 01:09:33 +03"
# 
# $M
# [1] "2018-07-21 01:03:55 +03" "2018-07-21 01:03:57 +03" "2018-07-21 01:06:18 +03"
# [4] "2018-07-21 01:09:33 +03"

time_median <- lapply(time_group_by_gender, median)
time_median <- lapply(time_median, format, "%H:%M:%S")

# > time_median
# $F
# [1] "01:05:07"
# 
# $M
# [1] "01:05:07"

【讨论】：

谢谢，是的，确实如此，它消除了使用split 然后sapply 的需要，我会更新答案