【问题标题】:How to optimize code that generates graph in R如何优化在 R 中生成图形的代码
【发布时间】:2021-09-10 12:29:39
【问题描述】:

你能帮我优化下面的代码吗?如您所见,我两次使用相同的日期,一次用于图形生成,一次用于子集 y 生成。结果是正确的,但我需要一些帮助来尝试优化以至少只使用一次日期以及您认为必要的另一次优化。欢迎任何帮助。

非常感谢!

library(dplyr)
library(lubridate)
library(tidyverse)

#dataset
df <- structure(
  list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
                 "2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03"),
       Week= c("Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Monday"),
       DR01 = c(4,1,4,3,3,4,3,6), DR02= c(4,2,6,7,3,2,7,4),DR03= c(9,5,4,3,3,2,1,5),
       DR04 = c(5,4,3,3,6,2,1,9),DR05 = c(5,4,5,3,6,2,1,9),
       DR06 = c(2,4,3,3,5,6,7,8),DR07 = c(2,5,4,4,9,4,7,8)),
  class = "data.frame", row.names = c(NA, -8L))

#Generate graph

dmda<-"2021-07-01"
dta<-df

datas<-dta %>%
  filter(date2 == ymd(dmda)) %>%
  summarize(across(starts_with("DR"), sum)) %>%
  pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
  mutate(name = as.numeric(name))
colnames(datas)<-c("Days","Numbers")
attach(datas)
plot(Numbers ~ Days, ylim=c(0,20))

model <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 47,b2 = 0))

new.data <- data.frame(Days = seq(min(Days),max(Days),len = 45))
lines(new.data$Days,predict(model,newdata = new.data))


#Add the y points to the graph

df[, 1:2] = lapply(df[, 1:2], FUN = as_date)

get_cutoff = function(date) {
  date2 = as_date(date)
  date1 = df[1,1]
  as.numeric(date2 - date1 + 1)
}

subset_data = function(date, start_index) {
  date = as_date(date)
  if (date > df[1,1]) {
    end_index = start_index + get_cutoff(date) - 1
    df[, -c(start_index:end_index)] %>%
      filter(date2 == date)
  } else {
    return(df)
  }
} 

y<-subset_data("2021-07-01", 4)
y

pivot_longer(y, 
             cols=c(starts_with("DR"))) %>% 
  mutate(day = parse_number(name)) -> new_y
new_y

lines(x=new_y$day, y=new_y$value, col="red")
points(x=new_y$day, y=new_y$value, col="red")

【问题讨论】:

  • 您想要优化什么?速度?可读性?这真的是你代码的瓶颈吗?
  • 其实我的优化思路是不需要把同一个日期用两次,也就是我只放一次。如果您认为可以让代码保持更好的顺序,我们也欢迎您。

标签: r


【解决方案1】:

进行这些更改:

  • 仅使用加载包
  • 可以消除润滑脂
  • 不需要数据
  • 在过滤器中我们不需要将 dmda 转换为 Date 类
  • pivot_wider 可以转换名称
  • 不要使用附加
  • 模型的参数是线性的,所以使用 lm,而不是 nls
  • 用曲线替换 new.data/lines
  • 不要覆盖 df
  • 简化截止计算
  • 使用 type = "o" 将点/线减少为仅线
  • 在行中使用子集

现在假设 dfdmda 已被定义为我们有这个问题。

library(dplyr)
library(tidyr)

datas <- df %>%
  filter(date2 == dmda) %>%
  summarize(across(starts_with("DR"), sum)) %>%
  pivot_longer(everything(), names_pattern = "DR(.+)", 
    names_to = "Days", values_to = "Numbers", 
    names_transform = list(Days = as.numeric))

plot(Numbers ~ Days, datas, ylim=c(0,20))

model <- lm(Numbers ~ I(Days^2), datas)
rng <- range(datas$Days)
curve(predict(model, list(Days = x)), rng[1], rng[2], add = TRUE)

# assume this for cutoff.  You may or may not need to change this line.
cutoff <- as.numeric(as.Date(dmda) - first(as.Date(df$date1))) + 1
lines(Numbers ~ Days, datas, subset = seq_len(nrow(datas)) > cutoff,
  type = "o" , col = "red")

【讨论】:

  • 你的流程图是我开始学习和使用 R 时应该知道或教过的
  • 非常感谢格洛腾迪克,但是当我运行您的代码时它不起作用。您能否再次测试一下,并在您的代码中插入带有日期的 dmda
  • 已修复。现在就试试。首先将 df 和 dmda 复制到 R 的新会话中,然后复制答案中的代码。
【解决方案2】:

我使用 ggplot 而不是基本的 R 绘图函数,因为您已经在 tidyverse 中工作。下面的方法将把它全部绘制在一张图上。

dmda<-"2021-07-01"
dta<-df

## Rather than rely on column position, explicitly set the number
## of days desired for highlighting on plot
num_days <- 3

model <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 47,b2 = 0))
new.data <- data.frame(Days = seq(min(Days),max(Days),len = 45)) %>%
    mutate(Numbers = predict(model, newdata = .))

datas<-dta %>%
    filter(date2 == ymd(dmda)) %>%
    summarize(across(starts_with("DR"), sum)) %>%
    ## Can convert data to numeric and create column names inside pivot_longer
    pivot_longer(everything(), names_pattern = "DR(.+)", 
                 values_to = "Numbers", names_to = "Days",
                 names_transform = list(Days = as.numeric, Numbers = as.numeric)) %>%
    ## Create flag for whether the values are in the final number of days
    mutate(subs = 1:n() > (n() - num_days))


plt <- ggplot(datas, aes(x = Days, y = Numbers)) +
    geom_point(aes(color = subs)) +
    geom_line(data = filter(datas, subs == TRUE), color = "red") +
    geom_line(data = new.data, color = "black") +
    scale_y_continuous(limits = c(0, 20)) +
    scale_color_manual(values = c("black", "red"))
plt

【讨论】:

    猜你喜欢
    • 2021-10-26
    • 1970-01-01
    • 2022-01-21
    • 1970-01-01
    • 2021-12-28
    • 1970-01-01
    • 2016-10-27
    • 2011-08-30
    • 2021-11-23
    相关资源
    最近更新 更多