识别具有上升趋势的股票答案

【问题标题】：Identify Stocks With Increasing Trends识别具有上升趋势的股票
【发布时间】：2020-11-12 10:32:45
【问题描述】：

我有一个“长”格式的数据框。第一列包含日期，第二列是股票名称，最后是收盘价。绘图时，这种格式非常简单。您可以使用股票名称列在单独的图上创建不同颜色的线或刻面。太好了。

这是示例数据：

dat <- structure(list(Date = structure(c(1592611200, 1592611200, 1592611200,
                                         1592611200, 1592697600, 1592697600,
                                         1592697600, 1592697600, 1592784000,
                                         1592784000, 1592784000, 1592784000,
                                         1592870400, 1592870400, 1592870400,
                                         1592870400, 1592956800, 1592956800,
                                         1592956800, 1592956800, 1593043200,
                                         1593043200, 1593043200, 1593043200,
                                         1593129600, 1593129600, 1593129600,
                                         1593129600, 1593216000, 1593216000,
                                         1593216000, 1593216000, 1593302400,
                                         1593302400, 1593302400, 1593302400,
                                         1593388800, 1593388800, 1593388800,
                                         1593388800), 
                                       tzone = "UTC", class = c("POSIXct", "POSIXt")), 
                      stock_name = c("AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX",
                                     "AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX"), 
                      closing_price = c(200, 1900, 100, 150, 210, 
                                        1950, 90, 160, 211, 1975, 75, 150,
                                        213, 1980, 60, 140, 211, 1990, 50,
                                        150, 213, 1991, 45, 160, 214, 1990,
                                        40, 150, 215, 1998, 38, 140, 217,
                                        2010, 30, 150, 216, 2020, 20, 150)),
                 row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))

但是，我们的目标是确定以下哪些股票具有上涨趋势。我的想法是对每只股票应用一个线性模型，然后提取斜率并根据哪些是正数进行过滤。我遇到的问题是如何使用“长”形式的数据框来完成此操作。

实际上，数据框有额外的列，这些列并不能很好地转换为“宽”格式的数据框。因此，在我看来，它需要保持“长”形式。

您将如何确定其中哪些股票具有上涨趋势？

目标数据框：

dat <- structure(list(Date = structure(c(1592611200, 1592611200, 1592611200,
                                         1592611200, 1592697600, 1592697600,
                                         1592697600, 1592697600, 1592784000,
                                         1592784000, 1592784000, 1592784000,
                                         1592870400, 1592870400, 1592870400,
                                         1592870400, 1592956800, 1592956800,
                                         1592956800, 1592956800, 1593043200,
                                         1593043200, 1593043200, 1593043200,
                                         1593129600, 1593129600, 1593129600,
                                         1593129600, 1593216000, 1593216000,
                                         1593216000, 1593216000, 1593302400,
                                         1593302400, 1593302400, 1593302400,
                                         1593388800, 1593388800, 1593388800,
                                         1593388800), 
                                       tzone = "UTC", class = c("POSIXct", "POSIXt")), 
                      stock_name = c("AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX",
                                     "AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX"), 
                      closing_price = c(200, 1900, 100, 150, 210, 
                                        1950, 90, 160, 211, 1975, 75, 150,
                                        213, 1980, 60, 140, 211, 1990, 50,
                                        150, 213, 1991, 45, 160, 214, 1990,
                                        40, 150, 215, 1998, 38, 140, 217,
                                        2010, 30, 150, 216, 2020, 20, 150),
                      trend = c("increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "",
                                "increasing", "increasing", "", "")),
                 row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))

这是我目前所得到的：

#function to label a trend as increasing
label_increasing <- function(stck_df){
  mdl <- lm(closing_price ~ Date, data = stck_df)
  #create a model using the date as a predictor
  if(mdl$coefficients["Date"] > 0){
    return("increasing")
    #if the trend is increasing with date, return "increasing"
  }#end if
}#end function

apple_dat <- dat %>%
  filter(stock_name == "AAPL")
#filter just the apple stock

apple_label <- label_increasing(filtered_dat)
apple_label
#works for a single stock

labeled_dat <- dat %>%
  group_by(stock_name) %>%
  mutate(trend = label_increasing(.))
labeled_dat
#does not work for the full data frame

labeled_dat <- dat %>%
  group_by(stock_name) %>%
  mutate(trend = map(., label_increasing))
labeled_dat
#I have a feeling I need to do some mapping but this isn't quite right

最后，这个灵感来自 NYT Covid-19 仪表板。状态增加和减少的部分。找到here。

【问题讨论】：

你想在哪个时期流行？
对于这个例子，它将是数据框的最小和最大日期。换句话说，整个时间段。

标签： r dplyr lm

【解决方案1】：

如果您想要整个时期，让我们以相同的方式开始它们并跟踪增长或百分比增长，那么您可以使用简单的过滤器语句选择仅显示从开始到结束增长的那些，无论多小


library(dplyr)
library(ggplot2)
dat %>% group_by(stock_name) %>%
  arrange(Date) %>%
  mutate(growth = closing_price - first(closing_price), 
         growth_percent = (closing_price - first(closing_price))/first(closing_price)*100) %>%
  filter(last(growth) >= 0) %>%
  ggplot(aes(x = Date, y = growth, group = stock_name, color = stock_name)) +
  geom_line()

您的原始数据

dat <- structure(list(Date = structure(c(1592611200, 1592611200, 1592611200,
                                         1592611200, 1592697600, 1592697600,
                                         1592697600, 1592697600, 1592784000,
                                         1592784000, 1592784000, 1592784000,
                                         1592870400, 1592870400, 1592870400,
                                         1592870400, 1592956800, 1592956800,
                                         1592956800, 1592956800, 1593043200,
                                         1593043200, 1593043200, 1593043200,
                                         1593129600, 1593129600, 1593129600,
                                         1593129600, 1593216000, 1593216000,
                                         1593216000, 1593216000, 1593302400,
                                         1593302400, 1593302400, 1593302400,
                                         1593388800, 1593388800, 1593388800,
                                         1593388800), 
                                       tzone = "UTC", class = c("POSIXct", "POSIXt")), 
                      stock_name = c("AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX",
                                     "AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
                                     "AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
                                     "HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
                                     "NFLX", "AAPL", "AMZN", "HTZ", "NFLX"), 
                      closing_price = c(200, 1900, 100, 150, 210, 
                                        1950, 90, 160, 211, 1975, 75, 150,
                                        213, 1980, 60, 140, 211, 1990, 50,
                                        150, 213, 1991, 45, 160, 214, 1990,
                                        40, 150, 215, 1998, 38, 140, 217,
                                        2010, 30, 150, 216, 2020, 20, 150)),
                 row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))

# dat

【讨论】：

显然，我想多了一个解决方案。这适用于我提出的问题并且更简单。我倾向于上述答案，只是因为在我看来，它似乎对个人收盘价不太敏感。这远远超出了我提出的问题，所以请怪我的提问技巧。
百分比增长也可能是一个更好的指标，但这是对代码的简单更改

【解决方案2】：

您可以nest 每个stock_name 的数据并将您的函数映射到每个组。

编辑：我不得不修改label_increasing()，所以响应变量被命名为closing_price。

library(tidyverse)

label_increasing <- function(stck_df){
  mdl <- lm(closing_price ~ Date, data = stck_df)
  #create a model using the date as a predictor
  if(mdl$coefficients["Date"] > 0){
    return("increasing")
    #if the trend is increasing with date, return "increasing"
  } #end if
}#end function

dat %>%
  group_by(stock_name) %>%
  nest() %>%
  mutate(trend = map(data, label_increasing)) %>%
  unnest(trend)


#-----

# A tibble: 2 x 3
# Groups:   stock_name [2]
  stock_name data              lm_mod    
  <chr>      <list>            <chr>     
1 AAPL       <tibble [10 x 3]> increasing
2 AMZN       <tibble [10 x 3]> increasing

【讨论】：

太棒了！这正是我想要的！我做了一个小改动：我使用了 unnest(data)，而不是 unnest(trend)。通过添加标记的趋势列，这将返回到原始 DF。哦，谢谢你发现我的错字。我更新了我的代码以包含您的编辑（close_price）

【解决方案3】：

用ggplot可视化数据，看看吧！

install.packages("ggplot2")
library(ggplot2)

ggplot(data = dat) + 
    geom_line(mapping = aes(x = Date, y = closing_price)) + 
    facet_wrap(~stock_name, scales = "free_y")

这是对您的问题的回答您将如何确定哪些股票具有上涨趋势？（如果您想要进行财务分析，我建议您聘请某人。）

【讨论】：

哈哈哈我当然不是在寻找财务分析。我的投资组合看起来像赫兹。我所追求的是能够将趋势标记为增加或减少，而无需将它们可视化。想象一个仪表板，它突出显示您个人资料中所有呈上涨趋势的股票，而不显示持平或下跌的股票。我在上面添加了一些代码，可以更好地说明我所追求的。但是很好的可视化代码！