【发布时间】:2020-11-12 10:32:45
【问题描述】:
我有一个“长”格式的数据框。第一列包含日期,第二列是股票名称,最后是收盘价。绘图时,这种格式非常简单。您可以使用股票名称列在单独的图上创建不同颜色的线或刻面。太好了。
这是示例数据:
dat <- structure(list(Date = structure(c(1592611200, 1592611200, 1592611200,
1592611200, 1592697600, 1592697600,
1592697600, 1592697600, 1592784000,
1592784000, 1592784000, 1592784000,
1592870400, 1592870400, 1592870400,
1592870400, 1592956800, 1592956800,
1592956800, 1592956800, 1593043200,
1593043200, 1593043200, 1593043200,
1593129600, 1593129600, 1593129600,
1593129600, 1593216000, 1593216000,
1593216000, 1593216000, 1593302400,
1593302400, 1593302400, 1593302400,
1593388800, 1593388800, 1593388800,
1593388800),
tzone = "UTC", class = c("POSIXct", "POSIXt")),
stock_name = c("AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
"AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
"HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
"NFLX", "AAPL", "AMZN", "HTZ", "NFLX",
"AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
"AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
"HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
"NFLX", "AAPL", "AMZN", "HTZ", "NFLX"),
closing_price = c(200, 1900, 100, 150, 210,
1950, 90, 160, 211, 1975, 75, 150,
213, 1980, 60, 140, 211, 1990, 50,
150, 213, 1991, 45, 160, 214, 1990,
40, 150, 215, 1998, 38, 140, 217,
2010, 30, 150, 216, 2020, 20, 150)),
row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))
但是,我们的目标是确定以下哪些股票具有上涨趋势。我的想法是对每只股票应用一个线性模型,然后提取斜率并根据哪些是正数进行过滤。我遇到的问题是如何使用“长”形式的数据框来完成此操作。
实际上,数据框有额外的列,这些列并不能很好地转换为“宽”格式的数据框。因此,在我看来,它需要保持“长”形式。
您将如何确定其中哪些股票具有上涨趋势?
目标数据框:
dat <- structure(list(Date = structure(c(1592611200, 1592611200, 1592611200,
1592611200, 1592697600, 1592697600,
1592697600, 1592697600, 1592784000,
1592784000, 1592784000, 1592784000,
1592870400, 1592870400, 1592870400,
1592870400, 1592956800, 1592956800,
1592956800, 1592956800, 1593043200,
1593043200, 1593043200, 1593043200,
1593129600, 1593129600, 1593129600,
1593129600, 1593216000, 1593216000,
1593216000, 1593216000, 1593302400,
1593302400, 1593302400, 1593302400,
1593388800, 1593388800, 1593388800,
1593388800),
tzone = "UTC", class = c("POSIXct", "POSIXt")),
stock_name = c("AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
"AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
"HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
"NFLX", "AAPL", "AMZN", "HTZ", "NFLX",
"AAPL", "AMZN", "HTZ", "NFLX", "AAPL",
"AMZN", "HTZ", "NFLX", "AAPL", "AMZN",
"HTZ", "NFLX", "AAPL", "AMZN", "HTZ",
"NFLX", "AAPL", "AMZN", "HTZ", "NFLX"),
closing_price = c(200, 1900, 100, 150, 210,
1950, 90, 160, 211, 1975, 75, 150,
213, 1980, 60, 140, 211, 1990, 50,
150, 213, 1991, 45, 160, 214, 1990,
40, 150, 215, 1998, 38, 140, 217,
2010, 30, 150, 216, 2020, 20, 150),
trend = c("increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "",
"increasing", "increasing", "", "")),
row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))
这是我目前所得到的:
#function to label a trend as increasing
label_increasing <- function(stck_df){
mdl <- lm(closing_price ~ Date, data = stck_df)
#create a model using the date as a predictor
if(mdl$coefficients["Date"] > 0){
return("increasing")
#if the trend is increasing with date, return "increasing"
}#end if
}#end function
apple_dat <- dat %>%
filter(stock_name == "AAPL")
#filter just the apple stock
apple_label <- label_increasing(filtered_dat)
apple_label
#works for a single stock
labeled_dat <- dat %>%
group_by(stock_name) %>%
mutate(trend = label_increasing(.))
labeled_dat
#does not work for the full data frame
labeled_dat <- dat %>%
group_by(stock_name) %>%
mutate(trend = map(., label_increasing))
labeled_dat
#I have a feeling I need to do some mapping but this isn't quite right
最后,这个灵感来自 NYT Covid-19 仪表板。状态增加和减少的部分。找到here。
【问题讨论】:
-
你想在哪个时期流行?
-
对于这个例子,它将是数据框的最小和最大日期。换句话说,整个时间段。