R：添加一个较短的列，减去单列中的每一行，第 1 - 2nd，2nd - 3rd答案

【问题标题】：R: Add a column of shorter length that subtracts each row in a single column, 1st - 2nd, 2nd - 3rdR：添加一个较短的列，减去单列中的每一行，第 1 - 2nd，2nd - 3rd
【发布时间】：2016-10-03 09:50:38
【问题描述】：

我有一个看起来像这个 sx16 数据框的数据框：

如果链接失效：

数据框叫做sx16

它有列名：日期、开盘价、最高价、最低价、结算

我想添加一个名为 up_period 的列，如果下面的 calc 为正，则打印 1，如果下面的 calc 为负，则打印 0：

sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)]

当然，这会产生错误，因为新列表比原来的 sx16 短。

我尝试像这样将 rbind.fill 包裹起来：

sx16$up_period <- rbind.fill(sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)])

但这会产生以下错误：

警告信息：在 sx16$Settle[1:nrow(sx16)] - sx16$Settle[2:nrow(sx16)] ：较长的对象长度不是较短对象长度的倍数

当然，这正是我认为 rbind.fill 可以解决的问题。这是我卡住的地方。一旦我得到这个，我可以添加一个简单的 if-else 来执行 1 和 0，但我不知道如何将这个较短的列添加到我的数据框中。

【问题讨论】：

欢迎来到 SO。请阅读how to ask a question 和how to make a reproducible example
使用示例数据： iris$Sepal.Length[1:(nrow(iris)-1)]-iris$Sepal.Length[2:nrow(iris)] 将处理除最后一个以外的所有值一个
@OliPaul 他们将如何将其绑定到数据框？它少了一排。而且所有迹象都相反（尝试iris$Sepal.Length - c(NA, iris$Sepal.Length[1:nrow(iris) - 1])）
你不是说iris$Sepal.Length - c(iris$Sepal.Length[2:nrow(iris)], NA)

标签： r

【解决方案1】：

试试这个（最后一个 up_period 没有定义）：

sx16$up_period <- sx16$Settle - c(sx16$Settle[-1],NA)

【讨论】：

这非常有效。 “，NA”部分是我不明白的。非常感谢！
最后一个元素对滞后序列不可用，NA需要保持序列长度不变。

【解决方案2】：

您可以使用dplyr 包中的lead：

library(dplyr)
result <- sx16 %>% mutate(up_period=as.numeric((Settle-lead(Settle,default=NA)) > 0))
##        Date   Open   High    Low Settle up_period
##1 2016-09-30 950.00 958.50 943.00 954.00         1
##2 2016-09-29 947.00 957.25 946.00 950.25         1
##3 2016-09-28 951.75 955.75 944.50 945.50         0
##4 2016-09-27 946.75 953.50 934.00 952.50         1
##5 2016-09-26 951.50 960.25 943.75 945.25         0
##6 2016-09-23 975.00 976.25 952.50 955.00        NA

在这里，我们将lead 的default 参数显式设置为NA 以填充最后的值，以表明我们可以将其设置为另一个值，例如如果我们想要的最后一个值。请注意，也不需要使用if-else，因为我们可以使用as.numeric 将布尔值转换为1,0。

您的数据的dput 是：

sx16 <- structure(list(Date = structure(c(17074, 17073, 17072, 17071, 
17070, 17067), class = "Date"), Open = c(950, 947, 951.75, 946.75, 
951.5, 975), High = c(958.5, 957.25, 955.75, 953.5, 960.25, 976.25
), Low = c(943, 946, 944.5, 934, 943.75, 952.5), Settle = c(954, 
950.25, 945.5, 952.5, 945.25, 955)), .Names = c("Date", "Open", 
"High", "Low", "Settle"), row.names = c(NA, -6L), class = "data.frame")

【讨论】：

这是一个很好的解决方案。我认为 dplyr 可能是我的解决方案，但我不太熟悉它。我将不得不对此进行补救。 as.numeric 是 if-else 的优雅解决方案。谢谢。

【解决方案3】：

我很惊讶还没有人提到diff。 diff(sx16$Settle) 等效于 sx16$Settle[2:nrow(sx16)] - sx16$Settle[1:(nrow(sx16)-1)]。因此，以下内容对您有用：

sx16$up_period <- c(ifelse(diff(sx16$Settle)<0, 1, 0), NA)

【讨论】：

我尝试使用 diff，但遇到了一些问题。主要是它计算的变化是错误的，因为它显示从第一行到第二行的变化是+7，而不是相反。您的解决方案显然完美无缺，所以我不确定我做错了什么。我得回去看看。谢谢。

【解决方案4】：

我将使用 iris 数据集：

x <- iris 
dummy <- x$Sepal.Length             #repeat column again but rename dummy
dummy[length(dummy)+1]=0            #add a value of 0 to the end for the day thats not happened yet
dummy <- dummy[2:length(dummy)]     #translate the column to match the original for calculation
x <- cbind(x,dummy)                 #add the column to the data
x$up <- x$Sepal.Length-x$dummy      #new calculated column
x$dummy <- NULL                     #remove dummy

所以本质上，我再次添加了您的列，将其向下平移一个位置，然后使用该虚拟列进行计算。

【讨论】：