如何检查向量是否包含n个连续数字答案

【问题标题】：How to check if a vector contains n consecutive numbers如何检查向量是否包含n个连续数字
【发布时间】：2013-04-20 07:52:15
【问题描述】：

假设我的向量数包含 c(1,2,3,5,7,8)，我希望找出它是否包含 3 个连续的数字，在本例中为 1,2,3。

numbers = c(1,2,3,5,7,8)
difference = diff(numbers) //The difference output would be 1,1,2,2,1

为了验证我的数字向量中是否有 3 个连续的整数，我尝试了以下方法，但收效甚微。

rep(1,2)%in%difference

上面的代码在这种情况下有效，但如果我的差异向量 = (1,2,2,2,1)，即使“1”不连续，它仍然会返回 TRUE。

【问题讨论】：

标签： r

【解决方案1】：

使用diff 和rle，这样的事情应该可以工作：

result <- rle(diff(numbers))
any(result$lengths>=2 & result$values==1)
# [1] TRUE

针对下面的 cmets，我之前的回答是专门只测试 length==3 的运行，不包括更长的长度。将 == 更改为 >= 可以解决此问题。它也适用于涉及负数的运行：

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> result <- rle(diff(numbers4))
> any(result$lengths>=2 & result$values==1)
[1] TRUE

【讨论】：

这似乎不适用于这种情况numbers = c(-2,2,3,5,6,7,8)
好的，但是如果没有机会更正，通常 (-1) 的形式不好。
对此感到抱歉。显然，在赞成票后按下会导致反对票，而不是（我所期望的）撤回赞成票。不错的修正。再次点赞。
为了不增长潜在的大向量（diff 的唯一值的数量），您可以直接在 diff(numbers) == 1 上应用 rle。
我认为您可以检查result$values 的长度是否也是1，确保任何大小的向量都是连续的，例如any(r$lengths>=2 & length(r$values)==1 & r$values==1)

【解决方案2】：

基准测试！

我包含了我的几个函数。随意添加你的。要获得资格，您需要编写一个通用函数来判断向量 x 是否包含 n 或更多连续数字。我在下面提供了一个单元测试功能。

竞争者：

flodel.filter <- function(x, n, incr = 1L) {
  if (n > length(x)) return(FALSE)
  x <- as.integer(x)
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(filter(is.cons, rep(1L, n-1L), sides = 1, method = "convolution") == n-1L,
      na.rm = TRUE)
}

flodel.which <- function(x, n, incr = 1L) {
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(diff(c(0L, which(!is.cons), length(x))) >= n)
}

thelatemail.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(x))
  any(result$lengths >= n-1L  & result$values == incr)
}

improved.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(as.integer(x)) == incr)
  any(result$lengths >= n-1L  & result$values)
}

carl.seqle <- function(x, n, incr = 1) {
  if(!is.numeric(x)) x <- as.numeric(x) 
  z <- length(x)  
  y <- x[-1L] != x[-z] + incr 
  i <- c(which(y | is.na(y)), z) 
  any(diff(c(0L, i)) >= n)
}

单元测试：

check.fun <- function(fun)
  stopifnot(
    fun(c(1,2,3),   3),
   !fun(c(1,2),     3),
   !fun(c(1),       3),
   !fun(c(1,1,1,1), 3),
   !fun(c(1,1,2,2), 3),
    fun(c(1,1,2,3), 3)
  )

check.fun(flodel.filter)
check.fun(flodel.which)
check.fun(thelatemail.rle)
check.fun(improved.rle)
check.fun(carl.seqle)

基准测试：

x <- sample(1:10, 1000000, replace = TRUE)

library(microbenchmark)
microbenchmark(
  flodel.filter(x, 6),
  flodel.which(x, 6),
  thelatemail.rle(x, 6),
  improved.rle(x, 6),
  carl.seqle(x, 6),
  times = 10)

# Unit: milliseconds
#                   expr       min       lq   median       uq      max neval
#    flodel.filter(x, 6)  96.03966 102.1383 144.9404 160.9698 177.7937    10
#     flodel.which(x, 6) 131.69193 137.7081 140.5211 185.3061 189.1644    10
#  thelatemail.rle(x, 6) 347.79586 353.1015 361.5744 378.3878 469.5869    10
#     improved.rle(x, 6) 199.35402 200.7455 205.2737 246.9670 252.4958    10
#       carl.seqle(x, 6) 213.72756 240.6023 245.2652 254.1725 259.2275    10

【讨论】：

这个上午脑子很笨。这些函数中有多少可以处理除 1 之外的增量？（无耻插件seqle 的灵活性:-)
@Carl，他们都可以；我已经修改了函数以像您一样采用可选的增量输入。我还添加了您的功能的一个版本。如果您认为可以改进，请随时修改它。例如，对于这个问题，您可以使用as.integer 而不是as.numeric。
@flodel flodel.filter 和 flodel.which 产生不同的输出。例如，flodel.filter(cbind(1,2,5), 3)（正确）评估为 False，而flodel.which(cbind(1,2,5), 3)（错误）评估为True。这意味着，flodel.which 根本不尊重 incr 参数。其他功能我没有测试。

【解决方案3】：

在diff 之后，您可以检查any 连续1s -

numbers = c(1,2,3,5,7,8)

difference = diff(numbers) == 1
## [1]  TRUE  TRUE FALSE FALSE  TRUE

## find alteast one consecutive TRUE
any(tail(difference, -1) &
    head(difference, -1))

## [1] TRUE

【讨论】：

+1 非常聪明。但是一个解释会很好，因为这里的原理远非显而易见。弄得我有点费解。
不幸的是，这不能很好地推广到大量的连续数字。我希望你不会介意我认为这样更干净的编辑。
@flodel - 感谢您的编辑。事实上，我想使用head 和tail，但我没想到-1 索引！

【解决方案4】：

很高兴在这里看到本土解决方案。

Fellow Stack Overflow 用户 Carl Witthoft 发布了一个名为 seqle() 的函数，并分享了 here。

函数如下所示：

seqle <- function(x,incr=1) { 
  if(!is.numeric(x)) x <- as.numeric(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  list(lengths = diff(c(0L,i)),
       values = x[head(c(0L,i)+1L,-1L)]) 
}

让我们看看它的实际效果。首先，一些数据：

numbers1 <- c(1, 2, 3, 5, 7, 8)
numbers2 <- c(-2, 2, 3, 5, 6, 7, 8)
numbers3 <- c(1, 2, 2, 2, 1, 2, 3)

现在，输出：

seqle(numbers1)
# $lengths
# [1] 3 1 2
# 
# $values
# [1] 1 5 7
# 
seqle(numbers2)
# $lengths
# [1] 1 2 4
# 
# $values
# [1] -2  2  5
# 
seqle(numbers3)
# $lengths
# [1] 2 1 1 3
# 
# $values
# [1] 1 2 2 1
#

您特别感兴趣的是结果中的“长度”。

另一个有趣的点是incr 参数。在这里，我们可以将增量设置为“2”，并查找数字之间的差为 2 的序列。因此，对于第一个向量，我们希望检测到 3、5 和 7 的序列。

我们试试吧：

> seqle(numbers1, incr = 2)
$lengths
[1] 1 1 3 1

$values
[1] 1 2 3 8

所以，如果我们设置incr = 2，我们可以看到我们有一个1（1）、1（2）、3（3、5、7）和1（8）的序列。

它如何应对 ECII 的第二个挑战？好像没问题！

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> seqle(numbers4)
$lengths
[1] 3 1 2

$values
[1] -2  5  7

【讨论】：

如果seqle 用于处理数字，则y 应替换为y <- abs(x[-1L] - x[-n] - incr) > .Machine$double.eps ^ 0.5。否则看看会发生什么，例如seqle(seq(0, 1, 1/17), incr = 1/17)。
@flodel，这太邪恶了！ :) 我能问一下，你如何检测这种情况？换句话说，是什么让你决定测试 1/17？
我的思路是这样的——为什么carl.seqle 比我的flodel.which 慢，尽管它们基本相同？我注意到我使用整数，而他故意使用数字。为什么有人要检查数字是否等距？我们也知道这必然会导致浮点问题。卡尔落入陷阱了吗？看起来他做到了：x[-1L] != x[-n] + incr。让我们用一个例子来仔细检查一下。我使用了 1/17（不合理），但可以选择 0.1。看，没什么神奇的！
好吧，FWIW 我当然不打算将此用于非整数。我当时正在处理通信理论（duh）。将其扩展到数字序列确实很有趣。我将不得不玩一段时间，看看我是否可以“打破”@flodel 的良好增强。
在我自己很抱歉的辩护中，我会指出打包的rle，我清楚地复制了它的代码，也不检查双打:-(

【解决方案5】：

简单但有效

numbers = c(-2,2,3,4,5,10,6,7,8)
x1<-c(diff(numbers),0)
x2<-c(0,diff(numbers[-1]),0)
x3<-c(0,diff(numbers[c(-1,-2)]),0,0)

rbind(x1,x2,x3)
colSums(rbind(x1,x2,x3) )==3 #Returns TRUE or FALSE where in the vector the consecutive intervals triplet takes place
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

sum(colSums(rbind(x1,x2,x3) )==3) #How many triplets of consecutive intervals occur in the vector
[1] 3

which(colSums(rbind(x1,x2,x3) )==3) #Returns the location of the triplets consecutive integers
[1] 2 3 7

请注意，由于 diff() 的工作方式，这对于连续的负间隔 c(-2,-1,0) 不起作用

【讨论】：