【问题标题】:Determine value based on criterias根据标准确定价值
【发布时间】:2021-04-20 05:25:12
【问题描述】:

我在很长一段时间内测量了罐子的气体排放量。我的数据集由三列组成:datetimejar。 这些罐子是按时间序列测量的,首先是“a”,然后是“b”,然后是“c”,但我的数据集中没有这些信息。因此,我想在我的数据集中创建一个新列,说明罐子是根据“a”、“b”还是“c”测量的。

到目前为止,我尝试过的事情并没有达到预期的结果。 有什么想法吗?

数据如下:

df <- structure(list(date = c("2021-03-14", "2021-03-14", "2021-03-14", 
"2021-03-14", "2021-03-14", "2021-03-14", "2021-03-14", "2021-03-14", 
"2021-03-14", "2021-03-14", "2021-03-14", "2021-03-14", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", 
"2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15", "2021-03-15"
), time = c("23:55:00", "23:56:00", "23:57:00", "23:58:00", "23:59:00", 
"00:01:00", "00:02:00", "00:03:00", "00:04:00", "00:05:00", "00:06:00", 
"00:07:00", "00:08:00", "00:09:00", "00:10:00", "00:11:00", "00:12:00", 
"00:13:00", "00:16:00", "00:17:00", "00:18:00", "00:19:00", "00:20:00", 
"00:21:00", "00:22:00", "00:23:00", "00:24:00", "00:25:00", "00:26:00", 
"00:27:00", "00:28:00", "00:29:00", "00:30:00", "00:31:00", "00:32:00", 
"00:33:00", "00:34:00", "00:35:00", "00:36:00", "00:37:00", "00:38:00", 
"00:39:00", "00:40:00", "00:41:00", "00:42:00", "00:43:00", "00:44:00", 
"00:46:00", "00:47:00", "00:48:00", "00:49:00", "00:50:00", "00:51:00", 
"00:52:00", "00:53:00", "00:54:00", "00:55:00", "00:56:00", "00:57:00", 
"00:58:00", "00:59:00", "01:00:00", "01:01:00", "01:02:00", "01:03:00", 
"01:04:00", "01:05:00", "01:06:00"), jar = c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L
), expected.outcome = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "c", "c", 
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", 
"c", "c", "c", "c", "c", "c", "c", "c")), class = "data.frame", row.names = c(NA, 
-68L))

【问题讨论】:

  • jar值获取a、b、c的逻辑是什么?
  • 嗯,这有点技术性,但我们只是说,在 jar 1 的第一个和第二个时间序列以及 jar 2 和 3 的第一个时间序列中,“gas a”被添加到 jar 中。在罐子 1 的第三和第四个时间序列以及罐子 2 和 3 的第二个时间序列中,将“气体 b”添加到罐子中,依此类推。每五秒左右进行一次测量。如果一个罐子至少 40 分钟没有被测量,那么时间序列就结束了

标签: r tidyverse


【解决方案1】:

目标似乎是根据“jar”列的变化添加一个新列。

  • 如果行中的所有必要信息都可用于计算该行中新列的值,那会更容易,您可以简单地定义一个新的 data.table 列,可能带有两个“ifelse”。例如:
dt <- data.table::data.table(df)[, Gas:= ifelse(CONDITION1, "a", ifelse(CONDITION2, "b", "c"))]
  • 然而,这里的值似乎也取决于其他行,所以我认为没有单行可以解决它。

例如,对于您的数据,似乎每次 jar 从 3 跳到 1 从一行到下一行,您的 expected.outcome 都会更改为下一个字母。 (我不确定这是否是您正在寻找的确切逻辑,因为您提到了在 40 分钟后发生变化的时间序列,在这种情况下您需要进行更改。)基于该标准,您可以创建一个循环来遍历数据帧,一点一点地建立新的列。

所以下面的代码添加会重现预期的结果。

addGasVector <- function(df)
{
  gases <- c("a", "b", "c")
  
  #initial values
  Gas <- vector() #will become a new column
  previousJar <- 0
  currentGas <- "a"
  
  #loops through every row to create a new column        
  for (row in 1:nrow(dt))
  {
    currentJar <- df[row, "jar"] 
    
    #criteria you identify for a change of gas, change accordingly
    if (previousJar == 3 & currentJar == 1)
      currentGas <- gases[match(currentGas, gases) + 1] #change of gas to next letter

    Gas <- c(Gas, currentGas) #adds the new column item
  
    previousJar <- currentJar #for the next iteration
  }
  
  df <- cbind(df, Gas) #adds the new column
  
  return(df)
}

View(addGasVector(df))

【讨论】:

    猜你喜欢
    • 2020-02-13
    • 2017-12-26
    • 2022-07-17
    • 1970-01-01
    • 1970-01-01
    • 2018-10-18
    • 2022-08-03
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多