【问题标题】:Filling NA's values time series填充 NA 的值时间序列
【发布时间】:2021-04-21 14:23:35
【问题描述】:

我正在处理的数据:

> dput(head(data1))
structure(list(datetime_utc = c("2010-01-04 00:00:00", "2010-01-04 01:00:00", 
"2010-01-04 02:00:00", "2010-01-04 03:00:00", "2010-01-04 04:00:00", 
"2010-01-04 05:00:00"), Generation_BE = c(13143.7, 13143.7, 13143.7, 
13143.7, 13143.7, 13143.7), Generation_FR = c(63599, 62212, 62918, 
62613, 62432, 63411), Prices.BE = c(37.15, 33.47, 28, 21.29, 
16.92, 28), holidaysBE = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 
6L), class = "data.frame")

我检查了我的数据,发现缺失值 (NA)。然后,我将 NA 的值替换为中位数。 我的最终目标是研究比利时的价格,所以我制作了比利时价格时间序列。

我的代码如下:

library(dplyr)

# Check for NA values
sum(is.na(data1$Prices.BE))

# We stored the columns name with the missing values in the list called list_na
list_na <- colnames(data1)[ apply(data1, 1, anyNA) ]
list_na

# View rows where the Prices of Belgium is NA 
data1[is.na(data1$Prices.BE),]

# Replace the missing observations with the median 
median_missing <- apply(data1[,colnames(data1) %in% list_na],
                        1,
                        median,
                        na.rm =  TRUE)
newdata1 <- data1 %>%
  mutate(replace_median_Prices.BE  = ifelse(is.na(Prices.BE), median_missing[1], Prices.BE))
head(newdata1)

# Extract Belgium prices time series from data 
belgiumptimeseries <-ts(newdata1$Prices.BE, start =as.Date("2001-01-01"), frequency = 365*24)
belgiumptimeseries

# Plotting Time Series
plot(belgiumptimeseries)

library(tsfeatures)
tsfeatures(belgiumptimeseries)

 # Decomposing to estimate the trend, seasonal and random components of this time series
> belgiumptimeseries_componets <-decompose(belgiumptimeseries, type="additive")  
Error in na.omit.ts(x) : time series contains internal NAs
> plot(belgiumptimeseries_componets)
Error in plot(belgiumptimeseries_componets) : 
  object 'belgiumptimeseries_componets' not found

我的代码的最后几行带有错误,表明我存在 NA 的值。 我做错了什么,我的代码的哪一部分运行不正常!?任何建议都会受到欢迎,我无法想象我的代码有什么问题!

【问题讨论】:

  • 错误在于显示time series has no or less than 2 periodsdecompose 步骤(基于dput 数据)
  • 另外,list_nalist_na# character(0),因此 median_missing 都是 NA
  • 在这种情况下你有什么建议?为了进行分解,我必须遵循什么方法? @akrun
  • 错误来自示例数据还是来自整个数据集?您的示例 dput 只有 6 行,所以我得到了那个错误。可能是您的错误不同
  • 我的意思是您正在根据行索引对列进行子集化。这应该是names(data1)[!apply(data1, 2, anyNA)]

标签: r time-series na


【解决方案1】:

我们可以将NA 元素替换为median 值和plot

library(dplyr)
library(zoo)
library(tsfeatures)

# // read the data
data1 <- read.csv(file.choose())
# // check for NAs column wise
colSums(is.na(data1))
# datetime_utc Generation_BE Generation_FR     Prices.BE    holidaysBE 
#            0             0             0            29             0 

# // replace the NA with the median of that column and 
# // only done for numeric and if there is any NA in the column
newdata1 <- data1 %>% 
       mutate(across(where(~ is.numeric(.) && anyNA(.)), 
                       na.aggregate, FUN = median)) 
# // check for NAs again column wise
colSums(is.na(newdata1))
# datetime_utc Generation_BE Generation_FR     Prices.BE    holidaysBE 
#    0             0             0             0            0 

构建时间序列

# // Extract Belgium prices time series from data 
belgiumptimeseries <- ts(newdata1$Prices.BE, 
         start = as.Date("2001-01-01"), frequency = 365*24)

检查功能

tsfeatures(belgiumptimeseries)
# A tibble: 1 x 20
#  frequency nperiods seasonal_period  trend   spike linearity curvature e_acf1 e_acf10 seasonal_streng…  peak trough
#     <dbl>    <dbl>           <dbl>  <dbl>   <dbl>     <dbl>     <dbl>  <dbl>   <dbl>            <dbl> <dbl>  <dbl>
#1      8760        1            8760 0.0552 5.93e-7     -35.3     -18.6  0.277   0.418            0.316  2000   3942
# … with 8 more variables: entropy <dbl>, x_acf1 <dbl>, x_acf10 <dbl>, diff1_acf1 <dbl>, diff1_acf10 <dbl>,
#   diff2_acf1 <dbl>, diff2_acf10 <dbl>, seas_acf1 <dbl>

分解时间序列

belgiumptimeseries_componets <- decompose(belgiumptimeseries, type="additive")  

plot(belgiumptimeseries_componets)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-03-04
    • 2019-05-10
    • 1970-01-01
    • 2021-09-30
    • 2014-11-14
    相关资源
    最近更新 更多