Package 'eNchange'

Title: Ensemble Methods for Multiple Change-Point Detection
Description: Implements a segmentation algorithm for multiple change-point detection in univariate time series using the Ensemble Binary Segmentation of Korkas (2020) <arXiv:2003.03649>.
Authors: Karolos K. Korkas
Maintainer: Karolos K. Korkas <[email protected]>
License: GPL (>= 2)
Version: 1.0
Built: 2025-03-07 04:03:51 UTC
Source: https://github.com/cran/eNchange

Help Index


Ensemble Methods for Multiple Change-Point Detection

Description

Implements a segmentation algorithm for multiple change-point detection in univariate time series using the Ensemble Binary Segmentation of Korkas (2020) <arXiv:2003.03649>.

Details

We propose a new technique for consistent estimation of the number and locations of the change-points in the structure of an irregularly spaced time series. The core of the segmentation procedure is the Ensemble Binary Segmentation method (EBS), a technique in which a large number of multiple change-point detection tasks using the Binary Segmentation (BS) method are applied on sub-samples of the data of differing lengths, and then the results are combined to create an overall answer. This methodology is applied to irregularly time series models such as the time-varying Autoregressive Conditional Duration model or the time-varying Hawkes process.

Author(s)

Karolos K. Korkas <[email protected]>.

Maintainer: Karolos K. Korkas <[email protected]>

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" <arXiv:2003.03649>.

Examples

## Not run: 
 pw.acd.obj <- new("simACD")
 pw.acd.obj@cp.loc <- seq(0.1,0.95,by=0.025)
 pw.acd.obj@lambda_0 <- rep(c(0.5,2),1+length(pw.acd.obj@cp.loc)/2)
 pw.acd.obj@alpha <- rep(0.2,1+length(pw.acd.obj@cp.loc))
 pw.acd.obj@beta <- rep(0.4,1+length(pw.acd.obj@cp.loc))
 pw.acd.obj@N <- 5000
 pw.acd.obj <- pc_acdsim(pw.acd.obj)
 ts.plot(pw.acd.obj@x,main="Ensemble BS");abline(v=EnBinSeg(pw.acd.obj@x)[[1]],col="red")
 #real change-points in grey
 abline(v=floor(pw.acd.obj@cp.loc*pw.acd.obj@N),col="grey",lty=2) 
 ts.plot(pw.acd.obj@x,main="Standard BS");abline(v=BinSeg(pw.acd.obj@x)[[1]],col="blue")
 #real change-points in grey
 abline(v=floor(pw.acd.obj@cp.loc*pw.acd.obj@N),col="grey",lty=2)

  
## End(Not run)

An S4 method to detect the change-points in an irregularly spaced time series using Binary Segmentation.

Description

An S4 method to detect the change-points in an irregularly spaced time series using the Binary Segmentation methodology described in Korkas (2020).

Usage

BinSeg(
  H,
  thresh = "universal",
  q = 0.99,
  p = 1,
  z = NULL,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  acd_p = 0,
  acd_q = 1,
  do.parallel = 2
)

## S4 method for signature 'ANY'
BinSeg(
  H,
  thresh = "universal",
  q = 0.99,
  p = 1,
  z = NULL,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  acd_p = 0,
  acd_q = 1,
  do.parallel = 2
)

Arguments

H

The input irregular time series.

thresh

The threshold parameter which acts as a stopping rule to detect further change-points and has the form C log(sample). If "universal" then C is data-independent and preselected using the approach described in Korkas (2020). If "boot" it uses the data-dependent method boot_thresh. Default is "universal".

q

The universal threshold simulation quantile or the bootstrap distribution quantile. Default is 0.99.

p

The support of the CUSUM statistic. Default is 1.

z

Transform the time series to use for post-processing. If NULL this is done automatically. Default is NULL.

start.values

Warm starts for the optimizers of the likelihood functions.

dampen.factor

The dampen factor in the denominator of the residual process. Default is "auto".

epsilon

A parameter added to ensure the boundness of the residual process. Default is 1e-5.

LOG

Take the log of the residual process. Default is TRUE.

process

Choose between acd or hawkes. Default is acd.

acd_p

The p order of the ACD model. Default is 0.

acd_q

The q order of the ACD model. Default is 1.

do.parallel

Choose the number of cores for parallel computation. If 0 no parallelism is done. Default is 2. (Only applies if thresh = "boot").

Value

Returns a list with the detected change-points and the transformed series.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint <arXiv:2003.03649>.

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- seq(0.1,0.95,by=0.025)
pw.acd.obj@lambda_0 <- rep(c(0.5,2),1+length(pw.acd.obj@cp.loc)/2)
pw.acd.obj@alpha <- rep(0.2,1+length(pw.acd.obj@cp.loc))
pw.acd.obj@beta <- rep(0.4,1+length(pw.acd.obj@cp.loc))
pw.acd.obj@N <- 5000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
ts.plot(pw.acd.obj@x,main="Standard BS");abline(v=BinSeg(pw.acd.obj@x)[[1]],col="blue")
#real change-points in grey
abline(v=floor(pw.acd.obj@cp.loc*pw.acd.obj@N),col="grey",lty=2)

A bootstrap method to calculate the threshold (stopping rule) in the BS or EBS segmentation.

Description

A bootstrap method to calculate the threshold (stopping rule) in the BS or EBS segmentation described in Cho and Korkas (2018) and adapted for irregularly time series in Korkas (2020).

Usage

boot_thresh(
  H,
  q = 0.75,
  r = 100,
  p = 1,
  start.values = c(0.9, 0.6),
  process = "acd",
  do.parallel = 2,
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  acd_p = 0,
  acd_q = 1
)

## S4 method for signature 'ANY'
boot_thresh(
  H,
  q = 0.75,
  r = 100,
  p = 1,
  start.values = c(0.9, 0.6),
  process = "acd",
  do.parallel = 2,
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  acd_p = 0,
  acd_q = 1
)

Arguments

H

The input irregular time series.

q

The bootstrap distribution quantile. Default is 0.75.

r

The number of bootrstap simulations. Default is 100.

p

The support of the CUSUM statistic. Default is 1.

start.values

Warm starts for the optimizers of the likelihood functions.

process

Choose between acd or hawkes. Default is acd.

do.parallel

Choose the number of cores for parallel computation. If 0 no parallelism is done. Default is 2.

dampen.factor

The dampen factor in the denominator of the residual process. Default is "auto".

epsilon

A parameter added to ensure the boundness of the residual process. Default is 1e-5.

LOG

Take the log of the residual process. Default is TRUE.

acd_p

The p order of the ACD model. Default is 0.

acd_q

The q order of the ACD model. Default is 1.

Value

Returns the threshold C.

References

Cho, Haeran, and Karolos Korkas. "High-dimensional GARCH process segmentation with an application to Value-at-Risk." arXiv preprint <arXiv:1706.01155> (2018).

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- c(0.25,0.75)
pw.acd.obj@lambda_0 <- c(1,2,1)
pw.acd.obj@alpha <- rep(0.2,3)
pw.acd.obj@beta <- rep(0.7,3)
pw.acd.obj@N <- 3000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
boot_thresh(pw.acd.obj@x,r=20)

An S4 method to detect the change-points in an irregularly spaced time series using Ensemble Binary Segmentation.

Description

An S4 method to detect the change-points in an irregularly spaced time series using the Ensemble Binary Segmentation methodology described in Korkas (2020).

Usage

EnBinSeg(
  H,
  thresh = "universal",
  q = 0.99,
  p = 1,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  thresh2 = 0.05,
  num_ens = 500,
  min_dist = 0.005,
  pp = 1,
  do.parallel = 2,
  b = NULL,
  acd_p = 0,
  acd_q = 1
)

## S4 method for signature 'ANY'
EnBinSeg(
  H,
  thresh = "universal",
  q = 0.99,
  p = 1,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  thresh2 = 0.05,
  num_ens = 500,
  min_dist = 0.005,
  pp = 1,
  do.parallel = 2,
  b = NULL,
  acd_p = 0,
  acd_q = 1
)

Arguments

H

The input irregular time series.

thresh

The threshold parameter which acts as a stopping rule to detect further change-points and has the form C log(sample). If "universal" then C is data-independent and preselected using the approach described in Korkas (2020). If "boot" it uses the data-dependent method boot_thresh. Default is "universal".

q

The universal threshold simulation quantile or the bootstrap distribution quantile. Default is 0.99.

p

The support of the CUSUM statistic. Default is 1.

start.values

Warm starts for the optimizers of the likelihood functions.

dampen.factor

The dampen factor in the denominator of the residual process. Default is "auto".

epsilon

A parameter added to ensure the boundness of the residual process. Default is 1e-5.

LOG

Take the log of the residual process. Default is TRUE.

process

Choose between "acd" or "hawkes" or "additive" (signal +iid noise). Default is "acd".

thresh2

Keep only the change-points that appear more than thresh2 M times.

num_ens

Number of ensembles denoted by M in the paper. Default is 500.

min_dist

The minimum distance as percentage of sample size to use in the post-processing. Default is 0.005.

pp

Post-process the change-points based on the distance from the highest ranked change-points.

do.parallel

Choose the number of cores for parallel computation. If 0 no parallelism is done. Default is 2.

b

A parameter to control how close the random end points are to the start points. A large value will on average return shorter random intervals. If NULL all points have an equal chance to be selected (uniformly distributed). Default is NULL.

acd_p

The p order of the ACD model. Default is 0.

acd_q

The q order of the ACD model. Default is 1.

Value

Returns a list with the detected change-points and the frequency table of the ensembles across M applications.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint <arXiv:2003.03649>.

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- seq(0.1,0.95,by=0.025)
pw.acd.obj@lambda_0 <- rep(c(0.5,2),1+length(pw.acd.obj@cp.loc)/2)
pw.acd.obj@alpha <- rep(0.2,1+length(pw.acd.obj@cp.loc))
pw.acd.obj@beta <- rep(0.4,1+length(pw.acd.obj@cp.loc))
pw.acd.obj@N <- 5000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
ts.plot(pw.acd.obj@x,main="Ensemble BS");abline(v=EnBinSeg(pw.acd.obj@x)[[1]],col="red")
#real change-points in grey
abline(v=floor(pw.acd.obj@cp.loc*pw.acd.obj@N),col="grey",lty=2) 
ts.plot(pw.acd.obj@x,main="Standard BS");abline(v=BinSeg(pw.acd.obj@x)[[1]],col="blue")
#real change-points in grey
abline(v=floor(pw.acd.obj@cp.loc*pw.acd.obj@N),col="grey",lty=2)

A method to simulate nonstationary ACD models.

Description

A S4 method that takes as an input a simACD object and outputs a simulated nonstationary ACD(1,1) model. The formulation of the of the piecewise constant ACD model is given in the simACD class.

Usage

pc_acdsim(object)

## S4 method for signature 'simACD'
pc_acdsim(object)

Arguments

object

a simACD object

Value

Returns an object of simACD class containing a simulated piecewise constant ACD time series.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint.

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- c(0.25,0.75)
pw.acd.obj@lambda_0 <- c(1,2,1)
pw.acd.obj@alpha <- rep(0.2,3)
pw.acd.obj@beta <- rep(0.7,3)
pw.acd.obj@N <- 3000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
ts.plot(pw.acd.obj@x)
ts.plot(pw.acd.obj@psi)

A method to simulate nonstationary Hawkes models.

Description

A S4 method that takes as an input a simHawkes object and outputs a simulated nonstationary Hawkes model. The formulation of the of the piecewise constant ACD model is given in the simHawkes class.

Usage

pc_hawkessim(object)

## S4 method for signature 'simHawkes'
pc_hawkessim(object)

Arguments

object

a simHawkes object

Value

Returns an object of simHawkes class containing a simulated piecewise constant Hawkes series.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" <arXiv:2003.03649>.

Examples

pw.hawk.obj <- new("simHawkes")
pw.hawk.obj@cp.loc <- c(0.5)
pw.hawk.obj@lambda_0 <- c(1,2)
pw.hawk.obj@alpha <- c(0.2,0.2)
pw.hawk.obj@beta <- c(0.7,0.7)
pw.hawk.obj@horizon <- 1000
pw.hawk.obj <- pc_hawkessim(pw.hawk.obj)
ts.plot(pw.hawk.obj@H)
ts.plot(pw.hawk.obj@cH)

An S4 class for a nonstationary ACD model.

Description

A specification class to create an object of a simulated piecewise constant conditional duration model of order (1,1). xt/ψt=εt  G(θ2)x_t / \psi_t = \varepsilon_t \; \sim \mathcal{G}(\theta_2) ψt=ω(t)+j=1pαj(t)xtj+k=1qβk(t)ψtk.\psi_t = \omega(t) + \sum_{j=1}^p \alpha_{j}(t)x_{t-j} + \sum_{k=1}^q \beta_{k}(t)\psi_{t-k}. where ψt=E[xtxt,,x1θ1]\psi_{t} = \mathcal{E} [x_t | x_t,\ldots,x_1| \theta_1] is the conditional mean duration of the tt-th event with parameter vector θ1\theta_1 and G(.)\mathcal{G}(.) is a general distribution over (0,+)(0,+\infty) with mean equal to 1 and parameter vector θ2\theta_2. In this work we assume that εt  exp(1)\varepsilon_t \; \sim \exp(1).

Value

Returns an object of simACD class.

Slots

x

The durational time series.

psi

The psi time series.

N

Sample sze of the time series.

cp.loc

The vector with the location of the changepoints. Takes values from 0 to 1 or NULL. Default is NULL.

lambda_0

The vector of the parameters lambda_0 in the ACD series as in the above formula.

alpha

The vector of the parameters alpha in the ACD series as in the above formula.

beta

The vector of the parameters beta in the ACD series as in the above formula.

BurnIn

The size of the burn-in sample. Note that this only applies at the first simulated segment. Default is 500.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint.

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- c(0.25,0.75)
pw.acd.obj@lambda_0 <- c(1,2,1)
pw.acd.obj@alpha <- rep(0.2,3)
pw.acd.obj@beta <- rep(0.7,3)
pw.acd.obj@N <- 3000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
ts.plot(pw.acd.obj@x)
ts.plot(pw.acd.obj@psi)

An S4 class for a nonstationary ACD model.

Description

A specification class to create an object of a simulated piecewise constant Hawkes model of order (1,1). We consider the following time-varying piecewise constant Hawkes process (which we term tvHawkes) λ(υ)=λ0(υ)+υt<sα(υ)eβ(υ)(υυt), for υ=1,,T\lambda({\upsilon}) = \lambda_0({\upsilon}) +\sum_{{\upsilon}_t < s} \alpha({\upsilon})e^{-\beta({\upsilon}) ({\upsilon}-{\upsilon}_t)}, \ \mbox{for} \ {\upsilon} = 1, \ldots,T.

Value

Returns an object of simHawkes class.

Slots

H

The durational time series.

cH

The psi time series.

horizon

The time horizon of a Hawkes process typically expressed in seconds. Effective sample size will differ depending on the size of the parameters.

N

Effective sample size which differs depending on the size of the parameters.

cp.loc

The vector with the location of the changepoints. Takes values from 0 to 1 or NULL if none. Default is NULL.

lambda_0

The vector of the parameters lambda_0 in the Hawkes model as in the above formula.

alpha

The vector of the parameters alpha in the Hawkes model as in the above formula.

beta

The vector of the parameters beta in the Hawkes model as in the above formula.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint.

Examples

pw.hawk.obj <- new("simHawkes")
pw.hawk.obj@cp.loc <- c(0.5)
pw.hawk.obj@lambda_0 <- c(1,2)
pw.hawk.obj@alpha <- c(0.2,0.2)
pw.hawk.obj@beta <- c(0.7,0.7)
pw.hawk.obj@horizon <- 1000
pw.hawk.obj <- pc_hawkessim(pw.hawk.obj)
ts.plot(pw.hawk.obj@H)
ts.plot(pw.hawk.obj@cH)

Transformation of an irregularly spaces time series.

Description

Transformation of a irregularly spaces time series. For the tvACD model, we calculate Ut=g0(xt,ψt)=xtψtU_t = g_0(x_t, \psi_t) = \frac{x_t}{{\psi}_t}, where ψt=C0+j=1pCjxtj+k=1qCp+kψtk+ϵxt{\psi}_t = C_0 + \sum_{j=1}^p C_j x_{t-j} + \sum_{k=1}^q C_{p+k} \psi_{t-k}+\epsilon x_t. where the last term ϵxt\epsilon x_t is added to ensure the boundness of UtU_t.

Usage

Z_trans(
  H,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  acd_p = 0,
  acd_q = 1
)

## S4 method for signature 'ANY'
Z_trans(
  H,
  start.values = c(0.9, 0.6),
  dampen.factor = "auto",
  epsilon = 1e-05,
  LOG = TRUE,
  process = "acd",
  acd_p = 0,
  acd_q = 1
)

Arguments

H

The input irregular time series.

start.values

Warm starts for the optimizers of the likelihood functions.

dampen.factor

The dampen factor in the denominator of the residual process. Default is "auto".

epsilon

A parameter added to ensure the boundness of the residual process. Default is 1e-6.

LOG

Take the log of the residual process. Default is TRUE.

process

Choose between acd or hawkes. Default is acd.

acd_p

The p order of the ACD model. Default is 0.

acd_q

The q order of the ACD model. Default is 1.

Value

Returns the transformed residual series.

References

Korkas Karolos. "Ensemble Binary Segmentation for irregularly spaced data with change-points" Preprint <arXiv:2003.03649>.

Examples

pw.acd.obj <- new("simACD")
pw.acd.obj@cp.loc <- c(0.25,0.75)
pw.acd.obj@lambda_0 <- c(1,2,1)
pw.acd.obj@alpha <- rep(0.2,3)
pw.acd.obj@beta <- rep(0.7,3)
pw.acd.obj@N <- 1000
pw.acd.obj <- pc_acdsim(pw.acd.obj)
ts.plot(Z_trans(pw.acd.obj@x))