Title: | Multiple Change-Point Detection for Nonstationary Time Series |
---|---|
Description: | Implements detection for the number and locations of the change-points in a time series using the Wild Binary Segmentation and the Locally Stationary Wavelet model of Korkas and Fryzlewicz (2017) <doi:10.5705/ss.202015.0262>. |
Authors: | Karolos Korkas and Piotr Fryzlewicz |
Maintainer: | Karolos Korkas <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1 |
Built: | 2025-03-07 03:48:23 UTC |
Source: | https://github.com/cran/wbsts |
Implements the Wild Binary Segmentation method of Fryzlewicz (2014) for nostationary time series as described in Korkas and Fryzlewicz (2017). Its purpose is the estimation of the number and locations of the change-points in a time series utilising the wavelet periodogram.
K. Korkas and P. Fryzlewicz
P. Fryzlewicz (2014), Wild Binary Segmentation for multiple change-point detection. Annals of Statistics, 42, 2243-2281. (http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf)
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft
The function finds the value which yields the maximum inner product with the input time series (CUSUM) located between and
of their support across all the wavelet periodogram scales.
cr.rand.max.inner.prod(XX,Ts,C_i,epp,M = 0,Plot = FALSE,cstar=0.95)
cr.rand.max.inner.prod(XX,Ts,C_i,epp,M = 0,Plot = FALSE,cstar=0.95)
XX |
The wavelet periodogram. |
Ts |
The sample size of the series. |
C_i |
The CUSUM threshold. |
epp |
A minimum adjustment for the bias present in |
M |
Number of random CUSUM to be generated. |
Plot |
Plot the threhsold CUSUM statistics across the wavelet scales. |
cstar |
A scalar in (0.67,1] |
1 |
Candidate change point |
2 |
The maximum CUSUM value |
3 |
The starting point |
4 |
The ending point |
K. Korkas and P. Fryzlewicz
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
#cps=seq(from=1000,to=2000,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] #z=ews.trans(y,scales=c(11,9,8,7,6)) #out=cr.rand.max.inner.prod(z, Ts = length(y),C_i = tau.fun(y), #epp = rep(32,5), M = 2000, cstar = 0.75, Plot = 1) #abline(v=cps,col="red")
#cps=seq(from=1000,to=2000,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] #z=ews.trans(y,scales=c(11,9,8,7,6)) #out=cr.rand.max.inner.prod(z, Ts = length(y),C_i = tau.fun(y), #epp = rep(32,5), M = 2000, cstar = 0.75, Plot = 1) #abline(v=cps,col="red")
This function is an internal C++ function wrapped by finner.prod.iter.
cusum(x)
cusum(x)
x |
A time series |
K. Korkas and P. Fryzlewicz
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
cps=seq(from=1000,to=2000,by=200) y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] z=ews.trans(y,scales=c(11,9,8,7,6)) ts.plot(abs(wbsts::cusum(z[10:2990,2])))
cps=seq(from=1000,to=2000,by=200) y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] z=ews.trans(y,scales=c(11,9,8,7,6)) ts.plot(abs(wbsts::cusum(z[10:2990,2])))
The function computes the EWS from a time series of any (non-dyadic) size by utilising the maximal overlap discrete wavelet transform; see also W. Constantine and D. Percival (2015).
ews.trans(x,scales=NULL)
ews.trans(x,scales=NULL)
x |
The time series. |
scales |
The wavelet periodogram scales to compute starting from the finest. |
The evolutionary wavelet spectral estimate of y.
Eric Aldrich (2020), wavelets: Functions for Computing Wavelet Filters, Wavelet Transforms and Multiresolution Analyses.
ews=ews.trans(rnorm(1000),c(9,8,7)) barplot(ews[,1])
ews=ews.trans(rnorm(1000),c(9,8,7)) barplot(ews[,1])
The function returns universal thresholds and the method is described in Korkas and Fryzlewicz (2017) and Cho and Fryzlewicz (2012). See also the supplementary material for the former work. The function works for any sample size.
get.thres(n, q=.95, r=100, scales=NULL)
get.thres(n, q=.95, r=100, scales=NULL)
n |
The length of the time series. |
q |
The quantile of the r simulations. |
r |
Number of simulations. |
scales |
The wavelet periodogram scales to be used. If NULL (DEFAULT) then this is selected as described in the main text. |
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
K. Korkas and P. Fryzlewicz (2017), Supplementary material: Multiple change-point detection for non-stationary time series using Wild Binary Segmentation.
Cho, H. and Fryzlewicz, P. (2012). Multiscale and multilevel technique for consistent segmentation of nonstationary time series. Statistica Sinica, 22(1), 207-229.
The function returns data-driven thresholds and it is described in Korkas and Fryzlewicz (2015) where it is referred as Bsp1. See also the supplementary material for this work.
get.thres.ar(y, q=.95, r=100, scales=NULL)
get.thres.ar(y, q=.95, r=100, scales=NULL)
y |
The time series. |
q |
The quantile of the r simulations. |
r |
Number of simulations. |
scales |
The wavelet periodogram scales to be used. If NULL (DEFAULT) then this is selected as described in the main text. |
K. Korkas and P. Fryzlewicz
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
K. Korkas and P. Fryzlewicz (2017), Supplementary material: Multiple change-point detection for non-stationary time series using Wild Binary Segmentation.
#cps=seq(from=100,to=1200,by=350) #y=sim.pw.arma(N =1200,sd_u = c(1,1.5,1,1.5,1), #b.slope=rep(0.99,5),b.slope2 = rep(0.,5), mac = rep(0.,5),br.loc = cps)[[2]] #C_i=get.thres.ar(y=y, q=.95, r=100, scales=NULL) #wbs.lsw(y,M=1, C_i = C_i)$cp.aft
#cps=seq(from=100,to=1200,by=350) #y=sim.pw.arma(N =1200,sd_u = c(1,1.5,1,1.5,1), #b.slope=rep(0.99,5),b.slope2 = rep(0.,5), mac = rep(0.,5),br.loc = cps)[[2]] #C_i=get.thres.ar(y=y, q=.95, r=100, scales=NULL) #wbs.lsw(y,M=1, C_i = C_i)$cp.aft
This function is an internal C++ function wrapped by cr.rand.max.inner.prod.
multi_across_fip(X,M,min_draw,tau,p,epp,Ts)
multi_across_fip(X,M,min_draw,tau,p,epp,Ts)
X |
The wavelet periodogram. |
Ts |
The sample size of the series. |
tau |
The CUSUM threshold at each scale. |
min_draw |
Minimal size of a single draw. |
epp |
A minimum adjustment for the bias present in |
M |
Number of random CUSUM to be generated. |
p |
A scalar in (0.67,1] |
1 |
Candidate change point |
2 |
The maximum CUSUM value |
3 |
The starting point |
4 |
The ending point |
K. Korkas and P. Fryzlewicz
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
#cps=seq(from=1000,to=2000,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] #z=ews.trans(y,scales=c(11,9,8,7,6)) #out=multi_across_fip(X=z, M=1000, min_draw=100, #tau=tau.fun(y), p=c(.95,.95),epp=rep(32,5),Ts= length(y))
#cps=seq(from=1000,to=2000,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,7),b.slope2 = rep(0.,7), mac = rep(0.,7),br.loc = cps)[[2]] #z=ews.trans(y,scales=c(11,9,8,7,6)) #out=multi_across_fip(X=z, M=1000, min_draw=100, #tau=tau.fun(y), p=c(.95,.95),epp=rep(32,5),Ts= length(y))
A function to control the number of change-points estimated from the WBS algorithm and to reduce the risk of over-segmentation.
post.processing(z,br,del=-1,epp=-1,C_i=NULL,scales=NULL)
post.processing(z,br,del=-1,epp=-1,C_i=NULL,scales=NULL)
z |
The wavelet periodogram matrix. |
br |
The change-points to be post-processed. |
del |
The minimum allowed size of a segment. |
epp |
A minimum adjustment for the bias present in |
C_i |
The CUSUM threshold. |
scales |
Which wavelet periodogram scales to be used. |
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
The function simulates a piecewise constant AR(1) model with multiple change-points
sim.pw.ar(N, sd_u, b.slope, br.loc)
sim.pw.ar(N, sd_u, b.slope, br.loc)
N |
Length of the series. |
sd_u |
A vector of the innovation standard deviation for every segment. |
b.slope |
A vector of the AR(1) coefficients. |
br.loc |
A vector with the location of the change-points. |
A simulated series
cps=c(400,612) y=sim.pw.ar(N =1024,sd_u = 1,b.slope=c(0.4,-0.6,0.5),br.loc=cps)[[2]] ts.plot(y) abline(v=cps,col="red")
cps=c(400,612) y=sim.pw.ar(N =1024,sd_u = 1,b.slope=c(0.4,-0.6,0.5),br.loc=cps)[[2]] ts.plot(y) abline(v=cps,col="red")
The function simulates a piecewise constant AR(2) model with multiple change-points
sim.pw.ar2(N, sd_u, b.slope, b.slope2, br.loc)
sim.pw.ar2(N, sd_u, b.slope, b.slope2, br.loc)
N |
Length of the series |
sd_u |
A vector of the innovation standard deviation for every segment |
b.slope |
A vector of the AR(1) coefficients |
b.slope2 |
A vector of the AR(2) coefficients |
br.loc |
A vector with the location of the change-points |
A simulated series
cps=c(512,754) y=sim.pw.ar2(N =1024,sd_u = 1,b.slope=c(0.9,1.68,1.32), b.slope2 = c(0.0,-0.81,-0.81),br.loc = cps)[[2]] ts.plot(y) abline(v=cps,col="red")
cps=c(512,754) y=sim.pw.ar2(N =1024,sd_u = 1,b.slope=c(0.9,1.68,1.32), b.slope2 = c(0.0,-0.81,-0.81),br.loc = cps)[[2]] ts.plot(y) abline(v=cps,col="red")
The function simulates a piecewise constant ARMA model with multiple change-points
sim.pw.arma(N, sd_u, b.slope, b.slope2, mac, br.loc)
sim.pw.arma(N, sd_u, b.slope, b.slope2, mac, br.loc)
N |
Length of the series |
sd_u |
A vector of the innovation standard deviation for every segment |
b.slope |
A vector of the AR(1) coefficients |
b.slope2 |
A vector of the AR(2) coefficients |
mac |
A vector of the MA(1) coefficients |
br.loc |
A vector with the location of the change-points |
A simulated series
cps=c(125,532,704) y=sim.pw.arma(N = 1024,sd_u = 1,b.slope=c(0.7,0.3,0.9,0.1), b.slope2 = c(0,0,0,0), mac = c(0.6,0.3,0,-0.5),br.loc = cps)[[2]] ts.plot(y) abline(v=cps,col="red")
cps=c(125,532,704) y=sim.pw.arma(N = 1024,sd_u = 1,b.slope=c(0.7,0.3,0.9,0.1), b.slope2 = c(0,0,0,0), mac = c(0.6,0.3,0,-0.5),br.loc = cps)[[2]] ts.plot(y) abline(v=cps,col="red")
The function returns .
tends to increase as we move to coarser scales due to
the increasing dependence in the wavelet periodogram sequences. Since the method applies to non-dyadic structures it is reasonable to propose a general rule that will apply in most cases. To accomplish this the
are obtained for
. Then, for each scale
the following regression is fitted
The adjusted was above 90% for all the scales. Having estimated the values for
the values can be retrieved for any sample size
.
tau.fun(y)
tau.fun(y)
y |
A time series |
Thresholds for every wavelet scale
K. Korkas and P. Fryzlewicz
P. Fryzlewicz (2014), Wild Binary Segmentation for multiple change-point detection. Annals of Statistics, 42, 2243-2281. (http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf)
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
##not run## #cps=c(400,470) #set.seed(101) #y=sim.pw.ar(N =2000,sd_u = 1,b.slope=c(0.4,-0.6,0.5),br.loc=cps)[[2]] #tau.fun(y) is the default value for C_i ##Binary segmentation #wbs.lsw(y,M=1)$cp.aft ##Wild binary segmentation #wbs.lsw(y,M=3500)$cp.aft
##not run## #cps=c(400,470) #set.seed(101) #y=sim.pw.ar(N =2000,sd_u = 1,b.slope=c(0.4,-0.6,0.5),br.loc=cps)[[2]] #tau.fun(y) is the default value for C_i ##Binary segmentation #wbs.lsw(y,M=1)$cp.aft ##Wild binary segmentation #wbs.lsw(y,M=3500)$cp.aft
The function implements the Wild Binary Segmentation method and aggregates the change-points across the wavelet periodogram. Currently only the Method 2 of aggregation is implemented.
uh.wbs(z,C_i, del=-1, epp, scale,M=0,cstar=0.75)
uh.wbs(z,C_i, del=-1, epp, scale,M=0,cstar=0.75)
z |
The wavelet periodogram matrix. |
C_i |
The CUSUM threshold. |
del |
The minimum allowed size of a segment. |
epp |
A minimum adjustment for the bias present in |
scale |
Which wavelet periodogram scales to be used. |
M |
The maximum number of random intervals drawn. If M=0 (DEFAULT) this is selected to be a linear function of the sample size of y. If M=1 then the segmentation is conducted via the Binary segmentation method. |
cstar |
This refers to the unbalanceness parameter |
cp.bef |
Returns the estimated change-points before post-processing |
cp.aft |
Returns the estimated change-points after post-processing |
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft
The function returns the estimated locations of the change-points in a nonstationary time series. Currently only the Method 2 of aggregation is implemented.
wbs.lsw(y, C_i = tau.fun(y), scales = NULL, M = 0, cstar = 0.75, lambda = 0.75)
wbs.lsw(y, C_i = tau.fun(y), scales = NULL, M = 0, cstar = 0.75, lambda = 0.75)
y |
The time series. |
C_i |
A vector of threshold parameters for different scales. |
scales |
The wavelet periodogram scales to be used. If NULL (DEFAULT) then this is selected as described in the main text. |
M |
The maximum number of random intervals drawn. If M=0 (DEFAULT) this is selected to be a linear function of the sample size of y. If M=1 then the segmentation is conducted via the Binary segmentation method. |
cstar |
This refers to the unbalanceness parameter |
lambda |
This parameter defines the maximum number of the wavelet periodogam scales. This is used if scales = NULL. |
cp.bef |
Returns the estimated change-points before post-processing |
cp.aft |
Returns the estimated change-points after post-processing |
K. Korkas and P. Fryzlewicz
K. Korkas and P. Fryzlewicz (2017), Multiple change-point detection for non-stationary time series using Wild Binary Segmentation. Statistica Sinica, 27, 287-311. (http://stats.lse.ac.uk/fryzlewicz/WBS_LSW/WBS_LSW.pdf)
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft
#### Generate a highly persistent time series with changing variance and of length 5,000 ###Location of the change-points #cps=seq(from=1000,to=2800,by=200) #y=sim.pw.arma(N =3000,sd_u = c(1,1.5,1,1.5,1,1.5,1,1.5,1,1.5,1), #b.slope=rep(0.99,11),b.slope2 = rep(0.,11), mac = rep(0.,11),br.loc = cps)[[2]] ###Estimate the change points via Binary Segmentation #wbs.lsw(y,M=1)$cp.aft ###Estimate the change points via Wild Binary Segmentation #wbs.lsw(y,M=0)$cp.aft