Skip to content

postCP change point detection

Toby Dylan Hocking edited this page Dec 7, 2015 · 17 revisions

Background

Change-point detection algorithms are useful for analyzing time series data that exhibit abrupt changes in distribution. There are many R packages for detecting K change-points in N data points, but only a few are able to compute error bands (confidence intervals) for change point locations. The only R package that can compute error bands with provably linear O(N) time complexity is the postCP package. However, the last time I checked (25 Nov 2015), CRAN has put the postCP package in the Archive: “Archived on 2014-09-26 as misused .C(DUP = FALSE) and failed its checks on several platforms, including Solaris and Linux under valgrind.”

Related work

Some other R packages that can compute multiple change-point models:

paper package time error bands?
Luong et al postCP O(N) yes
Rigaill et al EBS O(KN^2) yes
Frick et al stepR near-linear yes
Rigaill et al cghseg O(N) no
Killick et al changepoint O(N) no

Coding project ideas

postCP back on CRAN

Start from the most recent version of postCP on CRAN, and make the necessary changes to get it passing checks and back on CRAN. Writing a vignette and tests would also be nice.

Separating model-specific and generic parts of the code

postCP is based on a simple idea: a segmentation of n points into k segments can be seen as a Markov chain constrained to start in segment 1 and to end up in segment k. With this idea, any breakpoint model can be easily rewritten as a constrained Hidden Markov Model (HMM). Therefore postCP has two main steps:

  1. from a matrix of evidence (in logscale) one has to compute the posterior distribution of segmentation using forward-backward. One can easily derive from this quantity the marginal posterior distribution of any breakpoint and/or particular position.
  2. from a posterior distribution and the data, one can update the parameter estimate (in EM context)

It is hence clear that postCP should have a technical core part dedicated to constrained forward-backward for any model (typically coded in Rcpp for speed), and a model specific part (Gaussian regression, Poisson, survival, etc.) for the computation of the log-evidence and for the parameter updates.

Any good implementation of postCP should take this into account in the most R-standard compliant way.

Mentors

Gregory Nuel <[email protected]> would be the primary mentor since he knows the theory and C++ code implemented in the postCP package. Guillem Rigaill <[email protected]> could be a co-mentor, since he has developed several other R packages for change-point detection.

Tests

Do one or several — doing more hard tests makes you more likely to be selected.

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally