-
Notifications
You must be signed in to change notification settings - Fork 18
postCP change point detection
Change-point detection algorithms are useful for analyzing time series data that exhibit abrupt changes in distribution. There are many R packages for detecting K change-points in N data points, but only a few are able to compute error bands (confidence intervals) for change point locations. The only R package that can compute error bands with provably linear O(N) time complexity is the postCP package. However, the last time I checked (25 Nov 2015), CRAN has put the postCP package in the Archive: “Archived on 2014-09-26 as misused .C(DUP = FALSE) and failed its checks on several platforms, including Solaris and Linux under valgrind.”
Some other R packages that can compute multiple change-point models:
paper | package | time | error bands? |
---|---|---|---|
Luong et al | postCP | O(N) | yes |
Rigaill et al | EBS | O(KN^2) | yes |
Frick et al | stepR | near-linear | yes |
Rigaill et al | cghseg | O(N) | no |
Killick et al | changepoint | O(N) | no |
Start from the most recent version of postCP on CRAN, and make the necessary changes to get it passing checks and back on CRAN. Writing a vignette and tests would also be nice.
postCP is based on a simple idea: a segmentation of n points into k segments can be seen as a Markov chain constrained to start in segment 1 and to end up in segment k. With this idea, any breakpoint model can be easily rewritten as a constrained Hidden Markov Model (HMM). Therefore postCP has two main steps:
- from a matrix of evidence (in logscale) one has to compute the posterior distribution of segmentation using forward-backward. One can easily derive from this quantity the marginal posterior distribution of any breakpoint and/or particular position.
- from a posterior distribution and the data, one can update the parameter estimate (in EM context)
It is hence clear that postCP should have a technical core part dedicated to constrained forward-backward for any model (typically coded in Rcpp for speed), and a model specific part (Gaussian regression, Poisson, survival, etc.) for the computation of the log-evidence and for the parameter updates.
Any good implementation of postCP should take this into account in the most R-standard compliant way.
Gregory Nuel <[email protected]> would be the primary mentor since he knows the theory and C++ code implemented in the postCP package. Guillem Rigaill <[email protected]> could be a co-mentor, since he has developed several other R packages for change-point detection.
Do one or several — doing more hard tests makes you more likely to be selected.
- Easy: download and install the most recent version of postCP on CRAN, then run
example(postCP)
and explain why the output is problematic. - Medium: demonstrate that you know how to write vignettes and package tests. What tests would you suggest to verify that the postCP function is working?
- Hard: demonstrate that you know how to interface C++ code with R using
.C
and/or.Call
. Have you already written an R package that uses C code? If not, you may want to read ”When and how to write low-level (C/C++) instead of high-level (R/Python) code?” Then write a solution to this exercise to prove that you can write an interface using.C
.
Students, please post a link to your test results here.