postCP change point detection

Background

Change-point detection algorithms are useful for analyzing time series data that exhibit abrupt changes in distribution. There are many R packages for detecting K change-points in N data points, but only a few are able to compute error bands (confidence intervals) for change point locations. The only R package that can compute error bands with provably linear O(N) time complexity is the postCP package. However, the last time I checked (25 Nov 2015), CRAN has put the postCP package in the Archive: “Archived on 2014-09-26 as misused .C(DUP = FALSE) and failed its checks on several platforms, including Solaris and Linux under valgrind.”

Related work

Some other R packages that can compute multiple change-point models:

paper	package	time	error bands?
Luong et al	postCP	O(N)	yes
Rigaill et al	EBS	O(KN^2)	yes
Frick et al	stepR	near-linear	yes
Rigaill et al	cghseg	O(N)	no
Killick et al	changepoint	O(N)	no

Coding project ideas

postCP back on CRAN

Start from the most recent version of postCP on CRAN, and make the necessary changes to get it passing checks and back on CRAN. Writing a vignette and tests would also be nice.

Separating model-specific and generic parts of the code

postCP is based on a simple idea: a segmentation of n points into k segments can be seen as a Markov chain constrained to start in segment 1 and to end up in segment k. With this idea, any breakpoint model can be easily rewritten as a constrained Hidden Markov Model (HMM). Therefore postCP has two main steps:

from a matrix of evidence (in logscale) one has to compute the posterior distribution of segmentation using forward-backward. One can easily derive from this quantity the marginal posterior distribution of any breakpoint and/or particular position.
from a posterior distribution and the data, one can update the parameter estimate (in EM context)

It is hence clear that postCP should have a technical core part dedicated to constrained forward-backward for any model (typically coded in Rcpp for speed), and a model specific part (Gaussian regression, Poisson, survival, etc.) for the computation of the log-evidence and for the parameter updates.

Any good implementation of postCP should take this into account in the most R-standard compliant way.

Mentors

Gregory Nuel <[email protected]> would be the primary mentor since he knows the theory and C++ code implemented in the postCP package. Guillem Rigaill <[email protected]> could be a co-mentor, since he has developed several other R packages for change-point detection.

Tests

Do one or several — doing more hard tests makes you more likely to be selected.

Easy: download and install the most recent version of postCP on CRAN, then run example(postCP) and explain why the output is problematic.
Medium: demonstrate that you know how to write vignettes and package tests. What tests would you suggest to verify that the postCP function is working?
Hard: demonstrate that you know how to interface C++ code with R using .C and/or .Call. Have you already written an R package that uses C code? If not, you may want to read ”When and how to write low-level (C/C++) instead of high-level (R/Python) code?” Then write a solution to this exercise to prove that you can write an interface using .C.

Solutions of tests

Students, please post a link to your test results here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly