Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing a "band" of a correlation matrix #5

Open
PeteHaitch opened this issue May 19, 2016 · 5 comments
Open

Computing a "band" of a correlation matrix #5

PeteHaitch opened this issue May 19, 2016 · 5 comments

Comments

@PeteHaitch
Copy link
Contributor

Great package, Drew. I'm already benefiting from it, thank you.

I have a question/feature request that I hope might be of interest to you. Suppose I have a "wide" matrix, X, with r rows and c columns where c > r. Computing cor(X) or coop::pcor(X) gives me a square matrix, C, with c rows and columns. C is typically large since it contains c/r-times more elements than X. But I'm only interested in a "band" of C around the diagonal, bC, say, the main diagonal and k off-diagonals. My intuition could well be wrong, but it feels like there ought to be shortcuts to computing bC instead of C (and there will certainly be storage savings by only computing/returning bC). Do you think such banded computations are:

  • likely to be more efficient than computing the full matrix and subsetting the relevant band?
  • feasible?
  • in scope for coop?

Thanks

@wrathematics
Copy link
Owner

Hi Peter, thanks for the kind words!

The short version is that in some cases it would be faster, particularly for the pairwise-complete NA handling. Some cases would be slower (even assuming k < min(r, c)). But I think it's a great idea, and something I have a bit of interest in myself (dimension reduction with c>r matrices, for which this might be useful).

The computation is fairly straightforward. I'm not aware of any packages on CRAN that offer banded storage though (Matrix seems to use sparse storage, which is not ideal). I can easily write one, but if there's already a standard that I just don't know about, I'd rather "plug in" to it. Are you aware of one?

@PeteHaitch
Copy link
Contributor Author

PeteHaitch commented May 19, 2016

Yes, the returned object isn't a true banded matrix (which has zeros outside the band) but rather has "not computed" (NA?) outside the band. So I'm unsure the best way to store/represent this.

My current code is something like this:

C <- pcor(X)
lapply(0:k, function(k) subDiag(C, k))

where subDiag() extracts the k-th subdiagonal (k = 0 corresponding to diag()). So I'm returning a list of length k + 1 rather than a matrix. That works well for my case but may not generalise well to others.

@wrathematics
Copy link
Owner

I've started work on the storage and utilities needed for this here https://github.com/wrathematics/band There are a few more utilities I need to create before I can start the work on the coop side of things, but I don't think it'll take too much effort.

Unfortunately I probably won't get a chance to look at this much over the next few weeks, as I'm supposed to be studying for phd qualifying exams. But this is high on my priority list for this summer.

@PeteHaitch
Copy link
Contributor Author

Wow, this looks great. I had no idea there was LAPACK support for these types of matrices (well, I know rather little of LAPACK so perhaps this isn't so surprising).

No worries re timelines for this stuff and best wishes for your quals! I really appreciate your interest and help with this.

@wrathematics
Copy link
Owner

Thanks!

LAPACK and the BLAS have some limited support for symmetric ("packed") and band storages. But they mostly never bothered finishing, I think because it was so much slower for most applications. But particularly for the banded symmetric case with very narrow bands (i.e., a sparse matrix), I think it may win out on time, and will definitely be advantageous on space. It'll be a fun experiment either way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants