Skip to content
/ NSGCCA Public

Official Implementation of Nonlinear Sparse Generalized Canonical Correlation Analysis (NSGCCA)

Notifications You must be signed in to change notification settings

Rows21/NSGCCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Official Python implementation of NSGCCA, from the following paper:

Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data.
Rong Wu, Ziqi Chen, Gen Li and Hai Shu.
New York University
[arXiv]


We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in variable selection.

Installation

Clone this repository and install other required packages:

git clone [email protected]:Rows21/NSGCCA

Datasets

(Feel free to post suggestions in issues of recommending latest proposed CCA network for comparison. Currently, the baselines folder is to put comparable models.)

Citation

If you find this repository helpful, please consider citing:

@article{wu2025nonlinear,
  title={Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data},
  author={Wu, Rong and Chen, Ziqi and Li, Gen and Shu, Hai},
  journal={arXiv preprint arXiv:2502.18756},
  year={2025}
}

Results

Simulation Studies

Figure 2: The simulation performance for Synthetic Datasets.

Real-World Studies -- TCGA breast cancer database

Data_download_preprocess: TCGA-BRCA preprocessing through R script.
Venn Diagram: The clustering results for TCGA-BRCA.

Acknowledgement

This repository is built using the timm library.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

About

Official Implementation of Nonlinear Sparse Generalized Canonical Correlation Analysis (NSGCCA)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published