Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: Network analysis step: empty or incomplete network files despite successful pipeline completion #50

Open
bioarpit1 opened this issue Jan 17, 2025 · 0 comments

Comments

@bioarpit1
Copy link

Hi Taiji team,

First, thank you for developing this excellent tool for multi-omics data integration. Taiji's approach to integrating ATAC-seq and RNA-seq data is very helpful for our research.

Issue Description

Running Taiji pipeline with bulk ATAC-seq and RNA-seq data integration. While most outputs are generated, network analysis results are incomplete:

  • Network/*/edges_combined.csv is empty (only header)
  • GeneRanks.tsv and GeneRanks_PValues.tsv only contain column headers
  • nodes.csv is generated with content

Directory Structure

output/
├── ATACSeq/
├── GeneRanks_PValues.tsv
├── GeneRanks.tsv
├── GENOME/
├── Network/
│ └── condition1/
│ ├── edges_binding.csv # Only header
│ ├── edges_combined.csv # Only header
│ └── nodes.csv # Has content
├── RNASeq/
└── SCATACSeq/

input.yaml:

RNA-seq:
  - id: condition1_RNA
    group: condition1
    replicates:
      - rep: 1
        files:
          - path: /path/to/rna_counts/condition1_rep1.counts.tsv
            tags: ['GeneQuant']
      - rep: 2
        files:
          - path: /path/to/rna_counts/condition1_rep2.counts.tsv
            tags: ['GeneQuant']
      - rep: 3
        files:
          - path: /path/to/rna_counts/condition1_rep3.counts.tsv
            tags: ['GeneQuant']

  - id: condition2_RNA
    group: condition2
    replicates:
      - rep: 1
        files:
          - path: /path/to/rna_counts/condition2_rep1.counts.tsv
            tags: ['GeneQuant']
      - rep: 2
        files:
          - path: /path/to/rna_counts/condition2_rep2.counts.tsv
            tags: ['GeneQuant']
      - rep: 3
        files:
          - path: /path/to/rna_counts/condition2_rep3.counts.tsv
            tags: ['GeneQuant']

ATAC-seq:
  - id: condition1_ATAC
    group: condition1
    replicates:
      - rep: 1
        files:
          - path: /path/to/atac/condition1_rep1.mLb.clN.sorted.bam
            tags: ['Filtered']
      - rep: 2
        files:
          - path: /path/to/atac/condition1_rep2.mLb.clN.sorted.bam
            tags: ['Filtered']
      - rep: 3
        files:
          - path: /path/to/atac/condition1_rep3.mLb.clN.sorted.bam
            tags: ['Filtered']

  - id: condition2_ATAC
    group: condition2
    replicates:
      - rep: 1
        files:
          - path: /path/to/atac/condition2_rep1.mLb.clN.sorted.bam
            tags: ['Filtered']
      - rep: 2
        files:
          - path: /path/to/atac/condition2_rep2.mLb.clN.sorted.bam
            tags: ['Filtered']
      - rep: 3
        files:
          - path: /path/to/atac/condition2_rep3.mLb.clN.sorted.bam
            tags: ['Filtered']
###config.yaml
input: "input.yaml"
assembly: "GRCh38"
genome: "/path/to/genome/genome.fa"
annotation: "/path/to/annotation/GRCh38/genes.gtf"
motif_file: "/path/to/motif/cisBP_human.meme"
output_dir: "output/"

###file contents generanks.tsv and generanks_pvalues.tsv only header 
condition1    condition2
#### in the network folder nodes.csv gets populated 
geneName:ID,expression,expressionZScore
geneA,1,0.1
geneB,1,0.1
geneC,1,0.1
geneD,1.107,2.425
geneE,0.848,1.566
####edges_binding.csv only header 
:START_ID,:END_ID,chr,start:int,end:int,annotation,affinity,:TYPE
####edges_combined.csv 
:START_ID,:END_ID,weight,:TYPE

Environment

Taiji version: 1.3.1.2
samtools: 1.3.1
Using preprocessed BAM files with 'Filtered' tag
Pipeline completes without errors
All intermediate files are generated

Questions

Why are edge files empty when nodes.csv contains data?
Is there a minimum threshold for edge creation? if yes how to reduce that threshold ?
How can I debug the network construction step?
Are there specific parameters needed in config.yaml for network generation?
Do I need additional settings since I'm using preprocessed BAM files?**
one thing i observed in my result output when compared to your example output i dont have folder containing Promoters with promoters.bed do i need to provide it ?

Additional Notes

Tried with both single and multiple replicates
Both BAM and peak-based approaches produce same result
All ATACSeq and RNASeq intermediate files are generated successfully

Thank you for your time in looking into this issue. We really appreciate your work on Taiji and your help with troubleshooting.

Best regards,
Arpit 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant