Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hicDetectLoops normalisation method #433

Closed
ChristopherBarrington opened this issue Sep 16, 2019 · 5 comments
Closed

hicDetectLoops normalisation method #433

ChristopherBarrington opened this issue Sep 16, 2019 · 5 comments

Comments

@ChristopherBarrington
Copy link

Which normalisation method is used by hicDetectLoops? I have a cooler that I converted from a Juicer hic:

> hicInfo -m 5kb.cool
# Matrix information file. Created with HiCExplorer's hicInfo version 3.2
File:   5kb.cool
Date:   2019-09-16T14:07:19.256017
Genome assembly:        genome.chrom.sizes
Size:   2,528
Bin_length:     5000
Number of chromosomes:  6
Non-zero elements:      3,037,127
The following columns are available: ['chrom' 'start' 'end' 'KR' 'VC' 'VC_SQRT']


Generated by:   hic2cool-0.7.1

>

I guess it is the first in the list, so KR? Is there any way to change the normalisation that is used - I can't see anything about this in the docs.

@joachimwolff
Copy link
Collaborator

Hi,

In HiCExplorer we use the default way of the cooler file format to apply the correction factors. Unfortunately hic2cool and the hic format do not create these kind of matrices and you have to apply an additional step: run hicConvertFormat on the matrix from hic2cool and set the parameter —correction_name KR to move the correction factors of KR to the correct format. The output of this matrix will have the correction factors and these are applied from all HiCExplorer tools. For more information have a look at our documentation: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicConvertFormat.html#cool-to-cool

Best,

Joachim

@ChristopherBarrington
Copy link
Author

Thank you for the extra information. I think I understand now; I can add the KR correction using hicConvertFormat as described in the doc and can make the cool file.

Using the below snippet, I have a few issues

ml Anaconda3/2019.07
source activate hicexplorer-3.2

# get resolution cool
hicConvertFormat -m inter_30.hic -o output.cool --inputFormat hic --outputFormat cool -r 5000
hicConvertFormat -m inter_30.hic -o output.cool --inputFormat hic --outputFormat cool -r 1000000

# add weights using KR
hicConvertFormat -m output_5000.cool -o output_5000.KR.cool --inputFormat cool --outputFormat cool --correction_name KR 
hicConvertFormat -m output_1000000.cool -o output_1000000.KR.cool --inputFormat cool --outputFormat cool --correction_name KR 

## WARNING:hicmatrix.lib.cool:Writing non-standard cooler matrix. Datatype of matrix['count'] is: float64
## loses the genome chrom.sizes attribute

# add weights using KR
hicConvertFormat -m output_5000.cool -o output_5000.VC_SQRT.cool --inputFormat cool --outputFormat cool --correction_name VC_SQRT 
hicConvertFormat -m output_1000000.cool -o output_1000000.VC_SQRT.cool --inputFormat cool --outputFormat cool --correction_name VC_SQRT 

## OverflowError: cannot convert float infinity to integer

The first is that when I do the cool2cool step, I lose the 'Genome assembly' attribute that contained the path to the chrom.sizes file after hic2cool (via hicConvertFormat).

Does that warning message indicate a problem with the files? The --enforce_integer option does not produce the same warning but is it indicative of antoehr issue?

I am unable to use the VC_SQRT correction on smaller resolutions - I assume due to sparseness - is that expected?

Thanks for your help.

@joachimwolff
Copy link
Collaborator

Hi Christopher,

The first is that when I do the cool2cool step, I lose the 'Genome assembly' attribute that contained the path to the chrom.sizes file after hic2cool (via hicConvertFormat).

This is only some meta data that you don't need for analysis with HiCExplorer. If this information is needed it is quite easy to retrieve it from our internal datastructures. However, I agree with you that we should not remove it and I will fix this with an update.

Does that warning message indicate a problem with the files? The --enforce_integer option does not produce the same warning but is it indicative of antoehr issue?

No, not really a problem. The default assumption is that the 'count' data is stored as an integer, and therefore we give out this warning in cases we store floats. With HiCExplorer using floats is no issue at all, this warning is more for users who use the cool files with other software which maybe depend on integers. For the same reason we offer the --enforce_integer parameter.

I am unable to use the VC_SQRT correction on smaller resolutions - I assume due to sparseness - is that expected?

You should be able to apply it if it was provided for this resolution by the hic format. If you are sure the factors are available to get somewhere an inf is an indicator that we could have somewhere a bug. Did you use a public data set where this happens? Maybe I can reproduce this error, it would make it easier for me to fix this.

Best,

Joachim

@ChristopherBarrington
Copy link
Author

Thank you for the clarifications. The 'Normalizations' attribute does list it so I guess it should be usable.

... Normalizations:  ['VC', 'VC_SQRT', 'KR']

I have made a copy of a .hic example to the dropbox link

Thanks again,

Chris

@LeilyR LeilyR mentioned this issue May 18, 2020
19 tasks
@yanchunzhang
Copy link

yanchunzhang commented Oct 1, 2020

Hi Joachim, @joachimwolff
I have some relevant questions about normalization/correction methods.
I want to do downstream analysis such as hicFindTADs, hicDifferentialTAD and hicDetectLoops using a matrix converted from hic file. I also want to use the hicNormalize command to normalize my data to a same sequencing depth.
I'm not sure about which file format and correction method is best for my analysis, I will list some options here and hope you can give me some suggestion.
1, hicNormalize on cool -> hicCovertFormat from cool to h5 -> hicCorrectMatrix on h5 (ICE)
2, hicCovertFormat from cool to h5 -> hicNormalize on h5 -> hicCorrectMatrix on h5 (ICE)
3, hicNormalize on cool -> hicCovertFormat from cool to corrected cool (KR, VC, VC_SQRT)
Is ICE the most recommended correction method?
And what criteria do you suggest to choose thresholds of ICE correction for comparative analysis among samples (Sorry I think I didn't see a very detailed description of this in the documents)?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants