hicDetectLoops normalisation method #433

ChristopherBarrington · 2019-09-16T16:32:37Z

Which normalisation method is used by hicDetectLoops? I have a cooler that I converted from a Juicer hic:

> hicInfo -m 5kb.cool
# Matrix information file. Created with HiCExplorer's hicInfo version 3.2
File:   5kb.cool
Date:   2019-09-16T14:07:19.256017
Genome assembly:        genome.chrom.sizes
Size:   2,528
Bin_length:     5000
Number of chromosomes:  6
Non-zero elements:      3,037,127
The following columns are available: ['chrom' 'start' 'end' 'KR' 'VC' 'VC_SQRT']


Generated by:   hic2cool-0.7.1

>

I guess it is the first in the list, so KR? Is there any way to change the normalisation that is used - I can't see anything about this in the docs.

The text was updated successfully, but these errors were encountered:

joachimwolff · 2019-09-16T19:49:01Z

Hi,

In HiCExplorer we use the default way of the cooler file format to apply the correction factors. Unfortunately hic2cool and the hic format do not create these kind of matrices and you have to apply an additional step: run hicConvertFormat on the matrix from hic2cool and set the parameter —correction_name KR to move the correction factors of KR to the correct format. The output of this matrix will have the correction factors and these are applied from all HiCExplorer tools. For more information have a look at our documentation: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicConvertFormat.html#cool-to-cool

Best,

Joachim

ChristopherBarrington · 2019-09-17T10:55:07Z

Thank you for the extra information. I think I understand now; I can add the KR correction using hicConvertFormat as described in the doc and can make the cool file.

Using the below snippet, I have a few issues

ml Anaconda3/2019.07
source activate hicexplorer-3.2

# get resolution cool
hicConvertFormat -m inter_30.hic -o output.cool --inputFormat hic --outputFormat cool -r 5000
hicConvertFormat -m inter_30.hic -o output.cool --inputFormat hic --outputFormat cool -r 1000000

# add weights using KR
hicConvertFormat -m output_5000.cool -o output_5000.KR.cool --inputFormat cool --outputFormat cool --correction_name KR 
hicConvertFormat -m output_1000000.cool -o output_1000000.KR.cool --inputFormat cool --outputFormat cool --correction_name KR 

## WARNING:hicmatrix.lib.cool:Writing non-standard cooler matrix. Datatype of matrix['count'] is: float64
## loses the genome chrom.sizes attribute

# add weights using KR
hicConvertFormat -m output_5000.cool -o output_5000.VC_SQRT.cool --inputFormat cool --outputFormat cool --correction_name VC_SQRT 
hicConvertFormat -m output_1000000.cool -o output_1000000.VC_SQRT.cool --inputFormat cool --outputFormat cool --correction_name VC_SQRT 

## OverflowError: cannot convert float infinity to integer

The first is that when I do the cool2cool step, I lose the 'Genome assembly' attribute that contained the path to the chrom.sizes file after hic2cool (via hicConvertFormat).

Does that warning message indicate a problem with the files? The --enforce_integer option does not produce the same warning but is it indicative of antoehr issue?

I am unable to use the VC_SQRT correction on smaller resolutions - I assume due to sparseness - is that expected?

Thanks for your help.

joachimwolff · 2019-09-18T09:20:44Z

Hi Christopher,

The first is that when I do the cool2cool step, I lose the 'Genome assembly' attribute that contained the path to the chrom.sizes file after hic2cool (via hicConvertFormat).

This is only some meta data that you don't need for analysis with HiCExplorer. If this information is needed it is quite easy to retrieve it from our internal datastructures. However, I agree with you that we should not remove it and I will fix this with an update.

Does that warning message indicate a problem with the files? The --enforce_integer option does not produce the same warning but is it indicative of antoehr issue?

No, not really a problem. The default assumption is that the 'count' data is stored as an integer, and therefore we give out this warning in cases we store floats. With HiCExplorer using floats is no issue at all, this warning is more for users who use the cool files with other software which maybe depend on integers. For the same reason we offer the --enforce_integer parameter.

I am unable to use the VC_SQRT correction on smaller resolutions - I assume due to sparseness - is that expected?

You should be able to apply it if it was provided for this resolution by the hic format. If you are sure the factors are available to get somewhere an inf is an indicator that we could have somewhere a bug. Did you use a public data set where this happens? Maybe I can reproduce this error, it would make it easier for me to fix this.

Best,

Joachim

ChristopherBarrington · 2019-09-20T08:17:33Z

Thank you for the clarifications. The 'Normalizations' attribute does list it so I guess it should be usable.

... Normalizations:  ['VC', 'VC_SQRT', 'KR']

I have made a copy of a .hic example to the dropbox link

Thanks again,

Chris

yanchunzhang · 2020-10-01T22:02:06Z

Hi Joachim, @joachimwolff
I have some relevant questions about normalization/correction methods.
I want to do downstream analysis such as hicFindTADs, hicDifferentialTAD and hicDetectLoops using a matrix converted from hic file. I also want to use the hicNormalize command to normalize my data to a same sequencing depth.
I'm not sure about which file format and correction method is best for my analysis, I will list some options here and hope you can give me some suggestion.
1, hicNormalize on cool -> hicCovertFormat from cool to h5 -> hicCorrectMatrix on h5 (ICE)
2, hicCovertFormat from cool to h5 -> hicNormalize on h5 -> hicCorrectMatrix on h5 (ICE)
3, hicNormalize on cool -> hicCovertFormat from cool to corrected cool (KR, VC, VC_SQRT)
Is ICE the most recommended correction method?
And what criteria do you suggest to choose thresholds of ICE correction for comparative analysis among samples (Sorry I think I didn't see a very detailed description of this in the documents)?

Thanks!

LeilyR mentioned this issue May 18, 2020

TODO #544

Open

19 tasks

joachimwolff closed this as completed Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hicDetectLoops normalisation method #433

hicDetectLoops normalisation method #433

ChristopherBarrington commented Sep 16, 2019

joachimwolff commented Sep 16, 2019

ChristopherBarrington commented Sep 17, 2019

joachimwolff commented Sep 18, 2019

ChristopherBarrington commented Sep 20, 2019

yanchunzhang commented Oct 1, 2020 •

edited

Loading

hicDetectLoops normalisation method #433

hicDetectLoops normalisation method #433

Comments

ChristopherBarrington commented Sep 16, 2019

joachimwolff commented Sep 16, 2019

ChristopherBarrington commented Sep 17, 2019

joachimwolff commented Sep 18, 2019

ChristopherBarrington commented Sep 20, 2019

yanchunzhang commented Oct 1, 2020 • edited Loading

yanchunzhang commented Oct 1, 2020 •

edited

Loading