main.py fails #8

francicco · 2024-08-21T19:57:09Z

Hi,

I'm testing your amazing method on my data after running it on the toy example you provide.

But this is the error I get:

main.py  --epochs 25 --batch_size 256 --datapath . --n_rep 6 --modelpath ./rcl.ckpt
Reading coverage files
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/main.py", line 269, in <module>
    cov = read_data_new(file)
          ^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/ios.py", line 90, in read_data_new
    return np.array(dat)
           ^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (99357,) + inhomogeneous part.

Any help?
Thanks a lot
F

The text was updated successfully, but these errors were encountered:

hhvu0102 · 2024-08-21T22:52:44Z

Hello Francicco,

Thank you for your interest in our tool. It looks like the input files are not of the expected format, or there are too many files in the current directory that you are working on. Would you mind providing the following details:

Did you run bash /path/to/preprocessing.bash -d <directory> -b <bam-file> -t <number of threads> -n <name> and obtain a file with similar format as our example file MCF7_chr10_rep*.covBga.txt? If yes, can you post a snippet of the input files?
Did the file names coming from different replicates get numbered correctly, such as rep1, rep2, etc.?

It's also helpful sometimes to provide absolute path instead of ..

If possible, you can send me a file of one chromosome (for example, chr10), so I can help debugging.

Looking forward to hearing from you.

francicco · 2024-08-22T11:30:18Z

Hi Vu,

Thanks a lot for your quick reply. I've now run the preprocessing again with just 2 reps, I have 6 in total.

This is the command I executed:
preprocessing.bash -d $PWD -b "LB43.PP.DupRem.bam LB44.PP.DupRem.bam" -t 12 -n PP.Hmel

I've got a few warning for scaffolds with no regions, something expected. But then at the end:

[STAGE 5] (/user/work/tk19812/software/UnsupervisedPeakCaller/preprocessing.bash:488) Extracting initial 1000 bp segments from candidate regions (regions3)....
[STAGE 5] (/user/work/tk19812/software/UnsupervisedPeakCaller/preprocessing.bash:519) Writing initial 1000-length segments to file "/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/chr*/PP.Hmel_Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem_*-regions3.txt"
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 817 times. Isn't it about time 
you run 'parallel --citation' once to silence the citation notice?

Error in serverSocket(port = port) : 
  creation of server socket failed: port 11044 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11351 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11022 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11980 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11056 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
[ERROR] Failed to write "/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/chrHmel200083o/PP.Hmel_Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem_Hmel200083o-regions3.txt" regions3 file.

I then rerun it using -g Hmel because this is a butterfly. It went through STAGE 7 with some errors in small scaffold with negative starting positions:

Hmel200006o	-381	619	Hmel200006osegment1

It ended:

Candidate regions are in file:
	PP.Hmel_Hmel.ATACseq.LB43.PP_bigInputs.txt

Then I run:
run_rcl.sh -d $PWD -w -b "LB43.PP.DupRem.bam LB44.PP.DupRem.bam"

And main.py fails:

main.py  --epochs 25 --batch_size 256 --datapath 'mypath' --n_rep 2 --modelpath 'mypath'/rcl.ckpt
Finish training, start writing results (if your data is large, please give more memory)
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/rcl_score.py", line 133, in <module>
    classification = Classify(str(args.model))    ## train on all (80 or 90 %)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/rcl_score.py", line 25, in __init__
    base_model = ContrastLearn.load_from_checkpoint(embeddings_model_path).model
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/utilities/model_helpers.py", line 125, in wrapper
    return self.method(cls, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/core/module.py", line 1582, in load_from_checkpoint
    loaded = _load_from_checkpoint(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/core/saving.py", line 63, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/lightning_fabric/utilities/cloud_io.py", line 59, in _load
    with fs.open(path_or_url, "rb") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/spec.py", line 1303, in open
    f = self._open(
        ^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 191, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 355, in __init__
    self._open()
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 360, in _open
    self.f = open(self.path, mode=self.mode)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rcl.ckpt'

The *rep1.covBga.txt files look like:

==> Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.covBga.txt <==
Hmel221001o	20131	20132	Hmel221001osegment1	33
Hmel221001o	20132	20139	Hmel221001osegment1	36
Hmel221001o	20139	20141	Hmel221001osegment1	34
Hmel221001o	20141	20142	Hmel221001osegment1	30
Hmel221001o	20142	20144	Hmel221001osegment1	31
Hmel221001o	20144	20156	Hmel221001osegment1	32
Hmel221001o	20156	20168	Hmel221001osegment1	33
Hmel221001o	20168	20177	Hmel221001osegment1	35
Hmel221001o	20177	20178	Hmel221001osegment1	33
Hmel221001o	20178	20179	Hmel221001osegment1	32

==> Hmel.ATACseq.LB44.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.covBga.txt <==
Hmel221001o	20131	20138	Hmel221001osegment1	9
Hmel221001o	20138	20139	Hmel221001osegment1	10
Hmel221001o	20139	20168	Hmel221001osegment1	9
Hmel221001o	20168	20179	Hmel221001osegment1	10
Hmel221001o	20179	20196	Hmel221001osegment1	9
Hmel221001o	20196	20198	Hmel221001osegment1	10
Hmel221001o	20198	20250	Hmel221001osegment1	9
Hmel221001o	20250	20320	Hmel221001osegment1	10
Hmel221001o	20320	20323	Hmel221001osegment1	12
Hmel221001o	20323	20325	Hmel221001osegment1	56

Did the file names coming from different replicates get numbered correctly, such as rep1, rep2, etc.?

I think so. I've now changed to a full path.

The files are too big, I've made a link to it.
https://drive.google.com/file/d/1DyKIxm2m2BNSqc3GzfcZ06xBQafnL8Vw/view?usp=sharing

Thanks a lot
F

hhvu0102 · 2024-08-23T22:26:03Z

Hi Francesco,
Thank you very much for the details.

The error

Error in serverSocket(port = port) :
creation of server socket failed: port 11044 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) :
creation of server socket failed: port 11351 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) :
creation of server socket failed: port 11022 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) :
creation of server socket failed: port 11980 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) :
creation of server socket failed: port 11056 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted

means that not all of the cores you supplied were available at the time of process. Our code in preprocessing.bash assumes the number of threads in -t can also be used as the number of cores for some internal parallel processing. This also explains why the next time you ran the code, it went through successfully - it's because the second time there were 12 cores available to run.
I will update the code so that it stops this behavior. In the meantime, you may consider reducing the number of threads, but it may take a bit longer to process.

About negative starting positions:

Hmel200006o -381 619 Hmel200006osegment1

I apologize for this error. We have some sanity check step in the code for this and it worked ok for mouse and human, but maybe it failed for non-mouse/human chromosome names. I will work on an update soon.

About main.py failing:
This output message Finish training, start writing results (if your data is large, please give more memory) means that main.py may have run successfully. Would you mind confirming if /user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rcl.ckpt actually exist?
I'll send a request to access the data and try working on it next week.

Thank you for your patience!

hhvu0102 · 2024-08-28T23:07:31Z

Hi Francesco,

I have not completely resolved all of the bugs, but I'd like to follow up with what I have so far. Unfortunately, I could not replicate your error. With your data, I have successfully trained a model and predicted peaks. Here's what I did:

cd /home/vthihong/tools/UnsupervisedPeakCaller #here's where I store RCL code, i.e., where one would clone the code into

source  ~/1_denoisedATAC/envRCL/bin/activate #activate RCL environment
bash run_rcl.sh -d "/home/vthihong/testTools/0_testRCL/chrHmel221001o/" -b "Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.bam Hmel.ATACseq.LB44.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.bam" -r "" -g -s -w
deactivate

Under the directory /home/vthihong/testTools/0_testRCL/chrHmel221001o/, I stored your data in a subdirectory called Hmel221001o which corresponds to the chromosome you sent.

Upon inspecting the code using your data, it appears to me that one region is not 1000bp long (Hmel221001osegment4306), which indicates there's some minor bug in the upstream steps that I will work on soon. This problem should not affect if the model is trained or not, but will affect the rcl_score.py step. In order to successfully run rcl_score.py, I simply removed Hmel221001osegment4306 from the covBga.txt files, but this is not a long term solution.

Here are my suggestions for now:

Check if rcl.ckpt actually exists in the $PWD directory.
If rcl.ckpt does not exist, I assume there should be a file called out.err in the $PWD directory. We'll need to look into that output to debug.

Thanks,
Ha

francicco · 2024-08-29T13:38:53Z

Hi @hhvu0102,

Thanks a lot! I'll give it another go and I'll let you know!
Cheers
F

francicco · 2024-08-29T14:17:59Z

Another thing I noticed.

I hade to change a line in conclu.py.

From
from scipy.signal import gaussian
to
from scipy.signal.windows import gaussian

what version of python/scipy is better using?

Cheers
F

hhvu0102 · 2024-08-29T14:39:13Z

Hi,

I'm using Python 3.7.10 for RCL. Here are other packages I have in the environment:

Package                  Version
------------------------ ------------
-orch                    1.10.0
aiohttp                  3.8.6
aiosignal                1.3.1
annotated-types          0.5.0
anyio                    3.7.1
arrow                    1.2.3
async-timeout            4.0.3
asynctest                0.13.0
attrs                    23.1.0
beautifulsoup4           4.12.2
blessed                  1.20.0
boto3                    1.29.4
botocore                 1.32.4
certifi                  2023.11.17
charset-normalizer       3.3.2
click                    8.1.7
croniter                 1.3.15
cycler                   0.11.0
dateutils                0.6.12
deepdiff                 6.7.1
exceptiongroup           1.1.3
fastapi                  0.88.0
fonttools                4.38.0
frozenlist               1.3.3
fsspec                   2023.1.0
h11                      0.14.0
idna                     3.4
importlib-metadata       6.7.0
inquirer                 2.10.1
itsdangerous             2.1.2
Jinja2                   3.1.2
jmespath                 1.0.1
joblib                   1.3.2
kiwisolver               1.4.5
lightning                1.9.5
lightning-cloud          0.5.55
lightning-utilities      0.10.0
markdown-it-py           2.2.0
MarkupSafe               2.1.3
matplotlib               3.5.3
mdurl                    0.1.2
multidict                6.0.4
numpy                    1.21.6
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
ordered-set              4.1.0
packaging                23.2
pandas                   1.3.5
Pillow                   9.5.0
pip                      20.1.1
psutil                   5.9.6
pydantic                 2.5.1
pydantic-core            2.14.3
pygments                 2.17.1
PyJWT                    2.8.0
pyparsing                3.1.1
python-dateutil          2.8.2
python-editor            1.0.4
python-multipart         0.0.6
pytorch-lightning        1.9.5
pytz                     2023.3.post1
PyYAML                   6.0.1
readchar                 4.0.5
requests                 2.31.0
rich                     13.7.0
s3transfer               0.7.0
scikit-learn             1.0.2
scipy                    1.7.3
seaborn                  0.12.2
setuptools               47.1.0
six                      1.16.0
sniffio                  1.3.0
soupsieve                2.4.1
starlette                0.29.0
starsessions             1.3.0
threadpoolctl            3.1.0
torch                    1.10.0
torchaudio               0.10.0
torchmetrics             0.11.4
torchvision              0.11.0
tqdm                     4.66.1
traitlets                5.9.0
typing-extensions        4.7.1
urllib3                  2.0.7
uvicorn                  0.22.0
wcwidth                  0.2.10
websocket-client         1.6.1
websockets               11.0.3
wheel                    0.41.3
yarl                     1.9.3
zipp                     3.15.0

francicco · 2024-08-30T09:22:44Z

Hi @hhvu0102,

I reinstalled everything, re-run the example, and it worked but still I'm not able to generate the model:

/user/home/tk19812/.local/lib/python3.12/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
/user/home/tk19812/.local/lib/python3.12/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Reading coverage files
Reading RCL input file /user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rep1.txt.
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/main.py", line 267, in <module>
    cov = read_data_new(file)
          ^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/ios.py", line 90, in read_data_new
    return np.array(dat)
           ^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100584,) + inhomogeneous part.

:(

F

hhvu0102 self-assigned this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main.py fails #8

main.py fails #8

francicco commented Aug 21, 2024

hhvu0102 commented Aug 21, 2024

francicco commented Aug 22, 2024

hhvu0102 commented Aug 23, 2024

hhvu0102 commented Aug 28, 2024

francicco commented Aug 29, 2024

francicco commented Aug 29, 2024

hhvu0102 commented Aug 29, 2024

francicco commented Aug 30, 2024

main.py fails #8

main.py fails #8

Comments

francicco commented Aug 21, 2024

hhvu0102 commented Aug 21, 2024

francicco commented Aug 22, 2024

hhvu0102 commented Aug 23, 2024

hhvu0102 commented Aug 28, 2024

francicco commented Aug 29, 2024

francicco commented Aug 29, 2024

hhvu0102 commented Aug 29, 2024

francicco commented Aug 30, 2024