Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip sequences that are less than minimum sequence length #44

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

ajtritt
Copy link
Contributor

@ajtritt ajtritt commented Aug 11, 2023

Motivation

Sequences less than minimum length were not getting filtered out, which cascaded into a torch error

Traceback (most recent call last):
  File "/pscratch/sd/a/ajtritt/.conda/envs/gtnet-dev/bin/gtnet", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/main.py", line 45, in run
    func(sys.argv[2:])
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/classify.py", line 54, in classify
    output = run_torchscript_inference(args.fastas, model, conf_models, window, step, vocab, seqs=args.seqs,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/predict.py", line 123, in run_torchscript_inference
    for file_path, seq_name, seq_len, seq_chunks in reader:
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/sequence.py", line 150, in readfiles
    raise e
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/sequence.py", line 147, in readfiles
    batches = encoder.encode(values)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/a/ajtritt/projects/exabiome/gtnet.git/src/gtnet/sequence.py", line 61, in encode
    ret = ret.unfold(1, self.window, self.step)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: maximum size for tensor at dimension 1 is 896 but size is 1024

How to test the behavior?

gtnet classify bin1.fna

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #44 (cc739dd) into main (8270e52) will decrease coverage by 0.24%.
The diff coverage is 37.50%.

@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
- Coverage   22.29%   22.05%   -0.24%     
==========================================
  Files           7        7              
  Lines         471      476       +5     
  Branches       59       61       +2     
==========================================
  Hits          105      105              
- Misses        359      362       +3     
- Partials        7        9       +2     
Files Changed Coverage Δ
src/gtnet/sequence.py 72.05% <37.50%> (-2.76%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ajtritt ajtritt merged commit adb23bf into main Aug 11, 2023
@ajtritt ajtritt deleted the bug/min_seq_len branch August 11, 2023 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant