Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable compilation with ONNX 1.13 #1835

Merged
merged 7 commits into from
Feb 10, 2023
Merged

Conversation

gagnonlg
Copy link
Contributor

@gagnonlg gagnonlg commented Feb 7, 2023

Simple PR to fix a compilation error against ONNX runtime 1.13.1. Basically, the Ort::Session::Get{Input,Output}Name functions got removed starting from 1.13, and we have the use the Ort::Session::Get{Input,Output}NameAllocated versions which were introduced in 1.12.

One implication is that this PR introduces a lower bound of 1.12.0 on the ONNX runtime version.

@gagnonlg gagnonlg added Bug Something isn't working Component - Plugins Affects one or more Plugins Impact - Minor Nuissance bug and/or affects only a single module labels Feb 7, 2023
@gagnonlg gagnonlg added this to the next milestone Feb 7, 2023
@paulgessinger
Copy link
Member

paulgessinger commented Feb 7, 2023

Let me tag @benjaminhuth and @Corentin-Allaire. Thoughts?

@benjaminhuth
Copy link
Member

benjaminhuth commented Feb 7, 2023

Very good to stay compatible with newer ONNX versions. However, I see two points before this gets merged:

  • I think we need to update our CI image to a newer ONNX version
  • This probably also has implications for the Exa.TrkX plugin with the (currently unused) ONNX backend. It can be enabled via the ACTS_EXATRKX_ENABLE_ONNX cmake flag.

@Corentin-Allaire
Copy link
Contributor

Corentin-Allaire commented Feb 7, 2023

I have done those exact same change (plus some other) in my own branch. I general I think we should discuss how Onnx is implemented in Acts right now (with the current implementation you need to create a new class that inherit from OnnxRuntimeBase each time you import a new network but we can discuss this once I open a PR for the ambiguity resolver

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 7, 2023

Good point about Exa.Trk, tagging @xju2 -- do you know if your pipeline works with OnnxRuntime >= 1.12.0?

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 7, 2023

I general I think we should discuss how Onnx is implemented in Acts right now (with the current implementation you need to create a new class that inherit from OnnxRuntimeBase each time you import a new network but we can discuss this once I open a PR for the ambiguity resolver

@Corentin-Allaire Do you mean that in your ambi. solver work, you rely on changing the current implementation? If so, maybe it'd be worth it to merge that part in sooner (or start the discussion as you suggest) so as to minimize potential divergence? just my .02$

@Corentin-Allaire
Copy link
Contributor

Corentin-Allaire commented Feb 7, 2023

I general I think we should discuss how Onnx is implemented in Acts right now (with the current implementation you need to create a new class that inherit from OnnxRuntimeBase each time you import a new network but we can discuss this once I open a PR for the ambiguity resolver

@Corentin-Allaire Do you mean that in your ambi. solver work, you rely on changing the current implementation? If so, maybe it'd be worth it to merge that part in sooner (or start the discussion as you suggest) so as to minimize potential divergence? just my .02$

@gagnonlg My main change for now but it is not definitive was to switch runONNXInference to public that way your algorithm can have a Acts::OnnxRuntimeBase member that load the model of your choice instead of writing a new class for each model you import as it is done right now.

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 7, 2023

@gagnonlg My main change for now but it is not definitive was to switch runONNXInference to public that way your algorithm can have a Acts::OnnxRuntimeBase member that load the model of your choice instead of writing a new class for each model you import as it is done right now.

I see, seems like that would be a backwards compatible change if that decision is made so there's no hurry, I guess

What do you mean by .02$ ? Is it a typo or something I don't understand ?

Oh, it's literally "my 2 cents", an idiom that basically means "in my opinion", i.e. feel free to disagree

@Corentin-Allaire
Copy link
Contributor

@gagnonlg My main change for now but it is not definitive was to switch runONNXInference to public that way your algorithm can have a Acts::OnnxRuntimeBase member that load the model of your choice instead of writing a new class for each model you import as it is done right now.

I see, seems like that would be a backwards compatible change if that decision is made so there's no hurry, I guess

What do you mean by .02$ ? Is it a typo or something I don't understand ?

Oh, it's literally "my 2 cents", an idiom that basically means "in my opinion", i.e. feel free to disagree

I got the 2 cents afterward, I knew the expression but it this the first I see it written that way !

@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Merging #1835 (a63f605) into main (97a8458) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1835   +/-   ##
=======================================
  Coverage   49.50%   49.50%           
=======================================
  Files         407      407           
  Lines       22646    22646           
  Branches    10334    10334           
=======================================
  Hits        11210    11210           
  Misses       4247     4247           
  Partials     7189     7189           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

📊 Physics performance monitoring for a63f605

Full report
Seeding: seeded, truth estimated, orthogonal
CKF: seeded, truth smeared, truth estimated, orthogonal
IVF: seeded, truth smeared, truth estimated, orthogonal
Ambiguity resolution: seeded, orthogonal
Truth tracking
Truth tracking (GSF)

Vertexing

Vertexing vs. mu
IVF seeded

IVF truth_smeared

IVF truth_estimated

IVF orthogonal

Seeding

Seeding seeded

Seeding truth_estimated

Seeding orthogonal

CKF

CKF seeded

CKF truth_smeared

CKF truth_estimated

CKF orthogonal

Ambiguity resolution

seeded

Truth tracking (Kalman Filter)

Truth tracking

Truth tracking (GSF)

Truth tracking

@benjaminhuth benjaminhuth self-assigned this Feb 7, 2023
@benjaminhuth
Copy link
Member

We have a new image now, if you replace ghcr.io/acts-project/ubuntu2004_exatrkx:v30 with ghcr.io/acts-project/ubuntu2004_exatrkx:v31 in .gitlab-ci.yaml, then we should see if it acutally builds @gagnonlg

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 9, 2023

@benjaminhuth awesome, let's try it now

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 9, 2023

@benjaminhuth are you sure it's v31? Looks like OnnxRuntime 1.10 is still being picked up

@benjaminhuth
Copy link
Member

Yeah I also noticed it. I'll check what I messed up

@benjaminhuth
Copy link
Member

benjaminhuth commented Feb 9, 2023

Ah I mixed the numbers. It is release 34, not 31. Sorry!

@gagnonlg
Copy link
Contributor Author

gagnonlg commented Feb 9, 2023

Ah I mixed the numbers. It is release 34, not 31. Sorry!

No worries, it's an easy fix!

@benjaminhuth
Copy link
Member

Hmm this time the error is

section_end:1675953156:prepare_executor ERROR: Job failed: failed to pull image "ghcr.io/acts-project/ubuntu2004_exatrkx:v34" with specified policies
[always]: write /var/lib/docker/tmp/GetImageBlob814379636: no space left on device (manager.go:203:71s)

Do we need to reduce somehow the size of the image @paulgessinger ?

@paulgessinger
Copy link
Member

No I guess the disk is simply full on the VM. In principle I clear the image storage regularly but you can try rerunning which hopefully ends up on another VM.

@xju2
Copy link
Contributor

xju2 commented Feb 9, 2023

Hi @gagnonlg , the ONNX models were not working because some operations were not supported by that time. Maybe the new version will fix those. ;-)

@gagnonlg
Copy link
Contributor Author

@benjaminhuth from the ci-bridge:

$ python3 src/Examples/Scripts/Python/exatrkx.py onnx
Traceback (most recent call last):
  File "src/Examples/Scripts/Python/exatrkx.py", line 5, in <module>
    import acts.examples
  File "/builds/acts/ci-bridge/build/python/acts/__init__.py", line 7, in <module>
    from .ActsPythonBindings import *
ImportError: libonnxruntime.so.1.13.1: cannot open shared object file: No such file or directory

@benjaminhuth
Copy link
Member

Nice that it builds without errors!

The test workflow still runs on the v30 image, you would also need to update it to v34 (also in .gitlab-ci.yaml)
But then I'm confident it runs through!

@gagnonlg
Copy link
Contributor Author

@benjaminhuth of course, I need to bump the test image as well 🤡

@gagnonlg
Copy link
Contributor Author

All fine now, I think this can go in. @benjaminhuth can you approve?

@kodiakhq kodiakhq bot merged commit c2ef676 into acts-project:main Feb 10, 2023
paulgessinger pushed a commit to paulgessinger/acts that referenced this pull request Feb 13, 2023
Simple PR to fix a compilation error against ONNX runtime 1.13.1. Basically, the `Ort::Session::Get{Input,Output}Name` functions got removed starting from 1.13, and we have the use the `Ort::Session::Get{Input,Output}NameAllocated` versions which were introduced in 1.12. 

One implication is that this PR introduces a lower bound of 1.12.0 on the ONNX runtime version.
@paulgessinger paulgessinger modified the milestones: next, v23.3.0 Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Component - Plugins Affects one or more Plugins Impact - Minor Nuissance bug and/or affects only a single module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants