Babak/upgrade triton to v2.43.0 #3

babakbehzad · 2024-03-20T18:41:12Z

No description provided.

…ver#5911) * Add test for detecting S3 http2 upgrade request * Enhance testing * Copyright year update

…5922) * Add HPCX dependencies to search path * Copy hpcx to CPU-only container * Add ucc path to CPU-only image * Fixed if statement * Fix df variable * Combine hpcx LD_LIBRARY_PATH

…riton-inference-server#5915) * Add test case for metric lifetime error handling * Address comment * Use different MetricFamily name

…erver#5810) * Add testing for Pytorch instance group kind MODEL * Remove unused item * Update testing to verify the infer result * Add copyright * Remove unused import * Update pip install * Update the model to use the same add sub logic * Add torch multi-gpu and multi-device models to L0_io * Fix up model version

…rence-server#5937) * Add test for passing config via load api * Add more docs on instance update behavior * Update to suggested docs Co-authored-by: Ryan McCormick <[email protected]> * Use dictionary for json config * Modify the config fetched from Triton instead --------- Co-authored-by: Ryan McCormick <[email protected]>

…ence-server#5945) * Add redis config and use local logfile for redis server * Move redis log config to CLI * Have separate redis logs for unit tests and CLI tests

…ce-server#5885) * Add test on rate limiter max resource decrease update * Add test with explicit resource * Check server log for decreased resource limit

…#5936)

…ntrypoint updates (triton-inference-server#5910) * Allow changing ping behavior based on env variable in SageMaker * Add option for additional args * Make ping further configurable * Allow further configuration of grpc and http ports * Update docker/sagemaker/serve * Update docker/sagemaker/serve --------- Co-authored-by: GuanLuo <[email protected]>

…ce-server#5967) * Be more specific with MPI removal * Delete all libmpi libs

…rver#5963) * Add print statements for debugging * Add debugging print statements * Test using grpc client with stream to fix race * Use streaming client in all non-batch tests * Switch all clients to streaming GRPC * Remove unused imports, vars * Address comments * Remove random comment * Set inputs as separate function * Split set inputs based on test type

…nce-server#5966)

* Auto-format * Change to clang-format-15 in CONTRIBTUING

…rpreter

…nce-server#5990)

…iton-inference-server#5993)

…er#5883)

…rence-server#5994)

…-server#5976) * Add test for >1000 files * Capitalization for consistency * Add bucket cleaning at end * Move test pass/fail to end * Check number of files in model dir at load time

* Add testing for GPU tensor error handling * Fix up * Remove exit 0 * Fix jetson * Fix up

* Add test for Python BLS model loading API * Fix up

…rence-server#6006)

…ackend_python (triton-inference-server#6823)

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

…-server#6835)

…ver#6833) * Handle empty output * Add test case for 0 dimension output * Fix up number of tests

* tensorrt-llm benchmarking test

* Update README and versions for 2.42.0 / 24.01 (triton-inference-server#6789) * Update versions * Update README and versions for 2.42.0 / 24.01 * Fix documentaation genarion (triton-inference-server#6801) * Ser version of sphix to 5.0 * Set verions 5.0.0 * Update README.md and versions post 24.01

…und (triton-inference-server#6834) * Update miniconda version * Install pytest for different py version * Install pytest

* Add test for shutdown while loading * Fix intermittent failure on test_model_config_overwrite

Adding OpenTelemetry Batch Span Processor --------- Co-authored-by: Theo Clark <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

…erence-server#6755) * Support Double-Type Infer/Response Parameters

* Base Python Backend Support for Windows

…r#6886) * Update README and versions for 2.43.0 / 24.02 * Update Dockefile to reduce image size. * Update path in patch file for model generation

…erence-server#6882)

…ference-server#6873) (triton-inference-server#6881) * Add unit test reports to L0_dlpack_multi_gpu * Add unit test reports to L0_warmup

…r#6883) (triton-inference-server#6885)

* Elimitated usage of onnx models in tests/ disabled some tests * Verified green: batch 1 * Verified tests: batch 2 * Verified tests: batch 3 * Verified tests: batch 4 * Verified tests: batch 5 * Verified tests: batch 6 * Verified tests: batch 7 * Verified tests: batch 8 * Verified tests: batch 9 * Verified tests: batch 10 * Verified tests: batch 11 * Verified tests: batch 12 * Verified tests: batch 12_follow up * Verified tests: batch 13 * Verified tests: batch 14 * Verified tests: batch 15 * Verified tests: batch 16 * Removed exits

* Revert to previous way of CMake installation * Win10: set python version back to 3.8.10

* Update README.md for 24.02 * Update version to 24.02

docs/conf.py

+    # },
+    "use_edit_page_button": False,
+    "use_issues_button": True,
+    "use_repository_button": True,


deploy/mlflow-triton-plugin/mlflow_triton/config.py


+class Config(dict):


kthui and others added 30 commits June 8, 2023 14:17

Add test for detecting S3 http2 upgrade request (triton-inference-ser…

b2eab2f

…ver#5911) * Add test for detecting S3 http2 upgrade request * Enhance testing * Copyright year update

Add Redis cache build, tests, and docs (triton-inference-server#5916)

1185555

Updated handling for uint64 request priority

bad5272

Ensure HPCX dependencies found in container (triton-inference-server#…

5d2bcf8

…5922) * Add HPCX dependencies to search path * Copy hpcx to CPU-only container * Add ucc path to CPU-only image * Fixed if statement * Fix df variable * Combine hpcx LD_LIBRARY_PATH

Add test case where MetricFamily is deleted before deleting Metric (t…

fa8d8af

…riton-inference-server#5915) * Add test case for metric lifetime error handling * Address comment * Use different MetricFamily name

Fix L0_batcher count check (triton-inference-server#5939)

9221542

Add testing for json tensor format (triton-inference-server#5914)

73c5569

Add redis config and use local logfile for redis server (triton-infer…

c6e8869

…ence-server#5945) * Add redis config and use local logfile for redis server * Move redis log config to CLI * Have separate redis logs for unit tests and CLI tests

Add test on rate limiter max resource decrease update (triton-inferen…

3a1243e

…ce-server#5885) * Add test on rate limiter max resource decrease update * Add test with explicit resource * Check server log for decreased resource limit

Add docs on decoupled final response feature (triton-inference-server…

cbae8c2

…#5936)

Remove only MPI libraries in HPCX in L0_perf_analyzer (triton-inferen…

306549c

…ce-server#5967) * Be more specific with MPI removal * Delete all libmpi libs

Add test for redis cache auth credentials via env vars (triton-infere…

e75371b

…nce-server#5966)

Auto-formatting (triton-inference-server#5979)

8e6628f

* Auto-format * Change to clang-format-15 in CONTRIBTUING

Adding tests ensuring locale setting is passed to python backend inte…

01b3b5d

…rpreter

Refactor build.py CPU-only Linux libs for readability (triton-infere…

2024fdc

…nce-server#5990)

Improve the error message when the number of GPUs is insufficient (tr…

565f306

…iton-inference-server#5993)

Update README to include CPP-API Java Bindings (triton-inference-serv…

876dd1a

…er#5883)

Update env variable to use for overriding /ping behavior (triton-infe…

65f7cd1

…rence-server#5994)

Add test that >1000 model files can be loaded in S3 (triton-inference…

1fe247d

…-server#5976) * Add test for >1000 files * Capitalization for consistency * Add bucket cleaning at end * Move test pass/fail to end * Check number of files in model dir at load time

Add testing for GPU tensor error handling (triton-inference-server#5871)

4ba3871

* Add testing for GPU tensor error handling * Fix up * Remove exit 0 * Fix jetson * Fix up

Add test for Python BLS model loading API (triton-inference-server#5980)

438ee53

* Add test for Python BLS model loading API * Fix up

Update README and versions for 23.06 branch

fff1595

Fix LD_LIBRARY_PATH for PyTorch backend

e47fbca

Return updated df in add_cpu_libs

00c0fd1

Remove unneeded df param

3e6ef8d

Update test failure messages to match Dataloader changes (triton-infe…

b95366e

…rence-server#6006)

rmccorm4 and others added 21 commits January 24, 2024 19:21

Add unit test reports to L0_json, L0_metrics, L0_response_cache, L0_b…

8fc7b10

…ackend_python (triton-inference-server#6823)

Update trace summary script (triton-inference-server#6758)

30d64af

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

Add gsutil upload retry helper function (triton-inference-server#6817)

4bc15c9

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

Add test for shutdown while unloading in background (triton-inference…

b0e7e50

…-server#6835)

Handle 0 dimension output for generate endpoint (triton-inference-ser…

f0d788b

…ver#6833) * Handle empty output * Add test case for 0 dimension output * Fix up number of tests

tensorrt-llm benchmarking test (triton-inference-server#6771)

62c3a76

* tensorrt-llm benchmarking test

Use libmamba solver for L0_backend_python env test. Fix pytest not fo…

dfe9dde

…und (triton-inference-server#6834) * Update miniconda version * Install pytest for different py version * Install pytest

Add test for shutdown while loading model (triton-inference-server#6837)

f345bbb

* Add test for shutdown while loading * Fix intermittent failure on test_model_config_overwrite

Adding OpenTelemetry Batch Span Processor (triton-inference-server#6842)

8f98789

Adding OpenTelemetry Batch Span Processor --------- Co-authored-by: Theo Clark <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

Support Double-Type Inference Request/Response Parameters (triton-inf…

9860f73

…erence-server#6755) * Support Double-Type Infer/Response Parameters

Updating vllm version to 0.3.0 (triton-inference-server#6858)

2623c7f

Python Backend Windows Support (triton-inference-server#6830)

bfbb24c

* Base Python Backend Support for Windows

Add support for Oracle Cloud in deploy (triton-inference-server#6850)

3dfb7b1

Update README and versions for 2.43.0 / 24.02 (triton-inference-serve…

61df002

…r#6886) * Update README and versions for 2.43.0 / 24.02 * Update Dockefile to reduce image size. * Update path in patch file for model generation

Set OV version to 2023.3.0 (triton-inference-server#6880) (triton-inf…

4cdc07c

…erence-server#6882)

Add unit test reports to L0_dlpack_multi_gpu and L0_warmup (triton-in…

806d40a

…ference-server#6873) (triton-inference-server#6881) * Add unit test reports to L0_dlpack_multi_gpu * Add unit test reports to L0_warmup

Fixing StringTo uint32_t used only by tracing (triton-inference-serve…

fb486d9

…r#6883) (triton-inference-server#6885)

Win10: Revert 24.02 changes (triton-inference-server#6911)

bf4c982

* Revert to previous way of CMake installation * Win10: set python version back to 3.8.10

Update README.md for 24.02 (triton-inference-server#6925)

8ced3bb

* Update README.md for 24.02 * Update version to 24.02

github-advanced-security bot found potential problems Mar 20, 2024

View reviewed changes

babakbehzad added 2 commits March 22, 2024 19:00

Do not read cuda-keyring

4c6b7af

Do not read cuda-keyring

93973e7

babakbehzad force-pushed the babak/upgrade-triton-to-v2.43.0 branch from 4c6b7af to 93973e7 Compare March 23, 2024 02:04

babakbehzad and others added 3 commits March 23, 2024 02:05

Fix

7139c3f

Set the user id to 1001 because Verkada is user 1000

4a3ab82

Do not run ldconfig

77999d4

babakbehzad closed this Apr 5, 2024

babakbehzad reopened this Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Babak/upgrade triton to v2.43.0 #3

Babak/upgrade triton to v2.43.0 #3

babakbehzad commented Mar 20, 2024


		class Config(dict):

Babak/upgrade triton to v2.43.0 #3

Are you sure you want to change the base?

Babak/upgrade triton to v2.43.0 #3

Conversation

babakbehzad commented Mar 20, 2024