Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

ECR images outdated #206

Open
agoblet opened this issue Apr 18, 2021 · 1 comment
Open

ECR images outdated #206

agoblet opened this issue Apr 18, 2021 · 1 comment

Comments

@agoblet
Copy link

agoblet commented Apr 18, 2021

Describe the bug
The images stored on ECR seem outdated. I would like to use the SAGEMAKER_GUNICORN_WORKERS feature from 1.8. I deploy my sagemaker instance on eu-west-1 with image 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.1.0-gpu. In my sagemaker endpoint logs, I do not see multiple gunicorn workers starting. I also see the gunicorn startup command INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app, which looks different from the command in 1.8. I also see little CPU activity on my machine, while making many requests.

To reproduce

  • Deploy a sagemaker tensorflow endpoint with image 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.1.0-gpu and some python handler.
  • Check the logs in cloudwatch
  • Verify that these logs do not come from the 1.8 code.

Expected behavior

  • Multiple gunicorn workers are spawned, and that is shown in the logs
  • The gunicorn command shown in the logs corresponds to the 1.8 code
  • The sagemaker instance is able to fully utilize CPU.

Screenshots or logs


INFO:__main__:starting services
--
INFO:__main__:using default model name: model
INFO:__main__:tensorflow serving model config:
model_config_list: {  config: {    name: "model",    base_path: "/opt/ml/model",    model_platform: "tensorflow"  }
}
INFO:__main__:nginx config:
load_module modules/ngx_http_js_module.so;
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr error;
worker_rlimit_nofile 4096;
events {  worker_connections 2048;
}
http {  include /etc/nginx/mime.types;  default_type application/json;  access_log /dev/stdout combined;  js_include tensorflow-serving.js;  upstream tfs_upstream {    server localhost:12001;  }  upstream gunicorn_upstream {    server unix:/tmp/gunicorn.sock fail_timeout=1;  }  server {    listen 8080 deferred;    client_max_body_size 0;    client_body_buffer_size 100m;    subrequest_output_buffer_size 100m;    set $tfs_version 2.1;    set $default_tfs_model model;    location /tfs {        rewrite ^/tfs/(.*) /$1  break;        proxy_redirect off;        proxy_pass_request_headers off;        proxy_set_header Content-Type 'application/json';        proxy_set_header Accept 'application/json';        proxy_pass http://tfs_upstream;    }    location /ping {        js_content ping_without_model;    }    location /invocations {        proxy_pass http://gunicorn_upstream/invocations;    }    location ~ ^/models/(.*)/invoke {        js_content invocations;    }    location /models {        proxy_pass http://gunicorn_upstream/models;    }    location / {        return 404 '{"error": "Not Found"}';    }    keepalive_timeout 3;  }
}
INFO:__main__:batching is enabled
WARNING:__main__:num_batch_threads defaulted to 8,the number of CPUs. Set SAGEMAKER_TFS_NUM_BATCH_THREADS to override default.
max_enqueued_batches defaulted to 8. Set SAGEMAKER_TFS_MAX_ENQUEUED_BATCHES to override default. Tuning this parameter may be necessary to tune out-of-memory errors occur.
INFO:__main__:batching config:
max_batch_size { value: 32 }
batch_timeout_micros { value: 1000000 }
num_batch_threads { value: 8 }
max_enqueued_batches { value: 8 }
INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.1.0-rc1+dev.sha.56ab43f
TensorFlow Library: 2.1.0
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=12000 --rest_api_port=12001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0 --enable_batching=true --batching_parameters_file=/sagemaker/batching-config.cfg
INFO:__main__:started tensorflow serving (pid: 15)
INFO:__main__:installing packages from requirements.txt...
2021-04-18 09:17:16.694250: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2021-04-18 09:17:16.694301: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: model
2021-04-18 09:17:16.794528: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:16.794552: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 0}
2021-04-18 09:17:16.794559: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 0}
2021-04-18 09:17:16.794566: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 0}
2021-04-18 09:17:16.794586: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/0
2021-04-18 09:17:17.191099: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-04-18 09:17:17.191150: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/0
2021-04-18 09:17:17.192358: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-04-18 09:17:17.195306: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-18 09:17:17.459224: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.459806: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-04-18 09:17:17.459818: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2021-04-18 09:17:17.459866: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460440: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:17.460972: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Collecting numpy  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
Collecting opencv-python-headless  Downloading opencv_python_headless-4.5.1.48-cp36-cp36m-manylinux2014_x86_64.whl (37.6 MB)
Installing collected packages: numpy, opencv-python-headless
2021-04-18 09:17:19.843425: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-18 09:17:19.843448: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-04-18 09:17:19.843454: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-04-18 09:17:19.845196: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.845770: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846306: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 09:17:19.846833: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14125 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2021-04-18 09:17:20.610839: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxIntersectionOverUnion" device_type: "GPU"') for unknown op: BoxIntersectionOverUnion
2021-04-18 09:17:20.610880: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "BoxEncode" device_type: "GPU"') for unknown op: BoxEncode
2021-04-18 09:17:20.882372: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
Successfully installed numpy-1.19.5 opencv-python-headless-4.5.1.48
WARNING: You are using pip version 20.0.2; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --pythonpath /opt/ml/model/code -e TFS_GRPC_PORT=12000 -e SAGEMAKER_MULTI_MODEL=False python_service:app
INFO:__main__:gunicorn version info:
gunicorn (version 20.0.4)
INFO:__main__:started gunicorn (pid: 89)
[2021-04-18 09:17:22 +0000] [89] [INFO] Starting gunicorn 20.0.4
[2021-04-18 09:17:22 +0000] [89] [INFO] Listening at: unix:/tmp/gunicorn.sock (89)
INFO:__main__:gunicorn server is ready!
[2021-04-18 09:17:22 +0000] [89] [INFO] Using worker: gevent
[2021-04-18 09:17:22 +0000] [93] [INFO] Booting worker with pid: 93
INFO:__main__:nginx version info:
nginx version: nginx/1.18.0
built by gcc 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
built with OpenSSL 1.1.1  11 Sep 2018
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
INFO:__main__:started nginx (pid: 94)
2021-04-18 09:17:23.383629: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /opt/ml/model/0
2021-04-18 09:17:24.514820: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 7720231 microseconds.
2021-04-18 09:17:24.763369: I tensorflow_serving/servables/tensorflow/saved_model_bundle_factory.cc:169] Wrapping session to perform batch processing
2021-04-18 09:17:24.763420: I tensorflow_serving/servables/tensorflow/bundle_factory_util.cc:153] Wrapping session to perform batch processing
2021-04-18 09:17:24.764123: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/0/assets.extra/tf_serving_warmup_requests
2021-04-18 09:17:24.772618: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 0} exhausted max_num_retries: 0
2021-04-18 09:17:24.772644: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 0}
2021-04-18 09:17:24.778584: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:12000 ...
[warn] getaddrinfo: address family for nodename not supported
2021-04-18 09:17:24.782789: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:12001 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

System information
A description of your system. Please provide:

  • Toolkit version:
  • Framework version: 2.1.0 (also happens in 2.0.0)
  • Python version:
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

@agoblet
Copy link
Author

agoblet commented Apr 19, 2021

For now I have worked around this issue using gunicorn's WEB_CONCURRENCY environment variable. With that set I can see more workers spawning in the logs, and my CPU utilization is also better.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant