model_fn and input_fn called multiple times #1073

aunz · 2020-03-04T00:06:49Z

I am using the prebuilt SageMaker SKLearn container (https://github.com/aws/sagemaker-scikit-learn-container) version 0.20.0.
In the entry_point,I include a script which carries out the batch transformation job.

def model_fn():
    ...    

def input_fn():
    ...

def predict_fn():
    '''
        A long running process to preprocess the data before calling model
        https://aws.amazon.com/blogs/machine-learning/preprocess-input-data-before-making-predictions-using-amazon-sagemaker-inference-pipelines-and-scikit-learn/
    '''
    time.sleep(60 * 11) # sleep for 11 mins to simulate a long running process
    ....

def output_fn():
    ....

I noticed that the model_fn() was called multiple times in the cloudwatch log

21:11:43 model_fn called /opt/ml/model 0.3710819465747405
21:11:43 model_fn called /opt/ml/model 0.1368146211634631
21:11:44 model_fn called /opt/ml/model 0.09153953459183728

The input_fn() was also called multiple times

20:41:31 input_data <class 'str'> application/json 0.3936440317990033 {
20:51:30 input_data <class 'str'> application/json 0.4852180186010707 {
21:01:30 input_data <class 'str'> application/json 0.9954036507047136 {
21:11:30 input_data <class 'str'> application/json 0.0806271844985188 {

Precisely, it's called every 10 minutes.

I used ml.m4.xlarge, BatchStrategy = SingleRecord and SplitType of None. I also used the environmental variable SAGEMAKER_MODEL_SERVER_TIMEOUT = '9999' to overcome the 60s timeout. I expected that the model_fn or input_fn would only be called once, but in this case, they were called multiple times. In the end, the container crashed with "Internal Server Error".

I saw a similar related issue before #341 where the model_fn was called on each invocation. But in this case, there is no /invocations, the model_fn, input_fn, predict_fn, and output_fn were called multiple time. In the end, the container crashed with Internal Server Error.

The text was updated successfully, but these errors were encountered:

ikennanwosu · 2020-04-14T23:47:34Z

How did you resolve this please, as I am getting the same issue.

EKami · 2020-11-09T11:10:02Z

Same issue here =/

raydazn · 2020-12-14T17:51:39Z

Same issue here. If model_fu provides functionality of loading model, do me need to load it for every batch?

uday1212 · 2023-03-10T01:35:46Z

Same issue here ..!!! Anyone found a solution to this.?

naresh129 · 2023-06-30T11:17:41Z

How is this issue solved. same issue here too..

llealgt · 2024-10-11T02:56:18Z

Has anyone found a solution? I'm facing the same issue, the function runs 4 times, it seems like 1 time per GPU available.

HubGab-Git · 2024-10-13T09:33:35Z

Can you show your code? I would like to reproduce it

kurtgdl · 2024-12-07T00:50:47Z

Is there any update on this? It seems there's a problem with sagemaker-inference-toolkit. huggingface-sagemaker-inference-toolkit has the same issue: aws/sagemaker-huggingface-inference-toolkit#133

athewsey · 2024-12-16T06:06:19Z

In general as far as I'm aware, it's expected that the model_fn will be called multiple times because the default behaviour is for the server to load multiple copies of your model and use those to serve concurrent requests on multiple worker threads.

I've worked pretty closely with SageMaker but am not part of their core inference engineering team, so the following is based on an imperfect (and potentially outdated) understanding:

I believe both the sagemaker-scikit-learn-container and sagemaker-huggingface-inference-toolkit (for Hugging Face DLCs) use AWSLabs multi-model-server for their base inference server. The core sagemaker-inference-toolkit depends on it too as mentioned in the readme, but I know other DLCs like PyTorch and TensorFlow have been using their own ecosystems' serving stacks TorchServe and TFX.

It does make sense for the stack to support multiple worker threads so you can effectively utilize resources like instances with multiple GPUs, or a large number of CPU cores - and in general the stack should be configurable, but (IMO) it's a bit difficult to navigate with the serving stacks for these containers being split across so many different layers of code repository...

To explicitly control/limit the number of worker threads created to best utilize the hardware, I'd suggest trying environment variables:

SAGEMAKER_MODEL_SERVER_WORKERS (as per SM Inference Toolkit parameters.py)
MMS_DEFAULT_WORKERS_PER_MODEL, MMS_NETTY_CLIENT_THREADS, and possibly also MMS_NUMBER_OF_NETTY_THREADS (as per MMS configuration doc and underlying ConfigManager)

input_fn being called multiple times for a single request is more concerning as that seems like a retry. You may have to set MMS-specific timeout & payload size configurations if the SAGEMAKER_ one isn't getting picked up. For example, in the past for large-payload/long-time inference on Hugging Face v4.28 container, I used MMS_DEFAULT_RESPONSE_TIMEOUT, MMS_MAX_REQUEST_SIZE, and MMS_MAX_RESPONSE_SIZE.

Hope this helps, but it'd be great to hear from anybody who manages to clarify exactly which env vars are sufficient to control the number of model workers spawned on these containers.

kurtgdl · 2024-12-17T14:55:26Z

Thanks a lot @athewsey. Your points are very useful.

kurtgdl mentioned this issue Dec 7, 2024

Server reruns same task multiple times aws/sagemaker-huggingface-inference-toolkit#133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_fn and input_fn called multiple times #1073

model_fn and input_fn called multiple times #1073

aunz commented Mar 4, 2020

ikennanwosu commented Apr 14, 2020

EKami commented Nov 9, 2020

raydazn commented Dec 14, 2020

uday1212 commented Mar 10, 2023

naresh129 commented Jun 30, 2023

llealgt commented Oct 11, 2024

HubGab-Git commented Oct 13, 2024

kurtgdl commented Dec 7, 2024

athewsey commented Dec 16, 2024 •

edited

Loading

kurtgdl commented Dec 17, 2024

model_fn and input_fn called multiple times #1073

model_fn and input_fn called multiple times #1073

Comments

aunz commented Mar 4, 2020

ikennanwosu commented Apr 14, 2020

EKami commented Nov 9, 2020

raydazn commented Dec 14, 2020

uday1212 commented Mar 10, 2023

naresh129 commented Jun 30, 2023

llealgt commented Oct 11, 2024

HubGab-Git commented Oct 13, 2024

kurtgdl commented Dec 7, 2024

athewsey commented Dec 16, 2024 • edited Loading

kurtgdl commented Dec 17, 2024

athewsey commented Dec 16, 2024 •

edited

Loading