Hello,
I am trying to deploy this github solution but do not have access to a ml.g5.12xlarge and I am hoping to be able to run it on a ml.g5.4xlarge. Based on the error I am getting (see below dump showing timeout = 60s) I am wondering if there is some sort of timeout variable that I can set when I create the the sagemaker endpoint to increase how long it waits for the model to respond to the query.
PS: I am pretty sure the issue is not related to this post here
Thank you,
---------------------------------------------------------------------------
TimeoutError Traceback (most recent call last)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/urllib3/connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
463 except BaseException as e:
464 # Remove the TypeError from the exception chain in
465 # Python 3 (including for exceptions like SystemExit).
466 # Otherwise it looks like a bug in the code.
--> 467 six.raise_from(e, None)
468 except (SocketTimeout, BaseSSLError, SocketError) as e:
TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
ReadTimeoutError Traceback (most recent call last)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/httpsession.py:464, in URLLib3Session.send(self, request)
463 request_target = self._get_request_target(request.url, proxy_url)
--> 464 urllib_response = conn.urlopen(
465 method=request.method,
466 url=request_target,
467 body=request.body,
468 headers=request.headers,
469 retries=Retry(False),
470 assert_same_host=False,
471 preload_content=False,
472 decode_content=False,
473 chunked=self._chunked(request.headers),
474 )
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/urllib3/connectionpool.py:358, in HTTPConnectionPool._raise_timeout(self, err, url, timeout_value)
357 if isinstance(err, SocketTimeout):
--> 358 raise ReadTimeoutError(
359 self, url, "Read timed out. (read timeout=%s)" % timeout_value
360 )
362 # See the above comment about EAGAIN in Python 3. In Python 2 we have
363 # to specifically catch it and throw the timeout error
**ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443):
Read timed out. (read timeout=60)**
During handling of the above exception, another exception occurred:
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/httpsession.py:501, in URLLib3Session.send(self, request)
500 except URLLib3ReadTimeoutError as e:
--> 501 raise ReadTimeoutError(endpoint_url=request.url, error=e)
502 except ProtocolError as e:
ReadTimeoutError: Read timeout on endpoint URL: "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/aws-genai-mda-blog-flan-t5-xxl-endpoint-6ecf4020/invocations"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[14], line 20
7 query = "How many covid cases are there in the state of NY"
**---> 20 response = run_query(query)**
21 print("----------------------------------------------------------------------")
22 print(f'SQL and response from user query {query} \n {response}')
Cell In[13], line 51, in run_query(query)
**---> 51 channel, db = identify_channel(query) **
Thank your for the insights. I added values to the timeout variables you mentioned, but still got similar errors. I also tried a couple of different huggingface models and deep learning containers (DLC) that are supposed to be faster performing but it seemed to make very little difference. Do you think using a smaller model flan-t5-xl is worth trying out?
Just to summarize the models used and errors I am getting: Model: Flan-T5-XXL DLC: pytorch-inference:1.12.0-gpu-py38
Error: ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)**
Model: Flan-T5-XXL-FP16 DLC: huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04 Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: "InternalServerException", {"message": "addmm_impl_cpu_" not implemented for Half"}
Model: Flan-T5-XXL-BNB-INT8 DLC: pytorch-inference:1.12.0-gpu-py38 Error: ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)**