I have an existing SageMaker inference endpoint that I'm successfully calling from Aurora PostgreSQL using the aws_ml
extension's invoke_endpoint
function. I'm now trying to use the same endpoint from Redshift.
Based on Getting started with Amazon Redshift ML, I've set up the necessary IAM policies, created a model for the endpoint in Redshift, and called it via the model's registered function. However, I'm getting an error after 370 seconds no matter what I try.
Query 1 ERROR: ERROR: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-
DETAIL:
-----------------------------------------------
error: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-
code: 32207
context:
query: 4076
location: exfunc_client.cpp:136
process: query1_125_4076 [pid=29885]
-----------------------------------------------
I can see work being performed in the endpoint containers, and there's no errors reported. One major difference between Aurora PostgreSQL and Redshift is that there's no controls for batch size from Redshift. In Aurora PostgreSQL, I typically pass a batch size of around 1000 to invoke_endpoint
. Redshift is sending 50000 to 220000 rows per batch, which can take a couple minutes to complete.
Does anyone have any suggestions on how I can debug this? The query failure is always at 370 seconds. I'm not sure what the significance of that number is.