Accessing AWS Client SDK's from long running NextJs server actions/route handlers

0

I'm building a dashboard UI in NextJs that needs to pull data from various AWS services, such as Timestream, DynamoDB and S3.

I'm running into a problem with TimestreamQueryClient where after 1 hour of the server being up, a send() of a query command will fail with the following exception:

Error: The operation to discover endpoint failed. Please retry, or provide a custom endpoint and disable endpoint discovery to proceed.
    at eval (webpack-internal:///(action-browser)/./node_modules/@aws-sdk/middleware-endpoint-discovery/dist-es/updateDiscoveredEndpointInCache.js:44:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  reason: ExpiredTokenException: The security token included in the request is expired
      at throwDefaultError (webpack-internal:///(action-browser)/./node_modules/@smithy/smithy-client/dist-es/default-error-handler.js:11:22)
      at eval (webpack-internal:///(action-browser)/./node_modules/@smithy/smithy-client/dist-es/default-error-handler.js:20:9)
      at de_CommandError (webpack-internal:///(action-browser)/./node_modules/@aws-sdk/client-timestream-query/dist-es/protocols/Aws_json1_0.js:316:20)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async eval (webpack-internal:///(action-browser)/./node_modules/@smithy/middleware-serde/dist-es/deserializerMiddleware.js:8:24)
      at async eval (webpack-internal:///(action-browser)/./node_modules/@smithy/core/dist-es/middleware-http-signing/httpSigningMiddleware.js:25:20)
      at async eval (webpack-internal:///(action-browser)/./node_modules/@smithy/middleware-retry/dist-es/retryMiddleware.js:41:46)
      at async eval (webpack-internal:///(action-browser)/./node_modules/@aws-sdk/middleware-logger/dist-es/loggerMiddleware.js:9:26)

I'm using a NextJs server action to implement the query - so the client code sends the request to the server, the server sends the actually TSQ to AWS, and formats the response back to the client (the data is metrics which are being plotted into charts on the client).

One would expect the client to need to find a new endpoint periodically, and that seems to have happen hourly for Timestream, but specifically if you look at the 'reason' given, its down to an expired security token.

I've searched high and low, and found conflicting advice about whether credential refresh is performance automatically or manually. I've attempted to use the fromNodeProviderChain credentials provider (@aws-sdk/credential-providers) which I suspect is what it is using by default, but at least it allows me to intercept the call and inspect when the credentials are re-requested. This shows that the token is always the same, but I'm not actually sure what generates it in the first place (it's not stored in my ~/.aws/credentials file, which is where it is getting my access id and key from.

Note that my client instance is created simply with new TimestreamQueryClient({}).

I've also seen people suggest catching the exception in the user code, creating a new client instance and repeating the request.

I've tried all of the above approaches, and they all fail - recreating the TSQ client is the oddest failure of them all IMHO, as this should be a completely new instance, unless the default client is caching the credential provision internally and the token isn't expired/refreshed on the same basis as the server. When using fromNodeProviderChain I injected a custom expiration and whilst this would cause the credential provider to request according to the specified expiration, the token was identical on each attempt and was ultimately rejected by the AWS server on a subsequent request.

The AWS docs just don't seem to cover this, which makes me think that it -should- work but perhaps doesn't in my environment for reasons that aren't otherwise clear from the error message.

I'm deploying the NextJs app via ion.sst.dev and all the permissions are correct at the outset (I can get a client connection, query multiple times from multiple tables) but after the server has been up for 1+ hours, subsequent commands fail and I have to restart the process. SST is a layer on top of Route53 and CloudFront to simplify the deployment, and I'm using that alongside an existing CDK deployment. Part of the setup is to ensure that the Lambda is correctly configured for Timestream access including DescribeEndpoints and as I said, this all works from a fresh start, but fails later when the key is expired. In the exception above I think its clear that there is an internal retry and attempt and a fresh endpoint discovery, so that part seems to be working ok.

For all the tutorials and questions that I've been able to uncover, no one seems to show anything more complicated than the simple TSQ client construct, save for either specifying a region or injecting access id and key from environment variables. I've seen advice that says that static credentials actually disables credential refresh (but again contradictory sources for that).

Thanks.

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions