AI Company Document Q&A

Question

We've created a python app ( streamlit, langchain, python, openai, etc) that chats with our company docs including user guides, blogs, FAQ, support questions, and forum. We've also added in the ability for customers to create and upload libraries of custom documents (these are saved in a user directory on the server and accessed by login). Everything works great on our local machines and local network. But now we want to deploy to our external team and alpha testers for the next step. What tier plan/configurations/issues should we be considering for deploying this product for our next alpha/beta phase? We briefly tried the free tier with linux 1cpu and 1GB which seems to hang and drop connection frequently (none of which occurs on local network).

Many Thanks

Tim

Many Thanks

Tim

Answer

Hi, Sagemaker Serverless Inference proposes optimal costs for low traffic due to its serverless nature: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

```
 For Serverless Inference with Provisioned Concurrency, you pay for the compute 
capacity used to process inference requests, billed by the millisecond, and the 
amount of data processed. You also pay for Provisioned Concurrency usage, 
based on the memory configured, duration provisioned, and the amount of 
concurrency enabled. 
```

Pricing is detailled here: https://aws.amazon.com/sagemaker/pricing/

Hope it helps!

Didier

AI Company Document Q&A

Relevant content