Questions tagged with ML Ops with Amazon SageMaker and Kubernetes
Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Kubeflow Pipelines is a workflow manager that offers an interface to manage and schedule machine learning (ML) workflows on a Kubernetes cluster. Using open source tools offers flexibility and standardization, but requires time and effort to set up infrastructure, provision notebook environments for data scientists, and stay up-to-date with the latest deep learning framework versions.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hello AWS team!
I am trying to run a suite of inference recommendation jobs leveraging NVIDIA Triton Inference Server on a set of GPU instances (ml.g5.12xlarge, ml.g5.8xlarge, ml.g5.16xlarge) as well...
Hello,
I am trying to run a suite of inference recommendation jobs on a set of GPU instances (ml.g5.12xlarge, ml.g5.8xlarge, ml.g5.16xlarge) as well as AWS Inferentia machines (ml.inf2.2xlarge,...
Hi,
How would one go about designing a serverless ML application in AWS?
Currently, our project is using the [serverless framework](https://www.serverless.com/) and lambda functions to accomplish...
I want to create a training step in sagemaker pipeline, and use custom processor like below. But instead of python code I want to use java code in the place of [code = 'src/processing.py' ]. Is it...
I am trying to build a architecture for custom anomaly ai on AWS for my startup. Please let me know if my way of thinking is correct or not
1. Data Ingestion: Ingesting the data into AWS S3 in JSON...
Calling the sagemaker model endpoint with contentType `application/octet-stream` which is also being captured in Data Capture Logs.
What would be the ideal way to transform the data such that model...
based on aws docs/examples (https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html), one can create/register model that is generated by your training pipeline. first we need to...
Hi,
I'm working on an end-to-end ml project which, for the moment, goes from training (it takes already processed train/val/test data from an S3 bucket) to deploy, passing through hyperparameter...
I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript.
My code:
```
import tensorflow # to workaround a protobuf version conflict issue
import torch
import...
Is there any step-by-step guides/tutorials on how to implement Kubeflow with custom OIDC providers?
I want to install Kubeflow in region Jakarta with EKS, but Cognito is not available in region JKT...
Hi MLOps Gurus,
I'd like to seek guidance on my below situation.
This is regarding Sagemaker Project creation in AWS. The use case is to take final model (built by DS team) from S3 and do all...
I am expermenting with a sagemaker serverless endpoint (sample code below to create an endpoint from aws documentation). but I keep getting error when the endpoint is invoked , has anyone run into...