Skip to content

Seamless Amazon Keyspaces Integration with SageMaker Using SigV4 Authentication

4 minute read
Content level: Foundational
0

This post shows how to connect your SageMaker environment to Amazon Keyspaces using the AWS Signature Version 4 (SigV4) authentication plugin.

This lets you access data from Keyspaces tables for training, inference, or analysis directly from a SageMaker notebook. This approach simplifies credential management by leveraging temporary credentials through your SageMaker execution role. It also enhances security by allowing you to enforce fine-grained permissions with IAM polices.

Seamless Amazon Keyspaces Integration with SageMaker Using SigV4 Authentication

Amazon Keyspaces (for Apache Cassandra) provides a scalable, highly available, and managed Cassandra-compatible database service. When working with machine learning workflows in Amazon SageMaker, you often need to access your Keyspaces data for training, inference, or analysis. This article shows how to connect to Amazon Keyspaces from SageMaker notebooks using AWS Signature Version 4 (SigV4) authentication with temporary credentials.

Why SigV4 Authentication?

Traditional Cassandra connections require username/password authentication, but Amazon Keyspaces supports AWS IAM-based authentication through SigV4. This approach offers several advantages:

  • No credential management: Uses temporary credentials from IAM roles
  • Enhanced security: Leverages AWS IAM policies and permissions
  • Seamless integration: Works naturally with SageMaker execution roles
  • No service-specific credentials: Eliminates the need to generate Keyspaces-specific credentials

Prerequisites

Before implementing this solution, ensure your SageMaker execution role has the necessary permissions:

Amazon Keyspaces Access

Attach either AmazonKeyspacesReadOnlyAccess or AmazonKeyspacesFullAccess managed policies (use least privilege principle):

STS AssumeRole Permissions

Add the following policy to allow role assumption:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": ["sts:AssumeRole"],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

Implementation

Step 1: Install Required Dependencies

%pip install cassandra-sigv4

Step 2: Download SSL Certificate

!curl https://certs.secureserver.net/repository/sf-class2-root.crt -O

Step 3: Import Required Libraries

from sagemaker import get_execution_role
from cassandra.cluster import Cluster
from ssl import SSLContext, PROTOCOL_TLSv1_2, CERT_REQUIRED
from cassandra_sigv4.auth import SigV4AuthProvider
import boto3
from pandas import DataFrame

Step 4: Obtain Temporary Credentials

# Get STS client and SageMaker execution role
client = boto3.client('sts')
role = get_execution_role()

# Assume the role to get temporary credentials
role_info = {
    'RoleArn': role,
    'RoleSessionName': 'keyspaces-session'
}

credentials = client.assume_role(**role_info)

Step 5: Establish Keyspaces Connection

# Create boto3 session with temporary credentials
session = boto3.session.Session(
    aws_access_key_id=credentials['Credentials']['AccessKeyId'],
    aws_secret_access_key=credentials['Credentials']['SecretAccessKey'],
    aws_session_token=credentials['Credentials']['SessionToken']
)

# Configure SSL context
ssl_context = SSLContext(PROTOCOL_TLSv1_2)
ssl_context.load_verify_locations('sf-class2-root.crt')
ssl_context.verify_mode = CERT_REQUIRED

# Create SigV4 auth provider
auth_provider = SigV4AuthProvider(session)

# Build connection string
region_name = session.region_name
keyspaces_host = f'cassandra.{region_name}.amazonaws.com'

# Connect to Keyspaces
cluster = Cluster(
    [keyspaces_host], 
    ssl_context=ssl_context, 
    auth_provider=auth_provider, 
    port=9142
)
keyspaces_session = cluster.connect()

Step 6: Query Data

# Query system tables
result = keyspaces_session.execute('SELECT * FROM system_schema.keyspaces')
df = DataFrame(result)
print(df)

# Query your own tables
# keyspace_name = 'your_keyspace'
# table_name = 'your_table'
# rows = keyspaces_session.execute(f'SELECT * FROM {keyspace_name}.{table_name}')
# df = DataFrame(rows)

Key Benefits

1. Security First

  • Uses IAM roles instead of hardcoded credentials
  • Temporary credentials automatically expire
  • Fine-grained access control through IAM policies

2. Simplified Authentication

  • No need to manage Keyspaces service-specific credentials
  • Leverages existing SageMaker execution role
  • Seamless integration with AWS services

3. Production Ready

  • SSL/TLS encryption for data in transit
  • Compatible with SageMaker's security model
  • Supports all Keyspaces regions

Use Cases

This authentication pattern is ideal for:

  • ML Feature Engineering: Extract features from Keyspaces for model training
  • Real-time Inference: Query Keyspaces during model inference
  • Data Analysis: Analyze large datasets stored in Keyspaces
  • ETL Pipelines: Process data between Keyspaces and other AWS services

Best Practices

  1. Use Least Privilege: Grant only necessary Keyspaces permissions
  2. Monitor Access: Use CloudTrail to track Keyspaces API calls
  3. Handle Errors: Implement proper error handling for connection failures
  4. Connection Pooling: Reuse connections for better performance
  5. Regional Deployment: Deploy in the same region as your Keyspaces tables

Conclusion

Integrating Amazon Keyspaces with SageMaker using SigV4 authentication provides a secure, scalable solution for accessing Cassandra data in your ML workflows. This approach eliminates credential management overhead while maintaining enterprise-grade security standards.

The combination of Amazon Keyspaces' managed Cassandra compatibility and SageMaker's ML capabilities, connected through IAM-based authentication, creates a powerful platform for data-driven machine learning applications.

Additional Resources

AWS
EXPERT
published 3 months ago90 views