Reading Hbase data directly from S3 after migrating Hbase to S3 without EMR

0

Hello As part of Cloud Migration and Modernization approach using using AWS, the requirement is to migrate Hbase data directly to S3 then read the data from S3 using Java Microservices. (EMR would not be used) Can the following approach be taken? **We use AWS Data Sync to move On premise Hbase data to S3. ** https://aws.amazon.com/blogs/storage/using-aws-datasync-to-move-data-from-hadoop-to-amazon-s3/ The use Spring Boot microservices deployed as POD in Fargate to read the data from S3.?

  • One approach is to move the HBase files to S3 bucket. Use Glue crawler and create database catalogue and then use Athena API to fetch data using Microservice. But are there alternative approaches?

1 Answer
2
Accepted Answer

Hi bnaha,

Please go through the below steps I hope it will helps to resolve your issue

Approach 1: Direct S3 Access with Hadoop APIs

If you want to read the HBase data directly from S3 without using Glue and Athena:

1. Move HBase Files to S3: Use AWS DataSync or another method to move your HBase files to S3.

Document Link: https://aws.amazon.com/blogs/storage/using-aws-datasync-to-move-data-from-hadoop-to-amazon-s3/

2. Hadoop APIs: Use Hadoop APIs to read the HBase data directly from S3.

You can integrate the Hadoop APIs into your Spring Boot microservices to access the data.

Example:

Configuration conf = new Configuration();
conf.set("fs.s3a.access.key", "<YOUR_ACCESS_KEY>");
conf.set("fs.s3a.secret.key", "<YOUR_SECRET_KEY>");
conf.set("fs.s3a.endpoint", "s3.amazonaws.com");

FileSystem fs = FileSystem.get(URI.create("s3a://your-bucket-name"), conf);
Path path = new Path("s3a://your-bucket-name/path-to-hbase-files");
FSDataInputStream inputStream = fs.open(path);

// Process the input stream to read the HBase data

3. Spring Boot Microservices: Deploy your Spring Boot microservices in Fargate with the necessary configuration to read data from S3 using Hadoop APIs.

Approach 2: Using AWS SDK for S3

1. Move HBase Files to S3: Use AWS DataSync to move your HBase files to S3.

2. AWS SDK for Java: Use the AWS SDK for Java to directly read the data from S3 in your Spring Boot microservices.

Example:

AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                    .withRegion(Regions.DEFAULT_REGION)
                    .build();
                    
S3Object s3object = s3Client.getObject(new GetObjectRequest("your-bucket-name", "path-to-hbase-files"));
InputStream inputStream = s3object.getObjectContent();

// Process the input stream to read the HBase data

3. Spring Boot Microservices: Deploy your Spring Boot microservices in Fargate with the necessary configuration to read data from S3 using the AWS SDK.

Considerations

Performance: Evaluate the performance of reading large HBase files directly from S3. Depending on your data size and access patterns, Glue and Athena might offer optimizations.

Security: Ensure that your S3 bucket and data access are properly secured with IAM roles and policies.

Cost: Compare the cost implications of using Glue and Athena versus direct S3 access.

EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
  • Hello @Pandurangaswamy, thank you for the suggestion, We would migrate around 150 TB data from Hbase to S3. Based on your suggestions I feel these are the pros/cons. Let me know if you agree. 1)S3 direct access using Hadoop API / AWS SDK a) Costing would be less. b) Incase end user wants an UI to view the data that is not possible c) Performance might be faster as no intermediary layers 2) Athena API over Glue using S3 a) Higher Costing due to Athena and Glue b) UI Available c) Performance might be more optimized than direct access

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions