- Newest
- Most votes
- Most comments
Hi bnaha,
Please go through the below steps I hope it will helps to resolve your issue
Approach 1: Direct S3 Access with Hadoop APIs
If you want to read the HBase data directly from S3 without using Glue and Athena:
1. Move HBase Files to S3: Use AWS DataSync or another method to move your HBase files to S3.
Document Link: https://aws.amazon.com/blogs/storage/using-aws-datasync-to-move-data-from-hadoop-to-amazon-s3/
2. Hadoop APIs: Use Hadoop APIs to read the HBase data directly from S3.
You can integrate the Hadoop APIs into your Spring Boot microservices to access the data.
Example:
Configuration conf = new Configuration();
conf.set("fs.s3a.access.key", "<YOUR_ACCESS_KEY>");
conf.set("fs.s3a.secret.key", "<YOUR_SECRET_KEY>");
conf.set("fs.s3a.endpoint", "s3.amazonaws.com");
FileSystem fs = FileSystem.get(URI.create("s3a://your-bucket-name"), conf);
Path path = new Path("s3a://your-bucket-name/path-to-hbase-files");
FSDataInputStream inputStream = fs.open(path);
// Process the input stream to read the HBase data
3. Spring Boot Microservices: Deploy your Spring Boot microservices in Fargate with the necessary configuration to read data from S3 using Hadoop APIs.
Approach 2: Using AWS SDK for S3
1. Move HBase Files to S3: Use AWS DataSync to move your HBase files to S3.
2. AWS SDK for Java: Use the AWS SDK for Java to directly read the data from S3 in your Spring Boot microservices.
Example:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(Regions.DEFAULT_REGION)
.build();
S3Object s3object = s3Client.getObject(new GetObjectRequest("your-bucket-name", "path-to-hbase-files"));
InputStream inputStream = s3object.getObjectContent();
// Process the input stream to read the HBase data
3. Spring Boot Microservices: Deploy your Spring Boot microservices in Fargate with the necessary configuration to read data from S3 using the AWS SDK.
Considerations
Performance: Evaluate the performance of reading large HBase files directly from S3. Depending on your data size and access patterns, Glue and Athena might offer optimizations.
Security: Ensure that your S3 bucket and data access are properly secured with IAM roles and policies.
Cost: Compare the cost implications of using Glue and Athena versus direct S3 access.
Hello @Pandurangaswamy, thank you for the suggestion, We would migrate around 150 TB data from Hbase to S3. Based on your suggestions I feel these are the pros/cons. Let me know if you agree. 1)S3 direct access using Hadoop API / AWS SDK a) Costing would be less. b) Incase end user wants an UI to view the data that is not possible c) Performance might be faster as no intermediary layers 2) Athena API over Glue using S3 a) Higher Costing due to Athena and Glue b) UI Available c) Performance might be more optimized than direct access
Relevant content
- asked 3 years ago
- asked a year ago
- asked 5 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
One approach is to move the HBase files to S3 bucket. Use Glue crawler and create database catalogue and then use Athena API to fetch data using Microservice. But are there alternative approaches?