- Newest
- Most votes
- Most comments
Hi,
First of all, let's assume that your Roles/Permissions are correct. If yes, can you verify that the s3 folder exists. If no, try first creating this folder before running.
s3://s3.us-west-2.amazonaws.com/a2g-hive-test/tempsensores/data/
If the folder exists, then you will need to carefully review the IAM permissions and making sure that the service roles that allow S3 access are properly passed/assumed so that the service that is making the call to s3 has the proper permissions. Can you please include all of the the IAM roles/policies and Trust relationships to help debug.
-randy
Hi Randy,
The folder exists, I've already put data in ORC format before starting the tutorial.
iThe IAM roles being used are the default created by EMR when I created the first cluster, arn:aws:iam::753682516828:role/EMR_EC2_DefaultRole
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Resource": "",
"Action": [
"cloudwatch:",
"dynamodb:",
"ec2:Describe",
"elasticmapreduce:Describe*",
"elasticmapreduce:ListBootstrapActions",
"elasticmapreduce:ListClusters",
"elasticmapreduce:ListInstanceGroups",
"elasticmapreduce:ListInstances",
"elasticmapreduce:ListSteps",
"kinesis:CreateStream",
"kinesis:DeleteStream",
"kinesis:DescribeStream",
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:MergeShards",
"kinesis:PutRecord",
"kinesis:SplitShard",
"rds:Describe*",
"s3:",
"sdb:",
"sns:",
"sqs:",
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersions",
"glue:CreatePartition",
"glue:BatchCreatePartition",
"glue:UpdatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:CreateUserDefinedFunction",
"glue:UpdateUserDefinedFunction",
"glue:DeleteUserDefinedFunction",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions"
]
}
]
}
The trust relationship is as follows
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Is there any S3 permission missing?
Hi,
The service role permissions looks fine to me. The only two other possibilities that I can think of at the moment are:
- If the LOCATION value does not exist, it will return an 'access denied' instead of a not found message (for security reasons so that probing tools can't easily deduce the structure of your bucket):
Try changing:
LOCATION 's3://s3.us-west-2.amazonaws.com/a2g-hive-test/tempsensores/data/';
To just:
LOCATION 's3://a2g-hive-test/tempsensores/data/';
- Double check the bucket policy for a2g-hive-test to see if there is anything that might restrict reading of the data.
-randy
Hi Randy,
That worked out like a charm, I just modified the name in order to just reference the bucket/location:
CREATE EXTERNAL TABLE sensor
(
room string,
energy double,
temp double,
occupancy int,
awhen timestamp
)
PARTITIONED BY (year string, month string, day string)
STORED AS ORC
LOCATION 's3://a2g-hive-test/tempsensores/data/';
With that, hive was able to create the table without issues.
Thanks for your help,
Best.
Carlos.
All,
Apparently the Hive S3 code does not know how to handle the region endpoint in the first URL path element and instead only expects a globally unique bucket name there.
If you to be able to specify the region endpoint it will probably take an EMR (or Hive) feature request to support it.
Regards,
-Kurt
Relevant content
- Accepted Answerasked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago