- 新しい順
- 投票が多い順
- コメントが多い順
Are you sure these IP's / FQDN's belong to same EMR cluster and they have Route to each other ?
ip-xxx-xx-xx-xxx.us-west-1.compute.internal/172.31.29.217
ip-yyy-yy-yy-yyy.us-west-1.compute.internal:8020
Since this is happening on CREATE table
statement, I am suspecting that you might be creating a HIVE table on HIVE DB pointing to different cluster's HDFS. Were you using a common HIVE metastore for this cluster and previously terminated EMR cluster ? you can verify the LOCATION
on your DB / Table by using
DESCRIBE DATABASE EXTENDED your_database_name
or going to Metastore DB and checking the relevant schema.
You are completely correct. The problem is that we took the default for AWS GLUE catalog storage -- which is in EMR hdfs which disappears when the cluster is terminated. In fact we were using a different cluster, but GLUE was trying to get to where its detailed catalog data used to be but no longer existed.
Besides the fact the ip address wasn't in our current cluster, the other clue was the error message mentioned that the the core was trying to find out if the data was encrypted -- metadata that only exists in catalog.
For others who may be new to GLUE the following info may be interesting.
When you create a Hive table without specifying a LOCATION, the table data is stored in the location specified by the hive.metastore.warehouse.dir property. By default, this is a location in HDFS. If another cluster needs to access the table, it fails unless it has adequate permissions to the cluster that created the table. Furthermore, because HDFS storage is transient, if the cluster terminates, the table data is lost, and the table must be recreated.
We can update the property "hive.metastore.warehouse.dir" using any one of the below ways.
-
hive shell, set command like below set hive.metastore.warehouse.dir=s3://cr_test_bucket/prefix1/
-
EMR console , please refer [2] for sample configurations \t[2] Configure applications - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
-
/etc/hive/conf/hive-site.xml file in the primary node, we need to restart hive services to reflect the update.
関連するコンテンツ
- 質問済み 6年前