【以下的问题经过翻译处理】 使用 EMR 上的 Spark 读取 Aurora Postgres 表。已成功抓取 Aurora Postgres 表,并已创建 Glue 数据目录中的相应表。 EMR 集群已经配置了 Glue Data Catalog for Spark 以及以下文档中提到的配置。
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html
但是,当在 Spark 中对表运行查询时,出现以下错误。
scala> spark.sql("SELECT * FROM `aurora-glue`.`glue_public_distributors`")
18/09/11 14:17:40 WARN CredentialsLegacyConfigLocationProvider: Found the legacy config profiles file at [/home/hadoop/.aws/config]. Please move it to the latest default location [~/.aws/credentials].
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table glue_public_distributors. StorageDescriptor#InputFormat cannot be null for table: glue_public_distributors (Service: null; Status Code: 0; Error Code: null; Request ID: null);
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:808)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:385)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:682)
...
这里做错了什么吗?