Create table with array of json objects

0

The files in the bucket have json arrays in a single line separated by comma like below: [{"key" : 20}, {"key" : 30}, {"key" : 40}] When i create table with this, only the first record is being read.

AWS
質問済み 3ヶ月前135ビュー
1回答
0

I'm having the same issue. Data is stored in below format in s3 as JSON array with partitions S3 path - s3://fleet-fuelcard-data-import-dev/lambda/fuelsoft-morgan/660306/2024/Apr/03-Apr-2024.json.

{ "_1": [ { "EntityMapping": null, "AccountId": "xx", "CardNumber": "xxxx", "CardType": "xxxxx", "CheckDigit": "5", "CostCentre": "xx Amb Station", "Currency": "EUR", "DriverName": "", "EmployeeId": "", "ExtractedDate": "03-Apr-2024 11:38:28", "Goods": 44.64 }, { "EntityMapping": null, "AccountId": "660306", "CardNumber": "700047660306000053", "CardType": "Morgan", "CheckDigit": "5", "CostCentre": "Thurles Amb Station", "Currency": "EUR", "DriverName": "", "EmployeeId": "", "ExtractedDate": "03-Apr-2024 11:37:52", "Goods": 67.88 } ]

I used glue crawler to create the table in Athena. Below the respective DDL for the same and when I try to access the records using select, only the first record is being selected.

CREATE EXTERNAL TABLE fi_fuelsoft_morgan( array array<structentitymapping:string,accountid:string,cardnumber:string,cardtype:string,checkdigit:string,costcentre:string,currency:string,drivername:string,employeeid:string,extracteddate:string,goods:double,isonumber:string,id:string,invoiceno:int,invoiceddate:string,mileage:int,productdescription:string,producttype:string,receiptnumber:string,registrationnumber:string,shortcardnumber:string,sitedescription:string,sitenumber:string,total:double,transactiondate:string,unitprice:double,vat:double,volume:double,cf_account_name:string> COMMENT 'from deserializer') PARTITIONED BY ( partition_0 string, partition_1 string, partition_2 string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='array') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://fleet-fuelcard-data-import-dev/lambda/fuelsoft-morgan/' TBLPROPERTIES ( 'CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'UPDATED_BY_CRAWLER'='fleet-fuelcard-integration', 'averageRecordSize'='22644', 'classification'='json', 'compressionType'='none', 'objectCount'='1393', 'partition_filtering.enabled'='true', 'recordCount'='1393', 'sizeKey'='31560393', 'typeOfData'='file')

I'm stuck with this , Can anyone help on resolving this issue ? on how to get all the records in JSON array? Do I need to change the structure of array ? to have one single record in one line? as per the document https://github.com/rcongiu/Hive-JSON-Serde The following example will not work.

{ "key" : 10 } { "key" : 20 }

回答済み 15日前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ