- 新しい順
- 投票が多い順
- コメントが多い順
I'm having the same issue. Data is stored in below format in s3 as JSON array with partitions S3 path - s3://fleet-fuelcard-data-import-dev/lambda/fuelsoft-morgan/660306/2024/Apr/03-Apr-2024.json.
{ "_1": [ { "EntityMapping": null, "AccountId": "xx", "CardNumber": "xxxx", "CardType": "xxxxx", "CheckDigit": "5", "CostCentre": "xx Amb Station", "Currency": "EUR", "DriverName": "", "EmployeeId": "", "ExtractedDate": "03-Apr-2024 11:38:28", "Goods": 44.64 }, { "EntityMapping": null, "AccountId": "660306", "CardNumber": "700047660306000053", "CardType": "Morgan", "CheckDigit": "5", "CostCentre": "Thurles Amb Station", "Currency": "EUR", "DriverName": "", "EmployeeId": "", "ExtractedDate": "03-Apr-2024 11:37:52", "Goods": 67.88 } ]
I used glue crawler to create the table in Athena. Below the respective DDL for the same and when I try to access the records using select, only the first record is being selected.
CREATE EXTERNAL TABLE fi_fuelsoft_morgan
(
array
array<structentitymapping:string,accountid:string,cardnumber:string,cardtype:string,checkdigit:string,costcentre:string,currency:string,drivername:string,employeeid:string,extracteddate:string,goods:double,isonumber:string,id:string,invoiceno:int,invoiceddate:string,mileage:int,productdescription:string,producttype:string,receiptnumber:string,registrationnumber:string,shortcardnumber:string,sitedescription:string,sitenumber:string,total:double,transactiondate:string,unitprice:double,vat:double,volume:double,cf_account_name:string> COMMENT 'from deserializer')
PARTITIONED BY (
partition_0
string,
partition_1
string,
partition_2
string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='array')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://fleet-fuelcard-data-import-dev/lambda/fuelsoft-morgan/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='fleet-fuelcard-integration',
'averageRecordSize'='22644',
'classification'='json',
'compressionType'='none',
'objectCount'='1393',
'partition_filtering.enabled'='true',
'recordCount'='1393',
'sizeKey'='31560393',
'typeOfData'='file')
I'm stuck with this , Can anyone help on resolving this issue ? on how to get all the records in JSON array? Do I need to change the structure of array ? to have one single record in one line? as per the document https://github.com/rcongiu/Hive-JSON-Serde The following example will not work.
{ "key" : 10 } { "key" : 20 }