(resolved) Redshift Spectrum: Not able to query struct data in parquet file stored in s3

0

[updated] Resolved with this approach.

Hey there,

I have a struct column called members in parquet stored in S3, the column structure is like this:

{
  "members": {
    "member0": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    },
    "member1": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    }
  }
}

And I tried to untested it with following query:

SELECT t.members.member0
FROM my_schema.my_table AS t

but received the following error

Struct type "t.members.member0" cannot be accessed directly.
Hint: Use dot notation to access specific attributes of the struct.

According to documentation, it should work with struct data and I couldn't resolve it further.

posta 6 mesi fa354 visualizzazioni
1 Risposta
0

Hello there,

The key is to use dot notation to access specific attributes of the struct in order to retrieve your individual values.

In your case, instead of using the syntax t.members.member0, you should use dot notation to access the specific attributes within the struct. Here's an example of how you can modify your query:

SELECT t.members.member0.timeFrames[1].id AS id,
       t.members.member0.timeFrames[1].startAt AS startAt,
       t.members.member0.timeFrames[1].endAt AS endAt,
       t.members.member0.bufferTime AS bufferTime
FROM my_schema.my_table AS t;

Make sure to replace [1] with the appropriate index based on your data structure. The key is to use dot notation to navigate through the nested structure and access the desired values.

For further reference, you may want to take a look at this Redshift doc to Query your nested data in S3.

Hope this helps!

AWS
con risposta 6 mesi fa
  • Hey, thanks a lot for quick response. Unfortunately, it's still not working with [index] to access array . But I do inspired by your suggestion. I further tried out different approaches. To access array inside a struct, I need to join on it. The following query works for me to access this kind of values.

    SELECT 
         frame.id AS id,
         frame.startAt AS start_at
    FROM my_schema.my_table AS t
    INNER JOIN t.members.member0.timeFrames AS frame;
    

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande