(resolved) Redshift Spectrum: Not able to query struct data in parquet file stored in s3

0

[updated] Resolved with this approach.

Hey there,

I have a struct column called members in parquet stored in S3, the column structure is like this:

{
  "members": {
    "member0": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    },
    "member1": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    }
  }
}

And I tried to untested it with following query:

SELECT t.members.member0
FROM my_schema.my_table AS t

but received the following error

Struct type "t.members.member0" cannot be accessed directly.
Hint: Use dot notation to access specific attributes of the struct.

According to documentation, it should work with struct data and I couldn't resolve it further.

已提問 6 個月前檢視次數 355 次
1 個回答
0

Hello there,

The key is to use dot notation to access specific attributes of the struct in order to retrieve your individual values.

In your case, instead of using the syntax t.members.member0, you should use dot notation to access the specific attributes within the struct. Here's an example of how you can modify your query:

SELECT t.members.member0.timeFrames[1].id AS id,
       t.members.member0.timeFrames[1].startAt AS startAt,
       t.members.member0.timeFrames[1].endAt AS endAt,
       t.members.member0.bufferTime AS bufferTime
FROM my_schema.my_table AS t;

Make sure to replace [1] with the appropriate index based on your data structure. The key is to use dot notation to navigate through the nested structure and access the desired values.

For further reference, you may want to take a look at this Redshift doc to Query your nested data in S3.

Hope this helps!

AWS
已回答 6 個月前
  • Hey, thanks a lot for quick response. Unfortunately, it's still not working with [index] to access array . But I do inspired by your suggestion. I further tried out different approaches. To access array inside a struct, I need to join on it. The following query works for me to access this kind of values.

    SELECT 
         frame.id AS id,
         frame.startAt AS start_at
    FROM my_schema.my_table AS t
    INNER JOIN t.members.member0.timeFrames AS frame;
    

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南