(resolved) Redshift Spectrum: Not able to query struct data in parquet file stored in s3

0

[updated] Resolved with this approach.

Hey there,

I have a struct column called members in parquet stored in S3, the column structure is like this:

{
  "members": {
    "member0": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    },
    "member1": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    }
  }
}

And I tried to untested it with following query:

SELECT t.members.member0
FROM my_schema.my_table AS t

but received the following error

Struct type "t.members.member0" cannot be accessed directly.
Hint: Use dot notation to access specific attributes of the struct.

According to documentation, it should work with struct data and I couldn't resolve it further.

已提问 6 个月前355 查看次数
1 回答
0

Hello there,

The key is to use dot notation to access specific attributes of the struct in order to retrieve your individual values.

In your case, instead of using the syntax t.members.member0, you should use dot notation to access the specific attributes within the struct. Here's an example of how you can modify your query:

SELECT t.members.member0.timeFrames[1].id AS id,
       t.members.member0.timeFrames[1].startAt AS startAt,
       t.members.member0.timeFrames[1].endAt AS endAt,
       t.members.member0.bufferTime AS bufferTime
FROM my_schema.my_table AS t;

Make sure to replace [1] with the appropriate index based on your data structure. The key is to use dot notation to navigate through the nested structure and access the desired values.

For further reference, you may want to take a look at this Redshift doc to Query your nested data in S3.

Hope this helps!

AWS
已回答 6 个月前
  • Hey, thanks a lot for quick response. Unfortunately, it's still not working with [index] to access array . But I do inspired by your suggestion. I further tried out different approaches. To access array inside a struct, I need to join on it. The following query works for me to access this kind of values.

    SELECT 
         frame.id AS id,
         frame.startAt AS start_at
    FROM my_schema.my_table AS t
    INNER JOIN t.members.member0.timeFrames AS frame;
    

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则