(resolved) Redshift Spectrum: Not able to query struct data in parquet file stored in s3

0

[updated] Resolved with this approach.

Hey there,

I have a struct column called members in parquet stored in S3, the column structure is like this:

{
  "members": {
    "member0": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    },
    "member1": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    }
  }
}

And I tried to untested it with following query:

SELECT t.members.member0
FROM my_schema.my_table AS t

but received the following error

Struct type "t.members.member0" cannot be accessed directly.
Hint: Use dot notation to access specific attributes of the struct.

According to documentation, it should work with struct data and I couldn't resolve it further.

질문됨 6달 전354회 조회
1개 답변
0

Hello there,

The key is to use dot notation to access specific attributes of the struct in order to retrieve your individual values.

In your case, instead of using the syntax t.members.member0, you should use dot notation to access the specific attributes within the struct. Here's an example of how you can modify your query:

SELECT t.members.member0.timeFrames[1].id AS id,
       t.members.member0.timeFrames[1].startAt AS startAt,
       t.members.member0.timeFrames[1].endAt AS endAt,
       t.members.member0.bufferTime AS bufferTime
FROM my_schema.my_table AS t;

Make sure to replace [1] with the appropriate index based on your data structure. The key is to use dot notation to navigate through the nested structure and access the desired values.

For further reference, you may want to take a look at this Redshift doc to Query your nested data in S3.

Hope this helps!

AWS
답변함 6달 전
  • Hey, thanks a lot for quick response. Unfortunately, it's still not working with [index] to access array . But I do inspired by your suggestion. I further tried out different approaches. To access array inside a struct, I need to join on it. The following query works for me to access this kind of values.

    SELECT 
         frame.id AS id,
         frame.startAt AS start_at
    FROM my_schema.my_table AS t
    INNER JOIN t.members.member0.timeFrames AS frame;
    

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인