(resolved) Redshift Spectrum: Not able to query struct data in parquet file stored in s3

0

[updated] Resolved with this approach.

Hey there,

I have a struct column called members in parquet stored in S3, the column structure is like this:

{
  "members": {
    "member0": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    },
    "member1": {
      "timeFrames": [
        {
          "id": "string",
          "startAt": "string",
          "endAt": "string"
        }
      ],
      "bufferTime": "bigint"
    }
  }
}

And I tried to untested it with following query:

SELECT t.members.member0
FROM my_schema.my_table AS t

but received the following error

Struct type "t.members.member0" cannot be accessed directly.
Hint: Use dot notation to access specific attributes of the struct.

According to documentation, it should work with struct data and I couldn't resolve it further.

preguntada hace 6 meses357 visualizaciones
1 Respuesta
0

Hello there,

The key is to use dot notation to access specific attributes of the struct in order to retrieve your individual values.

In your case, instead of using the syntax t.members.member0, you should use dot notation to access the specific attributes within the struct. Here's an example of how you can modify your query:

SELECT t.members.member0.timeFrames[1].id AS id,
       t.members.member0.timeFrames[1].startAt AS startAt,
       t.members.member0.timeFrames[1].endAt AS endAt,
       t.members.member0.bufferTime AS bufferTime
FROM my_schema.my_table AS t;

Make sure to replace [1] with the appropriate index based on your data structure. The key is to use dot notation to navigate through the nested structure and access the desired values.

For further reference, you may want to take a look at this Redshift doc to Query your nested data in S3.

Hope this helps!

AWS
respondido hace 6 meses
  • Hey, thanks a lot for quick response. Unfortunately, it's still not working with [index] to access array . But I do inspired by your suggestion. I further tried out different approaches. To access array inside a struct, I need to join on it. The following query works for me to access this kind of values.

    SELECT 
         frame.id AS id,
         frame.startAt AS start_at
    FROM my_schema.my_table AS t
    INNER JOIN t.members.member0.timeFrames AS frame;
    

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas