- 最新
- 最多得票
- 最多評論
@ramiro Have you managed to find a workaround? I'm also facing the same thing!
Hello Bisha, unfortunately no. What we end up doing was to store the variable JSON as a string, that way we don't have to validate against any schema, as the struc forced us to do.
The catch is that you will have to deserialize the json strings at query time, but so far we found there is not a big penalty on the performance of the queries to have that extra convertion layer. That may depend on your data size though.
@ramiro Cool! I've also done the same thing, but it is not intuitive to have to remember in your SQL queries when to use json_extract_scalar or dot notation queries
I agree, if there was the equivalent of redshift's SUPER as a column type in athena, with all the flexibility it brings, that would be great
Hi,
Glue supports JSON natively: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-json-home.html
The native JSON support offers several interesting options: withSchema
and jsonPath
are very useful for example to save you lots of code to validate and retrieve your data.
Best,
Duduer
Thanks for the comment Duduer
Indeed Glue does, but what is the data type to pick for those kind of columns, where I am to store it in the Glue Data Catalog table?
Check the attached file for the options here (when defining Glue Data Catalog column type), there is no "json" option:
The only one that seem to fit storing JSON is struct<your_json_obj_properties_here>
My problem is that struct force you to strictly define the "your_json_obj_properties_here" part, and for our data, that could be different across the different records we are planning to store.
For example: sometimes that column value could look like this:
{ prop1: "value1" }
Sometimes it may have more than one property:
{ prop1: "value1", moreComplex: { dataStructures: "here" } }
Etc. etc...
In other words I wish there was a way to just say: "this column will be for ANY JSON, don't ask me to define each and every of the properties when I create the column type, because I don't know them beforehand, and they may be different on each row! Just take any valid JSON structure I send instead."
In redshift, that is called a SUPER column type:
https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html
Is there an equivalent in the Glue catalog tables, for column types?
相關內容
- AWS 官方已更新 1 年前
- AWS 官方已更新 3 年前
I don't think so. It requires that you define the inner schema of a struct or array.