[Pandas] [Glue] ResolveChoice cannot cast to JSON

0

Hi, I'm trying to cast a dataframe that contains JSON strong to a json format to write it to DB. I'm using an example from here https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-ResolveChoice.html

But the following cast to JSON is not working

df_events.resolveChoice(specs = [('detail','cast:json')])

and I get the following error

IllegalArgumentException: 'Invalid type name json'

Any suggestion how to cast to JSON? it's required to write a Data Frame to PostgresSQL

Michael
已提問 2 年前檢視次數 579 次
1 個回答
0

json is not a data type supported by glue. You would have to choose between one of these - BOOLEAN | DATE | DECIMAL | DOUBLE | LONG | STRING | BINARY. In your case when you have a JSON, you need to define those as a String column and cast it into a String.

You should be able to query this data with Athena. When you query this table you can use the JSON functions to query the JSON columns, for example:

SELECT json_extract_scalar(attributes, '$.tag1') ...

glue : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-common.html

athena : https://docs.aws.amazon.com/athena/latest/ug/data-types.html

profile pictureAWS
已回答 2 年前
  • What to do in a situation when you cannot change DB schema that has JSON column? Non of the suggested type would work as the Writing action would thrown an exception. This seems a massive oversight for Glue

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南