[Pandas] [Glue] ResolveChoice cannot cast to JSON

0

Hi, I'm trying to cast a dataframe that contains JSON strong to a json format to write it to DB. I'm using an example from here https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-ResolveChoice.html

But the following cast to JSON is not working

df_events.resolveChoice(specs = [('detail','cast:json')])

and I get the following error

IllegalArgumentException: 'Invalid type name json'

Any suggestion how to cast to JSON? it's required to write a Data Frame to PostgresSQL

Michael
gefragt vor 2 Jahren582 Aufrufe
1 Antwort
0

json is not a data type supported by glue. You would have to choose between one of these - BOOLEAN | DATE | DECIMAL | DOUBLE | LONG | STRING | BINARY. In your case when you have a JSON, you need to define those as a String column and cast it into a String.

You should be able to query this data with Athena. When you query this table you can use the JSON functions to query the JSON columns, for example:

SELECT json_extract_scalar(attributes, '$.tag1') ...

glue : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-common.html

athena : https://docs.aws.amazon.com/athena/latest/ug/data-types.html

profile pictureAWS
beantwortet vor 2 Jahren
  • What to do in a situation when you cannot change DB schema that has JSON column? Non of the suggested type would work as the Writing action would thrown an exception. This seems a massive oversight for Glue

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen