[Pandas] [Glue] ResolveChoice cannot cast to JSON

0

Hi, I'm trying to cast a dataframe that contains JSON strong to a json format to write it to DB. I'm using an example from here https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-ResolveChoice.html

But the following cast to JSON is not working

df_events.resolveChoice(specs = [('detail','cast:json')])

and I get the following error

IllegalArgumentException: 'Invalid type name json'

Any suggestion how to cast to JSON? it's required to write a Data Frame to PostgresSQL

Michael
질문됨 2년 전583회 조회
1개 답변
0

json is not a data type supported by glue. You would have to choose between one of these - BOOLEAN | DATE | DECIMAL | DOUBLE | LONG | STRING | BINARY. In your case when you have a JSON, you need to define those as a String column and cast it into a String.

You should be able to query this data with Athena. When you query this table you can use the JSON functions to query the JSON columns, for example:

SELECT json_extract_scalar(attributes, '$.tag1') ...

glue : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-common.html

athena : https://docs.aws.amazon.com/athena/latest/ug/data-types.html

profile pictureAWS
답변함 2년 전
  • What to do in a situation when you cannot change DB schema that has JSON column? Non of the suggested type would work as the Writing action would thrown an exception. This seems a massive oversight for Glue

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠