AWS re:Post을(를) 사용하면 다음에 동의하게 됩니다. AWS re:Post 이용 약관

Athena Table timestamp column with null values

0

I have some data in a csv in s3 in the format:

idconfirmed_atcancelled_atsuspended_at
a16784147162691678414716269
b1678316522493
c167823191578716782318982201678231915787

I've defined my table with the structure

column namedata type
idstring
confirmed_attimestamp
cancelled_attimestamp
suspended_attimestamp

but when I am using Athena to query my data, it fails on rows where cancelled_at or suspended_at are blank with the error Error parsing field value '' for field 1: For input string: ""

i am using OpenCsvSerde but have tried LazySimpleSerDe with 'serialization.null.format'='' but get the same error

is it possible to have athena support having a timestamp column that could be null / blank?

질문됨 2년 전3.5천회 조회
1개 답변
1

Hello,

Please note that errors that specify a null or empty input string ("For input string: "") happen when both of the following are true:

  • You're using Athena with OpenCSVSerDe, which means that your source data uses double quotes (") as the default quote character.
  • The source data contains null values ("") or empty cells.

You can refer the same in the below document:

 https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-error-csv/

It seems that this error is occurring because there are empty cells for some of columns

As mentioned in the above document in order to mitigate the issue “Define each column as STRING. The parser in Athena parses the values from STRING into actual types based on what it finds. This prevents Athena from throwing an error when it finds null values (empty strings with double quotes and no spaces) or empty cells (no values or double quotes).”

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠