There are non-valid characters in my Amazon Redshift data. How do I remove them?
Short description
If your data contains non-printable ASCII characters, such as null, bell, or escape characters, you might have trouble retrieving the data or unloading the data to Amazon Simple Storage Service (Amazon S3). For example, a string that contains a null terminator, such as "abc\0def," is truncated at the null terminator, resulting in incomplete data.
Resolution
Use the TRANSLATE function to remove the non-valid character. In the following example, the data contains "abc\u0000def". The TRANSLATE function removes the null character "\u0000" and replaces it with an empty value, which removes it from the string:
admin@testrs=# select a,translate(a,chr(0),'') from invalidstring;
a | translate
--------+-----------
abc | abcdef
abcdef | abcdef
(2 rows)
To remove specified non-valid characters from all rows in a table, run the UPDATE command with the TRANSLATE function, as shown in this example:
admin@testrs=# select * from invalidstring;
a
--------
abc
abcdef
(2 rows)
admin@testrs=# update invalidstring set a=translate(a,chr(0),'') where a ilike '%'||chr(0)||'%';
UPDATE 1
admin@testrs=# select * from invalidstring;
a
--------
abcdef
abcdef
(2 rows)
Related information
CHR Function