I'm following this tutorial to perform text analytics using UDFs.
https://aws.amazon.com/blogs/machine-learning/translate-and-analyze-text-using-sql-functions-with-amazon-athena-amazon-translate-and-amazon-comprehend/
The example where data are not saved works fine work fine in Athena v.3
USING
EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf',
EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'
SELECT *,
detect_sentiment_all(review_body, language) AS sentiment,
detect_entities_all(review_body, language) AS entities
FROM amazon_reviews_with_language
WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')
However, the example where a table is created to save the results does not work in Athena v.3, but it works in Athena v.2.
CREATE TABLE amazon_reviews_with_text_analysis WITH (format='parquet') AS
USING
EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf',
EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'
SELECT *,
detect_sentiment_all(review_body, language) AS sentiment,
detect_entities_all(review_body, language) AS entities
FROM amazon_reviews_with_language
WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')
It fails with an error:
line 2:1: mismatched input 'USING'. Expecting: '(', <query>
The question is how can I save data into a table when using external function in Athena v.3?
Thanks, mate. Appreciate the suggestion. I'm using the database that was created previously. The location was not set in the Glue Catalog, so I set it. Then I moved 'USING' clause before 'CREATE TABLE' and now I get the following error: GENERIC_INTERNAL_ERROR: Can not create a Path from an empty string
I also tried adding a location in the 'CREATE TABLE' statement like this:
USING EXTERNAL FUNCTION detect_dominant_language(col1 VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf' CREATE TABLE amazon_reviews_with_language WITH ( format='parquet' ) LOCATION 's3://my-bucket/db_path/amazon_reviews_with_language' AS SELECT *, detect_dominant_language(review_body) AS language FROM amazon_reviews_parquet LIMIT 5000
But it is not working in Athena v.3: Error: mismatched input 'LOCATION'. Expecting: '%', ')', '*', '+', ',', '-', '.', '/', 'AND', 'AS', 'AT', 'OR', '[', '||', <EOF>, <predicate>
I'm also getting errors in Engine V3 with a Lambda UDF + CTAS statement. My current workaround is to revert to Engine V2 and place USING EXTERNAL FUNCTION after CREATE TABLE AS.