Questions tagged with Analytics
Content language: English
Sort by most recent
Hi all, I run a simple query on Athena using Athena engine version 3. ``` create table if not exists "test"."total_customer" as select * from "A_customer" union all select * from "B_customer" ``` And I faced this error: `HIVE_PATH_ALREADY_EXISTS: Target directory for table 'test.total_customer' already exists: s3://xxxx/total_customer/2022/11/27/tables/ef02d72d-75e2-4b14-b53f-95d591094cfa. You may need to manually clean the data at location 's3://xxxx/total_customer/2022/11/27/tables/ef02d72d-75e2-4b14-b53f-95d591094cfa' before retrying. Athena will not delete data in your account.` The weird thing is it works perfectly when I use UNION instead of UNION ALL. So I'm not sure if this is Athena engine version 3 error or the query error. Thank you guys for taking a look at my issue.
Athena time travel queries with Apache Hudi tables
Hi, Is there a way to perform time travel with Athena queries on Apache Hudi tables in a way similar to the one described here [Implement a cdc based upsert in a data lake using Apache Iceberg and AWS Glue](https://aws.amazon.com/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/)? \ Or the only way is through *spark.sql* library at the moment? \ Thanks
How to download encrypted Athena query results in readable format
I want to download query results in Athena to csv in a readable format. The problem is that results are encrypted. What is the best practice to do? Or is it a wrong approach? I just need to share a few tables with colleagues who don't use Athena.
dynamo DB pagination last page
Hi Team, I'm implementing pagination using AWS dynamoDB I have in my UI 4 buttons - first - next - previous - last I'm getting data by segments of 25 rows to avoid to do full table scan. when the user click the last button to go directly to the last page, do I need to do a full table scan for that? or is there a way to get the 25 last rows instead of doing full table scan when we click on the "last" button?
HIVE_UNSUPPORTED_FORMAT: Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe org.openx.data.jsonserde.JsonSerDe is not supported. If a data manifest file was generated
HIVE_UNSUPPORTED_FORMAT: Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe org.openx.data.jsonserde.JsonSerDe is not supported. If a data manifest file was generated at 's3://athena-one-output-bucket/Unsaved/2022/11/24/0a5467bf-8b9a-4119-bc89-c891d1e26744-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account. This query ran against the "covid_dataset" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 0a5467bf-8b9a-4119-bc89-c891d1e26744
In need of exporting dynamodb table into csv
Hi, I want to automate the csv exporting so my team can access the csv file which i would store into an S3 bukkit. I find many solutions about how to import csv files into dynamo but none of them are giving me the answer for exporting to csv. I know there is a "export to csv" button in dynamodb and that's exactly how i want it but then automated. I've been trying with lambda , glue , Exports and streams ,... None of them seem to work or getting stuck because can't find any good documentation. Can somebody pleas help me out? Thanks
Extract only specific data from invoice using amazon textract
I want to extract only specific data from an invoice using amazon textract currently all the data including form fields and table fields are getting extracted and their contains some data that the user does not require, the user only cares about specific fields such as invoice date, invoice number etc. So i was wondering if their was a way to resolve this issue.
Quicksight email frequency >1day or location of Athena queries / Quicksight dataset?
Hello, We have Amazon Connect CTRs streaming into Athena, and in Quicksight we refresh this Athena SQL view/SPICE every 30 minutes. We have a client request to email the report every 30 minutes (48 times per day); however, Quicksight only allows the report to be emailed daily. In addition, we would like the ability to FTP the data out of Athena/Qucksight every 30 minutes, similar to what we can do with Filezilla and S3. I don't see where Athena query results or Quicksight Analysis/Dashboards are stored for access. How can we get this data emailed/FTPed every 30 minutes? Is there a way to copy an Athena query or Quicksight Analysis/Dashboards to S3 or similar for email/FTP access? Thank you. John
HIVE_FILESYSTEM_ERROR: Input path does not exist: when querying a table created by glue crawler using Athena. Possibly due to missing slash in s3 path.
Hello. I am running into the following error when querying a delta lake table that was built using a glue crawler. The s3 path has been modified to hide sensitive info. Notice how there is a slash missing between the name of the glue catalog table and the partition. I believe this is the cause of the error, but I do not think this is coming from our end or something we can fix on our end: "HIVE_FILESYSTEM_ERROR: Input path does not exist: s3://bucket_name_here/glue_catalog_db_name_here/**table_namepartition**/additional_partition/part_name.snappy.parquet This query ran against the "db_name" database, unless qualified by the query."