Questions tagged with Amazon Athena
Content language: English
Sort by most recent
Hi could anyone help on these?
we are facing this when we migrating to glue catalog, it cannot show timestamp in the glue athena which cause this table unselectable.
HIVE_INVALID_PARTITION_VALUE: Invalid partition value ‘2022-08-09 23%3A59%3A59’ for TIMESTAMP partition key: xxx_timestamp=2022-08-09 23%253A59%253A59
I am receiving this error while querying in athena: GENERIC_INTERNAL_ERROR: java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'set<string>' but 'set' is found.
What does that mean and how can I debug or resolve this?
Driver Version:2.0.35.I downloaded from Amazon links(It is one with AWS SDK).
Error:Exception in thread "main" java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:awsathena://AwsRegion=us-east-1;
at org.example.AthenaQueryRun.init(AthenaQueryRun.java:21)
at org.example.AthenaQueryRun.main(AthenaQueryRun.java:11)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:awsathena://AwsRegion=us-east-1;
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at org.example.AthenaConnection.init(AthenaConnection.java:35)
at org.example.AthenaQueryRun.init(AthenaQueryRun.java:17)
... 1 more
My Env:JDK corretto-11
Any help here.
I converted a CSV(from S3) to parquet(to S3) using AWS glue and the file which is converted to Parquet was named randomly .How do i choose the name of the file that is to be converted to Parquet from CSV ?

when i add data.parquet at the end of the s3 path (in target) without '/' ,AWS glues is creating a subfloder in the bucket with the name data.parquet instead of file name, where as the new file parquet file is created with the name like this "run-1678983665978-part-block-0-r-00000-snappy.parquet"
where should i give a name to the parquet file ?
Hi,
I have followed the [documentation](https://wellarchitectedlabs.com/cost/200_labs/200_cloud_intelligence/cost-usage-report-dashboards/dashboards/deploy_dashboards/) as mentioned in link.
But I'm not getting all accounts data in dashboard, its showing data only for destination account. As i can't use/access management account so I'm getting all my individual accounts data(creating cur and storing into s3) and storing to one
destination account like below.
s3://cur-buck****/account/SourceAccount1/cur/cur/year=2023/month=3/0001.snappy.parquet
s3://cur-buck****/account/SourceAccount2/cur/cur/year=2023/month=3/0001.snappy.parquet
s3://cur-buck****/account/SourceAccount3/cur/cur/year=2023/month=3/0001.snappy.parquet
s3://cur-buck****/account/DestinationAccount1/cur/cur/year=2023/month=3/0001.snappy.parquet
I'm crawling location s3://cur-buck****/account/ in crawler s3. My dashboards were deployed successfully and working well but i'm getting only data for destination account.
Account_map view is also giving 1 entry of destination account. I'm unable to get other accounts data even in athena query.
Please help , I still not able to get this CURBucketPath path when we have multiple accounts. May be i have added wrong prefix while creating stack via CFN.

Thanks!
Macie provides detailed positions of sensitive data in output file. But, I want to extract that data using positions from output file. Also, macie reveal only 10 samples.
Is there any way to get more than 10 samples in aws macie "reveal samples"? If don't, is there any other solution(like sql query or something) to extract the complete data from sensitive files?
I'm trying to use Athena to generate queries from VPC flow logs stored in an S3 bucket. I followed the guideline of "Querying Amazon VPC flow logs"[1], and I tried to access by both folders and files.
However, the generated queries were empty, with only title and no contents. I double checked that the files are not empty, and that the path is correct. I also tried to create partitions, but the results were unreadable.
Is there a way to access the contents of the S3 bucket? Thank you!



I have a string type for date and in that column, it has the word 'None'
My query for casting the date is below - *getting only the Month and Year on it*,
date_format(cast(c.enddate as date), '%M') as "Month",
date_format(cast(c.enddate as date), '%Y') as "Year"
ERROR prompted
INVALID_CAST_ARGUMENT: Value cannot be cast to date: None-
Can somebody help me with this problem, so that I can still get the Month and Year only?
Thank you in advance!
Hi, I'd appreciate AWS Athena support for TIMESTAMP data type with microsecond precision for all row formats and table engines. Currently, the support is very inconsistent. See the SQL script below.
```
drop table if exists test_csv;
create external table if not exists test_csv (
id int,
created_time timestamp
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties('separatorChar'=',', 'quoteChar'='"', 'escapeChar'='\\')
location 's3://my-bucket/tmp/timestamp_csv_test/';
-- result: OK
drop table if exists test_parquet;
create external table if not exists test_parquet (
id int,
created_time timestamp
)
row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
stored as inputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
location 's3://my-bucket/tmp/timestamp_parquet_test/'
tblproperties ('parquet.compress' = 'snappy');
-- result: OK
drop table if exists test_iceberg;
create table if not exists test_iceberg (
id int,
created_time timestamp
)
location 's3://my-bucket/tmp/timestamp_iceberg_test/'
tblproperties ( 'table_type' ='iceberg');
-- result: OK
insert into test_csv values (1, timestamp '2023-03-22 11:00:00.123456');
/*
result: ERROR [HY000][100071] [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. GENERIC_INTERNAL_ERROR: class org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector cannot be cast to class org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector (org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector and org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector are in unnamed module of loader io.trino.server.PluginClassLoader @1df1bd44). If a data manifest file was generated at 's3://my-bucket/athena_results/ad44adee-2a80-4f41-906a-17aa5dc27730-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account. [Execution ID: ***]
*/
insert into test_parquet values (1, timestamp '2023-03-22 11:00:00.123456');
-- result: OK
select * from test_parquet;
-- result: OK DATA: 1,2023-03-22 11:00:00.123000 BUT THE TIMESTAMP VALUE IS TRUNCATED TO MILLISECONDS!
insert into test_iceberg values (1, timestamp '2023-03-22 11:00:00.123456');
-- result: OK
select * from test_csv;
select * from test_iceberg;
-- result: OK DATA: 1,2023-03-22 11:00:00.123456 THIS IS FINE
```
Before I upgrade Athena to version 3, I'd like to know if I can roll it back to version 2 afterwards.
If I can roll it back, then I can test in my main dev environment. If I cannot roll Athena back to version 2, then I need a new environment to test the upgrade.
(I didn't find this question after searching on athena rollback, athena upgrade, or athena version)
Thanks!
Started getting this error today when querying data from Athena in a table created from parquet files in our S3 bucket:

I'm thinking a bad file somewhere but unable to narrow it down. Any steps or tips to resolve would be much appreciated. Thanks!
Getting the below error while querying on Athena -
HIVE_INVALID_PARTITION_VALUE: Invalid partition value 'Unsaved-2023-03 22:00:00' for TIMESTAMP partition key: ingest_timestamp=Unsaved-2023-03 22%3A00%3A00
This query ran against the "due_eventdb" database unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: b923ebaa-8ea4-42f4-bbb4-b5f6b1d6041a
I have deleted the following directory "Unsaved-2023-03 22%3A00%3A00" from the S3 bucket still getting the same error.