current_time minus 1hr in Glue Pyspark

0

I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a glue job to fetch the files that has file name for <= current_timestamp-1hr. Below code, I have used to fetch the time in required format desired_timezone = pytz.timezone('America/New_York') # Replace 'Your_Time_Zone' with your actual time zone current_datetime_2 = datetime.now(desired_timezone).strftime("%Y%m%d-%H%M%S")

I do not know, how to display time for current_time-1hr using above commands in Glue job pyspark code. Can someone please help me to achieve this?

1 Answer
1
Accepted Answer

Just subtract an hour from the current time with timedelta(hours=1) and format it like your file names using strftime("%Y%m%d-%H%M%S").

You will have something like:

from datetime import datetime, timedelta
import pytz

desired_timezone = pytz.timezone('America/New_York')  # Replace 'Your_Time_Zone' with your actual time zone
current_datetime = datetime.now(desired_timezone)
one_hour_ago_datetime = current_datetime - timedelta(hours=1)

formatted_current_datetime = current_datetime.strftime("%Y%m%d-%H%M%S")
formatted_one_hour_ago_datetime = one_hour_ago_datetime.strftime("%Y%m%d-%H%M%S")

print("Current time:", formatted_current_datetime)
print("One hour ago:", formatted_one_hour_ago_datetime)

Resources:

profile picture
EXPERT
answered 2 months ago
AWS
SUPPORT ENGINEER
reviewed 16 days ago
  • thanks a lot. The way you added the TIMEDELTA made the difference. Your solution worked for me :)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions