Questions tagged with Job Scheduling

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

  • 1
  • 2
  • 12 / page
What is a scalable solution for running lambda at specific times in the future? We are building a SaaS platform, in which our users can request a task to happen according a recurring schedule (simple parameter of {"frequency": "NNN minutes"}. They can also edit/delete this schedule, to affect that future events. There will be K's of users, requesting M's of tasks over time (from 1 min to 1 year). The tasks will not be created in the order they should execute. I need high integrity on completion. I've discounted just using sqs. I've looked at cloudwatch events, but have concerns about scaling. I've considered putting the tasks into a DB table then polling. Is there something else I should look at?
2
answers
0
votes
32
views
asked a month ago
Hello, my name is Victor and I am 18 years old and I have been studying cloud computing for some time... I have 2 Azure certifications but my great passion is AWS and today I come to ask for tips and guidance on how to grow within this platform, I am studying to remove AWS Practitioner this month and in January I will remove Architected Solutions. What is the main thing I have to know/understand to become an excellent professional in the future?
2
answers
0
votes
26
views
asked 2 months ago
I have a scenario where I have multiple glue ETL jobs which are interdependent, and they have some logical order to follow. I am looking for the best possible approaches within aws solutions to trigger a group of Glue jobs, based on the success/failure state of a different group of Glue jobs, i.e., setting up a combination of series and parallel execution of jobs under one entity which can then be reused in another such entity to avoid building the whole flow again (like we use a shell script to group and conditionally orchestrate python scripts). A GUI visual to represent the dataflows will be an added advantage. I have tried and tested the Workflow feature within Glue to simulate this requirement, I was able to create the grouping of jobs based on triggers, but the major drawback was that I could not call/invoke existing workflows into a bigger workflow (like a parent WF which can fire the end-to-end ETL), thereby requiring me to build the whole flow again each and every time. I have knowledge on SAP BODS ETL (if we need to draw comparisons), requesting experts' views to address this requirement. Thanks in advance!
1
answers
0
votes
62
views
asked 2 months ago
Hi, When defining the Compute Options on a RunTask call with: Launch type - "FARGATE", Platform Version: "1.4.0" - it seems that only the Platform Version is passed onto the "requestParameters" thus triggering the following error message: "The platform version must be null when specifying an EC2 launch type." Screenshot from Eventbridge Scheduler: ![Enter image description here](/media/postImages/original/IMT22IN3dkSjO5kKtwwyHOGw) Screenshot from the output in CloudTrail: ![Enter image description here](/media/postImages/original/IMqOGxKg7LQ9Klj_FXO3ZbdA) I have successfully triggered and run this task from an EventBridge rule
1
answers
1
votes
91
views
asked 2 months ago
I can't figure out why my permissions to let Scheduler invoke Lambda are wrong. The only thing I get are CW metrics that tell me something is wrong, but I see that because I don't get an invocation. Where are the logs telling me what went wrong? Sorry for the harsh question, but repost is a pain to use and it took me like 10 minutes to get here to ask this question.
1
answers
0
votes
21
views
asked 3 months ago
Hello, we are executing batch jobs on EKS as pods and we are facing issue with workloads are spread across large number of nodes after scale-up with low requests. Running jobs can't be migrated to other node so autoscaler ignores them and it prevents scale-down. It might be helpful to binpack these job pods across the nodes similar as described https://alibaba-cloud.medium.com/the-burgeoning-kubernetes-scheduling-system-part-3-binpack-scheduling-that-supports-batch-jobs-372b4704722 or https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/#enabling-bin-packing-using-requestedtocapacityratio Is it possible to activate bin-pack scheduler on EKS? Or which approach would you recommend for this situation? thanks Martin
0
answers
1
votes
32
views
asked 3 months ago
Hello. Having the Amazon Workspace application open on my running ( 24/7 ) machine, shows the right time in Excel formula =NOW(), however, if I do not have the application open, the time inside excel shows 1 hour behind, each time my script is running on the virtual machine, although the published sheets ( as html ) time save shows the right one. Any idea what's going on or why is this happening? Pictures attached to see the time when the script was running, what excel file shows inside of =NOW() and time & date of saved files from excel. ![Date & time showed in task scheduler](/media/postImages/original/IMXxYkV0HhQnqOavZSNnrIKg) ![Date & time showed in excel sheet with forumula =now() ](/media/postImages/original/IM1Mk9hwdVRnCkFG7SGwM7ew) ![Saved excel sheets as html date & time ](/media/postImages/original/IMnEIbL04CQoOIe41pnlbDhQ) Any help is appreciated! I need the time to show right even if I m logged in on my personal laptop to that application and if I m not.
1
answers
0
votes
54
views
asked 5 months ago
''failed to execute with exception Task allocated capacity exceeded the limit. (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 8d31d112-91e4-4892-900f-048d993e0da1; Proxy: null) this error is coming when ever I run a job.
1
answers
0
votes
385
views
saksham
asked 5 months ago
Hi, I'm having an issue with the new **FLEX** feature. In our company, we are trying to save costs when running GLUE Jobs (we are validating our product fit). The same day that FLEX was released we tried it. Since then I'm not able to make it work. I thought that only checking that Flex checkbox would suffice but I think I'm doing something wrong. The jobs are now running as before (without that checkbox checked) and they are running 100% OK. Simply put, we are reading from an RDS SQL Server Table, doing basic ETL processes, and storing it in an S3 bucket in CSV format. Also, I don't think there's an issue with Job timeout since it is set to 60 minutes and the job takes bearly a couple of minutes to fail The failed job status shows: * Glue version: 3.0 * Start-up time: 16 seconds * Execution time: 6 minutes 25 seconds * Timeout: 45 minutes * Worker type: G.1X * Number of workers: 10 * Execution class: FLEX * Max capacity: 10 DPUs The success job status is the same but: * Execution class: STANDARD In the job monitor we read: > An error occurred while calling o87.getDynamicFrame. Job 0 cancelled because SparkContext was shut down caused by threshold for executors failed after launch reached. Note: This run was executed with Flex execution. Check the logs if run failed due to executor termination. in cloudwatch logs part of the output error: `An error occurred while calling o90.getDynamicFrame.\n: org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$cleanUpAfterSchedulerStop$1(DAGScheduler.scala:1130)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$cleanUpAfterSchedulerStop$1$adapted(DAGScheduler.scala:1128)\n\tat scala.collection.mutable.HashSet.foreach(HashSet.scala:79)\n\tat org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:1128)\n\tat org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2703)\n\tat org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)\n\tat org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2603)\n\tat org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2111)\n\tat org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1419)\n\tat org.apache.spark.SparkContext.stop(SparkContext.scala:2111)\n\tat org.apache.spark.SparkContext.$anonfun$new$39(SparkContext.scala:681)\n\tat org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)\n\tat org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)\n\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n\tat org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)\n\tat org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)\n\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)\n\tat scala.util.Try$.apply(Try.scala:213)\n\tat org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)\n\tat org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n\tat org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)\n\tat org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)\n\tat org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:477)\n\tat org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:430)\n\tat org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)\n\tat org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3733)\n\tat org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2762)\n\tat org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3724)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)\n\tat org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)\n\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)\n\tat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)\n\tat org.apache.spark.sql.Dataset.withAction(Dataset.scala:3722)\n\tat org.apache.spark.sql.Dataset.head(Dataset.scala:2762)\n\tat org.apache.spark.sql.Dataset.take(Dataset.scala:2969)\n\tat com.amazonaws.services.glue.JDBCDataSource.getLastRow(DataSource.scala:1089)\n\tat com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:929)\n\tat com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:953)\n\tat com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)\n\tat com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)\n\tat com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:714)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:750)\n","Stack Trace":[{"Declaring Class":"get_return_value","Method Name":"format(target_id, \".\", name), value)","File Name":"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py","Line Number":328},{"Declaring Class":"deco","Method Name":"return f(*a, **kw)","File Name":"/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py","Line Number":111},{"Declaring Class":"__call__","Method Name":"answer, self.gateway_client, self.target_id, self.name)","File Name":"/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py","Line Number":1305},{"Declaring Class":"getFrame","Method Name":"jframe = self._jsource.getDynamicFrame()","File Name":"/opt/amazon/lib/python3.6/site-packages/awsglue/data_source.py","Line Number":36},{"Declaring Class":"create_dynamic_frame_from_catalog","Method Name":"return source.getFrame(**kwargs)","File Name":"/opt/amazon/lib/python3.6/site-packages/awsglue/context.py","Line Number":185},{"Declaring Class":"from_catalog","Method Name":"return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)","File Name":"/opt/amazon/lib/python3.6/site-packages/awsgl` Any help would be much appreciated, Agustin.
2
answers
0
votes
138
views
asked 5 months ago
I am beginner in aws, just want to know the aws batch is free? and to create sample job what will be prerequisite?
1
answers
0
votes
53
views
asked 6 months ago
The issue with performance drop when having just a couple of hundreds of connections might be typical to AWS T3 instances. Our server-side Apache modules maintain "sleeping" connections to clients, which is not quite typical for regular web applications. Any time a new request come for a host PC, the client's server-side Apache instance wakes up and puts the request's data into a queue for the host PC and signals its server-side Apache instance, which wakes up and sends the data to the host PC. Hence, every data exchange through the server involves quite a few of server processes scheduling. It appears that our T3 starts experiencing lagging of the AWS scheduling at some point, which is not reflected in CPU usage.
1
answers
0
votes
65
views
asked 6 months ago
I have 5 Batch jobs running on AWS Batch with Fargate, when it was running I noticed the capacity to S3 spiked through the NAT Gateway. I queried VPC Logs using Athena and found that the destination IP is of S3 None of my code uses S3, when I turn them off the capacity going to S3 is completely reduced. I don't understand why my Batch job service uses S3 while my code doesn't. Is there any way to investigate to know exactly where the capacity is coming from? (except https://aws.amazon.com/premiumsupport/knowledge-center/vpc-find-traffic-sources-nat-gateway/?nc1=h_ls , I read it). I understand it is possible to use S3 VPC Endpoint to handle throughput through the NAT Gateway, but I want to find the root cause.
1
answers
0
votes
67
views
iamnick
asked 6 months ago
  • 1
  • 2
  • 12 / page