Questions tagged with AWS Data Pipeline

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

I'm attempting to use AWS Data Pipeline to move a CSV file from my computer to AWS Data Lake as a parquet file. I'm unable to finad the exact template to select to migrate from my local computer. please help me in choosing the source .![.](/media/postImages/original/IMcwPjZ5cqQrKctVYzb3sm3A)
0
answers
0
votes
13
views
asked 10 days ago
Hello, when we see the AWS data pipeline console we see a message that AWS is planning to remove access to console by 04/03/2023. So we are checking how can we use the data pipelines from AWS CLI commands. Please can you let me know what AWS CLI command can be used to check the error stack trace of a pipeline with status FAILED. I have checked https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-troubleshoot-locate-errors.html where the steps are described for checking error from AWS console. I have also checked https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-error-logs.html but from the pipelineLogUri we can only receive the Activity log but not the error stack trace. Please help.
1
answers
0
votes
29
views
asked 12 days ago
I am transforming my table by adding new columns using SQL Query Transform in AWS Glue Job studio. ![visual diagram for transformation](/media/postImages/original/IMwcLXRM0iTROC0Uqb5lvOGg) SQL aliases- study Existing Schema from data catalog - study id, patient id, patient age I want to transform the existing schema by adding new columns. new columns - AccessionNo Transformed schema - study id, patient id, patient age, AccessionNo SQL query - **alter table study add columns (AccessionNo int)** Error it gives- pyspark.sql.utils.AnalysisException: Invalid command: 'study' is a view not a table.; line 2 pos 0; 'AlterTable V2SessionCatalog(spark_catalog), default.study, 'UnresolvedV2Relation [study], V2SessionCatalog(spark_catalog), default.study, [org.apache.spark.sql.connector.catalog.TableChange$AddColumn@1e7cbfec] I tried looking at AWS official doc for SQL transform and it says queries should be in Spark Sql syntax and my query is also in Spark Sql syntax. https://docs.aws.amazon.com/glue/latest/ug/transforms-sql.html What is the exact issue and please help me resolve. Thanks
1
answers
0
votes
50
views
Prabhu
asked 25 days ago
I want to run multiple python scripts to run one after another These are connected, what is the best way to do it ( want to use Ec2 Instance)
2
answers
0
votes
31
views
asked a month ago
I am looking for a way to share my Terraform-based modern data stack solution, which I have created using various AWS services, with other companies privately via AWS. However, in the future, we might also consider using AWS CDK for infrastructure creation. Can anyone provide guidance on how I can accomplish this? Specifically, what are the best practices or methods for securely sharing this solution with other organizations while maintaining control over access and usage, and being able to adapt to different infrastructure creation tools in the future?
1
answers
0
votes
46
views
asked 2 months ago
Where has the option of configuring request and response mapping template gone when creating a resolver for your data source? The way it was done in VTL?
0
answers
0
votes
40
views
asked 2 months ago
I am attempting to deploy a Flask (python3.8) application via AWS Elastic Beanstalk (EB). I was able to successfully deploy the application on a --single EC2 instance configuration in the public subnet on my VPC. I have a backend data pipeline that generates a `serve.json` file, which contains metadata about the S3 prefix to use for serving up data on the frontend. `serve.json` is updated and overwritten everyday, and I have an S3 trigger on the prefix containing `serve.json` that is to call a Lambda function which will restart the WSGI web-app server managing the Flask app, which is running on the EC2 instances created by EB. After each restart, my Flask app reads in data from .parquet files on the S3 prefix specified in the updated `serve.json` and serves a RESTful app. **Problem**: When I `restart-app-server` via boto3/Lambda, although the application restarts, my EB environment (`my-eb-environment`) gets degraded (`Health: Red`) with `cause` = `Incorrect application version found on all instances. Expected version n/a.`. **CLI Route**: The CLI action works as expected. The effect of the action can be seen in the `eb health` console below: ``` $ aws elasticbeanstalk restart-app-server --environment-name "my-eb-environment" ``` ![Enter image description here](/media/postImages/original/IMS5kZ5ZW9TW-8p0oeYSia8Q) **Lambda Route**: An S3 trigger calls the Lambda function below when `serve.json` is uploaded to the trigger prefix. Lambda function: ``` from datetime import datetime import boto3 def lambda_handler(event, context): """ restart the WSGI App Server running on EC2 instance(s) associated with an Elastic Beanstalk Environment """ print(str(datetime.now())) EB_ENV_NAME = 'my-eb-environment' try: eb_client = boto3.client('elasticbeanstalk') eb_response = eb_client.restart_app_server(EnvironmentName=EB_ENV_NAME) print('SUCCESS! RESTARTED WEB SERVER FOR EnvironmentName {}'.format(EB_ENV_NAME)) response = 200 return (eb_response, response) except Exception as e: print('BAD REQUEST: COULD NOT RESTART WEB SERVER\nMESSAGE: {}'.format(e)) response = 400 return (None, response) ``` I can `describe-log-streams` and `get-log-events` to view logs of the Lambda function. It is clear that the app has refreshed: ![Enter image description here](/media/postImages/original/IM9A4gS9D-R4KKgLmItYs1FA) But, `eb health` reveals that the environment is now degraded: ![Enter image description here](/media/postImages/original/IMU9E_ZeRFRFOKC7H3jgAvxg) Running the CLI command on my terminal again refreshes the application server and makes the environment healthy: ``` $ aws elasticbeanstalk restart-app-server --environment-name "my-eb-environment" ``` ![Enter image description here](/media/postImages/original/IMTR30ppcWTHK26FeoSCTZRw) 1. How do I resolve the application version issue on the **Lambda route** to perform `restart-app-server` with this pipeline, so I can automate app refreshing with each uploaded `serve.json`? 2. Any alternative solutions for automated EB application refreshing based on S3 triggers are also appreciated. I do not wish to reboot the EC2 instances, because I would like to avoid the downtime if I can.
0
answers
0
votes
42
views
asked 3 months ago
Hi all, I saw this announcement today: > Please note that Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions. We plan to remove console access by 02/28/2023. But I cannot see it in any official AWS post. And I wonder can I create new data-pipeline using CloudFormation. Thank you guys for helping.
0
answers
0
votes
159
views
asked 4 months ago
Hello , I need to purge all data from dynamodb table except the last 1 year of data . I do not have a TTL attribute set in the table, what is the best approach to proceed? IT will cost a lot if I write a TTL attribute for every record as per my knowledge although expiring items from TTL is free! I have an attribute called "created_on_date" in the table though !
1
answers
0
votes
132
views
asked 6 months ago
Hi All, We have multiple sql servers installed on windows on prem environment, now plan is migrate those to AWS RDS.. We are planning to us below method. 1) Native import/Export using S3 and the restore and 2) DMS Now my question is aroung Native import/export . Question:- 1) We dont have access to those sql server database servers, now if we try to take back up then its always storing on local machine where SQL Server is installed. 2) Now if we have to take backup of all these databases on remote server (Jump server) and then upload to s3 , then how we will be able to achieve this.
1
answers
0
votes
93
views
asked 6 months ago
I'm a beginner in tasks of date engineer. I have a taks to create a data lakehouse and i'm tryding undestand how to do it using these tools: DMS, S3, Glue and Hudi. I already created a simple data lake and i don't have more dificults but to build a data lakehouse its very hard to my, because I couldn't find any simple example for this. My enviroment is like this: Database with Postgresql and I need to update the data daily from the data lake, but now an update will be done and no longer full copy. Does AWS have an example to build it?
1
answers
0
votes
210
views
asked 8 months ago