What is the most cost efficient and fastest way to start GLUE ETL development

0

We are going to start development of GLUE ETLs . The different options available are :

  1. From the AWS Console : This seems to be costly and slow and not very efficient for developing scripts
  2. From Dev Endpoints : Billing Rate is high
  3. By AWS Glue Docker Image : Lacks functionality
  4. Interactive Sessions
  5. Local Setup

Which among these will be simple to set up and not incur cost ?

질문됨 2년 전1068회 조회
3개 답변
0
수락된 답변

Hello,

I would say the most cost efficient,simple and fastest way to start Glue ETL development are:

  1. Use Glue Docker image
  2. Use Interactive session

References:

[1] https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/

[2] https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html

AWS
답변함 2년 전
AWS
전문가
검토됨 2년 전
0

Yes , I agree, I am more inclined to use the docker image , but I faces quite a lot of challenges and still facing , in setting it up. For starter , the Jupyter notebook does not have nbextensions preinstalled ,and I could not get it installed either , asked a separate question on that . Then the SPARK is very slow .

Havent tried the interative sessions , will start with it soon .

답변함 2년 전
  • I started using VSCode , which was much simpler to setup and work on . Start the Docker image for Glue Open VS Code Attach the container It will automatically install Python and other libraries . Good to go .

0

Hi,

It always depend on the actual priorities you have to define the trade off between cost efficiency and developer experience.

The most cost efficient I would agree is using the docker container . If you prefer to use a traditional IDE environment instead of a notebook you should be able to use it for that as well. Performance will depend on the local machine.

If you give more weight to a flexible developer experience using notebooks and performance when running the jobs or cells then Interactive sessions would be a better choice. You can install your notebook with all extensions and tune the number of DPUs you want to use changing that between sessions.

If the developer is fine with a managed notebook, Glue Studio Notebooks would have the same cost of Interactive sessions (it just depends on the DPU you select for the session and duration of the session), same configuration based on session . Glue Studio Notebooks are only available in some regions currently.

hope this helps

AWS
전문가
답변함 2년 전
  • Is there a way to increase DPU or Spark config on the Docker ?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠