如何使用其他 AWS 服务提交 EMR Serverless 作业?

2 分钟阅读
0

我想使用其他 AWS 服务提交 Amazon EMR Serverless 作业。

解决方案

**注意:**如果在运行 AWS 命令行界面 (AWS CLI) 命令时收到错误,请参阅 AWS CLI 错误故障排除。此外,请确保您使用的是最新版本的 AWS CLI

要使用其他 AWS 服务提交 EMR Serverless 作业,请使用以下方法:

AWS Step Functions

要使用 Step Functions 提交 EMR Serverless 作业,请创建 Step Functions 状态机,将该作业作为步骤提交。以下是提交 EMR Serverless 作业的 Step Functions 状态机定义示例。此外,状态机根据收到的作业状态过渡到成功或失败状态。

{
  "Comment": "Submit an EMR Serverless job",
  "StartAt": "Submit EMR Serverless Job",
  "States": {
    "Submit EMR Serverless Job": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:emr-serverless:startJobRun",
      "Parameters": {
        "ApplicationId": "example-application-id",
        "ExecutionRoleArn": "example-execution-role-arn",
        "JobDriver": {
          "SparkSubmitJobDriver": {
            "EntryPoint": "example-entry-point",
            "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"
          }
        },
        "ConfigurationOverrides": {
          "MonitoringConfiguration": {
            "PersistentAppUI": "ENABLED"
          }
        }
      },
      "Next": "Get Job Run Status"
    },
    "Get Job Run Status": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:emr-serverless:getJobRun",
      "Parameters": {
        "JobRunId.$": "$$.Task.Submit EMR Serverless Job.Output.JobRunId"
      },
      "Next": "Job Run Succeeded?"
    },
    "Job Run Succeeded?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$$.Task.Get Job Run Status.Output.JobRun.State",
          "StringEquals": "SUCCEEDED",
          "Next": "Success"
        }
      ],
      "Default": "Failure"
    },
    "Success": {
      "Type": "Succeed"
    },
    "Failure": {
      "Type": "Fail"
    }
  }
}

**注意:**将 example-application-idexample-execution-role-arnexample-entry-pointexample-main-classexample-jar-paths 替换为所需的值。

AWS SDK

要以编程方式提交作业,请使用 AWS SDK 与 EMR Serverless API 进行交互。以下是使用适用于 Python 的 Amazon SDK (Boto3) 提交 EMR Serverless 作业的示例:

**注意:**以下 Python 脚本使用 Boto3 库调用 EMR Serverless API 的 start_job_run 方法。然后,打印 API 返回的作业运行 ID。

import boto3

emr_serverless = boto3.client('emr-serverless')

response = emr_serverless.start_job_run(
    ApplicationId='example-application-id',
    ExecutionRoleArn='example-execution-role-arn',
    JobDriver={
        'SparkSubmitJobDriver': {
            'EntryPoint': 'example-entry-point',
            'SparkSubmitParameters': '--class example-main-class --jars example-jar-paths'
        }
    },
    ConfigurationOverrides={
        'MonitoringConfiguration': {
            'PersistentAppUI': 'ENABLED'
        }
    }
)

job_run_id = response['JobRunId']
print(f'Submitted EMR Serverless job with ID: {job_run_id}')

**注意:**将 example-application-idexample-execution-role-arnexample-entry-pointexample-main-classexample-jar-paths 替换为所需的值。

AWS CLI

要使用 AWS CLI 提交 EMR Serverless 作业,请运行以下 start-job-run 命令:

aws emr-serverless start-job-run \
    --application-id example-application-id \
    --execution-role-arn example-execution-role-arn \
    --job-driver '{"SparkSubmitJobDriver": {"EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"}}' \
    --configuration-overrides '{"MonitoringConfiguration": {"PersistentAppUI": "ENABLED"}}'

**注意:**将 example-application-idexample-execution-role-arnexample-entry-pointexample-main-classexample-jar-paths 替换为所需的值。

AWS CloudFormation

AWS CloudFormation 可用于定义和预调配 EMR Serverless 资源。以下是创建 EMR Serverless 应用程序的 CloudFormation 模板示例。

**注意:**以下 CloudFormation 模板将创建具有指定发布标签和初始容量的 EMR Serverless 应用程序。

Resources:
  EMRServerlessApplication:
    Type: AWS::EMRServerless::Application
    Properties:
      ReleaseLabel: emr-6.3.0
      Type: SPARK
      InitialCapacity:
        WorkerCount: 2
        WorkerConfiguration:
          CPU: '2vCPU'
          Memory: '8GB'

  EMRServerlessJobRun:
    Type: AWS::EMRServerless::JobRun
    Properties:
      ApplicationId: !Ref EMRServerlessApplication
      ExecutionRoleArn: example-execution-role-arn
      JobDriver:
        SparkSubmitJobDriver:
          EntryPoint: example-entry-point
          SparkSubmitParameters: '--class example-main-class --jars example-jar-paths'
      ConfigurationOverrides:
        MonitoringConfiguration:
          PersistentAppUI: ENABLED

**注意:**将 example-execution-role-arnexample-entry-pointexample-main-classexample-jar-paths 替换为所需的值。

相关信息

正在运行的作业

运行 EMR Serverless 作业

使用 AWS Step Functions 编排 EMR Serverless 作业

GitHub 网站上的 EMR Serverless 示例

AWS 官方
AWS 官方已更新 3 个月前