如何使用其他 AWS 服务提交 EMR Serverless 作业?
我想使用其他 AWS 服务提交 Amazon EMR Serverless 作业。
解决方案
**注意:**如果在运行 AWS 命令行界面 (AWS CLI) 命令时收到错误,请参阅 AWS CLI 错误故障排除。此外,请确保您使用的是最新版本的 AWS CLI。
要使用其他 AWS 服务提交 EMR Serverless 作业,请使用以下方法:
AWS Step Functions
要使用 Step Functions 提交 EMR Serverless 作业,请创建 Step Functions 状态机,将该作业作为步骤提交。以下是提交 EMR Serverless 作业的 Step Functions 状态机定义示例。此外,状态机根据收到的作业状态过渡到成功或失败状态。
{ "Comment": "Submit an EMR Serverless job", "StartAt": "Submit EMR Serverless Job", "States": { "Submit EMR Serverless Job": { "Type": "Task", "Resource": "arn:aws:states:::aws-sdk:emr-serverless:startJobRun", "Parameters": { "ApplicationId": "example-application-id", "ExecutionRoleArn": "example-execution-role-arn", "JobDriver": { "SparkSubmitJobDriver": { "EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths" } }, "ConfigurationOverrides": { "MonitoringConfiguration": { "PersistentAppUI": "ENABLED" } } }, "Next": "Get Job Run Status" }, "Get Job Run Status": { "Type": "Task", "Resource": "arn:aws:states:::aws-sdk:emr-serverless:getJobRun", "Parameters": { "JobRunId.$": "$$.Task.Submit EMR Serverless Job.Output.JobRunId" }, "Next": "Job Run Succeeded?" }, "Job Run Succeeded?": { "Type": "Choice", "Choices": [ { "Variable": "$$.Task.Get Job Run Status.Output.JobRun.State", "StringEquals": "SUCCEEDED", "Next": "Success" } ], "Default": "Failure" }, "Success": { "Type": "Succeed" }, "Failure": { "Type": "Fail" } } }
**注意:**将 example-application-id、example-execution-role-arn、example-entry-point、example-main-class 和 example-jar-paths 替换为所需的值。
AWS SDK
要以编程方式提交作业,请使用 AWS SDK 与 EMR Serverless API 进行交互。以下是使用适用于 Python 的 Amazon SDK (Boto3) 提交 EMR Serverless 作业的示例:
**注意:**以下 Python 脚本使用 Boto3 库调用 EMR Serverless API 的 start_job_run 方法。然后,打印 API 返回的作业运行 ID。
import boto3 emr_serverless = boto3.client('emr-serverless') response = emr_serverless.start_job_run( ApplicationId='example-application-id', ExecutionRoleArn='example-execution-role-arn', JobDriver={ 'SparkSubmitJobDriver': { 'EntryPoint': 'example-entry-point', 'SparkSubmitParameters': '--class example-main-class --jars example-jar-paths' } }, ConfigurationOverrides={ 'MonitoringConfiguration': { 'PersistentAppUI': 'ENABLED' } } ) job_run_id = response['JobRunId'] print(f'Submitted EMR Serverless job with ID: {job_run_id}')
**注意:**将 example-application-id、example-execution-role-arn、example-entry-point、example-main-class 和 example-jar-paths 替换为所需的值。
AWS CLI
要使用 AWS CLI 提交 EMR Serverless 作业,请运行以下 start-job-run 命令:
aws emr-serverless start-job-run \ --application-id example-application-id \ --execution-role-arn example-execution-role-arn \ --job-driver '{"SparkSubmitJobDriver": {"EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"}}' \ --configuration-overrides '{"MonitoringConfiguration": {"PersistentAppUI": "ENABLED"}}'
**注意:**将 example-application-id、example-execution-role-arn、example-entry-point、example-main-class 和 example-jar-paths 替换为所需的值。
AWS CloudFormation
AWS CloudFormation 可用于定义和预调配 EMR Serverless 资源。以下是创建 EMR Serverless 应用程序的 CloudFormation 模板示例。
**注意:**以下 CloudFormation 模板将创建具有指定发布标签和初始容量的 EMR Serverless 应用程序。
Resources: EMRServerlessApplication: Type: AWS::EMRServerless::Application Properties: ReleaseLabel: emr-6.3.0 Type: SPARK InitialCapacity: WorkerCount: 2 WorkerConfiguration: CPU: '2vCPU' Memory: '8GB' EMRServerlessJobRun: Type: AWS::EMRServerless::JobRun Properties: ApplicationId: !Ref EMRServerlessApplication ExecutionRoleArn: example-execution-role-arn JobDriver: SparkSubmitJobDriver: EntryPoint: example-entry-point SparkSubmitParameters: '--class example-main-class --jars example-jar-paths' ConfigurationOverrides: MonitoringConfiguration: PersistentAppUI: ENABLED
**注意:**将 example-execution-role-arn、example-entry-point、example-main-class 和 example-jar-paths 替换为所需的值。
相关信息
使用 AWS Step Functions 编排 EMR Serverless 作业
GitHub 网站上的 EMR Serverless 示例
相关内容
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 个月前
- AWS 官方已更新 1 年前
- AWS 官方已更新 6 个月前