By UATeam in AWS — Nov 15, 2024

AWS Batch Job Submission Example: A Step-by-Step Guide

AWS Batch is a managed service that efficiently runs batch computing workloads on the AWS cloud. It simplifies job submission, resource scaling, and cost management, making it an essential tool for high-performance computing, data processing, and other batch-oriented tasks. This article provides a clear example of how to submit an AWS Batch job, covering everything from setup to execution.

What Is AWS Batch?

AWS Batch enables developers to:

Define job queues and compute environments.
Automatically scale resources based on job requirements.
Integrate seamlessly with other AWS services like S3 and CloudWatch.

Common use cases include large-scale simulations, data transformation, and report generation.

AWS Batch Job Submission Example

Objective

We’ll create and submit an AWS Batch job to process a dataset stored in an S3 bucket using a Dockerized Python script.

Step 1: Prerequisites

AWS Account: Ensure you have access to AWS Batch and related services (EC2, S3, IAM).
Docker Installed: For creating the container image.
Python Script: Prepare a Python script (e.g., process_data.py) to process the dataset.
S3 Bucket: Upload your dataset to an S3 bucket (e.g., s3://example-batch-data/).

Step 2: Create a Docker Image

Build and Push the Image:

docker build -t process-data-job .
docker tag process-data-job:latest <your_ecr_repository_url>:latest
docker push <your_ecr_repository_url>:latest

Dockerfile:

FROM python:3.9-slim

# Install dependencies
RUN pip install boto3

# Copy the script
COPY process_data.py /app/process_data.py

# Set the working directory
WORKDIR /app

# Define the command
ENTRYPOINT ["python", "process_data.py"]

Python Script (process_data.py):

import sys
import boto3

def main(input_path, output_path):
    print(f"Processing data from {input_path}...")
    # Simulate data processing
    print("Data processing complete!")
    print(f"Results saved to {output_path}")

if __name__ == "__main__":
    input_path = sys.argv[1]
    output_path = sys.argv[2]
    main(input_path, output_path)

Step 3: Configure AWS Batch

1. Create a Compute Environment

Navigate to AWS Batch → Compute Environments.
Click Create.
Configure the environment:
- Managed Compute Environment: Select.
- Instance Types: Optimal.
- Maximum vCPUs: Define based on your workload.

2. Create a Job Queue

Navigate to Job Queues and click Create.
Configure the queue:
- Name: example-job-queue.
- Compute Environment: Link the environment you created earlier.

3. Create a Job Definition

Navigate to Job Definitions → Create.
Configure the job:
- Name: process-data-job.
- Container Image: Use the ECR image URL (e.g., <your_ecr_repository_url>:latest).
- vCPUs and Memory: Allocate resources (e.g., 2 vCPUs, 4 GB memory).
- Command Override:
  - Set the script arguments (e.g., ["s3://example-batch-data/input.csv", "s3://example-batch-data/output.csv"]).

Step 4: Submit a Batch Job

Navigate to Jobs → Submit Job.
Provide details:
- Job Name: example-batch-job.
- Job Queue: Select example-job-queue.
- Job Definition: Select process-data-job.
Click Submit Job.

Step 5: Monitor Job Execution

AWS Batch Console:
- Check the job status (e.g., RUNNING, SUCCEEDED).
CloudWatch Logs:
- View logs to ensure the job processed data correctly.

Best Practices for AWS Batch

Optimize Resources: Use spot instances in your compute environment for cost savings.
Container Reusability: Build generic containers that can handle different datasets with input arguments.
Monitor and Debug: Use CloudWatch Logs to debug errors or optimize job performance.
Scaling Policies: Configure scaling policies for efficient resource utilization.

Conclusion

AWS Batch simplifies batch processing by automating job orchestration and resource management. In this example, we walked through setting up a compute environment, creating a job definition, and submitting a job to process data using a Dockerized Python script. AWS Batch is a powerful tool for scaling batch workloads efficiently and cost-effectively.