Member-only story

How to Configure the GlueJobOperator in Apache Airflow

Aline Rodrigues
Art of Data Engineering
3 min readFeb 9, 2025

Data engineering often requires setting up workflows that seamlessly connect multiple tools. One common challenge is integrating Apache Airflow with AWS Glue to create, manage, and schedule ETL (Extract, Transform, Load) jobs. In this guide, I’ll walk you through the steps to configure the GlueJobOperator in Airflow and share practical tips to avoid common pitfalls.

If you prefer the video version, click here!

Step 1: Create Your Glue Job

AWS Glue provides two ways to define jobs:

  1. Visual ETL: A drag-and-drop interface ideal for simpler use cases.
  2. Custom Python Scripts: Offers flexibility for complex transformations.

For simplicity, let’s assume you’re using Visual ETL to extract data from S3, transform it, and save it back to S3. Save your job script in a clearly defined S3 path for later reference, such as s3://enabledata/scripts/.

Step 2: Configure AWS Permissions

Setting up permissions can be tricky. Here’s a quick breakdown:

For Glue

  • Create a role for Glue with the necessary permissions:
  • AWS-managed Glue service role: AWSGlueServiceRole.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Art of Data Engineering
Art of Data Engineering

Published in Art of Data Engineering

Making Sense of Data & Helping Others Grow: Tips, Advice, and Stories from the Front Lines of Data Engineering

Aline Rodrigues
Aline Rodrigues

Written by Aline Rodrigues

Co-founder @ enabledata. Check out our 2-week FREE Data Journey for companies at enabledata.io

No responses yet

Write a response