Step-by-Step Guide: Running Apache Airflow in Docker

Photo by Ian Taylor on Unsplash

Step-by-Step Guide: Running Apache Airflow in Docker

Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. Docker simplifies the setup process by providing an isolated and reproducible environment. This guide walks you through setting up Apache Airflow with Docker on your system. For a guide on running a standalone Airflow instance, please follow this article.

Prerequisites

Before diving into the setup, ensure your system meets the following prerequisites:

  1. Operating System:

    • Windows, macOS, or Linux (Windows users may need Docker Desktop with WSL 2 backend enabled).
  2. Installed Software:

    • Docker: Docker Desktop (for Windows/macOS) or Docker Engine (for Linux).

    • Docker Compose: Typically included with Docker Desktop, but verify it’s available by running:

        docker-compose --version
      

  1. System Resources:

    • At least 4GB of RAM for Docker.

    • Adequate disk space (Airflow containers and logs may require several GB).

  2. Basic Knowledge:

    • Familiarity with command-line tools.

    • Understanding of environment variables and Docker basics.

Step 1: Pull the Docker Compose File

To configure Airflow with Docker, download the docker-compose.yaml file from the official Apache Airflow repository.

Windows PowerShell

Run the following command to download the file:

# Create a new directory and navigate into it
mkdir airflow_docker_setup
cd airflow_docker_setup

# Download the docker-compose.yaml file
Invoke-WebRequest -Uri "https://airflow.apache.org/docs/apache-airflow/2.10.4/docker-compose.yaml" -OutFile "docker-compose.yaml"

Linux/macOS

Use curl to download the file:

# Create a new directory and navigate into it
mkdir airflow_docker_setup
cd airflow_docker_setup

# Download the docker-compose.yaml file
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.10.4/docker-compose.yaml'

Step 2: Configure Environment Variables

Apache Airflow requires environment variables for proper setup. Create a .env file to define the required variables.

For Linux/macOS

Run the following command:

echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

For Windows PowerShell

Use the following command:

Set-Content -Path .env -Value "AIRFLOW_UID=50000`nAIRFLOW_GID=0"

Note: Replace 50000 with the UID suitable for your Docker setup if required.

Step 3: Initialize the Airflow Environment

Airflow uses a metadata database to store information about DAGs and tasks. Initialize the database before starting the containers.

Run the following command:

docker-compose up airflow-init

This creates the necessary directories, initializes the database, and sets up the Airflow environment.

Step 4: Start Airflow Containers

Once the initialization is complete, start the Airflow containers using:

docker-compose up

This command starts the following services:

  • Webserver: Accessible via a browser for managing Airflow.

  • Scheduler: Executes tasks.

  • Metadata Database: Stores Airflow’s metadata.

  • Other Services: Includes worker and trigger services if configured.

Step 5: Access the Airflow Web Interface

After the containers are running, open your browser and navigate to:

http://localhost:8080 or 127.0.0.1:8080

Default Credentials

  • Username: airflow

  • Password: airflow

Step 6: Add DAGs to Airflow

DAGs (Directed Acyclic Graphs) define workflows in Airflow. To add a new DAG:

  1. Place your DAG Python file in the dags folder within the Airflow directory.

  2. Refresh the Airflow UI to see your DAG listed.

Step 7: Stopping and Restarting Airflow

To stop the Airflow containers, press Ctrl+C in the terminal where the containers are running or use:

docker-compose down

To restart the containers:

docker-compose up

Common Issues and Troubleshooting

  1. Port Already in Use:

    • If localhost:8080 is already in use, modify the ports section in docker-compose.yaml.

        ports:
          - "8081:8080"
      

      Then access Airflow at http://localhost:8081 or 127.0.0.1:8081.

  2. Permission Errors:

    • Ensure the .env file is correctly configured and matches the UID and GID requirements for your system.
  3. Container Errors:

    • Check logs for errors:

        docker-compose logs
      

Conclusion

In this blog, we've learned how to set up Apache Airflow with docker. Docker provides an excellent way to run Airflow on any platform, and with just a few commands, you can automate complex workflows easily.

If you’re looking for the guide to setup a standalone version of Airflow, follow this guide.

Stay tuned for upcoming blogs where I’ll discuss more advanced Airflow features, and please let me know if you’re facing any issues with the setup :)

Join the DevHub community for more such articles, free resources, job opportunities, and much more!