Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. Docker simplifies the setup process by providing an isolated and reproducible environment. This guide walks you through setting up Apache Airflow with Docker on your system. For a guide on running a standalone Airflow instance, please follow this article.
Prerequisites
Before diving into the setup, ensure your system meets the following prerequisites:
Operating System:
- Windows, macOS, or Linux (Windows users may need Docker Desktop with WSL 2 backend enabled).
Installed Software:
Docker: Docker Desktop (for Windows/macOS) or Docker Engine (for Linux).
Docker Compose: Typically included with Docker Desktop, but verify it’s available by running:
docker-compose --version
System Resources:
At least 4GB of RAM for Docker.
Adequate disk space (Airflow containers and logs may require several GB).
Basic Knowledge:
Familiarity with command-line tools.
Understanding of environment variables and Docker basics.
Step 1: Pull the Docker Compose File
To configure Airflow with Docker, download the docker-compose.yaml
file from the official Apache Airflow repository.
Windows PowerShell
Run the following command to download the file:
# Create a new directory and navigate into it
mkdir airflow_docker_setup
cd airflow_docker_setup
# Download the docker-compose.yaml file
Invoke-WebRequest -Uri "https://airflow.apache.org/docs/apache-airflow/2.10.4/docker-compose.yaml" -OutFile "docker-compose.yaml"
Linux/macOS
Use curl
to download the file:
# Create a new directory and navigate into it
mkdir airflow_docker_setup
cd airflow_docker_setup
# Download the docker-compose.yaml file
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.10.4/docker-compose.yaml'
Step 2: Configure Environment Variables
Apache Airflow requires environment variables for proper setup. Create a .env
file to define the required variables.
For Linux/macOS
Run the following command:
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
For Windows PowerShell
Use the following command:
Set-Content -Path .env -Value "AIRFLOW_UID=50000`nAIRFLOW_GID=0"
Note: Replace
50000
with the UID suitable for your Docker setup if required.
Step 3: Initialize the Airflow Environment
Airflow uses a metadata database to store information about DAGs and tasks. Initialize the database before starting the containers.
Run the following command:
docker-compose up airflow-init
This creates the necessary directories, initializes the database, and sets up the Airflow environment.
Step 4: Start Airflow Containers
Once the initialization is complete, start the Airflow containers using:
docker-compose up
This command starts the following services:
Webserver: Accessible via a browser for managing Airflow.
Scheduler: Executes tasks.
Metadata Database: Stores Airflow’s metadata.
Other Services: Includes worker and trigger services if configured.
Step 5: Access the Airflow Web Interface
After the containers are running, open your browser and navigate to:
http://localhost:8080 or 127.0.0.1:8080
Default Credentials
Username:
airflow
Password:
airflow
Step 6: Add DAGs to Airflow
DAGs (Directed Acyclic Graphs) define workflows in Airflow. To add a new DAG:
Place your DAG Python file in the
dags
folder within the Airflow directory.Refresh the Airflow UI to see your DAG listed.
Step 7: Stopping and Restarting Airflow
To stop the Airflow containers, press Ctrl+C
in the terminal where the containers are running or use:
docker-compose down
To restart the containers:
docker-compose up
Common Issues and Troubleshooting
Port Already in Use:
If
localhost:8080
is already in use, modify theports
section indocker-compose.yaml
.ports: - "8081:8080"
Then access Airflow at
http://localhost:8081
or127.0.0.1:8081
.
Permission Errors:
- Ensure the
.env
file is correctly configured and matches the UID and GID requirements for your system.
- Ensure the
Container Errors:
Check logs for errors:
docker-compose logs
Conclusion
In this blog, we've learned how to set up Apache Airflow with docker. Docker provides an excellent way to run Airflow on any platform, and with just a few commands, you can automate complex workflows easily.
If you’re looking for the guide to setup a standalone version of Airflow, follow this guide.
Stay tuned for upcoming blogs where I’ll discuss more advanced Airflow features, and please let me know if you’re facing any issues with the setup :)
Join the DevHub community for more such articles, free resources, job opportunities, and much more!