It is crucial to take care of the security of the pipelines when using Airflow in production.
There will be many different connections and variables when building DAGs, and these must be encrypted. Usernames, passwords, and API keys will be used when connecting to different data sources. These credentials are also stored on the Airflow metadatabase and are not encrypted by default. If someone were to gain access to the metadatabase, they would have access to credentials. Even though the infrastructure will be kept securely on AWS infrastructure with multi-factor authentication, IAM roles, virtual private clouds, private subnets, security groups, and strong passwords, these credentials will be encrypted using Fernet keys.
Fernet is an encryption method that ensures that the value encrypted cannot be manipulated or read without the Fernet key. This key is a URL-safe base64-encoded key with 32 bytes storing the time when the value got encrypted. When a value needs to be encrypted, a Fernet object is instantiated based on that key, and the encrypt method is called.
A unique Fernet key will be randomly generated using Docker. This key will then be used to encrypt data in Airflow in the user interface.
The security of the data pipelines can be further improved by rotating the Fernet key. Automatically changing the key at an interval of time is a security best practice.
Airflow has a configuration for hiding defined and environment variables. Variable are hidden using the Airflow user interface.
Restricting the access of the Airflow user interface is recommended as a security best practice. Users and passwords will be generated to access the user interface.
This post is part of a 23 part mini-series about implementing a cost-effective modern data infrastructure for a small organization. This is a small part of a whitepaper that will be released at the end of this series.