Configuration management (CM) is one of the DevOps topics that gained a lot of traction lately. Those who want to start a career as DevOps engineers or who are just curious about many terms that they hear in DevOps circles like Ansible, Chef, Puppet, SaltStack and others, please read along.
Back then, there were physical servers
In the old days of system administration, when a sysadmin was tasked with deploying a web application for the company's new online store product, she'd do the following:
- Talk to the data center team to assign a new hardware server (if one is available).
- Grab the DVD containing the OS image and use it to install Linux or Windows on the server.
- Once the server is up and running, she'd start deploying the application.
Deploying a LAMP stack the ad-hoc way
Let's get more specific, let's say that the application that the sysadmin is going to deploy is the LAMP stack (Linux + Apache + MySQL + PHP). Each of those layers need to be setup individually, without being tied to the other. So, for example, she'd go ahead and install PHP 7.2, only to discover that some application code will only run on PHP 5.3. She'd then have to uninstall PHP 7 and reinstall the correct version.
Lack of documentation/communication
Of course you might say that this is mainly due to lack of proper documentation and/or communication between the ops team and the dev team. She should have been notified early enough the application requires PHP 5.3 to function correctly. Even with proper documentation, these things do happen.
Then comes virtual machines
Due to a lot of costs and efforts wasted in working with physical machines, enterprises sought virtual ones. A virtual machine is simply a server that shares resources with other servers, all of them are hosted on one or more large, powerful computers, called the virtualization host. Working with virtual machines is much easier, with a simple command or a mouse click, you have a fully functional server, optionally loaded with your desired OS. Think of it for a moment: if $1000 could buy you one physical server with one application, then virtualization could buy you the same server, but with the ability to host as many as ten virtual machines on it, each serving a different application. That means more done with the same or less money. With more power and capacity given to hardware serves, more and more virtual machines could be created, with more and more application stacks need to be deployed, this means more work tasked to devs and sysadmins.
The need for automation
Automation is as old as the very first computer program or shell script ever written. After all, computers are very clever running a task for an unlimited number of times with exactly the same steps. Humans aren't that good. So, back to our example, now that our company has embraced virtualization in its infrastructure, our sysadmin is tasked with deploying that LAMP stack to four different environments. They needed development, testing, staging, and production environments. All of them should have the same LAMP stack up and running, perhaps with some minor differences. For example, the development environment has an extra machine serving Nginx load balancer to act as a reverse proxy for the application. Since this additional layer has not been tested yet, it cannot be deployed to production yet.
Using a shell script
Our sysadmin thought about using shell script to automate the process. The shell script should deploy all the layers of the stack, and it should be intelligent enough not to install Nginx in the development environment. A script like that should read as follows:
TARGETENV=prod DB=mydatabase rootuser=root rootpass=admin # Install Apache sudo apt install apache2 # Configure the virtual hosts and other Apache conf files using commands like grep, sed, and awk # Start the service sudo systemctl start apache2 #Ensure that the service will start on system boot sudo systemctl enable apache2 # Install MySQL sudo apt install mysql-server #Configure the root user and password - in clear text?? #Start the service sudo systemctl start mysql #Ensure that the service is started on system boot sudo systemctl enable mysql # Optionally import a dump to the database mysql -u $rootuser -p$rootpass import -D$DB < mydb.sql # and so on
This is just an excerpt of a script that would possibly do the task. Now let's see the drawbacks of this approach
Drawbacks of using shell scripts
Let's have a look at this script and see some potential pitfalls:
- First and foremost, of course having to put the root username and password for the MySQL database in clear text, a huge security flaw and a practice that should be avoided at all costs.
- Going through the script, you will discover that it is dependent on other parties. For example, it needs to download the
.sqlperhaps from an SFTP site, a version control repo, or even an web server. Does it need a credentials of its own?
- In order to run this script, you will have to manually upload it to each target machine, login to that machine and run it. This is a lot of time wasted. Additionally, what if she were asked to replicate this setup on two more environments (that makes it over a dozen servers)?
- Our sysadmin is not working alone, she's got other sysadmins working with her on the team. One of them stopped the Apache server, perhaps for doing a particular change, and forgot to restart it. This is a very common scenario. Now, the application is down and investigations need to be made to know which exact component has stopped or was mis-configured. Now, one option would be to re-run the setup script to get things to the desired state. However, shell scripts need a lot of tweaks to make sure that running it several times will not throw an error, or duplicate settings. This is known as idempotence.
Using a configuration management tool
Configuration management tools were created to address the above (among other) drawbacks. For example, let's create an Ansible playbook that will do the same tasks as our shell script:
- hosts:webservers-prod vars_files: - secret.yaml tasks: - name: Install Apache apt: name: apache2 state: present - name: Start and enable Apache2 service: name: apache2 state: started enabled: yes - name: Install mysql apt: name: mysql-server state: present
This is an excerpt of a real Ansible playbook that was used to create the LAMP stack in my course Beginning Ansible. Notice the following:
- You can run this playbook right from your own laptop. No need to connect and login to the remote machine, upload the script and run it from there. This will ensure that you can run this script against as many target machines as you need. All what you need to change the
hostspart to reflect the group of machines you want to target.
- Ansible has a vast number of modules that cover nearly every use case a sysadmin might need. So, whether you want to download the SQL file from an FTP/SFTP/HTTP server, there will be a module that abstracts the whole operation and save you time while keeping the playbook consistent.
- All configuration management tools (Ansible included) are idempotent. Running this script thousands of times against the same target machines will not throw an error; as it will only make the required changes that drive the machine to the desired state.
- In the second line of the playbook, we are referring to
secrets.yamlfile. This is called a variables file. You can add all the variables that will be used in your playbook in this file. Optionally, you can encrypt the file so that it obfuscates sensitive information like usernames or passwords.
Is Ansible the right tool for you?
In this post, I demonstrated a specific tool, Ansible as an example configuration management tool. But it is not the only one. There are other tools in the market like Chef, Puppet, SaltStack among others. So, what makes Ansible different? let's see:
It does not require an Agent
Chef and Puppet need a central server that will manage and orchestrate the instructions sent to each target machine. This also requires an agent running continuously on each target server to receive and apply the commands received from the central server.
It uses the native SSH protocol
Ansible does not need any new ports open in your firewall to enable communication between your own laptop and the target machines. Just the same old SSH protocol on port 22 (you configure that to a different port if you want to).
It uses Python
Ansible is written in Python. So, sysadmins who are used to writing script in this amazing language will feel right at home. But Python knowledge is not a prerequisite for mastering Ansible. Only if you want to write complex, advanced playbooks when you will need to have a good Python background.
I've used Ansible, Chef, and Puppet and I found that Ansible has the least step learning curve. It requires little to no setup and you can start configuring servers within minutes of trying it out.
However, Ansible is not a silver bullet. It is not the answer for all your configuration management problems. You may want to consider a more advanced tool like Chef or Puppet if:
- You have an environment that contains thousands of hosts, with complex application dependencies and connections.
- You need a GUI
- You need to create roles and permissions to control who executes which tasks on which servers and stuff like that. Nonetheless, Ansible does have a commercial (free trial is available) product called Ansible Tower by Red Hat that offers a GUI and an access control list.
If you like this article and want to have a professional-level training on Ansible, you can enroll in my course Learn Ansible from the ground up on Udemy. I'm giving away a discount coupon exclusively for the readers of this blog. Just use the coupon code
ZSAVE2018 when you checkout.