Ansible is quickly becoming the de facto configuration management tool today. It's very easy to use, has a ton of modules, does not need a management server; as it can be run right from your laptop. The only requirement that has to be fulfilled on the target machine is a modern version of Python. But what if Python is not (yet) installed?

But I thought Python ships with Linux by default

Yes, you're correct. All major Linux distribution have Python 2 or 3 or both installed by default. However, if you managed to run an AWS EC2 machine using one of the Ubuntu AMIs (in this example it's ami-0bbe6b35405ecebdb), you'll notice that Python is not installed.

OK, I will just install it and then run Ansible

That might be a perfect choice if your project is hosted on one or two machines. But, what if you have a dozen or more? having to manually login to each and every host and run a sudo apt install python is opposite to the definition of automation. A DevOps engineer should have a better option.

The raw module

Ansible is a Python library just like requests, numpy and others. Any module that Ansible uses against the target host uses Python internal to make the required changes. But what if Python is not installed in the first place? Fortunately, Ansible has the raw module. If you have a look at the module's page here https://docs.ansible.com/ansible/devel/modules/raw_module.html, you'll see the official definition:

This is useful and should only be done in a few cases. A common case is installing python on a system without python installed by default.

Preparing Ansible to connect to the EC2 instance

Before starting to write the playbook, we'll need to make some changes to how Ansible connects to remote hosts.

Create a new file in your current directory and call it ansible.cfg. By default, Ansible will read this file and obey any setting in it, overriding the defaults defined in /etc/ansible/ansible.cfg. My file looks as follows:

[defaults]

host_key_checking = False

private_key_file = ~/.ssh/mykey.pem

remote_user = ubuntu

The first line avoids SSH key checking. If you use SSH to connect to a host for the first time, you'll see a warning that the system is going to add the SSH key of the remote host to the list of known hosts and you have to type yes to establish connection. When dealing with lots of hosts in a security-contained environment, such a prompt will waste a lot of time unecessarily.

The second line defines the location of the SSH private key used to connect to the remote host.

The last line instructs Ansible to use ubuntu as the username when connecting to the remote host. By default, Ansible will use the currently-logged username to establish connection.

Now, we need to create the hosts file. I've created a group called development containing my EC2 instance IP address:

[development]

34.214.124.121

Running the playbook

Now, the important part. The playbook that will be used for configuring our target host:

- hosts: development

  become: yes

  gather_facts: no

  pre_tasks:

  - name: 'install python'

    raw: 'sudo apt-get -y install python'

Notice that you MUST set gather_facts to noif Python is not installed yet; as this setting uses Python to gather information about the target host.

When we need to install Python, this must be specified in the pre_tasks section of the playbook. You will want this specific task to always be the first thing that Ansible does once it connects to the remote host. This becomes even more important if you will use Ansible roles because any tasks defined in the role will be executed first before Ansible examines the tasks in the playbook.

Now, the raw module comes to action. Obviously, we are using a straight Ubuntu command, apt. We're even using sudo to make sure that the operation will be carried out with the root privileged account even that we already set become: yes.

Running the playbook is as easy as issuing the following command:

ansible-playbook -i hosts deploy.yml

Double check that Python got installed by logging in to the host and running python --version.

Drawbacks of this approach

Ansible was designed so that it uses Python almost all of its work. Using the raw module has some disadvantages of its own:

  1. It is not idempotent. Configuration management tools are meant to be run thousands of times without making any further changes that have been already made to the target machines. This is called idempotence. The raw module of course does not enjoy this so each time this playbook runs, the sudo apt install python command will be run unnecessarily.
  2. You cannot gather important information about the target machine. The gather_facts setting can be very important if you intend to use machine-related information later in the playbook. It has to be set to no if the playbook is going to be used to install Python.

Possible workarounds for the drawbacks

You can create a separate playbook that will just make sure that all target machines have Python installed. Subsequent Ansible tasks can be added to another playbook where gather_facts can be turned on and the Python installation command won't be run over and over.

Another workaround may not be directory related to Ansible: you can create your own AMI image that has Python installed and use this image when you need to spawn any new EC2 instances on which Ansible is going to be used.