rlucas.net: The Next Generation Rotating Header Image

September, 2014:

Vagrant/Ansible SSH problem with older OpenSSH lacking ControlPersist

Intro; skip to the meat below if you found this through a google search on the error about “-c ssh … ControlPersist”

Vagrant is a Ruby-based abstraction layer that manages a mixture of VirtualBox (or other VM software), SSH, and “Provisioners” like Chef, Puppet, or in our case, Ansible. It’s meant mainly for setting up development and testing environments consistently; it lets you ignore the vagaries of each dev’s local box mess by actually running and testing the software inside a well-defined, consistently configured virtual machine.

Ansible is a Python-based configuration management tool that has a much more straightforward “up and running” learning curve than its ostensible peers. Notably, it is generally “agentless,” in the sense that all of the Ansible software gets run on your local box (your devops guy’s box at world headquarters) without any part of Ansible being installed on your remote nodes, and the actual process of configuring each node is done mainly by opening up ssh connections to each box and running generic, non-Ansible software (such as that remote box’s package manager).

Vagrant can invoke Ansible as a provisioner. Ansible can also be invoked to provision “real” machines, like EC2 instances or (does anyone even have anymore?) actual physical machines.

The holy grail of devops here would be to re-use your dev, test, and prod configs, varying them only in the necessary parts. Ansible is modular enough to do this, and so in theory, you do something very schematically like this:

core_software: x, y, z ...
some_debugging_stuff: a, b, c ...
real_live_security_stuff: m, n, o ...

dev_vm: core_software, some_debugging_stuff
prod_box: core_software, real_live_security_stuff

You can now invoke Vagrant to create a VM and provision it with Ansible to give you a “dev_vm”, while directly using Ansible to create a “prod_box” at your data center. Theoretically, you now have some assurance that the two boxen have exactly the same core software of x, y, z.

The meat of it

Ansible’s heavy reliance upon outbound SSH connections from your local box is OK but throws some kinks in the works when you try to use identical Vagrant + Ansible configurations on two machines that do not share identical software versions like SSH (say, one brand-new Mac OS X and one older Linux). Specifically, you may see this fatal error while performing a “vagrant up” or “vagrant provision”:

using -c ssh on certain older ssh versions may not support ControlPersist

If it’s not clear, that error is coming from Ansible which is sanity-checking the SSH options which it’s being asked to use by Vagrant. Test your local system with:

$ man ssh | grep ControlPersist

If that fails, you have an SSH which is too old to support the ControlPersist option, but Ansible thinks it’s being asked to use that. (ControlPersist is used by default by recent Ansible versions to speed up the reinvocation of SSH connections, since Ansible uses lots and lots and lots of them.)

Optional: to help you understand debug this, you’ll need to get Ansible more verbose. You can do this through the Vagrantfile you’re using by giving the option:

ansible.verbose = "vvv"

The error message you get from Ansible will suggest that you set ANSIBLE_SSH_ARGS=”” as a remedy. If you try this on the command line while invoking Vagrant merely by prepending it, like ‘ANSIBLE_SSH_ARGS=”” vagrant provision’, it won’t work; the “-vvv” output from Ansible will show that it’s been invoked with a long list of ANSIBLE_SSH_ARGS including the troublesome ControlPersist.

Further Googling may suggest that you can override the ssh args either in an “ansible.cfg” file (in one of /etc/ansible/ansible.cfg, ./ansible.cfg, or ~/.ansible.cfg) or in the Vagrantfile with “ansible.raw_ssh_args=[”]”. It is possible that none of these will seem to work; read on.

After much stomping around and examining of the Vagrant source as of 4ef996d at https://github.com/mitchellh/vagrant/blob/master/plugins/provisioners/ansible/provisioner.rb the problem became clear: Vagrant’s “get_ansible_ssh_args” function WILL permit you to set an empty list of ssh_args (thereby leading to Vagrant setting ANSIBLE_SSH_ARGS=”” for you), but only if there are NONE of the following set: an array (more than one) in config.ssh.private_key_path, true in config.ssh.forward_agent, or ANYTHING in the raw_ssh_args. If anything is set in any of those at that point, the ControlMaster and ControlPersist options will be set.

It’s kind of vexing because you don’t expect that setting forward_agent will cause these other things always to be set, even when you have tried explicitly to set raw_ssh_args to empty.

So in sum:

  • No ssh_args in any of the ansible.cfg files that may be looked at
  • Vagrantfile: ensure no more than one private ssh key in config.ssh.private_key_path
  • Vagrantfile: ensure config.ssh.forward_agent=false
  • Vagrantfile: ensure ansible block has ansible.raw_ssh_args=[]

(My problem was with OpenSSH OpenSSH_5.3p1 Debian-3ubuntu7.1, Ansible ansible 1.7.1, and Vagrant Vagrant 1.6.3, and was specifically triggered by the config.ssh.forward_agent=true.)

That solves it for me — I am happy with it because it’s almost 100% used for local Vagrant VMs. I have yet to see how managing remote boxes works from my older Linux machine with the ControlPersist optimization removed (though remember, in the case that you’re using Ansible directly and not through Vagrant, the above fix won’t apply.)