rlucas.net: The Next Generation Rotating Header Image


s3put fails with ssl.CertificateError suddenly after upgrade

We had been using periods / dots in Amazon S3 bucket names in order to create some semblance of namespace / order. Pretty common convention.

A short while ago a cron job doing backups stopped working after some Python upgrades. Specifically, we were using s3put to upload a file to “my.dotted.bucket“. The error was:

ssl.CertificateError: hostname 'my.dotted.bucket.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

It turns out that per Boto issue #2836 a recent strictifying of SSL certificate validation breaks the ability to validate the SSL cert when there are extra dots on the LHS of the wildcard. Boo.

If you don’t have the luxury of monkey-patching (or actually patching) the code that sits atop this version of boto, you can put the following section into your (possibly new) ~/.boto config file:

calling_format = boto.s3.connection.OrdinaryCallingFormat

(Of course, expect that all of the nasty MITM attacks that stricter SSL validation is meant to mitigate to come back and bite you!)

Vagrant/Ansible SSH problem with older OpenSSH lacking ControlPersist

Intro; skip to the meat below if you found this through a google search on the error about “-c ssh … ControlPersist”

Vagrant is a Ruby-based abstraction layer that manages a mixture of VirtualBox (or other VM software), SSH, and “Provisioners” like Chef, Puppet, or in our case, Ansible. It’s meant mainly for setting up development and testing environments consistently; it lets you ignore the vagaries of each dev’s local box mess by actually running and testing the software inside a well-defined, consistently configured virtual machine.

Ansible is a Python-based configuration management tool that has a much more straightforward “up and running” learning curve than its ostensible peers. Notably, it is generally “agentless,” in the sense that all of the Ansible software gets run on your local box (your devops guy’s box at world headquarters) without any part of Ansible being installed on your remote nodes, and the actual process of configuring each node is done mainly by opening up ssh connections to each box and running generic, non-Ansible software (such as that remote box’s package manager).

Vagrant can invoke Ansible as a provisioner. Ansible can also be invoked to provision “real” machines, like EC2 instances or (does anyone even have anymore?) actual physical machines.

The holy grail of devops here would be to re-use your dev, test, and prod configs, varying them only in the necessary parts. Ansible is modular enough to do this, and so in theory, you do something very schematically like this:

core_software: x, y, z ...
some_debugging_stuff: a, b, c ...
real_live_security_stuff: m, n, o ...

dev_vm: core_software, some_debugging_stuff
prod_box: core_software, real_live_security_stuff

You can now invoke Vagrant to create a VM and provision it with Ansible to give you a “dev_vm”, while directly using Ansible to create a “prod_box” at your data center. Theoretically, you now have some assurance that the two boxen have exactly the same core software of x, y, z.

The meat of it

Ansible’s heavy reliance upon outbound SSH connections from your local box is OK but throws some kinks in the works when you try to use identical Vagrant + Ansible configurations on two machines that do not share identical software versions like SSH (say, one brand-new Mac OS X and one older Linux). Specifically, you may see this fatal error while performing a “vagrant up” or “vagrant provision”:

using -c ssh on certain older ssh versions may not support ControlPersist

If it’s not clear, that error is coming from Ansible which is sanity-checking the SSH options which it’s being asked to use by Vagrant. Test your local system with:

$ man ssh | grep ControlPersist

If that fails, you have an SSH which is too old to support the ControlPersist option, but Ansible thinks it’s being asked to use that. (ControlPersist is used by default by recent Ansible versions to speed up the reinvocation of SSH connections, since Ansible uses lots and lots and lots of them.)

Optional: to help you understand debug this, you’ll need to get Ansible more verbose. You can do this through the Vagrantfile you’re using by giving the option:

ansible.verbose = "vvv"

The error message you get from Ansible will suggest that you set ANSIBLE_SSH_ARGS=”” as a remedy. If you try this on the command line while invoking Vagrant merely by prepending it, like ‘ANSIBLE_SSH_ARGS=”” vagrant provision’, it won’t work; the “-vvv” output from Ansible will show that it’s been invoked with a long list of ANSIBLE_SSH_ARGS including the troublesome ControlPersist.

Further Googling may suggest that you can override the ssh args either in an “ansible.cfg” file (in one of /etc/ansible/ansible.cfg, ./ansible.cfg, or ~/.ansible.cfg) or in the Vagrantfile with “ansible.raw_ssh_args=[”]”. It is possible that none of these will seem to work; read on.

After much stomping around and examining of the Vagrant source as of 4ef996d at https://github.com/mitchellh/vagrant/blob/master/plugins/provisioners/ansible/provisioner.rb the problem became clear: Vagrant’s “get_ansible_ssh_args” function WILL permit you to set an empty list of ssh_args (thereby leading to Vagrant setting ANSIBLE_SSH_ARGS=”” for you), but only if there are NONE of the following set: an array (more than one) in config.ssh.private_key_path, true in config.ssh.forward_agent, or ANYTHING in the raw_ssh_args. If anything is set in any of those at that point, the ControlMaster and ControlPersist options will be set.

It’s kind of vexing because you don’t expect that setting forward_agent will cause these other things always to be set, even when you have tried explicitly to set raw_ssh_args to empty.

So in sum:

  • No ssh_args in any of the ansible.cfg files that may be looked at
  • Vagrantfile: ensure no more than one private ssh key in config.ssh.private_key_path
  • Vagrantfile: ensure config.ssh.forward_agent=false
  • Vagrantfile: ensure ansible block has ansible.raw_ssh_args=[]

(My problem was with OpenSSH OpenSSH_5.3p1 Debian-3ubuntu7.1, Ansible ansible 1.7.1, and Vagrant Vagrant 1.6.3, and was specifically triggered by the config.ssh.forward_agent=true.)

That solves it for me — I am happy with it because it’s almost 100% used for local Vagrant VMs. I have yet to see how managing remote boxes works from my older Linux machine with the ControlPersist optimization removed (though remember, in the case that you’re using Ansible directly and not through Vagrant, the above fix won’t apply.)

Never Trust Microsoft Products On Mac OS X

In Microsoft Word 2008 for Mac OS X, if you open a “Find” dialog, and if it has focus, it will capture cmd-S (the universal “Save” command) and cause it to toggle the “sounds like” option in advanced find options. It will sort of capture cmd-W (the universal “close Window,” which will close the Find dialog). It will not capture cmd-Q (the universal “Quit”), which will quit the program and close all its windows, assuming your document has been saved.

Which you thought you’d been doing by hitting cmd-S. And then you wonder why your search results were so weird.

Thanks again, Mac BU!

TrueCrypt breaks after OS X 10.8 (Mountain Lion) upgrade

1. You run TrueCrypt as before, and once you finally remember your passphrase and type it correctly, you get an error: “the MacFUSE file system is not available (71)”

2. You look around and you still have /usr/local/lib/*fuse*.so etc., in addition to /Library/Filesystems/fusefs.fs

3. Your MacFuse control panel in System Preferences still exists (but the “uninstall” button doesn’t seem to work!). Also, the script which that button runs, /Library/Filesystems/fusefs.fs/Support/uninstall-macfuse-core.sh — that doesn’t work either (run with sudo).

4. You’ve tried installing “fuse4x” and “fuse4x-kext” from MacPorts, but still nothing.

What I did:

1. Remove MacFUSE manually, completely, using the list of installed files at http://code.google.com/p/macfuse/wiki/FAQ

MacFUSE file system bundle (/Library/Filesystems/fusefs.fs)
MacFUSE Objective-C framework (/Library/Frameworks/MacFUSE.framework)
MacFUSE C-based libraries (/usr/local/lib/*fuse*.dylib) and headers (/usr/local/include/fuse*)
MacFUSE preference pane (/Library/PreferencePanes/MacFUSE.prefPane)
MacFUSE project templates for Xcode (/Library/Application Support/Developer/Shared/Xcode/Project Templates/MacFUSE)
2. Remove /Applications/TrueCrypt.app

3. port uninstall fuse4x and fuse4x-kext

4. Reboot. (Might not be necessary but can’t hurt here.)

5. port install fuse4x fuse4x-kext

6. Reinstall truecrypt (you might need to option-right-click to bypass the security check). Note that installing truecrypt, I believe, re-links the TC binary against the fuse libraries that you just re-installed using MacPorts, so you end up with a different TC binary than before.

7. Run TrueCrypt.app again and hope that it all worked…


MySQL silently ignores aggregate query errors by default

In a SQL query, if you use aggregate functions (min, max, count, sum, etc.) and mix them with non-aggregate columns, you have to indicate how to “group” things. Otherwise, the output is not predictable.

MySQL by default will just ignore these problems and make up something. This can make bugs in complex queries hard to track down (and it virtually guarantees that a novice or dullard will slip some errors into such queries eventually).

You can fix this with:


(That is, you want the “ONLY_FULL_GROUP_BY” option set. The SET above can be run in the mysql> prompt and affects that session only; thinking DBAs should strongly consider enforcing this as a server option.)

I am too tired and busy to give in to the temptation to unleash a rant about MySQL here, but PLEEEEZ. It’s the year 2013 and this is still an issue??

Hat tip Michael McLaughlin: http://blog.mclaughlinsoftware.com/2010/03/10/mysql-standard-group-by/

MySQL docs on this “extension” http://dev.mysql.com/doc/refman/5.1/en/group-by-extensions.html

Quick Django Foreign-key Gotcha

When your Django app poops out with:

'RelatedManager' object is not callable

The problem might be that you’re trying to call up related fields via a foreign key field type’s auto-generated reverse-lookup attribute — but mistakenly trying to call it as a method instead of as an attribute that holds a query set. In code terms, you put:

tracks = cd.track_set()

What you meant to do was:

tracks = cd.track_set.all()

Some Gotchas with using svndumpfilter

A few things:

1. svndumpfilter can take multiple args, e.g.

$ svndumpfilter include /x /y /z > mydump

to include /x, and /y, and /z. It can’t, however, do both include and exclude at once. In theory, you can run multiple dumps when you later load them, so you could (sort of; see below) accomplish the sameish thing with

$ svndumpfilter include /x > mydumpx
$ svndumpfilter include /y > mydumpy
$ svndumpfilter include /z > mydumpz

2. HOWEVER, if you have ever MOVED a file within the repository between (in the example above), /x and /y, you can’t rely upon doing it piecewise. That’s because the references within the loading process during the load of /y will no longer be valid as they point to /x/whatever.

3. It is commonly suggested that one edit “Node-path:” entries within the dump in order to fix up directory structure issues. NOTE that you MUST also change “Node-copyfrom-path:” in the same manner. Trickily, Node-copyfrom-path is only present in nodes that were (surprise!) copied from another node. This is of course tied to 2. above.

What is all this about, you ask? Well, it turns out that if you have a single respository with a lot of sprawling projects all under /trunk, you might need to break them out. (For example, if you intend to upload your repository dump to someone like a CVSDude or Beanstalk).

The error message that got me was something like:

svnadmin: File not found: revision 91, path ‘/trunk/x/whatever’