rlucas.net: The Next Generation Rotating Header Image

python

s3put fails with ssl.CertificateError suddenly after upgrade

We had been using periods / dots in Amazon S3 bucket names in order to create some semblance of namespace / order. Pretty common convention.

A short while ago a cron job doing backups stopped working after some Python upgrades. Specifically, we were using s3put to upload a file to “my.dotted.bucket“. The error was:

ssl.CertificateError: hostname 'my.dotted.bucket.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

It turns out that per Boto issue #2836 a recent strictifying of SSL certificate validation breaks the ability to validate the SSL cert when there are extra dots on the LHS of the wildcard. Boo.

If you don’t have the luxury of monkey-patching (or actually patching) the code that sits atop this version of boto, you can put the following section into your (possibly new) ~/.boto config file:

[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat

(Of course, expect that all of the nasty MITM attacks that stricter SSL validation is meant to mitigate to come back and bite you!)

DB Transaction “BEGIN” in Django shell

Django provides a handy “shell” which can be invoked using the manage.py for a project, and which will usefully setup the necessary Django environment and even invoke ipython for completion, syntax highlighting, debugger, etc.

Also usefully, but very much separate from the shell functionality, Django provides a nice framework for dealing with database transactions through its ORM. One can use django.db.transaction.rollback() for example.

However, the shell by default will be invoked with autocommit, meaning that each individual SQL statement gets committed. When one is poking around freehand in the shell, this might not be for the best, so one may want to turn off autocommit and resort to the choice of being able to rollback().

Unfortunately for that use case, all of the Django infrastructure for beginning database transactions is focused on how to begin a transaction in your code, where it rightly would be expected to be within a function or at least a “with” block. Hence, the docs and the code focus on using decorators, e.g. “@transaction.commit_on_success” or context managers, e.g. “with transaction.commit_on_success():“. Obviously not helpful in the shell / REPL.

If you are in your “manage.py shell” and need to do some romping around in your single-database Django app while being wrapped up in the warm fuzzy security blanket of a DB transaction lest you fat-finger something, you can get the same effect for your subsequent few commands in the shell with:

from django.db import transaction
transaction.enter_transaction_management()
transaction.managed(True)
# do stuff
imp = my_models.ImportantObject(title="Emabrassing Tpyos In the Titel")
imp.save()
# oops
transaction.rollback()
# this is too stressful, let's quit
transaction.leave_transaction_management()

Caveats: this only works in a one-database-connection setup where using the default connection does what you want; newer versions of Django may have a nice way to do this; don’t trust my random blog post with your production data!

Python matrix initialization gotcha

If you want to spin up a list of lists — a poor man’s matrix — in Python, you may want to initialize it first. That way you can use indices to point directly (random access) into the matrix, with something like:

matrix[i][j][k]

without having to worry whether you’ve managed to make the matrix “big enough” through appending , looping, whatever.

If you are an idiot like me, you will skim StackOverflow and come away with the naive use of the “*” operator to create lists.

In [1]: lol = [[[None]*1]*3]*2

In [2]: lol
Out[2]: [[[None], [None], [None]], [[None], [None], [None]]]

That seems to work fine for our case — a small 3-D matrix (trivial in the third dimension I admit) initialized to None, the pseudo-undefined object of Python. Sounds good. Wait…

In [3]: lol[0][0][0] = 'asdf'

In [4]: lol
Out[4]: [[['asdf'], ['asdf'], ['asdf']], [['asdf'], ['asdf'], ['asdf']]]

Um. Since the same singleton None object was assigned to each of the slots in the matrix, changing it in one place changes it everywhere.

Facepalm.

To do what you actually want to do, use the list comprehension syntax and leave the monstrosity of the * operator alone:


In [21]: lolfixed = [[[None for k in range(1)] for j in range(3)] for i in range(2)]

In [22]: lolfixed = [[[None for k in range(1)] for j in range(3)] for i in range(2)]
KeyboardInterrupt

In [22]: lolfixed
Out[22]: [[[None], [None], [None]], [[None], [None], [None]]]

In [23]: lolfixed[0][0][0] = 'asdf'

In [24]: lolfixed
Out[24]: [[['asdf'], [None], [None]], [[None], [None], [None]]]

Django auto_now types wreck get_or_create functionality

I recently had occasion to lazily use the Django “get_or_create” convenience method to instantiate some database ORM records. This worked fine the first time through, and I didn’t have to write the tedious “query, check, then insert” rigamarole. So far so good. These were new records so the “or_create” part was operative.

Then, while actually testing the “get_” part by running over the same input dataset, I noticed it was nearly as slow as the INSERTs, even though it should have been doing indexed SELECTs over a modest, in-memory table. Odd. I checked the debug output from Django and discovered that get_or_create was invoking UPDATE calls.

The only field it was updating was:

updated_at = models.DateTimeField(auto_now=True)

Thanks, Django. You just created a tautological truth. It was updated at that datetime because … you updated it at that datetime.

Interestingly, its sister field did NOT get updated:

created_at = models.DateTimeField(auto_now_add=True, editable=False)

This field, true to its args, was not editable, even by whatever evil gnome was going around editing the updated_at field.

Recommendation: if you want actual timestamps in your database, why don’t you use a real database and put a trigger on it? ORM-layer nonsense is a surefire way to have everything FUBAR as soon as someone drops to raw SQL for performance or clarity.

Quick Django Foreign-key Gotcha

When your Django app poops out with:

'RelatedManager' object is not callable

The problem might be that you’re trying to call up related fields via a foreign key field type’s auto-generated reverse-lookup attribute — but mistakenly trying to call it as a method instead of as an attribute that holds a query set. In code terms, you put:

tracks = cd.track_set()

What you meant to do was:

tracks = cd.track_set.all()