Sohliloquies: Diving Deeper Into Deep Dream: Different Distortions

The moment you get Google Research's Deep Dream project set up, you can make it do incredible things. If you want to generate trippy zooms with lots of dogs and eyes, the code Google provides works more or less out-of-the-box.

For reference, here is a gist for a .py file containing all the code from the notebook. This is meant to be run through an interactive interpreter, e.g. by providing the -i flag at runtime or, in iPython, starting a new notebook and then executing e.g. "run demo.py".

And, here is a gist for a similar but extended piece of code which takes an input file on the command line and performs the iterative process described at the very end of the researchers' original iPython notebook. Both these files are almost entirely composed of code from that notebook; they're just provided as references, so we can have some well-established starting points from which to branch out.

Prelude

Cards on the table. I'm not here to teach you about neural nets or about Deep Dream. I'm here to show you some really fucking cool graphics (watch that link to the end, I promise it's worth it), and I'm here to give you some ideas about how you could make your own. If you aren't interested in the blow-by-blow, and want to hurry up and get to the blow-my-mind, just pop on down to the bottom of the post, where you'll find a list of every graphic linked to in the body here, compiled for your browsing pleasure.

Okay, with that out of the way, let's do a quick overview of how those gists work, just so we know where we're starting from. I'm going to just quickly walk through the latter one, because its functionality is basically a superset of the former's.

First, a Caffe model is loaded from a baked-in path. This is all fairly standard boilerplate. One important feature is that we require that the model be able to compute its own gradients. This is used later, in the optimization step.

Then we make a directory to store hallucinated (excuse me--dreamed) images in, load an image to start from, and we're off to the races. Inside the loop, we save our image, distort it, and apply a scaling factor. The image is represented internally as a multidimensional NumPy array, which makes it easy to transform. SciPy's ndimage library works well here, and is used in the code provided above.

The distortion step, which is where we actually consult our neural network, is done by deepdream(), which in turn delegates in part to make_step(). The interaction between the two of them is tricky to unravel, so suffice it to say that the end result of them is that they use gradient ascent to find small distortions ("details" in the parlance of the source comments) which maximize the increase in neural net layer activation. Or, in other words, they figure out small changes to the image which produce big changes in how much the neural net sees.

The reason for zooming after deepdream() is so that we can keep finding new features. If the image was left unchanged, new and interesting features would be generated for a while, but eventually an attractive basin would be reached and we'd end up just sharpening preexisting features in minute ways. This is not necessarily a bad thing, but it could pose problems if you're hoping for a neverending stream of visuals.

Critically, zooming avoids falling into a basin by stretching the image, essentially creating gaps within which new features can appear. The goal here is essentially to make sure that we're never "too optimized", and thus things stay interesting. Note that if that's all we want to accomplish, lots of non-zoom distortions would also suffice. More on that later.

The functions provided come with plenty of parameters to fiddle with and tweak. It can be interesting to see, for instance, how changing octave_n and octave_scale in deepdream() changes the vividness (and runtime) of a single dream iteration. I won't talk too much about this, because it's easier to see than to explain. You're encouraged to try it out yourself, perhaps in an iPython GUI notebook session.

Animation

The first trick is animation. It's super easy to generate a long series of stills, each of which is striking on its own, but it's even cooler to see them flow naturally in sequence. Since each frame is generated by applying the same transformation to the frame before it, there's a sort of continuity across frames which is visually striking. You can get some sense of what a dream would look like animated by opening a slideshow and holding down an arrow key.

But, if you want the real deal, animated GIFs are the way to go. Python support for this is tenuous -- Pillow says it supports writing GIFs, but the feature is completely undocumented, so I'm not going there. GIMP supports creating GIFs, but it's a bit of an involved process and the memory overhead is non-negligible.

The easy way is with ImageMagick. Download the ImageMagick suite through your package manager, then open a terminal in the folder where all your frames are. They should be numbered so that alphabetical sort puts them in the correct order -- the second gist given at the top of this post will ensure that. Then, run

convert -delay 7 -loop 0 *.jpg out.gif

and ImageMagick will take care of the rest. If you don't want your animation to loop after it finishes, use -loop 1, and if it's a bit slow for your taste, lower the delay. You can also reverse these animations, which can look seriously cool, with the seed image emerging from the surreal haze in a really neat sort of way. To reverse an already-created gif, use

convert out.gif -reverse reversed.gif

then open it and stare in awe.

Affine Transforms

Some really incredible visuals can result from rolling your own transforms to replace the zoom effect. The zoom happens on the following line:

frame = nd.affine_transform(frame, [1-s,1-s,1], [h*s/2,w*s/2,0], order=1)

which essentially just performs a matrix transformation at an offset. [1-s,1-s,1] is taken by nd to represent the nonzero entries of a diagonal matrix, and [h*s/2,w*s/2,0] provides offsets to keep the image centered. One simple change is to only stretch along one dimension. For instance,

frame = nd.affine_transform(frame, [1-s,1,1], [h*s/2,0,0], order=1)

will stretch the image vertically but not horizontally. Here's a before and after comparison to drive home just how different the end results are. The "before" animation actually had about twice as many frames, but I cut half of them off because they were just an infinite zoom into a sea of eyes and it was honestly kind of scary. The vertical stretch, on the other hand, kept going with beautiful imagery for an almost unbelievable 1200 frames. The animation I linked near the very start of the article is a reversed version of the "after" animation here.

I've been finding this scaled-down doge picture, by the way, to be a really nice image to use for experiments. It's small enough that you can bang out 50+ iterations super quickly. I'm going to keep using it as a reference point, so we can focus on the changes different distortions bring about more than what the net does with different images.

Matrices can get you just about any affine transformation, and so there's a lot of potential ground to cover there, but I'll let you experiment with that. If you want to provide a full matrix instead of a diagonal one, just replace the 1D list with a 2D one and nd will figure it out. I'd recommend trying a rotation matrix.

Crazy Transforms

But, of course, we need not limit ourselves to affine transforms! We have our image as a NumPy array -- there are any number of things we can do to it. I'll take the time to go into a few, but really, the world's your oyster. Let's go through a few ideas.

Let it roll

Here's one cool idea: np.roll can be used to "roll" a row or column in a multidimensional array so that all its elements are shifted "down" or "up", and those elements which fall off one end reappear on the other. For instance, np.roll([1,2,3,4,5], 2) returns array([4, 5, 1, 2, 3]). What if we shift each column in our image using this? Maybe that'll look cool.

We can define a function to decide how much each column should shift and then apply it in a loop. Strictly speaking it's not necessary to use a function, but I find that it helps as far as readability is concerned. Before the main loop, we put something along the lines of:

def shiftfunc(n): return n/15

or any other function you can come up with, and then within the loop, we replace the affine transform with the following loop:

for n in xrange(img.shape[0]):

frame[:, n] = np.roll(frame[:, n], 3*shiftfunc(n))

This'll shift every column by the amount specified by shiftfunc, which is worthwhile to mess around with. The reason for multiplying by 3 in the main loop is that because of how np.roll works (which in turn is due to how the data is represented in memory), if our shift value isn't a multiple of 3 we end up mixing up some of the image's RGB data and it just doesn't look good.

A roll shift like this keeps all of the image's original contents, and just moves them around, which means that you don't see too much change in image contents or color palette. It still is fun to look at, though. Food for thought: what about combining an effect like this with some sort of palette shift and palette flattening?

Of course, one can use just about any function. Here's one that uses sine waves, and here's the code for it. I like how this one turned out -- the shearing creates some interesting emergent properties, like larger dogheads becoming smaller ones as the constituent parts shear apart and some roll over the image's border.

Stochastic shifts

Another idea: random shifts. return randint(-5, 5). To see how that turns out, look here. It's interesting to see how thin, vertical features like legs are now more likely to appear, since they're closer to invariant across random column shifts. Fiddling with parameters like random number range, shift frequency (as demoed in the sine gist), and octave number would likely induce substantial changes in this sort of transform's output.

In particular, the current "shimmer" effect, while cool to look at, never takes us too far from the original image; this is because setting the lower and upper bounds equal effectively "cancels out" the randomness, making the image jitter around but never stray too far from home. If the bounds are unequal, we start to see some more interesting effects. For instance, this is the result of setting our lower bound to 0 instead of -5, so that the image can only ever shift down. I like how that one turned out, but it does end up "converging" after a while in a way that's strangely analogous to how unshifted images converge.

One interesting property of this transform is that the long-term stochastic behavior of the shift function isn't too different from just shifting the image down by a constant. One could go over the generated frames and shift them back up, but because the neural net doesn't recognize the roll operation's rollover, there would be an unavoidable seam.

A potentially fruitful direction to take these randomness experiments in would be to try making the bounds a function of the column number. For instance, maybe even-numbered columns get twice as big a range as odd ones. This might forestall the "convergence" effect discussed above. One might also want to try grouping adjacent columns into larger groups of e.g. 3 or 4, and shifting them as groups, to allow more locally emergent features.

Crazy shit

Okay, that's all well and good, but what about using something totally nutty for our distortion function? Earlier I promised that we'd explore this in depth, and that exploration wouldn't be complete without a couple of completely out-there selections. Here's one to get us started.

That's the result of using the Sobel operator, often used to help with edge detection, as our transform. Here's my code to plug it into Deep Dream. The way that edges move across the image is almost reminiscent of the emergent patterns in Conway's Game of Life. Just for funsies, here's that same Sobel gif, but in reverse and with no looping.

Other filters like the Prewitt operator can be used to produce similar effects. Take a look. Interestingly, this one falls apart way more quickly than the Sobel one does.

Since these filters emphasize edges, and since earlier layers in a neural net also emphasize edges, one might be tempted to combine the two. It turns out that very quickly their feedback erases any sign of the original image. Here's the result of setting end='inception_3b/5x5_reduce' in the Sobel code above. The reader is encouraged to experiment with different output layers for any or all of the transforms discussed.

Sobel, Prewitt, & co are nice, but they lose all color data. What about transforms that retain more of the original image? I haven't dived too deep yet into the possibilities here. Some ideas to ponder: Gaussian blur (here's the best I've come up with -- nothing too special, but look, the doge sprouts legs!), dividing an image into subregions and performing distinct affine transforms on each, weird color shifts, combining hardcore high-octave-count, high-iteration-count deepdream steps with classic glitch effects (e.g. pixel sorting), using roll effects but alternating between horizontal and vertical, and so on. The sky's the limit! If this post gave you a super cool idea, be sure to leave a comment.