We need to talk about K = viN…

Every tech entrepreneur dreams about making a product that goes viral, but true virality is as rare as a unicorn and poorly understood. A proper understanding of the viral coefficient, K, can give insights that enable a more systematic approach to maximising it. Because as far as virality is concerned, K is hiding something pretty important…

I’m going to distinguish different types of virality and go into detail about the various ways to model it. I’m also going to introduce the idea of “pre-virality” and why it shouldn’t be discounted. The stakes are high, viral growth means companies literally don’t need to spend a penny on marketing, and yet achieve explosive popularity and success. However, unless you have a maths background, virality can be quite counter-intuitive. (Beware, this article contains equations!)

My experience of viral growth

I’m no stranger to virality. I’ve been watching and playing with viral experiments since 1999. I’ve witnessed abject failures as well surprising successes. My interest began with setting up an experimental R&D team within moonfruit.com (a dotcom started by my two besties from Imperial, Wendy and Eirik). My team built intelligent agents to help people discover the web by learning how and what they searched for. The agent project never came to light (funding was pulled), but, on the way, we released two interactive desktop toys to test out some of the cool tech that we were building, and thought they would be good viral branding experiments. Neither went viral, instead they both quickly died out upon release. That was the end of that. A few years later, I noticed that suddenly everyone, it seemed, was inviting me by email to sign up to a web service called Birthday Alarm. It was clearly going massively viral. I didn’t know it at the time, but it turned out this was the brainchild of one of my physics classmates from Imperial. Michael had been experimenting both systematically and persistently, and finally hit a jackpot. He went on to create and sell Ringo and then Bebo, so it was well-worth the effort. Then in 2009, Wendy masterminded a simple but ingenious Twitter competition which went so viral that at one point more than 2% of all tweets globally contained the #moonfruit hashtag. Twitter actually pulled the campaign it was so disruptive. Interestingly, copycat competitions failed to go viral. More recently, I had a surprising, albeit smaller, viral success in 2015 when an article about how to learn a language quickly that I wrote on Quora suddenly started sending so much traffic to my latest startup, Kwiziq, that we upgraded our servers to cope. It was something I’d written months before and completely forgotten about. Unbeknownst to me it had been growing slowly and then suddenly, boom!

Virality is no easy thing to achieve – and certainly comes with no guarantees – but all my experience has shown me that it is possible to approach it systematically and tip the odds in one’s favour. I decided to write this article to pull together all my own thoughts and learnings over the last sixteen years because at Kwiziq we are trying to grow as virally as possible.

The classic viral loop

This is how the classic “viral loop” is generally viewed:

viral-loop

Intuitively, we see that, if the average number of people invited by each user, Nmultiplied by the percentage that accept the invitation, i, is greater than 1, then a positive feedback loop is created since each user effectively generates at least one more.

For this loop then, we define the viral coefficient as:

 K = i N 

This is a great starting point for understanding virality, but as we will see, the problem with it is that it obscures more than it reveals. As a result, it gives almost no insight into what to do increase K. We also mustn’t simply accept this is a good model for the realities of word-of-mouth sharing, without question, because a bad model will misguide us. We are need to understand the viral process in more detail.

Viral Models

All models are wrong,” as we physicists like to say, “but some are useful.

How we model virality for a product depends very much upon when and how people can and do share it or invite their network. Models are idealisations and therefore always wrong in some way, but they have two very important roles to play in a startup:

  1. They can help predict growth, but more importantly :-
  2. They give us insights that can guide product and process design

So the choice of model is rather important. There are two popular models which display very different behaviour and yet both mathematically model the same loop.

The Steady Sharing Model

The simplest model (often the starting point for more sophisticated models used by viral blogger Andrew Chen) bases growth on the idea that user base will always bring in a proportion, k, of new users (via sharing) every viral cycle.

Note that we don’t actually need to know what the viral cycle length is, but we’ll see how important it is later.

If we start with a number of users,  U_{0} then the total number of users we have after one viral cycle, or  \Delta t is:

 U_{\Delta t} = U_{0} K

where  K = 1 + k

e.g. if k =15\% growth then K = 1 + 0.15 = 1.15

The number of users after an arbitrary time,t, is just this formula iteratively applied to itself  t / \Delta t times:

 U_{t} = U_{0} K^{(\frac{t}{\Delta t})}

This formula acts exactly the same as compound interest on a principle amount. So long as  k is more than zero, you’ll get some growth. But, the model is only valid if users do actually continuously share.

If we ignore network saturation for now – i.e. if we assume that for the first twenty cycles we still have plenty of room to grow – then Steady Sharing looks like this for varying values of K:

steady-sharing-model-k-behaviour

At very low values, there’s minimal contribution to growth.  Effects like this would easily be masked by any inbound or paid marketing channels. As K grows though, since the growth is exponential, the viral channel would soon make itself very clear indeed.

The Big Bang Theory

David Skok proposes a replacement model for Steady Sharing which I call the “Big Bang” viral model. In this model new customers invite / share to all the people in their circle of influence all in one go in the first loop of the cycle but then stop. The formula for growth in this model is slightly more complex, and  K here has a slightly different meaning.

If we start with a number of users,  U_{0} then if every new user subsequently successfully brings  K new users within one viral cycle, or  \Delta t is:

 U_{\Delta t} = U_{0} K

bringing the total users to  U_{\Delta t} + U_{0} .

(Note that K doesn’t have to be an integer – if a thousand users brings 33 more, say, then K would be 0.033).

Since only new users bring new growth, the equation for the number of users after time t is a little more complex:

U_t = U_{0} \cdot {K^{(\frac{t}{\Delta t}+1)} -1 \over K - 1}

(if you’re curious why it’s that, there’s a nice derivation here).

At large values for K, the two models behave in an increasingly similar manner (Steady Sharing lags by one cycle), however for low values (compare the graphs on the left for both models), the two mechanisms are starkly different:

big-bang-model-k-behaviour

A K value of 1.0 produces linear growth, above 1.0 produces exponential growth but below values of 1.0 this type of sharing dies out; the number of users reached approaches a limit.

Models versus reality

So, which of these models is true?

Neither. And both.

Skok claims the Steady Sharing Model is wrong, but it’s no more wrong than any other model. Which model is more accurate and applicable depends on the type of product, the user behaviour, and how sharing is enabled. Indeed, it’s possible for one, both or neither to apply.

Some products naturally support steady sharing – indeed multiple opportunities for users to share to the same people: content-based products like YouTube, for example. Others encourage Big Bang invites of entire contacts list, all in one go.

The Big Bang model is a very good fit for one-offs: articles, books, songs, videos etc. – things that people experience once and where there’s an immediate desire and opportunity to share but little incentive to return after. It’s also a good fit for products that haven’t achieved strong stickiness yet or where only one opportunity to invite people is given.

How do these models match reality though?

Learning from real examples

I don’t have enough data or insight for the moonfruit competition or Birthday Alarm, so I’ll pick two of my own examples.

Example #1 Desktop Toys

These were simple .exe files that we sent to a seed base of users – by email – and tracked the spread via code in them that pinged our servers when they were activated. They pretty much followed exactly the Big Bang model with a K value somewhere less than 0.5 (I can’t recall the seed user number exactly but we didn’t get a large multiple of the seed users). We were’t expecting miracles – we knew that sending a .exe install file was a pretty hefty ‘payload’ for a viral to carry and that a proportion of people would be suspicious (even though it was a harmless install).

We were disappointed naturally, but if I knew then what I knew now, I’d have had a better idea of how to measure and increase K.

Example #2 Quora article

The Quora also went viral in very much a Big Bang way, at first anyway, only slowing when it reached saturation in the primary network – roughly a million views (reaching what Chen calls the network’s “carrying capacity”).

However, the growth didn’t slow to zero as predicted by Big Bang.

In fact, it went pretty linear and still garners about 5-6k views a month which I’m more than happy with as it generally keeps me top of this list and continues to drive traffic to Kwiziq. In reality, it didn’t follow either model closely.

Why?

All models are wrong. In the real world, things are fantastically more complicated than our simple models. In reality, it’s quite likely that some users will be Steady Sharing (I know several teachers who share it ongoingly, for example), and others will Big Bang share when they encounter it; and others still who do both. It’s possible to a be a hybrid of both models.

Furthermore, and this is vital to understand, the source of new users (or readers in this case) is not limited to those generated by the sharing of the article. New readers are being brought in by Quora (internally via emails they send out highlighting popular articles, and via external search). They also get traffic from links from syndicated versions of the article such as these Huffington Post and Business Insider versions.

Neither of the models account for external sources to the viral cycle. In both though, this would add an extra boost to the number of users in any given cycle.

Finally, neither accounts for multiple sharing mechanisms.

The article gets shared by email, Facebook and Twitter as well, to name a few (Twitter being the most obvious to me as I often get an @gruffdavies mention).

So in fact, instead of one value for K, we really have one for each of channel and they will be different.

The important learning from this is that, even if the sharing mechanism is Big Bang and has a K lower than 1.0, if there are other mechanisms for introducing fresh users each cycle, then K becomes a magnifier.

Pre-virality

If we have external sources of users (paid or inbound marketing for example), then even lower values of K than 1.0 are potentially still hugely valuable. Values between 0.5 and 1.0 are what I called pre-viral and still well worth attaining, because each user then multiplies into more, even though the effect is self-limiting:

K value Multiplying Effect
            0.5 200%
            0.7 333%
            0.8 495%
            0.9 891%

Using models as a guide to design

Models aren’t just good for telling us how a product or service might spread, but can guide our tactics and strategies underpinning growth. They can and should guide product and process decisions. 

It’s clear that non-viral sources of acquisition should not be overlooked. Any web-based service is going to have SEO and paid marketing channels – and these channels can help to continuously feed viral growth mechanisms even when the K values are low. It’s also clear that, if possible, steady sharing mechanisms are going to have a huge benefit in the long-term.

There are obvious major learnings to be had immediately from these models, and a third which may be less obvious:

  1. Big Bang is better than Steady Sharing for high values of K, but Steady Sharing produces continual growth at low values, therefore the ideal scenario should support and encourage a hybrid of both.
  2. In Steady Sharing for all K, and Big Bang for K > 1.0 reducing the sharing cycle time has an extraordinary impact on growth because it’s exponential. Optimise for the shortest possible cycle times.
  3. K is actually the combined result of multiple steps of a funnel. Each step reduces K by a factor. So decompose K into its components and then optimise each.

All this brings me to the title of this blog post.

We really need to talk about K = viN

Kwiziq already supports multiple sharing mechanisms and one invitation mechanism (teachers, tutors or studygroup coordinators can invites students to join). We haven’t supported users-inviting-users yet but this is high on our agenda. However, it seemed obvious to me that, with respect to virality, because we only support one language right now, this dramatically reduces K.

The classic viral loop is missing a vital detail: relevance to the user. A user won’t even think about accepting an invitation unless the offer is relevant to their needs.  In our case, if they aren’t learning French then they won’t consider accepting. So the viral loop really looks like this:

viral-loop-including-relevance

Only a percentage of users invited, v will find the product relevant, of which a further percentage, i will accept, which gives us:

 K = v i N 

Very high relevance is a key reason that Michael had such success with Birthday Alarm – everyone has a birthday and everyone has people whose birthdays they want to remember. It’s the reason moonfruit’s twitter campaign was so successful – almost everyone wanted a MacBook and the “cost” of retweeting a hashtag to have the chance win one was negligible to the user.

We estimate that a third of the world speaks another language badly (essentially our market) which puts an absolute upper limit on v. In reality, though it’s much smaller currently: it’s whatever proportion of people are interested specifically in learning French. So clearly, offering as many popular languages as possible has the potential to multiply the current value of v (and therefore K) by significant multiples.  By comparison, any efforts to optimise the acceptance rate, i, are going to be limited and far harder to achieve per percentage point than expanding into new languages.

We already know it’s possible to achieve virality with more languages since at least three of the major players in the language learning space grew virally to 30M – 150M users over the space of a just a few years after they started offering the most popular languages.

Analyse Feasibility: the Kevin test

The K = viN equation lets us do a decent feasibility study on whether we even stand a chance at making a product grow virally. Let’s plug some values in. We first need to know the maximum potential values for each based on our knowledge of the product and the market. Note, these will be way higher than anything we could achieve, we’re just looking for an absolute ceiling value because if this figure isn’t viral then we can’t ever hope to achieve it. We’re a looking to see if our ceiling value is greater than at least ten here because we need to be realistic about achieving only 10% of the ceiling = 1.0 for virality), or there’s just no point. Next we’ll use what we think our current starting values could be and then, target values that we hope are attainable.  For N, I’m going to use the median number of Facebook friends people have as this is a published figure: about 200.  The mean figure is higher at 338, but we should always use conservative estimates.

For Kwiziq, then, the K = viN feasibility analysis looks like this:

Relevance Invite
Acceptance
Rate
Average
Network
Size
v i N = K
ceiling :  33% 50% 200 33 !!
est. starting : 5% 5% 200 0.5 pre-viral
target : 15% 10% 200 3 hugely viral

So, Kwiziq passes the “Kevin feasibility test”.

Detailed viral analysis

This is just the beginning of what we must do in analysing the viral loop though. It behooves us to understand every single step in the funnel, whether it’s a real step involving a physical action, or a step in the user’s decision process in their head. What we need is something reminiscent of the Drake Equation.

Here’s a an example as a starting point for a specific product viral loop, but in real life, this should be an ongoing process of identifying and optimising every possible step of attrition that can reduce  K, including the UX itself that could inhibit sharing at the top of this cycle.

kwiziq-viral-loop

There are fairly obviously a great deal of potential UX improvement steps between a user joining, wanting to and being easily able to invite others that will be worth careful attention.

Measure

“If you cannot measure it, you cannot improve it” – Lord Kelvin

One final piece of advice, measure everything. The minute you have a measure to track, you’ll start thinking of ways to improve it.

You may not always have direct measures (how do you measure real word-of-mouth, for example?) If, like Kwiziq, your product supports many different acquisition types and channels, and is a hybrid viral model, then you need to look for indirect measures that will indicate when they are working.

I would suggest you track something like the ratio of New Paying Users to New Registrations per week/month. This is especially revealing because paying users will contain most of your “sneezers” (they like your product enough to pay for it, right?). New users coming though viral channels will start to make this metric rise non-linearly at the beginning (it will eventually become asymptotic towards 100% if viral takes over). Your efforts to improve UX and address anything that’s reducing K ought to be visible here early. If your paying users are growing faster (at higher order) than new users, you’re doing something right!

For a “freemium Saas” product like Kwiziq, measuring the ratio of new paying users to all new users will reveal any viral contributions to growth before they become apparent in your overall acquisition measures

NB: you will need to understand and analyse any other mechanisms that can cause non-linearity here to be sure, but over time, a metric like this ought to detect viral growth well before it becomes visible in your overall acquisition metrics.

 

Good luck!

P.S. The flow diagrams in this blog post were created in seconds using Pidgin.

Use Pidgin for faster Flowcharts and Relationship Diagrams

I’ve released a new version of Pidgin (still in beta) this weekend, with some cool new features, but although I’ve used pidgin in posts before, I’ve never actually blogged its original release for lack of time.

Pidgin is (free) tool for people who, like me, aren’t very good at drawing and would rather a computer do that for them. If diagram tools are more of a fiddle than a help to you, then you might prefer Pidgin too.

Pidgin is a simple language for describing graphs, which it then draws for you, so you can focus on jotting down your thoughts instead of battling some drawing interface. It can now draw two useful types of diagrams: Entity Relationships Diagrams and (new this week), Flowcharts.

Entity Relationship Diagrams

These are specifically useful to software engineers, but in fact any hierarchy, tree or object graph can be drawn so these are for all sorts of things. In Pidgin, stick to the singular form of the things and then describe how they’re related, like this:

Car contains one Engine
Car uses Petrol
Engine contains many SparkPlug
Petrol is Fuel

Pidgin will then draw it for you:

34

Notice that has, uses and contains have a special significance in Object Oriented Programming, and they’re represented with symbols stolen from UML.  If you just  want a simple hierarchy, the stick to using has. You can also wrap entities in speech marks if you need more than one word to describe them. Here’s an example planning the menu structure for a website (but it could just as easily be an org chart):

"Homepage" has "Footer"
"Homepage" has "Top Menu"
"Homepage" has "Homepage Body"
"Homepage Body" has "Blog News Summary"
"Footer" has "Privacy Policy"
"Top Menu" has "1. About Us"
"Footer" has "Contact Us"
"1. About Us" has "1.1 History & Mission"
"1. About Us" has "1.2 Staff"
"1. About Us" has "1.3 Blog Recent Posts"
"Top Menu" has "2. Program & Services"
"2. Program & Services" has "2.1 Overview"
"2. Program & Services" has "2.2 Great Programme A"
"2. Program & Services" has "2.3 Great Programme Z"

50

You can create a wide variety of useful relationship diagrams like this, pretty much as fast you think – it’s a real time saver. See more examples of Entity Relationship Diagrams made with Pidgin.

Flowcharts – new!

Pidgin now supports a very simple flowchart syntax so you can describe processes. I haven’t bothered with the usual diamond, round and square box styles as I’ve always thought they make flowcharts messy and hard to read and they don’t add anything. Pidgin flowcharts are simple and clean to read. You just write then between steps and if you need a conditional step, simply write it before then , like this. (the condition can be anything):

EnterCar then StartEngine
StartEngine then "Did it start?"
"Did it start?" yes then DriveHome
"Did it start?" no then "Try again?"
"Try again?" yes then StartEngine
"Try again?" no then Panic
DriveHome then Done

Notice I’ve wrapped the conditional steps in quotes, this isn’t just because they’re more than one word: currently you must use speechmarks if you want to use a question mark.

89

Editing Pidgin – new code editor

The code editor has some powerful new features. If you double click an entity, the other instances are highlighted too.  You can CTRL-double click some or each of them to edit more than one name at once, which is super handy. To rename all instances of a token, CTRL-H for search and replace. (Be aware Pidgin is case sensitive, so MyEntity is not the same as myEntity – you’ll end up with a wonky diagram, if you do that.)

pidgin-token-highlight-and-multi-select

Pidgin is still very much in beta, so save your text and graphs if they’re important, just in case. It now catches many basic syntax errors and gives line numbers (but it doesn’t catch everything yet and sometimes you just won’t get a graph if you make a mistake):

pidgin-basic-error-help

 

 

New Display Options

There  are two new display options for graphs: layout and privacy. 

Private graphs let you elect to hide a graph from public lists (although, in fact, no graphs are listed anywhere yet, but this is something I intend to support in the future since it’s nice to share work.)

pidgin-display-options

Layouts by default are left to right (horizontally) vertical layouts are more readable for certain graphs:

90

That’s pretty much it for this release!

If you find it useful, or find any bugs or have any feedback, do let me know.

Gruff

Make your own AI Art with Deep Neural Networks

I’m seriously rubbish at drawing and painting, but who needs arty fingers when you can whip up your own AI to make fine art for you?

Back in 2010, when The Looking Glass Club was published, my then-agent had complained that the technology in 2035 seemed too advanced for just a few decades hence, but despite justifying this with The Singularity, even I’m surprised by just how much AI has advanced in the five years since. I gave a talk about AI at Imperial a few months ago, and demo-ed how Deep Neural Networks are starting to get scarily good at things that, until now, have remained firmly in the realm of human capability, like learning to play Breakout better than you without knowing anything about the game beforehand, and imagining trippy works of art.

Yesterday, I came across @DeepForger (a Twitter bot by @alexjc) that paints impressionistic versions of your photos by copying the style and techniques of famous artists. How’s that for living in the future?  It painted this amazing portrait of me in the style of Picasso:

gruff-picassoed

Okay, it’s not quite Picasso, but it’s really quite remarkable. It’s understood the stylistic essence of the art piece it was given and created something in a similar vein!

This extraordinary AI is an implementation of a neural art algorithm that was only published at the end of August!

Unsurprisingly @DeepForger has a long queue of commissions so I decided to have a go at creating my own AI Artist that I could whip into making art at will. If you want to do the same, here’s how I did it, and what I learned in the process, so that hopefully you can avoid some of the mis-steps I took.

I’m working on a setup that can handle bigger files as I write this, but here’s an Escher-inspired memorial of mum’s cat Misty, who sadly passed away a few days ago:

escher-misty

How to create an instance of “Neural-Style” on an Ubuntu VM

Neural-style is one of several implementations of Leon Gatys et al‘s neural art algorithm on github. If you know what you’re doing, and already have an Ubuntu server or VM, that link should be enough to get you started.

I’ve been experimenting with three different setup flavours:

  1. An Ubuntu VM running on VirtualBox on my laptop
  2. A more powerful Ubuntu VM created as a droplet on DigitalOcean
  3. An AWS GPU instance

I’ll give detailed steps here for the first two configurations, and you can find steps for setting up an AWS GPU instance nicely documented here. (UPDATE: the AWS+docker implementation caused me too many issues, but I finally got a GPU-accelerated instance up and producing amazing results; I’ll post at the end) First though, here are some important things that can save you some pain and why I tried all three:

Learnings/Gotchas – read these first!

The two main causes of pain were lack of memory and directory permissions.

Neural-style needs at least 4GB of RAM on the Ubuntu instance if you use the default settings. You can easily brick your VM if you don’t have enough (VirtualBox hangs and goes into Guru Meditation mode you have to start over). It has a mode which only needs 1GB RAM but it doesn’t give good results AFAICT.

You can get it to work with less RAM by setting the image_size switch to either 256 or 128 pixels (default is 512) but then the images are too small to be good.

If you’re using VirtualBox you can probably give the VM about 45% of the host RAM safely – more than this and you’ll run into trouble. This is why I ended up trying DigitalOcean and AWS although I did get eventually get a version working on my laptop too.

If you use Vagrant to set up your VM then make sure you run all the commands using sudo to avoid permission errors which can be arcane and hard to figure out.

The default mode of neural-style is optimised to use a GPU but there’s a switch to force CPU.

I found that VMs with lots of cores (whether CPU or GPUs) didn’t make the code run any faster so there’s no point renting a cloud VM with loads (e.g. AWS g2.2xlarge is enough). GPUs run the code substantially than CPUs though (because matrix ops are done directly on the chip).

Setting up a VM on DigitalOcean is far simpler than AWS but they don’t have a GPU virtualisation so they are quite slow, no matter how many CPUs you go for.

Step 1 – Spin up a fresh Ubuntu VM

If you use Vagrant and VirtualBox:

  1. On your host create a new directory for your project and then in powershell or cmd run vagrant init ubuntu/trusty64
  2. vagrant up
  3. You can either increase the memory before step 2, using the Virtual Manager, or poweroff and do it after checking your VM works.

Alternatively, if your home machine isn’t powerful enough, you can use DigitalOcean to create a cloud VM. Just create a new droplet with at least 4GB (preferably 8GB) based on Ubuntu 14.04 and SSH in. (Remember to use sudo for everything, if you’re not root which you won’t be in a VirtualBox machine.)

Then, follow these commented steps in turn to configure your VM (you have run each step separately, this isn’t a script but if anyone fancies knocking up a Chef recipe to do this, do share!)

# you'll need git
sudo apt-get install git

# 1. install lua package manager
sudo apt-get install luarocks

# 2. install lua just-in-time compiler 
sudo apt-get install luajit

# 3. install torch - a scientific computing framework for luajit
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-all | bash

# 4. protocol buffers - google's data exchange format
sudo apt-get install libprotobuf-dev protobuf-compiler

# 5. install caffe - a deep learning framework
sudo luarocks install loadcaffe

# 6. to handle various image formats
sudo luarocks install image

# 7. a neural network library
sudo luarocks install nn

# 8. and finally, get "neural-style" from github repo
sudo git clone https://github.com/jcjohnson/neural-style.git

# 9. go into the directory you've just cloned
cd neural-style

# 10. Download the neural models that contain the art genius - this step may take a while
sudo sh models/download_models.sh 

You’re should be almost ready to make some art. You may also want to set up another shared folder for image resources, but I was lazy and just the vagrant folder on my host which was shared by default and copied stuff in/out of this directory as needed.

You will need a source art image for the style (in fact, you can use more than one but start simple) and a photo that you want to artify.

A headsup

Before you start, here’s what to expect if everything runs smoothly: we’ll use a verbose flag (-print_iter 1) so you should see steps as they happen, if things go quiet for more than 30s or so then check the state of your VM using Vbox manager to make sure it’s alive – if not, it’s probably run out of memory.

When neural-style is running, it starts by creating convolution layers which you’ll see in the output, and then after a few more steps, it should start producing image iterations. You need between 1,000 and 2,000 iterations for a good image. If you don’t see output like this with iterations every few seconds then something has gone wrong:

conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Setting up style layer          2       :       relu1_1
Setting up style layer          7       :       relu2_1
Setting up style layer          12      :       relu3_1
Setting up style layer          21      :       relu4_1
Setting up content layer        23      :       relu4_2
Setting up style layer          30      :       relu5_1
WARNING: Skipping content loss
Running optimization with L-BFGS
Iteration 1 / 1000
  Content 1 loss: 2094767.343750
  Style 1 loss: 26350.109863
  Style 2 loss: 4467230.468750
  Style 3 loss: 1869238.085938
  Style 4 loss: 90886093.750000
  Style 5 loss: 12086.410522
  Total loss: 99355766.168823
   creating recyclable direction/step/history buffers
Iteration 2 / 1000
  Content 1 loss: 2094763.906250
  Style 1 loss: 26350.109863
  Style 2 loss: 4467230.468750
  Style 3 loss: 1869238.085938
  Style 4 loss: 90886087.500000
  Style 5 loss: 12086.410522
  Total loss: 99355756.481323

Finally: let’s make make some AI art!

If you want to do a small test run with safe options, try this (replacing the two image filenames with your real files):

# small fast images
sudo th neural_style.lua -style_image ORIGINALARTISTIMAGE.jpg -content_image YOURPHOTO.jpg -gpu -1 -image_size 128 -print_iter 1

If that works, then you can try something larger:

# default image size is 512 anyway, takes more memory the larger the file
sudo th neural_style.lua -style_image ORIGINALARTISTIMAGE.jpg -content_image YOURPHOTO.jpg -gpu -1 -image_size 512 -print_iter 1

If you absolutely have to work with less than 3GB available then you can change the optimiser but to get good results you’ll need to experiment A LOT more, and normalising gradients is recommended:

sudo th neural_style.lua -style_image ORIGINALARTISTIMAGE.jpg -content_image YOURPHOTO.jpg -gpu -1 -image_size 512 -print_iter 1 -optimizer adam -normalize_gradients 

UPDATE – AWS GPU-accelerated version with CUDA:

Docker seemed to complicate everything for me, so I went back and unpicked the dependences so I could you the plain install with AWS.

Here are the modified steps:

# follow these steps for AWS/GPU 
# http://tleyden.github.io/blog/2015/11/22/cuda-7-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/

# then follow ONLY the mount GPU steps here (not the docker stuff)
# http://tleyden.github.io/blog/2015/11/22/running-neural-style-on-an-aws-gpu-instance/
# and then the steps later marked optional

# you'll need git
sudo apt-get install git

# install lua package manager
sudo apt-get install luarocks

# install lua just-in-time compiler 
sudo apt-get install luajit

# install torch - a scientific computing framework for luajit
# (takes quite a while!)
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-all | bash

# protocol buffers - google's data exchange format
sudo apt-get install libprotobuf-dev protobuf-compiler

# install caffe - a deep learning framework
sudo luarocks install loadcaffe

# to handle various image formats
sudo luarocks install image

# a neural network library
sudo luarocks install nn

# and finally, get "neural-style" from github repo
sudo git clone https://github.com/jcjohnson/neural-style.git

# (optional for GPU/AWS) install cutorch - a CUDA backend for torch 
# see https://en.wikipedia.org/wiki/CUDA
sudo luarocks install cutorch
sudo luarocks install cunn

cd neural-style

sudo sh models/download_models.sh

# you may need SFTP set up too - I assume you can work that out

# now you can paint FAST!
sudo th neural_style.lua -style_image style.jpg -content_image subject.jpg -print_iter 1

And this is the first result I got! Stunning!

yvette

And a much better Escher-cat:

out

Have fun! And share your art!

Gruff

 

A neat MySQL hack to create a VIEW with subquery in the FROM clause…

I’ve grown to love MySQL, but every now and then it has some strange restriction that temporarily stumps me. Then I discover a marvelous workaround and fall in love with it all over again.

This hack lets you overcome the restriction that you’re not allowed to have a subquery in FROM clause of a VIEW.  The trick is to refactor the subquery into a view of its own, which you can then join to in the VIEW you want!  The subquery is likely to be something you’ll re-use anyway, so this is doubly useful.

Here’s a worked example.

I’m working with some currency information for products that is (unfortunately) denormalised into columns instead of being Boyce-Codd normal form.  You see this a lot in databases as it can make front-end code easier, but it makes it hard to work with in SQL.

It looks like this:

denormalised prices

I’ve renamed and simplified tables and columns here to make the example clearer.

When payments come in, they’re in a specific currency and they may be discounted from the standard package price.  I want to create a view that gives me exchange rates so I can report total revenues in a common currency.  I need to use the exchange rate that reflects the prices stored in the packages.

I want to be able to join a payment that came in against a package with the effective exchange rate for that currency in that package, so I need a view that shows package_id, currency, per_GBP.

To get this, I want to join a list of active currencies with the packages, and do a CASE on each currency to give the ratio in question. Easy enough, it looks like this:

-- NB: this DOESN'T WORK
create or replace view exchange_rates as
select  packages.id
,       active_currencies.currency
,       CASE currency
            WHEN 'GBP' THEN 1.0
            WHEN 'USD' THEN price_usd / price_gbp
            WHEN 'EUR' THEN price_eur / price_gbp
            WHEN 'CAD' THEN price_cad / price_gbp
            -- etc
        END as per_GBP 
from (  select distinct currency
        from payments) as active_currencies -- unfortunately currently ILLEGAL in MySQL
join packages -- deliberate cross join
;

MySQL spits a dummy over this, but we can trick it into playing nicely like this:

create or replace view active_currencies as
select distinct currency
from payments
;

create or replace view exchange_rates as
select  packages.id
,       active_currencies.currency
,       CASE currency
            WHEN 'GBP' THEN 1.0
            WHEN 'USD' THEN price_usd / price_gbp
            WHEN 'EUR' THEN price_eur / price_gbp
            WHEN 'CAD' THEN price_cad / price_gbp
            -- etc
        END as per_GBP 
from active_currencies -- ah, that's better!
join packages -- deliberate cross join
;

Now, when to convert payments to a common currency (GBP), I just join to my exchange_rates view on the package id and the currency and then use the exchange rate to convert to GBP!

Since I’m likely to use active currencies elsewhere, it’s actually a bonus to have this view. I suspect this technique may be an alternative use using this hack with stored procedures to solve restrictions with VIEWs containing group/count aggregate functions.

Writing games in HTML5 and CoffeeScript combining Box2D and EaselJS in an OOP/MVC architecture

Crap. That was a mouthful.

There seems to be a huge amount of stuff on Her Majesty’s Interwebs about all of these topics, but very little explaining how to pull them all together into something coherent. There’s a shit-ton of bad code to copy to get something bouncing around on screen, but none of it cares about scaling well into anything complex.  And there seems to be a massive dearth of diagrams showing how anything connects together, whether it’s classes within libraries, or how you’re supposed to fit them together in your own code. I don’t know about you, but I can’t design anything without diagrams, so expect lots here.  In fact, let’s have one now.

A diagram about diagrams

There. I feel better already.

(If you want to sketch diagrams like this quickly to help you design your code, try Pidgin. It’s a cool little tool that lets you write out relationships and it does the drawing-y bit.)

Now, I’m fairly new to Javascript having spent most of my (front-end) programming life mucking around with languages designed explicitly for Object Orientation, not bastard sons of webpage scripts. Javascript was a bit of a shock for me. Not an entirely unpleasant one, I admit: I discovered the language was far more powerful and flexible than I had realised – but, having delved beyond dabbling with it, and faced with the challenge of creating something substantial that needs to combine two powerful 3rd party libraries like Box2DWeb and CreateJS, I have definitely decided Javascript is not for me. Its syntax is horrendously ugly and hard to read, its “gotchas” many and dangerous, and writing OO code that will be easy to maintain and DRY is just expecting too much from it.

Or perhaps me.

CoffeeScript

Enter CoffeeScript. It’s JavaScript, The Good Parts with a Ruby-like syntax. Excuse me a moment while I just fetch some Kleenex.

Let’s look at JavaScript versus CoffeeScript syntax around Inheritance for example. I’ve stolen this from the CoffeeScript site showing what you would write in CoffeeScript (on the left) needed to do what JavaScript is doing on the right:

CoffeeScript versus JavaScript for Inheritance

Okay, the death text might have been me.

There’s no contest.  Given this syntax (JavaScript’s) any benefits provided by Inheritance are just eaten up by the unintelligible, un-maintainable gobbledygook you have to write to get it.  We don’t need inheritance; we can always use composition to achieve re-use (and we still will), but there’s something very elegant about having your game entities extend an underlying class that (say) understands about being dynamic in a 2D world.

CoffeeScript comes with it’s own cons of course: it’s compiled, not run natively in browsers, so that means you have to have a build process. You can either have the build done by the server so you actually deploy CoffeeScript, or you can build as part of testing and deployment.

EaselJS and Box2D libraries

If you’re just hacking around and trying to get your head around either EaselJS, and/or Box2D, and you’ve never tried CoffeeScript, you might want to just go ahead and get something working first using JavaScript (that’s certainly what I did). To write CoffeeScript, you need to understand JavaScript anyway (it compiles to JavaScript after all), and you have enough on your plate to learn as it is. It’s a challenge to take powerful libraries for display and physics and combine them in a way that won’t make you cry when you look at your code in a year’s time (also what I did). I’m going to assume that you care about writing code that’s nicely architected and easy to maintain. If you don’t, you’re reading the wrong blog post.

I’ve chosen to work with EaselJS (part of the CreateJS suite) and Box2Web (for now) but the real challenge here is actually independent of the choice of libraries. It’s not about which one you choose to render graphics and which one you use to handle physics, but how to combine them well. A decent game architecture should allow you to switch either fairly easily, and also port your game to different platform, like a specific mobile device (iOS/droid) or a platform-independent SDK such as Corona.

The Model-View-Controller Pattern

This is where MVC comes in. Model-View-Controller is the architectural design pattern adopted by web frameworks like Ruby on Rails and CakePHP. It’s more of a principle than a specific architecture, since it can take different forms depending on what you’re actually building. A game is not a web application built around a relational database, so the architecture of a game isn’t going to look exactly the same as a web system. However, the real principle at the heart of MVC is Separation of Concerns.

Now, I’m not actually an architecture purist – I firmly believe there is a balance to be had between writing beautifully architected code and getting something built and shipped. I am a big fan of Lean. There is absolutely nothing wrong in my view, in hacking some hideous Frankenstein of code together to get something working and out there. Software architecture isn’t like building architecture; its beauty is not on display for all to admire. I do think though, that once you have something out there, it is worth thinking about how you can refactor it so that the code is easier to maintain and scale. (It’s not even about code re-use. To paraphrase Matz, it’s not about achieving perfection, it’s about how you feel when programming. I don’t mind hacking prototypes to get something working or to learn how something works, but I hate working with that code afterwards.)

Productivity not purity

MVC comes with its own overhead in terms of number of classes, complexity and messages being thrown around. It’s interesting that MVC hasn’t been widely adopted (apparently) in games industry projects is cited as A) performance (probably true) and B) the fact that the Model in games is often “the same as the View” (partially true e.g. if you have a dedicated GPU for rendering then you may well rely on its ability to other fast vertex operations like collision detection which breaks MVC completely). In a web game though, the model can be properly separated from the View.

Remember though, it’s not a perfection or purity of this paradigm but getting the benefits of the underlying principles of Separation of Concerns.  Architecture is a function of project scale, not just the technologies involved: If you’re planning something tiny, it’s certainly overkill to go for a fully OOP and MVC design, but you should still about separating concerns and writing modular, DRY code. The great thing about libraries like EaselJS and Box2DWeb are that they are already dealing with separate domains. EaselJS is all about the View. Box2D is all about the Model.

So, is your game code the Controller? Hmm, no. It’s all three, and mostly Model, in fact.  We need to understand what MVC really means. What is each part concerned with and how do they interact? I keep seeing diagrams showing MVC that are wrong.  Let’s fix that now:

MVC

Note: those diamonds are not arrowheads. They’re UML… the lines point away from the diamond.

However the responsibilities of Models, Views and Controllers, the dividing lines of the separation of concerns is very different in a game than it is in a web-application.  The decision about what goes where. For example, visual effects, like particles, will probably use the physics library but really they’re View code because they’re not truly part of the game Model.  The game is independent from them. They’re visual gloss. They could be rendered completely differently without affecting the game.

Controllers

Controller classes are all about handling action input from the User. This can get a little fuzzy when we consider that this can include input from elements from the screen, e.g. like on-screen buttons, but remember even when the user taps a button on screen, it’s the touchscreen on top of the View that’s handling the touch.  Controller classes are responsible for responding to the User’s desires and making the internal game state change appropriately. This includes creating the game world in the first place when the User clicks ‘play’.  Controllers don’t necessarily fire events or call methods on Models – they can also be polled to get their state.  You can use both: i.e. your game start button’s click event can call Game.start() but during the game, you might want to know if the left key is down or not during a game tick to decide whether to apply a force left to your hero. You poll the keyboard controller for this information.

Models

The Model classes are all about representing the internal state of the game and the logic about changing state. So that’s the physical world itself and its contents, and all game elements in play.  This is more than just the physics; physics is just part of the game logic.  Notice that although the Model is not concerned with rendering, it does “update” the View.  All this means is that it is responsible for the telling the View what’s changed.

Views

View classes are only concerned with rendering things so the user can see them.  We should also include in the View anything sense-related actually, so that means audio too.

To MVC or not to MVC…

If you’re planning to port your game to run on different platforms, or you’re not sure about which libraries you’re going to use for physics or rendering, then you will probably want to be very clean about keeping these domains decoupled.  In other words, you will want write your own classes to wrap around any library classes.  Remember, we’re not just Separating Concerns for fun, we do it because it will make life easier and coding more enjoyable. If it is going to make things harder overall, don’t do it!

In a (business) web application, separating the model and view is easy because there’s usually nothing visual about the business model. Games are intrinsically visual: the view and model are coupled, and there’s no escaping this.  It’s a question of loose-coupling versus tight-coupling.

If, for example, we create a BallModel class that contains a Box2D ball body, and BallView class that contains the EaselJS Bitmap. Any calls from BallModel to update the view go via BallView, so your model is now competely agnostic about which library you choose.  If we have these inherit from base classes that do the common stuff, then we can benefit from Polymorphism.  So we associate our base ObjectModel class with a base ObjectView class and ObjectModel.update() can call ObjectView.update() which will do standard transformations of the physics coordinates to pixels (say).

Now if we decide to switch to a different graphics library, our code changes are all (hopefully) contained in our View classes. The same goes for your physics engine.  You will need one class at least that knows about both in order to convert from model units (kg/metres/seconds) into view units (pixels). This could be a helper, but it’s also reasonable to have it handled in the View. It’s the View’s job to render the Model.

MVC in games therefore is going to look broadly like this:

Model-view-controller pattern in games

Controllers will kick off the game and handle user input during it.  The models will concerns themselves with the (internal/invisible) world of the game, and the views will make it all visible to the user.  Each of these domains (MVC) should have a nice class hierarchy for nice modular code, not just our own code, but 3rd-party library code.

This is all very abstract, let’s have look with some concrete examples to see what the challenge is working with real libraries.

First off, EaselJS.  EaselJS is a brilliant library that makes graphics and sound in HTML5 really easy. It interacts with a <canvas> element in your HTML document.

How EaselJS works in an HTML doc

 

I’m using “is” here to show inheritance.  I’m only showing the core visual classes here, there are dozen more that do all sort of fancy things like image filters, event handling etc.  I’ve also included some other libraries provided by CreateJS as you’ll almost certainly need them.  You can just include the EaselJS library on its own if you prefer.

Easel has a Stage which represents the Canvas in your HTML doc.  It is (inherits from) a Container which both is and can contain DisplayObjects. There are classes like Bitmap and Sprite (animated bitmaps) which are DisplayObjects. The documentation for EaselJS is excellent although lacking lovely diagrams like this one to help see how it fits.

Now let’s look at Box2D.  The architecture of Box2D is pretty much platform-independent so it shouldn’t matter which port you use, or whether you read the (excellent) C++ manual even though we’re using JS/CoffeeScript.  Box2D is a fully-fledged rigid body physics engine. It’s complex. I’m going to assume you’re fine with that. I’m not going to be covering all the details of Box2D here, only the bits needed to work out how to create an HTML5 game architecture with it and EaselJS.

So, while Box2D is actually in three modules (Common, Collision and Dynamics) I’m just going to look at a small part of the Dynamics to get started.

Box2D basics

Box2D worlds are the corollary of our Stage in EaselJS (but can be bigger).  Then we have bodies which are rather more complicated than our Bitmap / Display Objects from EaselJS.  Box2D separates physical concerns into three classes for objects.

Bodies are concerned with having positions, moving about, rotating, responding to forces etc. but they’re a bit like ghosts; they don’t have a shape or material.

Fixtures add material properties to a body (so it’s like telling a body what it’s made of, how dense the material is and how springy for example).  Fixtures also have a Shape, which in turn tells the body how large it is.

Shapes are all about collisions.  They determine the edges and extent of bodies and therefore how they interact with other bodies/shapes.  Shapes are also the most tightly-coupled concept in our Model to our View, because normally our graphics will need to match our shapes. There’s no escaping this. It’s nothing to worry about.  Our models are responsible for creating and updating their views.  I.e. initiating and causing those events. They’re just not responsible for the details of what happens inside them.

Notice that a body can have more than one fixture.  I.e. you can create compound bodies by fixing (say) a rectangle and circle to a body.  These two shapes will never move with respect to each other.

We don’t actually need to add anything else to get a basic physical system going.  We could add Constraints and Joints to our world to remove degrees of freedom or join bodies together, but let’s leave that for now. It’s detail we’re not concerned with yet.

The Box2D world doesn’t come with its own ticker.  Once we setup the rules (gravity etc.) we call world.Step() with some parameters to run it one timestep.  This ticker is provided by EaselJS, so here the GameApplication View will be asking the model to update itself.  The model in return will update all of it’s entities and they will all communicate those updates back to their Views.

Hopefully, you’re starting to get a clear sense of how to organise your code.

In part 2, I’m going to get into the details of a class hierarchy that brings all this together with examples.

How to write DRY SQL in MySQL – Part 2: Stored Procedure Hacks

In Part 1 we looked at using VIEWS to keep our SQL DRY. Unfortunately, there are cases when VIEWS are either not allowed or perform terribly and can’t be optimised.  Stored Procedures are, of course, the mainstay of writing DRY SQL, and optimisation.  Unfortunately, you can’t treat the result set of Stored Procedure as a query; therefore you can’t join stored proc results in queries and so developers often end up having to copy their SQL leading to WET code. However, there’s a neat little hack which effectively lets you treats stored procedures like views.  It’s a little more work, but it’s worth it.

There are some very common queries that MySQL doesn’t support as VIEWs:

1. Any query that contains a subquery in the WHERE clause. E.g. where myfoo not in (select foo from bar). Fortunately, you can usually rewrite these using a left join and then filter any rows with nulls on the right hand side.

2. VIEWs that contain group/count aggregate functions cannot be made to use ALGORITHM=MERGE.  THIS IS BAD, especially if you naively forgot to set the algorithm, or just didn’t know about them: MySQL will happily build the view using ALGORITHM=TEMPTABLE instead of merging it with your query at runtime. If you have a generic query that covers a large result set, you won’t be able to use a view without bringing down the whole system!  You might not even notice this is happening if the project is new and as your data grows it’ll come back and bite you in the ass.

It’s pretty likely anyway that you will want to encapsulate, or already have encapsulated, complex queries in parameterised stored procs, so making them work like normal queries is incredibly useful and keeps your SQL DRY.

In order to make sprocs joinable, we can take advantage of temporary session tables.

Temporary tables stick around as long as the session is active, unless you explicitly drop them. So, instead of returning the result set from a stored procedure, we just put the results into a temporary session table, and whatever calls the sproc will have access to that table.  We can parameterise the sproc to keep the set small, and we can have queries as complex and procedural as we want!

This little hack will even work with your middleware code so long as it’s capable of running complex SQL (i.e. more than a single statement).  This is almost always the case.

Here’s an example:


DROP PROCEDURE IF EXISTS sp_create_tmp_myquery;
CREATE PROCEDURE sp_create_tmp_myquery(  )
BEGIN
-- this sproc creats a session temp table for use OUTSIDE the procedure 
-- essentially allowing DRY reuse of the query 
-- if the query or sproc calling this doesn't drop the temp table it'll be dropped in the next call 
-- to this sproc or when the session closes
  
  drop temporary table if exists tmp_myquery; 

  create temporary table tmp_myquery
  as
  -- some complex sql. Knock yourself out.

END;

-- to use the sproc:
call sp_create_tmp_myquery();

-- work with the results 
select foo
from   bar
join  tmp_myquery on <some condition>;

-- or just select them
select * from myquery
-- drop the table explicitly; not necessary but cleaner
drop temporary table if exists tmp_myquery; 

You’re not limited to a single data set either. If you want or need to, you could create multiple tables in the sproc and work with them after.

How to write DRY SQL in MySQL – Part 1: Views

Writing DRY code is a fundamental principle of software development.  DRY, or Don’t Repeat Yourself, was popularised by the 1999 Pragmatic Programmer coding practices book (the alternative being WET code, meaning Write Everything Twice, or possibly “We Enjoy Typing”).

DRY code is optimal, not just because it’s less typing, but because when you come to refactor code (make changes to it), it’s much easier if there’s a single place where that change needs to be made.  If you’ve repeated the same code blocks all over the place, you’ll need to go edit all of them.

The wider principle of DRY is, “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”  Given that one of the fundamental design principles of good Relational Databases, Normalisation, is also about removing redundancy and repetition, you’d imagine that relational database would support DRY coding practices well.

Unfortunately, writing database server code (SQL queries, stored procedures, triggers, etc.) in a DRY way, is surprisingly hard in most Relational Databases I’ve come across.  So much so, that modern frameworks like Ruby on Rails and CakePHP tend to do far more of the database work than older systems.  But languages like Ruby and PHP aren’t set-based languages and they can be very inefficient and slow, especially if they have to make lots of round trips to the DB to get data.

There are several techniques though that anyone working with a database can learn to help DRY up DB code.

Use Views to DRY Common Joins

If your data tables are properly normalised (to at least 3rd or Boyce-Codd Normal Form which they absolutely should be) then you will no doubt have primary tables that store data linked to various ‘type tables’ which store category or attribute data for of entries in your primary tables.

Let’s say, for example, you have a table called books to store details of published and unpublished books.  Some attributes of books are many-to-one relationships, meaning a given attribute might have many entries but only one of them applies to each book.  Two examples of such attributes might be:

  1. publishers (let’s assume each book has just one publisher for this)
  2. publishing_states (a book has one state, “published”, or “unpublished”, or any other state we want to define, such as “work in progress”)

Attribute tables like these tend to be one the most common types of join that you need to write in your SQL queries and so these are perfect candidates for DRYing up.

You might also have a table called print_formats to store possible formats such as “Hard Cover”, “Kindle”, “Paperback”, etc. (a book may have several formats) so this is a one-to-many relationship.

Finally, of course, you’ll also store many-to-many relationships such as authors (a book can have many authors and an author can have many books).

Let’s say in your first iteration with a small sample of data your tables look like this:

books

id title publisher_id publishing_state_id author_id
1 The Looking Glass Club 1 1 1
2 Harry Potter 1 1 2

publishers

id name
1 Sencillo Press
2 Bloomsbury Press

publishing_states

id name
1 published
2 unpublished

authors

id name
1 Gruff Davies
2 JK Rowling

Given these tables, you find yourself writing SQL code something like this all over the system.

select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

When you want authors and books you’d write:

select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id
join authors a on a.id = b.author_id = a.id;

You’ll no doubt be joining to other tables in different combinations as well as these, but the this triple join appears in all of them.  You’re repeating yourself, which is kind of a “bad code smell”, (which happens all the time in SQL, unfortunately).

These joins end up being used in your non-DB server code (PERL, PHP or RUBY) and stored procedures you write for efficiency, and inside queries that you use for administration and testing.  It’s everywhere already.

Now, let’s say you start adding some more book data, but this time you’ve been given some book titles that haven’t yet been published.  So now your books table is now:

id title publisher_id published_state_id author_id
1 The Looking Glass Club 1 1 1
2 Harry Potter 1 1 2
3 Supernova null 2 1
4 Wonderland null 2 1

You go to run the system and none of your standard queries bring back these any of these new titles.

You scratch your head for a while and then realise that when you did the first iteration, you only had published books in the system and it didn’t occur when writing your queries that unpublished books wouldn’t have a publisher_id yet.

Oops.

All those triple joins are written as inner joins when the publisher join should have been a left join in order to bring back rows with no publisher.

You now have go search and edit every single query in the system and all your testing and admin code and fix the joins. You might be even forget your own admin code because you don’t run it except when needed and one day that will come back and bite you because you won’t spot the error.

Views to the rescue

As soon as you realise that you’re typing the same joins all the time (you usually know in advance with many-to-one attributes that you’ll be doing that), you can write a VIEW to do the join for you.

So, in iteration 1 of the system, (in MySQL) we’d have written this early on:

CREATE OR REPLACE ALGORITHM=MERGE VIEW books_publishers_publishing_states
AS
select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

I’ve called the view the name of the tables concatenated. Yes it’s long, but it’s fully descriptive and it’s far shorter than writing the full triple join.

Very Important: MySQL has two types of view and you always want to use type MERGE where possible. MERGE views are just SQL statements and get merged at runtime with any query they are used in. If you don’t define the algorithm as MERGE, MySQL can decide to use the other type, TEMPTABLE which will evaluate the query, selecting all the data and then join the resulting temporary table with whatever data set your query returns. This could be A LOT OF DATA if your view has no WHERE clause like this. It can bring the whole system down.  Trust me, I’ve done it.

Now we have a VIEW, we can use it anywhere where we would before have written out the full triple join making our code lovely and DRY.

So we can write:

select *
from books_publishers_publishing_states;

or, our “all books and authors” query looks much simpler to:

select *
from books_publishers_publishing_states bpps
join authors a on a.id = bpps.author_id;

Now, in this case, when we add our new data, the same problem occurs, but when we realise our join is wrong, we only have to edit it in one place.

We refactor the VIEW with the left join:

CREATE OR REPLACE ALGORITHM=MERGE VIEW books_publishers_publishing_states
AS
select *
from books b
left join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

Once we run this against the DB, everything works again with unpublished books.

Now, you might think this example is contrived (examples usually are), but we still save on typing and get to reuse code here. And I promise you will want to refactor your views as the system grows.

A more realistic example of such refactor is that you might want to add a where clause to filter certain data sets globally and change these quickly across the whole system. Views make this sort of refactor extremely quick to do.

DRY limits with MySQL VIEWs

Unfortunately, you can’t always create a view for every case. If you query uses a subquery in the where clause for example, MySQL can’t create a view for that case. Or, you might something more complex than a single query that you want to keep DRY. Fortunately, we can handle these cases another way.

In my next post, I’m going to show you a nice little hack that effectively lets you joins to the results of stored procedures as if they were tables or views.

The inability to treat stored procedure results like queries is another main source of WET database code.