How to write DRY SQL in MySQL – Part 1: Views

Writing DRY code is a fundamental principle of software development.  DRY, or Don’t Repeat Yourself, was popularised by the 1999 Pragmatic Programmer coding practices book (the alternative being WET code, meaning Write Everything Twice, or possibly “We Enjoy Typing”).

DRY code is optimal, not just because it’s less typing, but because when you come to refactor code (make changes to it), it’s much easier if there’s a single place where that change needs to be made.  If you’ve repeated the same code blocks all over the place, you’ll need to go edit all of them.

The wider principle of DRY is, “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”  Given that one of the fundamental design principles of good Relational Databases, Normalisation, is also about removing redundancy and repetition, you’d imagine that relational database would support DRY coding practices well.

Unfortunately, writing database server code (SQL queries, stored procedures, triggers, etc.) in a DRY way, is surprisingly hard in most Relational Databases I’ve come across.  So much so, that modern frameworks like Ruby on Rails and CakePHP tend to do far more of the database work than older systems.  But languages like Ruby and PHP aren’t set-based languages and they can be very inefficient and slow, especially if they have to make lots of round trips to the DB to get data.

There are several techniques though that anyone working with a database can learn to help DRY up DB code.

Use Views to DRY Common Joins

If your data tables are properly normalised (to at least 3rd or Boyce-Codd Normal Form which they absolutely should be) then you will no doubt have primary tables that store data linked to various ‘type tables’ which store category or attribute data for of entries in your primary tables.

Let’s say, for example, you have a table called books to store details of published and unpublished books.  Some attributes of books are many-to-one relationships, meaning a given attribute might have many entries but only one of them applies to each book.  Two examples of such attributes might be:

  1. publishers (let’s assume each book has just one publisher for this)
  2. publishing_states (a book has one state, “published”, or “unpublished”, or any other state we want to define, such as “work in progress”)

Attribute tables like these tend to be one the most common types of join that you need to write in your SQL queries and so these are perfect candidates for DRYing up.

You might also have a table called print_formats to store possible formats such as “Hard Cover”, “Kindle”, “Paperback”, etc. (a book may have several formats) so this is a one-to-many relationship.

Finally, of course, you’ll also store many-to-many relationships such as authors (a book can have many authors and an author can have many books).

Let’s say in your first iteration with a small sample of data your tables look like this:

books

id title publisher_id publishing_state_id author_id
1 The Looking Glass Club 1 1 1
2 Harry Potter 1 1 2

publishers

id name
1 Sencillo Press
2 Bloomsbury Press

publishing_states

id name
1 published
2 unpublished

authors

id name
1 Gruff Davies
2 JK Rowling

Given these tables, you find yourself writing SQL code something like this all over the system.

select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

When you want authors and books you’d write:

select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id
join authors a on a.id = b.author_id = a.id;

You’ll no doubt be joining to other tables in different combinations as well as these, but the this triple join appears in all of them.  You’re repeating yourself, which is kind of a “bad code smell”, (which happens all the time in SQL, unfortunately).

These joins end up being used in your non-DB server code (PERL, PHP or RUBY) and stored procedures you write for efficiency, and inside queries that you use for administration and testing.  It’s everywhere already.

Now, let’s say you start adding some more book data, but this time you’ve been given some book titles that haven’t yet been published.  So now your books table is now:

id title publisher_id published_state_id author_id
1 The Looking Glass Club 1 1 1
2 Harry Potter 1 1 2
3 Supernova null 2 1
4 Wonderland null 2 1

You go to run the system and none of your standard queries bring back these any of these new titles.

You scratch your head for a while and then realise that when you did the first iteration, you only had published books in the system and it didn’t occur when writing your queries that unpublished books wouldn’t have a publisher_id yet.

Oops.

All those triple joins are written as inner joins when the publisher join should have been a left join in order to bring back rows with no publisher.

You now have go search and edit every single query in the system and all your testing and admin code and fix the joins. You might be even forget your own admin code because you don’t run it except when needed and one day that will come back and bite you because you won’t spot the error.

Views to the rescue

As soon as you realise that you’re typing the same joins all the time (you usually know in advance with many-to-one attributes that you’ll be doing that), you can write a VIEW to do the join for you.

So, in iteration 1 of the system, (in MySQL) we’d have written this early on:

CREATE OR REPLACE ALGORITHM=MERGE VIEW books_publishers_publishing_states
AS
select *
from books b
join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

I’ve called the view the name of the tables concatenated. Yes it’s long, but it’s fully descriptive and it’s far shorter than writing the full triple join.

Very Important: MySQL has two types of view and you always want to use type MERGE where possible. MERGE views are just SQL statements and get merged at runtime with any query they are used in. If you don’t define the algorithm as MERGE, MySQL can decide to use the other type, TEMPTABLE which will evaluate the query, selecting all the data and then join the resulting temporary table with whatever data set your query returns. This could be A LOT OF DATA if your view has no WHERE clause like this. It can bring the whole system down.  Trust me, I’ve done it.

Now we have a VIEW, we can use it anywhere where we would before have written out the full triple join making our code lovely and DRY.

So we can write:

select *
from books_publishers_publishing_states;

or, our “all books and authors” query looks much simpler to:

select *
from books_publishers_publishing_states bpps
join authors a on a.id = bpps.author_id;

Now, in this case, when we add our new data, the same problem occurs, but when we realise our join is wrong, we only have to edit it in one place.

We refactor the VIEW with the left join:

CREATE OR REPLACE ALGORITHM=MERGE VIEW books_publishers_publishing_states
AS
select *
from books b
left join publishers p on b.publisher_id = b.id
join publishing_state ps on b.publishing_state_id = ps.id;

Once we run this against the DB, everything works again with unpublished books.

Now, you might think this example is contrived (examples usually are), but we still save on typing and get to reuse code here. And I promise you will want to refactor your views as the system grows.

A more realistic example of such refactor is that you might want to add a where clause to filter certain data sets globally and change these quickly across the whole system. Views make this sort of refactor extremely quick to do.

DRY limits with MySQL VIEWs

Unfortunately, you can’t always create a view for every case. If you query uses a subquery in the where clause for example, MySQL can’t create a view for that case. Or, you might something more complex than a single query that you want to keep DRY. Fortunately, we can handle these cases another way.

In my next post, I’m going to show you a nice little hack that effectively lets you joins to the results of stored procedures as if they were tables or views.

The inability to treat stored procedure results like queries is another main source of WET database code.