Chapter 8: Composite Keys and Domain-Key Normal Form

Back to Table of Contents.

Composite Keys and Domain-Key Normal Form

So far we have improved our data model and Rails model layer substantially from the small set of tables we began with. Specifically, we have:

  1. Added constraints everywhere
  2. Enforced referential integrity
  3. Added basic indexes
  4. Factored out repeated data model chunks with database inheritance, and created an analogous Rails plugin to facilitate reuse
  5. Factored out columns that were teetering on the edge of violating third normal form into fully-fledged tables
  6. Created domain tables for our domain data, and analogous Rails models and constants

We have done quite a bit, but our data model is still not enterprise solid. In this chapter, we will discuss two related topics that can help us get closer to our goal. The first is the idea of keys made up of multiple columns, otherwise known as composite keys. The second is a topic that proverbial wars have been fought over, which boils down to whether primary keys should be simple id columns, as is the Rails default, or the more complex composite or natural keys, which rely on unique identifying information in the records themselves.

In Rails, as it comes out of the box, the decision to use id columns has been made for you. However, composite keys have inherent benefits over simple id keys. In reality, both conventions have pluses and minuses. In this chapter, we’ll learn the pros and cons of each convention. We’ll learn how to make composite keys work in Rails through use of a plugin. Then we’ll have our cake and eat it too by making both conventions coexist happily. We’ll gain the benefits of each system without the addition of too much more work.

Let’s begin by going over the benefits of an id column system.

The first and most obvious benefit is that it is sitting right there waiting for you to use it. The Rails associations mechanism—which allows you to define has many, belongs to, and has and belongs to many relationships—relies on id column primary and foreign keys. If you are prototyping a quick-and-dirty application, you can get going quickly without any hassle.

The second benefit is that an id column is an easy handle on a piece of data. For example, when editing the contents of a record via a web form, all of the values other than the id itself may be editable. There is no chance of the primary key of the object changing when it is edited, so logic involved in mutating that object is simple.

Conversely, editing a primary key can be tricky, because records in other tables may be referencing the primary key that is about to undergo a change. This is the third benefit: id columns provide a level of indirection to the real data. Because of that indirection, records in tables referencing a table with an id column do not need to be changed when the referenced record is updated. For example, let’s say the primary key of our ratings table was not an id column, but instead the rating_name column itself. The column rating_id in the movies table would now have to reference this column instead of the id column. If we then decided that PG-13 ought to have a rating_name of “PG13” rather than “PG-13”, we would have to update every movie that referenced the PG-13 record in our ratings domain table. This is definitely something we would like to avoid if we can.

The final benefit is that the primary key for the next row to be added is always close at hand, and is guaranteed to be unique. Either via database sequence, or serial column type, a built-in mechanism for uniqueness is responsible for generating the next key.

The above list seems like a lot benefits. It’s not so easy to simply list the benefits of using composite keys, because, for one thing, you can’t always use them. They serve a special purpose in maintaining data integrity. Whereas you can always slap an id column on a table, there isn’t always a natural composite key available. However, when there is such a natural composite key present, ignoring it in favor an independent single-column key can lead to big trouble.

When people argue that single-column primary keys are better than composite keys because of the long list of advantages mentioned above, they’re missing the reality that special circumstances call for special measures. Problems arise when a situation requires composite keys but single-column id keys are used instead. The next section will show how we can pierce a giant hole in our movie showtime database’s schema—which we thought we had locked down with full referential integrity—simply because we ignored a composite key.

Why Composite Natural Keys Matter

The first step in understanding the value of composite keys is knowing how to spot them. Figure 8-1 shows the auditoriums table along with two related tables, the theatres table, and the movie_showtimes table.

er_0801Figure 8-1. The auditoriums table and two related tables

We’ve defined the following references so far, indicated by the interconnecting lines above:

  • movie_showtimes(auditorium_id) references auditoriums(id)
    auditoriums(theatre_id) references theatres(id)

But as we saw in Chapter 6, when we normalized on the auditorium column, we lost a reference from movie_showtimes to the theatres table. The reference was lost because it violated 3NF, but its absence is quite a nuisance. Simple queries such as “how many movies are playing in theatre x today?” become needlessly complex. We’d like to say:

select count(*)
  from movie_showtimes
 where theatre_id = ?
Instead, we need the following query:
select count(*)
  from auditoriums a,
       movie_showtimes ms,
 where ms.auditorium_id = a.id
   and a.theatre_id = ?

Any query linking theatres to showtimes must go through the auditoriums table, which feels quite unnatural. It is extremely tempting to add the theatre_id column back to the movie_showtimes table, as shown in Figure 8-2. We’d then add this reference:

movie_showtimes(theatre_id) references theatres(id)

Unfortunately, there is now a big referential integrity hole left wide open. Can you see it?

er_0802Figure 8-2. The movie_showtimes table with a reference to the theatres table, a violation of 3NF

In this data model, a movie showtime can exist for which the auditorium does not exist in the theatre the movie is expected to play in. Consider the following data:

movies_development=# select id, name from theatres;
 id |       name        
----+-------------------
  1 | Steller Theatre
  2 | Old Towne Theatre
(2 rows)

movies_development=# select * from auditoriums;
 id | theatre_id | room | seats_available 
----+------------+------+-----------------
  1 |          1 | A    |             150
  2 |          2 | B    |             150
(2 rows)

movies_development=# 
 select id, movie_id, theatre_id, auditorium_id from movie_showtimes;

 id | movie_id | theatre_id | auditorium_id 
----+----------+------------+---------------
  1 |        1 |          1 |             2
(1 row)

Our single movie showtime is playing in the theatre called Stellar Theatre and is showing in auditorium B. Unfortunately, auditorium B happens to be in Old Towne Theatre! This doesn’t make any sense, yet it’s perfectly valid based on our foreign key references and Rails associations. Even though we thought we had ensured referential integrity, we’re actually still allowing for bogus data to enter our system. In script/console, we can cause nonsense to occur, as shown in Example 8-1.

Example 8-1. A functionally dependent reference can lead to bogus associations

>> t = Theatre.find_by_name('Steller Theatre')
>> puts t.movie_showtimes.first.auditorium.theatre.name
=> "Old Towne Theatre"

The reason this is happening is not because our references are wrong, but because our primary keys are. Though the referential integrity of our data model is satisfied, the referential integrity of our problem domain is not. id columns do not always provide enough information to ensure referential integrity for interrelationships between multiple tables. That’s where composite keys come in.

Spotting Composite Keys

Simply put, a composite key is one that is made up of more than one column. But how do you determine what a composite key should be? Finding them actually turns out to be pretty straightforward.

First, let’s examine the properties of a primary key. Actually, there is only one. A primary key must be unique for all records.

This works in the opposite direction as well. If you have a real-world situation that has a uniqueness constraint on a set of columns, then in your data model, those columns are likely to be the right candidate for a composite key. When a primary key is based on attributes of the data that make it unique from all other data, the key is known as a natural key.

If we look back to our definition of the auditoriums table, we see that we did, in fact, have a unique constraint on the columns, (theatre_id, room). Certainly, it does not make sense for a single theatre to have two auditoriums called A. That would be most confusing. But this unique set of columns is also a great way to refer to the auditorium itself. Indeed, referring to an auditorium as “auditorium ‘A’ in theatre #1” is much more natural than referring to it as “auditorium #47,” a number based solely on a sequence, which specifies nothing intrinsic about the auditorium itself. The former provides much more information, and as such, it provides much better guarantees for our referential integrity constraints.

Figure 8-3 reproduces the segment of our data model from above, but we’ve removed the id column from auditoriums and replaced it with the more natural composite key. Primary keys are shown in bold.

er_0803Figure 8-3. auditoriums table with a composite key

In order to reference the auditorium from the movie_showtimes table, we need to reference both parts of the key. Therefore, we now have the following references:

movie_showtimes(theatre_id) references theatres(id)
movie_showtimes(theatre_id, room) references auditoriums(theatre_id, room)
auditoriums(theatre_id) references theatres(id)

The table definition now looks like this:

create table auditoriums (
   room varchar(64) not null
      check (length(room) >= 1),
   theatre_id integer not null
      references theatres(id),
   seats_available integer not null,
   primary key (room, theatre_id)
);

create sequence movie_showtimes_id_seq;
create table movie_showtimes (
   id integer not null 
      default nextval('movie_showtimes_id_seq'),
   movie_id integer not null
      references movies(id),
   theatre_id integer not null
      references theatres(id),
   room varchar(64) not null,
   start_time timestamp with time zone not null,
   primary key (id),
   foreign key (theatre_id, room)
      references auditoriums(theatre_id, room) initially deferred
);

It is now impossible for the inconsistent state demonstrated above to exist. Because the theatre_id column in movie_showtimes references both the theatre_id column in auditoriums and the id column in theatres, the three are guaranteed to be the same.

Atop the Shoulders of…

In Chapter 4, I discussed the idea of standing atop the shoulder of giants. Information architecture and database normalization techniques are highly developed areas of research. It should be no surprise that scholarly papers dealing with key selection go back more than 25 years. Here we have a perfect example where we should not neglect to sit atop the shoulders of our predecessors.

In 1981, Ronald Fagin of IBM Research Laboratories introduced domain key/normal form (DK/NF) in his paper A Normal Form for Relational Databases That is Based on Domains and Keys, published in the ACM Transactions on Database Systems Journal. In this paper, Fagin proved mathematically that a schema design in which keys are chosen as the smallest set of columns that naturally and uniquely identify a row of data absolutely prevent anomalies such as the one we created above, where a movie showtime can occur in an auditorium that doesn’t exist in the theatre the movie is set to play in. Sometimes these keys are a single id column, sometimes they are single columns that have intrinsic, natural meaning, and sometimes they are composite keys made up of mutiple columns. The overriding point is that there is no one-size-fits-all solution. Each table must be analyzed on a case-by-case basis to determine how it contributes to the whole of the schema.

The best relational databases we have today grew out of the research of past decades. Even a concept that may be taken for granted today, such as the ability to define a primary key based on multiple columns, is a database feature that grew up out of research such as Fagin’s. While it’s not surprising that over the course of decades scholars investigated these research areas, and that their findings made it into the database products we use today as fundamental features, it is surprising that many of these critical concepts are lacking from the core Ruby on Rails framework.

Many users of Ruby on Rails who are new to schema design take that lack of features as a cue that the concepts are not relevant or that you can live without them. Many more, if Rails is their first platform, may never have heard about DK/NF or natural keys, and won’t know of the benefits they are giving up by not using them when they are appropriate. Now that you know about natural keys, you don’t have to be in this crowd of poor saps.

Migrating to Domain-Key Normal Form

Before we dive into the how of implementation, first let’s get our schema into DK/NF so we know what we are going to implement.

Often getting your schema into DK/NF can be an onerous task. However, because our schema was already properly refactored into third normal form, all that remains is the judicious selection of keys.

We’ve already handled the auditoriums table, which was an example of a table deserving of a composite key. Next we’ll look at tables that have single-column keys. We’ll further break down the single-column case into two groups: primary key columns that should be left as monotonically increasing ids (the Rails default), and primary key columns that are candidates to be based on intrinsic, or natural, data. Then we’ll come back to the auditoriums table to see how to implement composite primary keys in Rails using a plugin. Finally, we’ll look at the movie showtimes table, which is also a case of a table deserving a composite primary key, but we’ll treat it as a special case. We’ll explain what heuristics should be used to decide when to not use composite keys, and instead of using a strict natural key, we’ll introduce the concept of a Rails-DK/NF hybrid, which gives us the benefits of natural keys, but preserves some of the conveniences of Rails id columns.

Single column keys

The tables that hold movies, ratings, payment types, orders, ticket purchase line-items, and zip codes are examples of tables that have single column primary keys. As it happens, this is the majority of our tables, and this may be the reason why only single-column keys are available in Rails by default.

To decide whether a table’s primary key should be a monotonically increasing (and therefore arbitrary) integer, we check to see if there is a more natural key. As we did earlier in this chapter, we do this by determining if there are unique constraints in the table other than the id primary key itself. Figure 8-4 shows all of these tables with the default Rails id column. Primary keys are in bold, and columns with a uniqueness constraint are in italics.

er_0804Figure 8-4. The tables with the default Rails id column

Examining these tables, we find that the zip_codes, ratings, and orders tables each have a single column with a uniqueness constraint apart from the primary key itself: zip, rating_name, and confirmation_code, respectively. Effectively, this means that there are two distinct ways to access a row of data, but the distinction is not meaningful. These columns can be merged into one without any loss of functionality in retrieving data. Because Rails doesn’t actually care what kind of data is in the id column, and because you override the primary key’s column name with the set_primary_key directive in your models, you also don’t lose any built-in Rails functionality by merging the columns and choosing more descriptive column names than “id.”

We already saw an example of this in Chapter 7, where we defined the primary key of the zip_codes table to be the zip column. For zip codes and ratings, which are domain tables and may never be modified after an initial data load, it’s easy to forego reliance on the Rails mechanism for choosing new keys. These tables are shown Figure 8-5, without the redundancy of two sets of unique columns.

Note that what we are doing here is only appropriate if the primary key is not likely to change, or at least if changes are infrequent enough that they aren’t normal business use-cases within our application. For domain data, which—with the exception of zip codes—has a corresponding constant within the codebase, and for which we have no plans to build an edit interface, we can be reasonably assured that we are safe on this point.

er_0805Figure 8-5. Domain tables that may never be modified after an initial data load

For the orders table, the confirmation code also has no valid use case for change, so we can merge the id and confirmation_code columns. It may seem more difficult to break with convention, though, because we need to generate a special value before saving each record. But since we need to generate the confirmation code regardless of whether it is the primary key, it’s no more difficult. In fact, we can easily generate a confirmation code in a before_create method in the Order class by hashing the next value of the sequence that would have filled the id column, as shown below. Note that even though we changed the primary key column name to confirmation_code, Rails still forces us to refer to all single-column primary keys via a column called id at the application level. Thus when we are setting self.id below, we are actually filling the confirmation_code column of our table:

class Order < ActiveRecord::Base
   set_primary_key :confirmation_code
   
   has_many :purchased_tickets, :foreign_key => 'order_confirmation_code'

   def before_create
      next_ordinal_id = Order.connection.select_value(
         "select nextval('orders_id_seq')"
      )
      self.id = next_ordinal_id.crypt("CONF_CODE")
   end
end

The following output from script/console shows this code in action. Our primary keys are now beautiful random strings of text worthy of any confirmation code system:

>> o = Order.create({:movie_showtime_id => 1,
                     :purchaser_name => 'Joe Moviegover'})
=> #
>> o.id
=> "COtW6pplX6z7o"

Adding dependent objects still works just as we expect:

>> o.purchased_tickets << PurchasedTicket.new(:purchase_price_cents => 650)
=> [#
>> o.confirmation_code
=> "COtW6pplX6z7o"

There are other benefits to having the natural key be the primary key, too. Consider the orders table, and the dependent table purchased_tickets, shown in Figure 8-6. On the left, we have Rails default id columns. On the right, the orders table has been updated to use the confirmation_code column as the primary key. Now, the purchased_tickets table has a order_confirmation_code column rather than an order_id column to reference the orders table. The added benefit is that if we have a confirmation number—likely to be provided by a visitor to our site to look up their order or to print out their tickets at a kiosk—we can select directly from the purchased_tickets table. We don’t have to first find the order record in the orders table, and then join against the purchased_tickets table to get the information we are looking for. When we removed the indirection id primary key column, indirection went away along with it.

er_0806Figure 8-6. The orders table and its dependent purchased_tickets table

The remaining tables, theatres and movies, keep the default id primary key column.

Using Composite Keys in Rails

Although Rails doesn’t have support for composite primary keys built in, there are two good methods for getting the benefits they provide. The first method is via a plugin by Dr. Nic Williams called, naturally, composite_primary_keys. The second is via a hybrid model – a Rails-DK/NF hybrid – discussed later in this chapter.

Using the composite_primary_keys plugin

The composite_primary_keys plugin is available at http://compositekeys.rubyforge.org. There you’ll find more extensive documentation on how to use the plugin, but to install it as a gem, just type the following from the command line:

sudo gem install
 composite_primary_keys

Then add the following line to the end of your

config/environment.rb file:
require 'composite_primary_keys'

Next, in your models, define the composite primary key with the plugin’s pluralized analog to set_primary_key, set_primary_keys:

class Auditorium < ActiveRecord::Base
  # we do this because Rails inflection fails for this class name
  set_table_name 'auditoriums'
   set_primary_keys :room, :theatre_id
 
  belongs_to :theatre
  has_many :movie_showtimes, :dependent => :destroy
end

Then, in models that reference the composite key, specify the foreign key as an array of column names:

class MovieShowtime < ActiveRecord::Base
  belongs_to :movie
  belongs_to :theatre
   belongs_to :auditorium, :foreign_key => [:room, :theatre_id]
end

Working with models that have composite primary keys is straightforward; they behave just like regular models. Notice below that we don’t need to do anything special when creating an Auditorium object. Similarly, when we create a MovieShowtime object, we don’t need to specify the separate pieces of the foreign key. Just passing the object is enough for the plugin to pull out the appropriate key columns to create the reference:

m = Movie.create!(
  :name => 'Casablanca',
  :length_minutes => 120,
  :rating => Rating::PG13)
t = Theatre.create!(
  :name => 'Kendall Cinema',
  :phone_number => '5555555555')
a = Auditorium.create!(
  :theatre => t,
  :room => '1',
  :seats_available => 100)
ms = MovieShowtime.create!(
  :movie => m,
  :theatre => t,
  :auditorium => a,
  :start_time => Time.new)

Using a DKNF-Rails hybrid

The next table we’ll investigate is the movie_showtimes table. The combination of (movie_id, theatre_id, room, start_time) is certainly unique, and therefore it’s a candidate to be a primary key. In fact, we have a greater constraint on showtimes than this: a movie cannot begin playing in an auditorium until the previous movie has finished, and we’ll see how to deal with this sort of constraint in the next chapter). But the decision to make a composite key a primary key is not based solely on whether the key is unique. The next question we must ask is whether changing the key is a valid use case or not.

Are movie times set in stone once tickets have been purchased, or can the times be changed, patrons be damned? One school of thought would say the answer is yes. Why shouldn’t a movie showtime be changeable?

On the other hand this may feel like a bait-and-switch operation to those who’ve already bought tickets, so another school of thought would say no. Once there is a dependent record—especially a dependent record that has been paid for, such as an order—the record cannot change. The appropriate course of action would be to mark the original record as canceled, create a new one for the new time, and take action to refund the original ticket purchases. Patrons could then buy tickets to a different showtime if they so choose. That would certainly be fairer to patrons, and the database would then accurately reflect the events that transpired as well.

In the interest of a pure DK/NF data model, we would, therefore, define the full set of columns in the movies_showtimes table as a composite primary key. However, for the purposes of providing a breadth of examples, we’ll say here that movie times are changeable. Although it’s not impossible to make changes to a primary key, in practice it can be a bit more inconvenient to do so than it is for other columns. With ActiveRecord, it is actually impossible to change a primary key at all unless one resorts to custom SQL statements. Therefore, for those who are faint of heart when it comes to custom SQL, for now, we will retain our id column for movie_showtimes, but we’ll also update the orders table to reference the rest of the columns, in addition to the standard Rails id column reference of movie_showtime_id (Figure 8-7). This will allow us the referential integrity benefits of a natural key while also maintaining some of the benefits of an id primary key: namely, it will be possible within Rails to make changes to the key without resorting to custom SQL.

er_0807

Figure 8-7. Updating the orders table

Note that in order to add a foreign key reference from one table to another, the target columns of the constraint must constrained to be unique. This has two purposes. First, if there was no unique key, we could risk referencing more than one record, which would not be meaningful (in fact, it would be a bug, although such a relationship is allowed by MySQL). Second, the unique constraint also adds an implicit index on the columns, which facilitates a quick internal database check to make sure the reference is valid. Without the uniqueness constraint, we get the following error when we try to add the foreign key constraint:

movies_development=# alter table orders 
  add constraint movie_showtimes_movie_theatre_room_start_time_fkey
  foreign key (movie_id, theatre_id, room, start_time)
  references movie_showtimes(movie_id, theatre_id, room, start_time);

ERROR: there is no unique constraint matching given keys for referenced table “movie_showtimes”

To correct this, we add the unique constraint on the rest of the columns in movie_showtimes: movie_id, theatre_id, room, start_time, and everything works fine. The two tables, movie_showtimes and orders, can be defined compactly as follows:

create sequence movie_showtimes_id_seq;
create table movie_showtimes (
  id integer not null 
    default nextval('movie_showtimes_id_seq'),
  movie_id integer not null
    references movies(id),
  theatre_id integer not null
    references theatres(id),
  room varchar(64) not null,
  start_time timestamp with time zone not null,
  primary key (id),
  unique(movie_id, theatre_id, room, start_time),
  foreign key (theatre_id, room)
    references auditoriums(theatre_id, room) initially deferred
);

create sequence orders_id_seq;
create table orders (
  confirmation_code varchar(16) not null
    check (length(confirmation_code) > 0),
  movie_showtime_id integer not null
    references movie_showtimes(id),
  movie_id integer not null,
  theatre_id integer not null,
  room varchar(64) not null,
  start_time timestamp with time zone,
  purchaser_name varchar(128) not null
    check (length(purchaser_name) > 0),
  primary key (confirmation_code),
  foreign key (movie_id, theatre_id, room, start_time)
    references movie_showtimes (movie_id, theatre_id, room, start_time)
) inherits (addresses);

Making assignment easier with method overrides

One of the drawbacks of a hybrid model in which the natural key exists in addition to an arbitrary id column is that the columns must be assigned explicitly even in places where the Rails associations would normally do the magic for us. For example, ordinarily we would assume we could do the following to create an order object, assuming we already have a movie_showtime object in the variable ms:

>> o = Order.create!(
?>   :movie_showtime => ms,
?>   :purchaser_name => 'Joe Moviegoer')

This won’t work with our hybrid model, however, because the columns movie_id, theatre_id, room, and start_time need to be specified as well. When not using the composite_primary_keys plugin, Rails only knows that the second line above implies the movie_showtime_id column’s value. The rest of the values must be specified manually, as follows:

o = Order.create!(
  :movie_showtime => ms,
  :movie => ms.movie,
  :auditorium => ms.auditorium,
  :start_time => ms.start_time,
  :purchaser_name => 'Joe Moviegoer')

Note that we did get to skip the definition of the theatre, because the auditorium composite key takes care of that for us.

We’d rather keep things simple, though. We really want Rails to accept the simpler syntax that we’d get if we chose to use the composite keys plugin, and with a little extra magic, we can get assignment to behave the way we expect it to.

In order to make this work, we use the alias method to save the old movie_showtime= assignment method under a new name, old_movie_showtime=. We then rewrite the original method to do all the assignments we expected it to do. Finally, we call the old framework method to be sure our method doesn’t have any unintended side effects. Here is our method:

class Order < ActiveRecord::Base
  alias :old_movie_showtime= :movie_showtime=
  def movie_showtime=(ms)
    self.movie_id = ms.movie_id
    self.theatre_id = ms.theatre_id
    self.room = ms.room
    self.start_time = ms.start_time
    self.old_movie_showtime=(ms)
  end
end

Deferrable Foreign Key Constraints

Because natural keys are by definition not arbitrary—they are made up of the intrinsic information that identifies the row—care must be taken when changing that data. For example, if you sell a ticket for a movie showtime, and then try to change the auditorium that movie is set to play in, you will immediately get a referential integrity violation exception from the database. The composite foreign key reference in the orders table will no longer have a corresponding row in the movie_showtimes table.

Deferrable constraints allow you to postpone the checking of foreign key relationships until the end of a transaction. Note that being inside a transaction is the key to deferrable constraints. You are only granted a short, closed period during which referential integrity can be violated. The following unit test illustrates how to change part of a key without violating referential integrity in dependent tables:

def setup
    @m = Movie.create!(
      :name => 'Casablanca',
      :length_minutes => 120,
      :rating => Rating::PG13)
    @t = Theatre.create!(
      :name => 'Kendall Cinema',
      :phone_number => '5555555555')
    @a = Auditorium.create!(
      :theatre => @t,
      :room => '1',
      :seats_available => 100)
    @ms = MovieShowtime.create!(
      :movie => @m,
      :theatre => @t,
      :auditorium => @a,
      :start_time => Time.new)
    @o = Order.create!(
      :movie_showtime => @ms,
      :movie => @m,
      :theatre => @t,
      :auditorium => @a,
      :start_time => @ms.start_time,
      :purchaser_name => 'Joe Moviegoer')
  end

  def test_deferrable_constraints
    MovieShowtime.transaction do 
      @ms.start_time = @ms.start_time + 1.hour
      @ms.save!
      Order.update_all(["start_time = ?", @ms.start_time], 
        ["movie_showtime_id = ?", @ms.id])
    end
  end

Of course, the test fails, since we haven’t yet defined our foreign key reference to be deferrable:

ChakBookPro:chapter-7-dknf chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
E
Finished in 0.657148 seconds.

  1) Error:
test_deferrable_constraints(MovieShowtimeTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:  update or delete on table "movie_showtimes" violates foreign key constraint "orders_movie_id_fkey" on table "orders"
DETAIL:  Key (movie_id,theatre_id,room,start_time)=(20,20,1,2007-12-16 00:53:49.076398-05) is still referenced from table "orders".
: UPDATE movie_showtimes SET "start_time" = '2007-12-16 01:53:49.076398', "theatre_id" = 20, "movie_id" = 20, "room" = '1' WHERE "id" = 20

To create a deferrable constraint, simply append the keyword initially deferred when defining the constraint:

create table orders (
  confirmation_code varchar(16) not null
    check (length(confirmation_code) > 0),
  movie_showtime_id integer not null
    references movie_showtimes(id),
  movie_id integer not null,
  theatre_id integer not null,
  room varchar(64) not null,
  start_time timestamp with time zone,
  purchaser_name varchar(128) not null
    check (length(purchaser_name) > 0),
  primary key (confirmation_code),
  foreign key (movie_id, theatre_id, room, start_time)
    references movie_showtimes (movie_id, theatre_id, room, start_time)
    initially deferred
);

With our initially deferred foreign key, our unit test now passes:

ChakBookPro: chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
.
Finished in 0.093331 seconds.

1 tests, 0 assertions, 0 failures, 0 errors

Deferrable constraints solve the problem of “seldom changing.” They give you cake and let you eat it, too. You can have true referential integrity, but you also are granted a window where referential integrity can be compromised—as long as you are using that window to make appropriate updates in dependent tables.

Note one caveat to this method of testing our deferrable constraint: deferrable constraints only have meaning within a transaction. Unit tests are all run within transactions to facilitate cleanup between each test run—all the effects of a unit test are rolled back before the next test is run. This means that in tests, even if you forget to start and end your own transaction, the test will appear to pass even though the same code would throw a database exception elsewhere. This also, unfortunately, makes it impossible to write a negative unit test for deferrable constraints.

Understanding the Trade-offs

In this chapter we’ve seen three variations on referential integrity. The first is based on the Rails default of id column primary keys. Through a rather simple example, we showed that single-column primary keys are not always enough to guarantee that a problem domain’s referential integrity is maintained. We can easily be given a false sense of security that our data is protected when we have referential integrity constraints throughout a data model, but we run into problems when those constraints don’t match the real-world constraints behind the problem. For those situations, only natural keys can give our data the protection it deserves.

We then saw two ways to implement natural keys. One method is a strict adherence to domain-key normal form (DK/NF) using the composite_primary_keys plugin. The other method is a hybrid that utilizes both Rails id columns for the primary keys, but also enforces referential integrity with unique constraints and references on natural keys. The trade-offs between these methods are summarized in Table 8-1.

Table 8-1. Trade-offs of the methods for implementing natural keys

Rails id columns only Strict DK/NF with composite primary keys plugin Rails-DK/NF hybrid

Supported out of the box

Domain-specific referential integrity

Natural key can be updated via Rails API N/A

Efficient use of indexes

Ease of writing code

The last two metrics, efficient use of indexes and ease of writing code, are worth discussing in more detail.

Efficient use of indexes

It’s obvious that with the Rails-DK/NF hybrid, we need two sets of indexes and references as compared with a strict DK/NF data model. First we need to index and reference the Rails id columns. We additionally need to place unique constraints on the natural key and then reference that key as well in dependent tables. This imposes a cost in our database in terms of both time and space. There is a time cost because each insert, update, or delete on a referenced table requires that two indexes be updated and checked. Inserts and updates to dependent tables also require two checks rather than one to be sure the reference is valid. It’s only worth paying this price if the problem domain requires that the natural key be mutable. In our movie showtime example, we argued rather unconvincingly that this might be the case. We also presented a convincing argument that the natural key for showtimes be fixed, and that showtimes should be cancelled if, for example, the time needed to be changed and orders had already been placed. In many other real-world problems, it’s similarly unwise to change natural keys once real-world dependencies are in play, as will often be the case.

Often newcomers to DK/NF are initially wary of leaving the well-understood behind. If you feel like you really need the security of being able to change natural keys through Rails after dependent objects have been created, take a step back and re-examine the business case for doing so. As natural keys are those that map in some way to real life, in all but the most bizarre of circumstances it makes more sense to deactivate old records and create new ones when this situation arises.

Ease of writing code

One of the major strength of Rails is that much can be said in very few lines of code, so conventions that reduce programmer efficiency should be anathema to a Rails programmer. As we saw, using the composite_primary_keys plugin does not require much in the way of additional code to make it work: just a single line in the class with the composite key to define it, and an extra parameter when defining associations in dependent classes. After that, coding proceeds as usual. In the Rails-DK/NF hybrid, we don’t need any special definition for our natural key (other than database indexes and constraints), but because the relationship doesn’t factor into the association relationship from Rails’ point of view, we must constantly specify all of the natural key columns when creating objects that reference the key. That is, unless we override the association assignment method to make the assignments for us, but that too, is additional code to write.

While this is not exactly onerous, it should be considered when choosing to use the DK/NF-Rails hybrid over a strict natural key model.

Figure 8-8 shows the end product of this chapter’s additions.

er_0808Figure 8-8. The theatre tickets schema with domain-key normal form and Rails-DK/NF hybrid

Exercises

  1. Try to reproduce the anomaly demonstrated in Example 8-1. Convince yourself that with the correct composite keys, it is impossible.
  2. Make a list of real-world queries that are simplified by a composite key reference from orders to movie_showtimes.

Refactor Steps

  1. Examine each table. Are there any non-primary key columns that have a unique constraint, or which should have a unique constraint?
  2. If you found places where you must add a unique constraint, add it:
    create unique index concurrently table_name_column_one_column_two_uniq_idx
      on table_name(column_one, column_two, ...);
    
  3. Choose the next steps based on whether your unique constraint is based on one column or multiple columns.

Single column refactor

  1. In the table’s model class, change the primary key column using set_primary_key:
    set_primary_key :unique_column
    
  2. In tables that reference this table, add a column to reference the new key:
    alter table dependent_table
      add column referenced_table_unique_column coltype;
    
  3. Fill the column with appropriate values:
    update dependent_table
      from referenced_table r
       set referenced_table_unique_column = r.unique_col
     where referenced_table_id = r.id;
    
  4. Add the foreign key reference constraint to each dependent table:
    alter table dependent_table
      add constraint referenced_table_unique_column_fkey
         (referenced_table_unique_col)
      references referenced_table(unique_col);
    
  5. Drop the original id column and reference:
    alter table referenced_table
     drop column id;
    alter table dependent_table
     drop column referenced_table_id;
    
  6. Set the new column as the primary key:
    alter table referenced_table
      add primary key(unique_column);
    

Multiple column refactor

  • Install the composite_primary_keys gem:
    gem install composite_primary_keys
    
  • Load the gem in your application. In environment.rb, add the following:
    require 'composite_primary_keys'
    
  • In the table’s model class, change the table’s primary key using set_primary_keys:
    set_primary_keys [:col1, :col2, ...]
    
  • Follow the steps for single column refactor, starting with 2.
Chapter 7 : Domain Data
Chapter 9 : Guaranteeing Complex Relationships with Triggers