Chapter 5: Building a Solid Data Model

Back to Table of Contents.

Building a Solid Data Model

Data modeling is an art form. Based on the previous chapter, we begin with the principal that the database is not simply a place where information is temporarily dumped but rather is the fortress that houses and protects your company’s critical information assets. You’ve got to design the data layer so that it does its job of protecting your data.

This is accomplished in part by ensuring your schema provides for complete referential integrity, and has appropriate constraint checking. Further, the data layer’s design should be one that makes future changes and additions easy. Designed incorrectly, seemingly small changes in business needs, if they require changes to the data layer, can often become Herculean efforts if the initial design is not flexible, or is crippled by inconsistencies revealed by the design change. Adhering to Domain Key Normal Form (DKNF) ensures, among other things, that it is easy to add additional layers of complexity on top of your data model without the need to redesign the entire schema. As it happens, DKNF is also a necessity to get the most protection out of referential integrity checks.

Building a Solid Data Model: Theatre Tickets

Over the course of this book, we will build a website for movie ticket sales. In this chapter we construct a first pass at the data model. We will start simple with a very small set of tables, and secure those tables with database constraints and referential integrity checks in the data layer, and analogous checks at the application layer. In Chapter 6, we will expand upon our data model, and refactor it into Third Normal Form (3NF) to remove redundancy, achieve greater data integrity, and to ensure flexibility for future needs. In Chapter 8, we will dive into DKNF to achieve referential integrity for more complex relationships that emerge. We conclude with advanced considerations in Chapter 9, such as the use of stored procedures and triggers for constraint checking that exceeds the database’s built-in capabilities.

After we have added all of this structure, layer by layer, we have, finally, a very solid data layer atop which we can confidently build an application.

Starting Simple

The data model in Figure 5-1 shows a tiny subset of tables necessary for our ticket sales website.

er_0501Figure 5-1. A simple schema for a movie ticket website

At the heart of the schema is a table that stores basic information about movies, called movies. Movies have a name, a running length, and a rating. An entry in this table might be Casablanca, with a running time of 102 minutes, and a rating of PG.

There are also theatres, stored in the theatres table, which in addition to the theatre’s name, contains the address and phone number of the theatre as well.

Movie showtimes are listed in the movie_showtimes table. Each movie showtime record stores the movie being shown, the theatre in which it plays, and the start time of movie. Additionally, there is an auditorium field, which we can use to prevent double-booking a single auditorium. Here’s the DDL for this schema:

create sequence movies_id_seq;
create table movies (
   id integer,
   name varchar(256),
   length_minutes integer,
   rating varchar(8),
   primary key (id)
);

create sequence theatres_id_seq;
create table theatres (
   id integer,
   name varchar(256),
   address_line_1 varchar(256),
   address_line_2 varchar(256),
   address_city varchar(128),
   address_state varchar(2),
   address_zip_code varchar(9),
   phone_number varchar(10),
   primary key (id)
);

create sequence movie_showtimes_id_seq;
create table movie_showtimes (
   id integer,
   movie_id integer,
   theatre_id integer,
   auditorium varchar(16),
   start_time timestamp with time zone,
   primary key (id)
);

This is where most books that focus only on the framework generally stop; their next step would be to create model classes and a user interface and begin playing around with a budding site. That’s strategy’s fine for books that are essentially framework manuals, which don’t have time to spend on issues that do not directly concern the framework.

This book does not stop here, however. First, because we know a lot more about our problem domain than we have shown here. For example, this is a ticket sales site, but we don’t even have a place to store orders yet. Mapping out everything we know up front will reveal areas for refactoring, or other types of improvements that we can’t see with just a subset of our overall data model. It doesn’t make much sense to spend time and effort on application logic when we almost certainly will have to throw most or all of it away after the next tiny database revision.

The second reason we don’t stop here is because this data model is tragically fragile. Indeed, this is a data model that represents a dumping ground for data, rather than a fortress. This is not only a problem from the perspective of our mantra: that an enterprise architecture cannot rely on the beneficence of the application layer to maintain order in the database layer; it is also a problem because our work in the application layer would be fraught with time wasted hunting for simple bugs that we would rather the data layer prevented us from creating in the first place. These problems would debilitate our site and our development process in short order.

It’s a tenet held by the Rails community that development should start with a small kernel and proceed with many quick iterations, building organically upon that kernel until a perfect, living, breathing application has emerged. This author does not disagree with the tenets of agile development, but rather with the size of the starting kernel.

In fact, there are a number of problems with our simple data model already, and they should be managed before they cause us problems. These problems will be revealed in this chapter via negative unit tests. We will expect our invalid use of our models to cause errors, which our tests will check for. However, the tests will pass, making it plain that it would be foolhardy to place any faith in what we’ve got so far, and that additional work must be done to lock down the data layer before it makes sense to proceed.

Even after we have solved the problems described in this chapter, it is still too early to proceed. Although this simple data model will be locked down, as noted above, there are still many more things wrong with our schema which will become evident when we start coding and find ourselves exploring obvious business needs that we could have foreseen easily. The next few chapters will be devoted to eradicating these flaws, with an ultimate goal of making spotting problems and building solid, flexible data models second nature.

To the point of kernel size, then, the following principle:

At each step, we should add everything we know we need to the layer we are working on, and no more. It’s at that point that we can rush off to tackle the layer above.

To get started, we must first define the classes against which we write our unit tests. At this point, there are no model-based validations because we are working on the data layer. We’ll add model validations later in this chapter. We do, however, define the relationships at this point:

class Movie < ActiveRecord::Base
 has_many :movie_showtimes
end

class Threatre < ActiveRecord::Base
   has_many :movie_showtimes 
end

class MovieShowtimes < ActiveRecord::Base   
   belongs_to :movie
   belongs_to :theatre 
end

Constraints

The first problem with our data model is that there is nothing constraining the values in any of the table columns except for our own good intentions and, frankly, that is not enough. Looking at the movies table, the rating column of the table could be anything, regardless of whether the value is a valid rating identifier: asdf would be acceptable. Any of the columns could be null or empty. The movie length could be zero or negative, which doesn’t make sense for a movie. The schema as-is also supports duplicate entries, which we know we don’t want.

Below is a set of unit tests that test for all of these conditions:

require File.dirname(__FILE__) + '/../test_helper'
require 'movie'

class MovieTestCase < Test::Unit::TestCase

protected

   def test_for_db_error(error_message, &block)
      begin
         yield
      rescue ActiveRecord::StatementInvalid
         database_threw_error = true
      rescue
         something_else_threw_error = true
      end
      assert !something_else_threw_error, "There is an error in our test code"
      assert database_threw_error && !something_else_threw_error, error_message
   end

public

   def test_db_no_name
      movie = Movie.new(:rating => 'PG', :length_minutes => '10')
      test_for_db_error("Database did not catch null name") do
         movie.save!
      end
   end

   def test_db_empty_name
      movie = Movie.new(:name => '', :rating => 'PG', :length_minutes => '10')
      test_for_db_error("Database did not catch empty name") do
         movie.save!
      end
   end

   def test_db_same_movie
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG',
                                 :length_minutes => '10')
      movie_dup = movie.clone
      test_for_db_error("Database did not catch duplicate movie") do
         movie.save!
         movie_dup.save!
      end
   end

   def test_db_no_rating
      movie = Movie.new(:name => 'Casablanca', :length_minutes => '10')
      test_for_db_error("Database did not catch null rating") do
         movie.save!
      end
   end

   def test_db_invalid_rating
      movie = Movie.new(:name => 'Casablanca', :rating => 'Fred',
                                 :length_minutes => '10')
      test_for_db_error("Database did not catch invalid rating") do
         movie.save!
      end
   end

   def test_db_no_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG')
      test_for_db_error("Database did not catch null movie length") do
         movie.save!
      end
   end

   def test_db_zero_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG',
                                 :length_minutes => '0')
      test_for_db_error("Database did not catch zero length movie") do
         movie.save!
      end
   end

   def test_db_negative_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG',
                                 :length_minutes => '-10')
      test_for_db_error("Database did not catch negative movie length") do
         movie.save!
      end
   end

end

Note that these are all negative unit tests; the tests attempt to do something that should not work and verify that it was not allowed. Negative unit tests are just as important as positive tests, if not more so. Often developers are drawn toward writing unit tests that test the obvious functionality of a class in the way they intend to use that class five minutes later. While those tests are fine, and may detect a code change that causes a bug down the road, negative tests that test the boundaries of valid data provide much more piece of mind. If code is written with a bug that might lead to data corruption, running negative unit tests ensures that the bug will cause no long-term damage.

To simplify writing the tests, and to conform to the tenet of Don’t Repeat Yourself (DRY), a method test_for_db_error was created, which takes a block containing the offending negative test code. This method verifies that a database exception is thrown, and not some other type of exception, such as a typo. If there is no exception thrown, the test fails, because we were expecting our error to be caught by the data layer. To be even more DRY, we’ll move this method to test_helper.rb before our next revision so that other test cases can utilize the method as well.

Below are the results of running these tests, highly condensed to show only the relevant information. As we expected, they all fail, indicating there is still work to be done in the data layer before we start getting any fancier in our application layer:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_test_case.rb
Loaded suite test/unit/movie_test_case
Started
FFFFFFFF
Finished in 0.059249 seconds.

   1) Failure: test_db_empty_name(MovieTestCase)
Database did not catch empty name.
   2) Failure: test_db_invalid_rating(MovieTestCase)
Database did not catch invalid rating.
   3) Failure: test_db_negative_length(MovieTestCase)
Database did not catch negative movie length.
   4) Failure: test_db_no_length(MovieTestCase)
Database did not catch null movie length.
   5) Failure: test_db_no_name(MovieTestCase)
Database did not catch null name.
   6) Failure: test_db_no_rating(MovieTestCase)
Database did not catch null rating.
   7) Failure: test_db_same_movie(MovieTestCase)
Database did not catch duplicate movie.
   8) Failure: test_db_zero_length(MovieTestCase)
Database did not catch zero length movie.

8 tests, 16 assertions, 8 failures, 0 errors

Here is the movies table reworked to disallow invalid data:

create sequence movies_id_seq;
create table movies (
   id integer not null
      default nextval('movies_id_seq'),
   name varchar(256) not null unique
      check (length(name) > 0),
   length_minutes integer not null
      check (length_minutes > 0),
   rating varchar(8) not null
      check (rating in ('Unrated', 'G', 'PG', 'PG-13', 'R', 'NC-17')),
   primary key (id)
);

First, all columns that should not be null (all of them in this case) are marked with not null.

Next, we’ve added check constraints to each column that takes freeform data. We check that the length of the name column is at least one character. We check that the running length of the movie is at least one minute. We also check that the movie rating is one of six possible valid values: Unrated, G, PG, PG-13, R, or NC-17.

To ensure there are no duplicate movie entries, we also add the unique keyword to the name definition. Note that in Postgres, this creates an implicit index on the name column. The index is used to make the process of checking for uniqueness fast internally within the database, but it also will speed up any queries where we query movies by name. Indexing will be covered in greater detail later in this chapter.

Finally, the id column has a default added. It’s the same value that Rails would assign automatically to that column, but this helps if we ever have to add data from the psql prompt, or if we write advanced application code that bypasses ActiveRecord and uses raw SQL.

When we run our test cases against our updated schema, all of the tests pass, which is a good step forward:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_test_case.rb
Loaded suite test/unit/movie_test_case
Started
........
Finished in 0.0751310000000001 seconds.

8 tests, 16 assertions, 0 failures, 0 errors

Unfortunately, we still have problems to work out. The goal of the tests was for a database exception to be thrown. However, we never want an exception to be thrown during normal operation. Exceptions are meant for exceptional situations only, and bad user input, for example, is not so exceptional. Our model class should detect when we are about to do something bad and prevent it. For example, when the error is related to user input, the application should prompt the user for better input.

What we would like, then, is for exactly the same code as we had in our previous set of tests to run, but rather than throw a database exception during a save call, the model class should simply refuse to save through to the database and return false.

Below, we’ve recast all of the database tests as model tests. In doing so, we’ve removed the wrapping of the code with test_for_db_error. These are still negative tests, but we want to assert that the records that would have been invalid did not get saved. Because save returns false and we want to ensure validations fails, we assert that !save is true.

Keep in mind that while the database tests themselves have been removed to save space, we would certainly want to keep them around in practice.

require File.dirname(__FILE__) + '/../test_helper'
require 'movie'

class MovieTestCase < Test::Unit::TestCase

   # model constraints tests

   def test_no_name
      movie = Movie.new(:rating => 'PG', :length_minutes => '10')
      assert movie.new_record?, "Model constraints did not catch null name"
   end

   def test_empty_name
      movie = Movie.new(:name => '', :rating => 'PG', :length_minutes => '10')
      assert !movie.save, "Model constraints did not catch empty name"
   end

   def test_same_movie
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG',
                                 :length_minutes => '10')
      movie_dup = movie.clone
      movie.save
      assert !movie_dup.save, "Model constraints did not catch duplicate movie"
   end

   def test_no_rating
      movie = Movie.new(:name => 'Casablanca', :length_minutes => '10')
      assert !movie.save, "Model constraints did not catch null rating"
   end

   def test_invalid_rating
      movie = Movie.new(:name => 'Casablanca', :rating => 'Fred',
                                 :length_minutes => '10')
      assert !movie.save, "Model constraints did not catch invalid rating"
   end

   def test_no_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG')
      assert !movie.save, "Model constraints did not catch null movie length"
   end

   def test_zero_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG',
                                 :length_minutes => '0')
      assert !movie.save, "Model constraints did not catch zero length movie"
   end

   def test_negative_length
      movie = Movie.new(:name => 'Casablanca', :rating => 'PG', :length_minutes => '-10')
      assert !movie.save, "Model constraints did not catch negative movie length"
   end

end

The output from running these new tests is shown below, again compressed for brevity. The eight database tests that test for exceptions continue to pass, but the model tests now fail because of those very exceptions. At this point, this is what we expect. The database is doing its job, but the model is not:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_test_case.rb
Loaded suite test/unit/movie_test_case
Started
........EEEEEEEE
Finished in 0.050854 seconds.

   1) Error: test_empty_name(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   new row for relation "movies"
 violates check constraint "movies_name_check"

   2) Error: test_invalid_rating(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   new row for relation "movies"
 violates check constraint "movies_rating_check"

   3) Error: test_negative_length(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   new row for relation "movies"
 violates check constraint "movies_length_minutes_check"

   4) Error: test_no_length(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   null value in column "length_minutes"
 violates not-null constraint

   5) Error: test_no_name(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   null value in column "name"
 violates not-null constraint

   6) Error: test_no_rating(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   null value in column "rating"
 violates not-null constraint

   7) Error: test_same_movie(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   duplicate key violates unique
 constraint "movies_name_key"

   8) Error: test_zero_length(MovieTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   new row for relation "movies"
 violates check constraint "movies_length_minutes_check"

16 tests, 16 assertions, 0 failures, 8 errors

Note that the tests did not fail due to an assertion failure, but with a full-on error. That’s because our data layer is now an active participant in enforcing our rules. Exceptions were thrown and consequently no records were saved. Without the constraints we added earlier, the tests would not have failed as a result of database exceptions, but due to logic errors. Our assertions would have failed, but meanwhile invalid data would have snuck through to the data layer. Try it yourself: remove the database constraints and run the tests again. How does the test output differ?

It’s time now to add model validations that will prevent normal code flow from throwing database exceptions. Here is our updated model class for the Movie class:

class Movie < ActiveRecord::Base
   validates_presence_of :name, :rating, :length_minutes
   validates_uniqueness_of :name
   validates_length_of :name, :maximum => 256
   validates_numericality_of :length_minutes, :only_integer => true
   
   has_many :movie_showtimes
   
   VALID_RATINGS = ['Unrated', 'G', 'PG', 'PG-13', 'R', 'NC-17']
   
   def validate_length_minutes
      if length_minutes && length_minutes <= 0
         errors.add 'length_minutes',
            'must be greater than zero'
      end
   end
   
   def validate_rating_type
      if !VALID_RATINGS.include?(rating)
         errors.add 'rating',
            "must be #{VALID_RATINGS[0..-2].join(', ')} or #{VALID_RATINGS[-1]}"
      end
   end
   
   def validate
      validate_length_minutes
      validate_rating_type
   end
end

Each database constraint has an analogous model validation:

  • Each not null constraint is translated into a validates_presence_of validation.
  • Our unique constraint appears as a validates_uniqueness_of validation.
  • The check constraint on the length of the name column is represented via a validates_length_of call. It’s important to test upper-length boundaries on all freeform fields, as failing to do so would result in an exception thrown if a user enters an extremely long movie name.
  • Similarly, for length_minutes, we verify that the input is of the correct type for the physical storage, with validates_numericality_of and the :only_integer flag set to true.

There are two tests accomplished easily in the data layer for which there aren’t any analogous built-in Rails validators. For those—validating that the movie length is greater than zero, and that the rating is valid—we must write our own custom validators.

Armed with our new model validators, we run our unit tests again. Here are the results:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_test_case.rb
Loaded suite test/unit/movie_test_case
Started
FFFFFFFF........
Finished in 1.194743 seconds.

   1) Failure: test_db_empty_name(MovieTestCase)
There is an error in our test code.
   2) Failure: test_db_invalid_rating(MovieTestCase)
There is an error in our test code.
   3) Failure: test_db_negative_length(MovieTestCase)
There is an error in our test code.
   4) Failure: test_db_no_length(MovieTestCase)
There is an error in our test code.
   5) Failure: test_db_no_name(MovieTestCase)
There is an error in our test code.
   6) Failure: test_db_no_rating(MovieTestCase)
There is an error in our test code.
   7) Failure: test_db_same_movie(MovieTestCase)
There is an error in our test code.
   8) Failure: test_db_zero_length(MovieTestCase)
There is an error in our test code.

16 tests, 16 assertions, 8 failures, 0 errors

All of the new unit tests pass, but now our old ones fail. This is because our database tests assume a database exception is thrown to validate the negative test. However, now our models correctly prevent the exception from ever being reached. We still want to test the data layer, though, and we can still do so. Rails provides a mechanism to skip validation on save. The save method, it happens, is a wrapper around calling the method save_with_validation with the parameter true. To make the tests run as we expect, we need to replace the save calls with calls to save_with_validation with the parameter false, like this:

def test_db_no_name
      movie = Movie.new(:rating => 'PG', :length_minutes => '10')
      test_for_db_error("Database did not catch null name") do
         movie.save_with_validation(false)
      end
   end

Here’s the output of running the tests with this change:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_test_case.rb
Loaded suite test/unit/movie_test_case
Started
................
Finished in 0.445988 seconds.

16 tests, 24 assertions, 0 failures, 0 errors

Now all tests pass.

Mythbusting

It’s time again to dispel some frequently propagated myths.

The first myth is that database constraints impose an undue burden on the database, and slow down your application. It’s worth pointing out, based on this controlled example, that the time consumed by checking database constraints in negligible. Adding database constraints only contributed 0.015 seconds to the overall test run.

The next myth is that model validations are all you need, and that database constraints are a waste of time. The repeated refrain is You Aren’t Going to Need It (YAGNI). Unfortunately, the fact that the Rails API itself provides easy mechanisms to skip validation, as we did in our second iteration of database unit tests with save_with_validation(false), means you are going to need it. Perhaps as a disciplined developer you will pledge to avoid using the API this way. However, when your site becomes a success and you hire a dozen additional developers, will your own discipline be enough?

In addition to

save_with_validation(false),

there are also more innocuous sounding API methods that also skip validation, such as update_attribute. See the sidebar, where the API for this method and the related method update_attribute_with_validation_skipping is reproduced.

Rails API Methods That Lack Validation

Here are Rails API methods that skip validation:

update_attribute(name, value)

Updates a single attribute and saves the record. This is especially useful for boolean flags on existing records. Note: This method is overwritten by the Validation module that’ll make sure that updates made with this method doesn‘t get subjected to validation checks. Hence, attributes can be updated even if the full object isn’t valid. [sic]

update_attribute_with_validation_skipping(name, value)

Updates a single attribute and saves the record without going through the normal validation procedure. This is especially useful for boolean flags on existing records. The regular update_attribute method in Base is replaced with this when the validations module is mixed in, which it is by default.

Although the documentation says skipping validation can be useful for toggling a Boolean value, there is nothing about these methods that ensures that’s all you are doing. The reality of the situation is that the maintainers of the API had a need for a lightweight mechanism to make updates they were sure would be safe, but now Pandora’s box has been opened, and along with it the ability to circumvent the safety of model validation on a whim.

Further, there are many types of constraints that are difficult or impossible to express or guarantee at the application level, but are very simply expressed in the data layer. Our custom validators for movie length and rating are simple examples, and we will see more examples in the following chapters. As application-level validators become more and more complex, it becomes easier for bugs to creep in and prevent them from fulfilling their purpose. Data layer constraints, on the other hand, are generally very succinct, and in these cases, simple really does make a difference.

Finally, because model validations are often more verbose than the corresponding database constraints, it’s easy to omit them, or to lose them during refactorings or rewrites of application code. Database constraints, on the other hand, do not need to be constantly declared and maintained in order to function. After they have been specified once, they become part of the data model and require a deliberate action to remove them. In the database, there is no such thing as save_with_validation(false).

Referential Integrity

We now have a full set of constraints: both physical at the data layer, and also logical at the application layer. However, we still have a major problem. Although we have references from the movie_showtimes table to the theatres and movies tables in the form of the theatre_id and movie_id columns, there is nothing in our data layer to guarantee that those references are valid, or even present, for that matter. Similarly, although we defined our belongs_to relationships in the MovieShowtime class, there is nothing guaranteeing those references are present from the model perspective, either. Our simple data model lacks referential integrity enforcement.

Now that we have gotten the hang of adding database and model tests, we’ll proceed at a quicker pace in testing the MovieShowtime class. Below are our test cases. We define a setup method, which creates a theatre object and a movie object. Most of our other tests will be concerned with leaving one, the other, or both of these objects out of the attempts to save a new MovieShowtime object.

We normally would have tests for constraint checking for the presence and length of the auditorium field, and the presence of the start_time field. However, you’ve gotten the hang of that now from our experience with the movie test cases, so we’ll leave those out to save space:

require File.dirname(__FILE__) + '/../test_helper'

class MovieShowtimeTestCase < Test::Unit::TestCase

   def setup
      @theatre = Theatre.create!(
          :name => 'Ruby Palace',
          :address_line_1 => '123 Broadway',
          :address_city => 'Cambridge',
          :address_state => 'MA',
          :address_zip_code => '02139',
          :phone_number => '5555555555')
      @movie = Movie.create!(
          :name => 'Casablanca',
          :rating => 'PG',
          :length_minutes => '10')
   end

   #
   # model tests - referential integrity
   #

   def test_add_showtime_no_movie
      st = MovieShowtime.new(:theatre => @theatre,
                                        :auditorium => '1',
                                        :start_time => Time.now.xmlschema)
      assert !st.save, "Model validation allowed save with no movie"
   end

   def test_add_showtime_no_theatre
      st = MovieShowtime.new(:movie => @movie,
                                        :auditorium => '1',
                                        :start_time => Time.now.xmlschema)
      assert !st.save, "Model validation allowed save with no theatre"
   end   

   #
   # database tests - referential integrity
   #

   def test_db_add_showtime_no_movie
      test_for_db_error "Database allowed save with no movie." do 
         st = MovieShowtime.new(:theatre => @theatre,
                                           :auditorium => '1',
                                           :start_time => Time.now.xmlschema)
         st.save_with_validation(false)
      end
   end

   def test_db_add_showtime_no_theatre
      test_for_db_error "Database allowed save with no theatre." do 
         st = MovieShowtime.new(:movie => @movie,
                                           :auditorium => '1',
                                           :start_time => Time.now.xmlschema)
         st.save_with_validation(false)
      end
   end

   def test_db_add_showtime_invalid_references
      test_for_db_error "Database allowed save with invalid references." do 
         st = MovieShowtime.new(:movie_id => 12,
                                           :theatre_id => 99,
                                           :auditorium => '1')
         st.save_with_validation(false)
      end      
   end

end

As expected, since we have neither database constraints nor model validations, all our tests fail:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
FFFFF
Finished in 0.112901 seconds.

   1) Failure: test_add_showtime_no_movie(MovieShowtimeTestCase) 
Model validation allowed showtime save with no movie.
   2) Failure: test_add_showtime_no_theatre(MovieShowtimeTestCase) 
Model validation allowed showtime save with no theatre.
   3) Failure: test_db_add_showtime_invalid_references(MovieShowtimeTestCase)
Database allowed save of movie_showtime with invalid references.
   4) Failure: test_db_add_showtime_no_movie(MovieShowtimeTestCase)
Database allowed save of movie_showtime with no movie.
   5) Failure: test_db_add_showtime_no_theatre(MovieShowtimeTestCase)
Database allowed save of movie_showtime with no theatre.
5 tests, 8 assertions, 5 failures, 0 errors

Here is our updated

movie_showtime

table, redesigned to catch these problems at the data layer and prevent invalid data from entering our database:

create table movie_showtimes (
   id integer not null
      default nextval('movie_showtimes_id_seq'),
   movie_id integer not null
      references movies(id),
   theatre_id integer not null
      references theatres(id),
   auditorium varchar(16) not null
        check (length(auditorium) > 0),
   start_time timestamp with time zone not null,
   primary key (id)
);

Here, because for each column we only reference one foreign column at a time, we use the simplest syntax for a foreign key reference:

column_name type references reftable(refcolumn)

We could also add the reference outside of the table definition. This gives us the added benefit of being able to name the foreign key reference, and also specify compound key references:

alter table table_name add foreign key ( column_name [, ... ] )
   references reftable [ ( refcolumn [, ... ] ) ];

At the moment , we don’t have any need for compound key references, but we will see them in Chapter 7.

Here’s our updated model class definition, which also enforces the presence of the appropriate references:

class MovieShowtime < ActiveRecord::Base
   belongs_to :movie
   belongs_to :theatre
   
   validates_presence_of :movie, :theatre
   validates_presence_of :start_time, :auditorium
   validates_presence_of :auditorium

   validates_length_of :auditorium, :maximum => 16
end

Now all of our tests pass. Here are the results:

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
...........
Finished in 0.117669 seconds.

5 tests, 8 assertions, 0 failures, 0 errors

One important point to note about our two sets of tests is that although our model tests checked for the presence of the foreign key references, they did not check the validity of those references. Take a closer look at our third test, test_db_add_showtimes_invalid_references. We didn’t create an analogous model test because there is no practical way to make the test pass. A scenario in which this type of bug might turn up would be if two editors were working with our site at the same time. One editor loads a page to create a movie showtime, and selects Casablanca. The other editor, meanwhile, deletes Casablanca from the movie database. When the first editor submits her showtime, passing the movie id for Casablanca to the backend, the record is already gone and the save of the showtime would cause an error. The way around this possibility requires a hefty performance trade-off: load the record to be referenced first, and pass the object rather than the id, as in our other tests.

Another problem is that if we call the destroy method on a movie record referenced by the movie_showtimes table, ActiveRecord is happy to oblige, leaving the associated movie_showtime record in an invalid state. Luckily, our database foreign key references are already set up to prevent this from happening. Below are four new tests; the first two are database tests that attempt to destroy the movie and theatre objects after they have been referenced by a MovieShowtime object. These tests actually pass, because we are testing that the data layer catches the bugs, and it does. The associated model tests fail, however. They throw database exceptions, as shown below the tests. Note that without the database constraints, the tests still would not pass, but due to an assertion failure.

   #
   # database invalid destroy tests
   #
   def test_db_prevents_invalid_theatre_destroy
      st = MovieShowtime.create!(
         :theatre => @theatre,
         :movie => @movie,
         :auditorium => '1',
         :start_time => Time.now.xmlschema)
      test_for_db_error "Database allowed referenced theatre to be deleted" do
         @theatre.destroy_without_callbacks
      end
   end

   def test_db_prevents_invalid_movie_destroy
      st = MovieShowtime.create!(
         :theatre => @theatre,
         :movie => @movie,
         :auditorium => '1',
         :start_time => Time.now.xmlschema)
      test_for_db_error "Database allowed referenced movie to be deleted" do
         @movie.destroy_without_callbacks
      end
   end


   #
   # model invalid destroy tests
   #
   def test_prevents_invalid_theatre_destroy
      st = MovieShowtime.create!(
         :theatre => @theatre,
         :movie => @movie,
         :auditorium => '1',
         :start_time => Time.now.xmlschema)
      @theatre.destroy
      dependent_showtimes_count = 
         MovieShowtime.find_all_by_theatre_id(@theatre.id).size
      assert dependent_showtimes_count == 0,
         "Model allowed destruction of theatre with dependent objects"
   end

   def test_prevents_invalid_movie_destroy
      st = MovieShowtime.create!(
         :theatre => @theatre,
         :movie => @movie,
         :auditorium => '1',
         :start_time => Time.now.xmlschema)
      @movie.destroy
      dependent_showtimes_count = 
         MovieShowtime.find_all_by_movie_id(@movie.id).size
      assert dependent_showtimes_count == 0,
         "Model allowed destruction of movie with dependent objects"
   end

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
............EE
Finished in 0.144437 seconds.

   1) Error: test_prevents_invalid_movie_destroy(MovieShowtimeTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   update or delete on table "movies"
 violates foreign key constraint "movie_showtimes_movie_id_fkey" on table "movie_showtimes"
   2) Error: test_prevents_invalid_theatre_destroy(MovieShowtimeTestCase):
ActiveRecord::StatementInvalid: PGError: ERROR:   update or delete on table "theatres"
 violates foreign key constraint "movie_showtimes_theatre_id_fkey" on table "movie_showtimes"

14 tests, 19 assertions, 0 failures, 2 errors
dan-chaks-computer-2:~/web/theatre-tickets chak$

Here we find another situation in which model validations can leave us high and dry. Rails does provide a mechanism for dealing with these kind of interrelationships. Whenever we define a model relationship, such as belongs_to, we can define what should happen to dependent objects when the object in question is destroyed. The dependent objects can be defined to have the destroy method called on them first, with:

has_many :movie_showtimes, :dependent => :destroy

This, in turn, calls destroy on any objects dependent on the original dependent object. Or, if it’s known that the dependent objects themselves don’t have dependencies, the model callbacks on the dependent objects can be skipped with:

has_many :movie_showtimes, :dependent => :delete

Because we have the option of skipping dependent objects’ callbacks, it’s important to note that if dependent objects later have bestowed upon them new dependent objects, we have to remember to commute any existing deletes into destroys.

Sometimes it really is an error to destroy data that has dependent objects, and the correct response to a deletion attempt is to fail, and fail fast.

Consider a medical records database. If you add a patient record and realize immediately that you misspelled the patient’s name, or otherwise created the record with some set of invalid information, and it is easiest to just start over. In fact, it might be all right to delete the record. However, once you’ve added dependent data, such as patient history or test results, deleting the patient record would not only be a mistake, it would also be grounds for a lawsuit. If there is a software bug that allows a delete statement to be issued when there is dependent data, a database constraint is the only surefire way to ensure that the deletion fails, as it should. In this case we wouldn’t declare a destroy or delete dependency, because we wouldn’t want that action to propogate; we want it to throw an exception.

Once we add declarations of what to do to dependent objects upon deletion to the movie and theatre classes, all tests pass again.

dan-chaks-computer-2:~/web/theatre-tickets chak$ ruby test/unit/movie_showtime_test_case.rb 
Loaded suite test/unit/movie_showtime_test_case
Started
..............
Finished in 0.146988 seconds.

14 tests, 21 assertions, 0 failures, 0 errors

Intro to Indexing

Our simple schema is almost complete. At this point, it is completely locked down at the data layer and at the model layer. At this point we could begin coding a frontend. However, if a site built atop this data model has any success whatsoever, it will quickly slow to a crawl as our tables become pregnant with data, and will topple over the minute the site gets listed on Slashdot. This is due to lack of indexes. Without indexes, every query must scan the entirety of each table referenced in the query—not a problem when we have just a few entries in each table. However, if we load all movies ever made into our movies table, and all of the theatres in the United States into the theatres table, and then proceed to add movie showtimes, the speed (or lack thereof) would be unbearable.

Like all of data modeling, appropriate indexing is an art form. In our simple example, however, it’s easy to identify a couple rules of thumb.

Rule #1: Every foreign key reference should be indexed. Queries that ask questions such as, “What are all the movie showtimes for this movie?” or “What are all the movie showtimes in this theatre?” are sure to be frequent. In fact, these are exactly the queries generated when traversing ActiveRecord associations such as

theatre.movie_showtimes or movie.movie_showtimes.

We create the indexes like this:

create index movie_showtimes_movie_id_idx on movie_showtimes(movie_id);
create index movie_showtimes_theatre_id_idx on movie_showtimes(theatre_id);

Rule #2: Any column that will appear in any SQL where clause should be indexed. In our example, looking up theatres by zip code or showtimes by their start time are the most obvious queries, and they therefore deserve indexes:

create index theatres_zip_idx on theatres(address_zip_code);
create index movie_showtimes_start_time_idx on movie_showtimes(start_time);

The other obvious column we might want to search on is the movie name, but we don’t need to create an explicit index for it. Recall that when we declared the movie name to be unique, Postgres created an implicit index for us.

Once you’ve created indexes, you don’t have to do anything special in your application to use them. If the database, in planning how to execute a query decides use of the index will improve performance, it will use it. If the index won’t help a particular query, it will be ignored.

Chapter 4 : Database As a Fortress
Chapter 6 : Refactoring to Third Normal Form
Advertisements