Chapter 7: Domain Data

Back to Table of Contents.

Domain Data

Domain tables are a special type of table in data modeling. They contain knowledge known a priori about the application’s domain, and the contents within these tables don’t normally change through interaction with the application. In our data model, we already have an example of a domain table: the ratings table. Regardless of what our application looks like or how it changes over time, regardless of what data we intend to store about users, movies, or orders, the set of ratings and their meanings in the context of the domain of motions pictures is set.

Domain tables are particularly special because at first glance they can seem extraneous. Since they don’t change frequently, you could encode the data they contain in application code rather than directly in the database. In fact, many beginners to data modeling leave domain tables out completely, preferring to store the concepts they encode in the code itself. However, promoting domain data to fully fledged relational tables and records provides numerous benefits.

First, domain tables play a key role in helping to maintain referential integrity. If multiple tables reference ratings, having them defined once in a single table ensures that all tables reference the same set. If multiple tables have a text column for the movie rating as our movies table did at the start of Chapter 5, it is be very easy for each table to store different sets of ratings, perhaps “PG-13” in one table, and “pg13” in another. It would be extremely difficult for our application to know these are the same thing. Even if we had check constraints on each table, we’d have to search for and update each check constraint if the motion picture association added a new rating. That process is also prone to error or omission, and it’s certainly not very DRY.

The second benefit of domain tables is that they help maintain third normal form and keep our data model flexible; as with our example above, we were able to easily add a description for each rating without any need to change our software and without sacrificing 3NF. Just like with rating information, domain tables often “come from somewhere,” meaning that the information can be researched ahead of time—if not on the Internet then by interviewing someone familiar with the business behind your application. It’s a good idea to put everything you know in the domain table from the very beginning. What was initially a single column in a table sometimes turns out to be a large table full of rich data that can be leveraged in interesting ways as the application matures.

Perhaps the best thing about domain tables is how well they interface with Rails. A common convention to deal with the problem of mixing up literals like “PG-13” and “pg13” is to declare constants for the appropriate values. Of course, having a convention doesn’t mean people will follow it. Data from domain tables can be treated like constants, which encourage good convention, but also enforce the use of the convention because referential integrity requires it.

Let’s take the ratings table as an example. The first thing we do, since this is a domain table and is unlikely to change, is add insert statements to our schema definition for each rating type:.

insert into ratings(name, description)
   values('G', 'General audiences');
insert into ratings(name, description)
   values('PG', 'Parental guidance suggested');
insert into ratings(name, description)
   values('PG-13', 'Parents strongly cautioned');
insert into ratings(name, description)
   values('R', 'Restricted');
insert into ratings(name, description)
   values('NC-17', 'No one under 17 admitted');
insert into ratings(name, description)
   values('Unrated', 'This movie has not been rated');

We can insert these records directly because we won’t need an interface to manipulate domain data. Even though the table has an ActiveRecord model, the frequency with which it changes is likely to be much slower than the frequency with which our application goes through complete rewrites.

The next step is to create the constants that convention dictates you create anyway. In this case, however, rather than the value of constant being set to some arbitrary string value, such as “PG-13” for the constant PG13, we can set the value to the actual ActiveRecord object itself:

class Rating < ActiveRecord::Base
   validates_presence_of :name, :description
   validates_length_of :name, :maximum => 16
   
   G          = Rating.find_by_name('G')
   PG         = Rating.find_by_name('PG')
   PG13      = Rating.find_by_name('PG-13')
   R          = Rating.find_by_name('R')
   NC17      = Rating.find_by_name('NC-17')
   UNRATED = Rating.find_by_name('Unrated')
   
end

Now, to assign a rating to a movie, we can use the constant, such as Rating::R or Rating::PG13. Each of these constants corresponds directly to the database record itself, but has the additional flexibility of being a constant. Also, since we initialize these constants in the model class definition, the database query to get the record is performed only once, the first time the class is loaded:

>> Rating.find(:all).map{|r| r.name}
=> ["PG", "PG-13", "NC-17", "Unrated", "G", "R"]
>> Rating::R
=> #"R", "id"=>"10", "description"=>"Restricted"}>
>> Rating::PG13
=> #"PG-13", "id"=>"3", "description"=>"Parents strongly cautioned"}>
>> Rating::G.description
=> "General audiences"

And of course, when we want to reference the rating, such as when creating new movies, we just use the constant. The ActiveRecord object is referenced just as expected, and the movie record saves normally:

>> m = Movie.new(:name => 'Rocky Horror Picture Show', :length_minutes => 100, :rating => Rating::R)
=> #"Rocky Horror Picture Show", "length_minutes"=>100, "rating_id"=>10}, @rating=#"R", "id"=>"10", "description"=>"Restricted"}>>
>> m.save
=> true

Dealing with Zip Codes

Zip codes, while they may not benefit from having constants defined within the ZipCode model, will nonetheless benefit from a table of all the valid zip codes (which certainly are known ahead of time), making validating zip codes a snap. If the zip exists in the table, it’s valid. Similarly, if the zip does not exist, a database exception will be thrown before an application-level oversight lets faulty data through.

Generally for a zip code table, the zip code itself should be the primary key, contained in a varchar column to accommodate zip codes that begin with a zero. In our table, we’ve decided to name the primary key column zip to keep things clear. Even though Rails prefers the primary to be called id, we can use the declaration set_primary_key to inform Rails we’ve chosen a different name. This makes our table definition much more readable, and more importantly, self-documenting:

create table zip_codes (
   zip varchar(16) not null,
   city varchar(255) not null,
   state_abbreviation varchar(2) not null,
   county varchar(255) not null,
   latitude numeric not null,
   longitude numeric not null,
   primary key(zip)
);

Zip code data is available from a variety of sources online, and with some massaging, can be easily imported into your database. As we discussed above, one of the primary benefits of having a zip code domain table is the ability to do distance calculations, to answer questions like, “What theatres are playing Casablanca within 10 miles of my home?” Almost all zip code databases available online provide latitude and longitude coordinates for zip codes. To facilitate calculating the answer to our question quickly, we create a stored procedure, miles_between_lat_long, which provides a reasonably good approximation of distance with a minimal set of complex calculations (note that this equation is an approximation only; if your application requires high precision distances, you will want to use a better—but likely slower—formula):

create or replace function miles_between_lat_long(
   lat1 numeric, long1 numeric, lat2 numeric, long2 numeric
) returns numeric
language 'plpgsql' as $$
declare
   x numeric = 69.1 * (lat2 - lat1);
   y numeric = 69.1 * (long2 - long1) * cos(lat1/57.3);
begin
	return sqrt(x * x + y * y);
end
$$;
//We can then add a method,
zips_within_miles

to our ZipCode model, which returns all of the zip code objects within a given distance from the zip in question:

class ZipCode < ActiveRecord::Base
   set_primary_key 'zip'
   
   def zips_within_miles(miles)
      ZipCode.find(:all, 
                         :conditions => ["miles_between_lat_long(?, ?, 
                                  zip_codes.latitude, zip_codes.longitude) < ?",
                                  self.latitude, self.longitude, miles])
   end
end

The following example, using script/console, finds all of the cities within two miles of Cambridge, Massachusetts:

>> z = ZipCode.find('02139')=> #"CAMBRIDGE", "latitude"=>#<bigdecimal:3463d10,'0.42365079e2',12(12)>, "zip"=>"02139", "county"=>"MIDDLESEX", "state_abbreviation"=>"MA", "longitude"=>#<bigdecimal:3463ce8,'-0.71104519e2',12(12)>}>
 z.zips_within_miles(2).collect{|z| z.city}.uniq 
=> ["BOSTON", "CHARLESTOWN", "ALLSTON", "BRIGHTON", "CAMBRIDGE", "BROOKLINE", "BROOKLINE VILLAGE"]

With a variation on the method above, we can bestow upon any model that “acts as addresses”

the ability to find instances within a given distance from a particular latitude-longitude coordinate. We add the following to the ClassMethods section of our Addresses plugin:

module ClassMethods
   def acts_as_address
         # validation / association code was here...

         # define the find_within_miles proc for all models that contain addresses
         find_within_miles = Proc.new do |zip, miles|
            z = ZipCode.find_by_zip(zip)
            !z ? [] :
            self.class.find(:all,
                                    :include => :zip_code,
                                    :conditions => ["miles_between_lat_long(?, ?, 
                                        zip_codes.latitude, zip_codes.longitude) < ?",
                                        z.latitude, z.longitude, miles])
         end
         self.send(:define_method, 'find_within_miles', find_within_miles)         
      end
   end

We can answer our question of which theatres are close by with very natural language:

Theatre.find_within_miles('02139', 10)

Strategy Pattern with Domain Tables

Another use of domain tables, when the number of records is tractable enough to be represented in code, is to point to collections of methods or data contained in classes, much as a trigger for the Gang of Four strategy pattern.

We will explore this idea with the payment types found in the orders table. We begin with the standard domain table refactoring performed above. Then, we’ll enhance our solution by making use of ActiveRecord’s single table inheritance, to achieve the Domain Table Strategy Pattern. We start out with our initial cut at the Order class, before we create the domain table for payment types. It looks very similar to our initial Ratings class, with an array containing the acceptable payment types to match the database constraint:

class Order < ActiveRecord::Base
   belongs_to :movie_showtime_id
   validates_uniqueness_of :confirmation_code
   validates_presence_of :confirmation_code,
      :purchaser_name, :payment_type,
      :credit_card_number, :credit_expiration_month, :credit_expiration_year
   
   PAYMENT_TYPES = ['Visa', 'MasterCard', 'American Express']
   
   def validate_payment_type
      unless PAYMENT_TYPES.include?(payment_type)
         errors.add('payment_type',
            "must be #{PAYMENT_TYPES[0..-2].join(', ')} or #{PAYMENT_TYPES[-1]}")
   end
   
   def validate
      validate_payment_type
   end
end

Following the ratings example, we factor out the payment type into its own table, payment_types, and we insert the appropriate data for our domain of credit card processing:

create sequence payment_types_id_seq;
create table payment_types(
   id integer not null
      default nextval('payment_types_id_seq'),
   name varchar(128) not null unique
      check (length(name) > 0),
   primary key (id)
);

insert into payment_types (id, name)
   values(nextval('payment_types_id_seq'), 'MasterCard');
insert into payment_types (id, name)
   values(nextval('payment_types_id_seq'), 'Visa');
insert into payment_types (id, name)
   values(nextval('payment_types_id_seq'), 'American Express');

We also create a corresponding class, PaymentType, which contains constants for each payment type. Just as with the movie ratings, each of these constants contains the actual database record, and can be treated like a real ActiveRecord object in our code:

class PaymentType < ActiveRecord::Base
   has_many :orders
   
   VISA = PaymentType.find_by_name('Visa')
   MASTER_CARD = PaymentType.find_by_name('MasterCard')
   AMEX = PaymentType.find_by_name('American Express')

   def validate_card_number(card_number, expiry)
      case self
         when VISA
            validate_visa_card_number(name, address, amount)
         when MASTER_CARD
            validate_master_card_number(name, address, amount)
         when AMEX
            validate_amex_card_number(name, address, amount)
      end
   end

   def process_order(name, address, amount)
      case self
         when VISA
            process_visa_order(name, address, amount)
         when MASTER_CARD
            process_mc_order(name, address, amount)
         when AMEX
            process_amex_order(name, address, amount)
      end
   end
end

What’s new here is that we’ve added some methods that must switch on the constant. Credit card validation is different for each credit card type, and the processing is often different, also. There is likely to be a different processing provider for Visa than for American Express, and the API to secure the funds is likely to be different as well.

Whenever there are if-then-else or switch statements that switch on a type property repeated in many of a class’s methods, it’s usually an indication that we need subclasses to override some common method instead.

The astute reader might point out that we can avoid repetitive

if-then-else

clauses by applying some well-thought-out method-naming conventions, and create a dispatch mechanism. For example:

def process_order(*args)
   self.send(“process_#{self.name.tableize}_order”, *args)
end

While this will certainly will work, we are still left with three copies of each method in a single class— – a recipe for confusion and bugs. Better is to effect a simple change with which we can turn our basic domain table into one that invokes Rails single table inheritance mechanism to pick the right methods for us. All we need is a type column that differentiates each record. Here is our payment_types table again, but we’ve renamed the name column to type, and to make single table inheritance work correctly, our domain records are modified slightly to insert types that are friendly to the Rails inflection mechanism. (“American Express” is now one word.):

create table payment_types(
   id integer not null
      default nextval('payment_types_id_seq'),
   type varchar(128) not null unique
      check (length(type) > 0),
   primary key (id)
);

insert into payment_types (id, type)
   values(nextval('payment_types_id_seq'), 'Visa');
insert into payment_types (id, type)
   values(nextval('payment_types_id_seq'), ‘MasterCard');
insert into payment_types (id, type)
   values(nextval('payment_types_id_seq'), 'AmericanExpress');

The model class for PaymentType is now vastly simplified. We no longer have repeated switch statements. Instead we define stubs that throw an exception if the method is not overridden in a subclass, and we expect that subclasses will override each method. Also note that to load our constants, we now say find_by_type rather than find_by_name:

class PaymentType < ActiveRecord::Base
   has_many :orders
   
   VISA = PaymentType.find_by_type('Visa')
   MASTER_CARD = PaymentType.find_by_type('MasterCard')
   AMEX = PaymentType.find_by_type('AmericanExpress')

   def validate_card_number(card_number, expiry)
      raise "This method must be redefined in the subclass"
   end

   def process_order(name, address, amount)
      raise "This method must be redefined in the subclass"
   end
end

The following is an example of one of the subclasses. In each one, we place the logic for the particular type of credit card:

class AmericanExpress < PaymentType
   def validate_card_number(card_number, expiry)
      # implement algorithm for card number validation
   end

   def process_order(name, address, amount)
      # implement credit card processing logic
   end
end

It’s worth noting that now each constant not only points to a different object, but to an object of a different class. Notice the script/console output below, and how it differs from the same exercise performed for the ratings:

>> PaymentType::VISA
=> #"Visa", "id"=>"1"}>
>> PaymentType::MASTER_CARD
=> #"MasterCard", "id"=>"2"}>
>> PaymentType::AMEX
=> #"AmericanExpress", "id"=>"3"}>

Now within the order class, we can write very elegant and clear statements that delegate to the appropriate PaymentType subclass. Keep in mind that everything except credit card validation has been removed from the class definition below, to save space—you wouldn’t remove them in practice:

class Order < ActiveRecord::Base
   def validate_credit_card_number
      unless self.payment_type.validate_card_number(
         credit_card_number,
         credit_expiration_month,
         credit_expiration_year)
         errors.add('credit_card_number', 'is invalid')
   end
   
   def validate
      validate_credit_card_number
   end
end

Refactor from Day One

Our final data model, with addresses factored into their own inherited table, and with ratings, payment types, and zip codes moved into domain tables, is shown in Figure 7-1.

er_0701Figure 7-1. Our data model after creating domain tables for zip codes and payment types

In terms of understandability, design elegance, and therefore maintainability, this is a much better design than where we were when began this chapter. We haven’t created any controllers or views yet, but that’s okay—they are consumers of our models, and making changes to models becomes infinitely more difficult once active code is relying on them.

The real goal is to be able to spot opportunities for refactoring before design choices become entrenched in client code, and before the design becomes problematic due to scale or lack of understandability, leading to bugs. We are taking an intentionally slow and measured approach to data modeling here to point out patterns of refactoring that you should learn to spot in your own projects. Of course, we could have started with the “right” data model from the start, but then the patterns of how to move from wrong to right would not have emerged.

Are there other opportunities for refactorings that may save us headaches down the road?

One possibility might be the credit card information stored in the orders table. Although it’s not likely to repeat in another table, the binding of the information to the orders table may not be ideal. Imagine that not long after this site becomes a smashing success, we want to add member accounts and recall users’ addresses and credit cards when they log in. Splitting out the credit card information into its own table named credit_cards will make it a breeze to later add a mapping table users_credit_cards, which links the information together. Since the address is actually used for credit card verification, it’s probably credit_cards that should inherit from addresses, not orders.

The phone_number column left behind in the theatres table is a good candidate for normalization as well. We may someday want to list multiple phone numbers, perhaps a local number and a toll-free number. Rather than adding a column each time we think of a new phone number we’d like to store, a table containing a number_type column, along with a join table theatres_phone_numbers does the trick much more elegantly. The credit card service that processes the payments might require a phone number for verification purposes as well, and we could easily reference the phone number with a foreign key reference from the credit_cards table.

How far to go in the first round of normalization and refactoring is a matter of judgment. In this book, we won’t perform the refactorings we just listed, but only because we won’t run into the problems we listed within this text. (Also, the examples would become monotonous.) However, if we were really building this site, it would behoove us to normalize as much as we could up front. While correct (and full) normalization can seem to result in an explosion of tables and therefore an explosion in the amount of work associated to write models and tests, in practice the upfront work proves minimal, because each table and model class is simpler to write and easier to test. Fewer bugs tend to creep in, and when they do they are localized to a smaller subset of code, and are much easier to find.

Chapter 6 : Refactoring to Third Normal Form
Chapter 8 : Composite Keys and Domain Key/Normal Form