James's Database
~~James
La Borde
Normalization
As
I have mentioned in a previous articles, planning your database before you begin
is highly important. This month we will be looking at a very key piece of that
puzzle, normalization. Normalization is the process that makes a relational
database relational. If you are following the advice from earlier articles you
are already practicing normalization and don’t even know it. It is removing
redundant data and moving it into a unique, related table of its own.
The Normal Forms
There
are three forms (or degrees) of normalization. The First Normal Form
basically says that there can not be any repeating groups and that the data in
each field can not be divided. This is the stage at which you create a primary
key. Second Normal Form states that all fields within the table should
refer to the primary key. Third Normal Form requires that all the
criteria of First and Second Normal Forms be met and that you remove all fields
that describe a non-key value.
What
does all of that mean in English? Let’s tackle First Normal Form first.
Basically, First Normal Form is the breaking up of your data into the smallest
fields possible and then moving duplicate information into another table, where
it will appear only once. In short, this would mean splitting up an Address
field into Street Address, City, State, and ZIP. Second Normal Form ensures that
all fields in a table support only the primary key in the table. This means that
if your primary key is Shipping Address ID, only information pertaining to the
Shipping Address is included in the table. Items that refer to, for example, a
billing address, would be located in another table. Third Normal Form goes
further in that it requires any field containing data that can be derived from
information in another field in the same table must be removed. This is a
difficult concept to grasp. The best example I have seen is using an address. In
an address, the only data that truly refer to the primary key are the street
address and the ZIP code; the city and state can be derived from the ZIP code.
In this instance, a separate table would be required to store the ZIP code
information.
How Far is Far Enough?
As
you may have noticed, going all the way to Third Normal Form does seem quite
tedious and over the top for most database applications. Normalization is
actually a concept that was borrowed from mathematics and therefore does not
always fully apply. Let’s look at an example of what happens when you take
this concept to the fullest extent, such as a table containing the names and
addresses of your customers. Just to meet the First Normal Form completely we
would have to do quite a bit. First, the name would have to be divided into
first and last names. The address would have to be divided as well, further than
you may realize. In order to be truly atomic (the smallest division possible)
the street address itself would be divided into a minimum of two and possibly
three fields. The first would be the number, next the street name, and last (if
you wanted to go this far), would be street suffix. Then we would do the usual
division of the city, state and ZIP, of course, the ZIP code would be divided
into ZIP and ZIP+4 extension. Ah, we finally meet the First Normal Form right?
No, remember there can be no repeating groups or redundant data. Names would
have to be separated into at least two fields: we could have many ‘John’s
for instance; and of course, the potential exists for common last names as well.
The streets could repeat too as well as the street suffix. Now we have our
Address table, first names, last names, streets, and street suffixes in
supporting tables. We finally meet the First Normal Form. Wonderful news, we
also meet the Second Normal form since we have included relatively little
information. To meet the Third Normal Form, we now have to remove any fields
that are not dependent upon the primary key. This means that we have to move all
of the information regarding the city and state to a ZIP code information table
as the city and state are dependant upon that and not upon the primary key. As
you can see, this is entirely too far. Most developers find their own style and
stick with it. The primary thing to worry about is consistency.
Consistency
is the key to successfully developing your database in regards to the
normalization process. Be consistent not only within your tables but also
throughout the entire database. If you use first and last names separately in
one table, then do it throughout your database. Your clients’ needs will
determine just how far you take the normalization process. When you are done
with your database, the end result will show whether or not you did your
preliminary work up front.
Coming
Next Month: Naming Conventions
As
for after that I am open to suggestions from everyone. I was thinking a
question and answer article would be good so feel free to send in your Access
questions.
James La Borde works
in the computer department at a Credit Union,
where he uses Access, SQL Server, VBA, and ODBC daily. He also
teaches online Access classes at
Eclectic
Academy.