Exploring the structure and architecture of databases.
Electromagnetic properties can be varied to read as either an 0 or 1. Such a reading is single bit. Eight bits make a byte. A single byte can be translated to a character. (Actually the dominant convention, ASCII, codes characters using 1 byte but the newer convention, Unicode, uses 2 bytes.)
Characters make the data for records (aka rows; tuples;) each of which has a number of fields (aka columns; attributes;). Records can be collected to form a particular subject, and arranged into tables. A table is typically shown like a spreadsheet and lists the records would along the left and lists the fields along the top. A collection of tables forms a database. EG:
I just got an offer to order 7 books for a penny. The postcard I need to send to the book club is a record. That record contains a number of fields, such as: my name, address, city, state, zip code, and the titles of the 7 books. When the book club gets all their postcards back, they can enter all the data to in one ore more tables in their database.
Data in a database is collected by subject (EG: Customer, Product, Sale, or Order.) into tables (aka files, relations, subjects, or entities).

Kernel (aka master or regular). Kernel tables exist on their own. It does not provide descriptive information to other tables. Kernel tables are the heart of the database. EG:
Bridge (aka connector, junction, middle, or associative). Bridge tables tie other tables together. This is the middle table in a many-to-many relationship. EG:A "Product" table is a typical kernel table.
A "Product" kernel table is connected to a "Order" table via a "ProductOrder" table since a product may be in many orders and an order can have many products.
Extension. Extension tables have a one-to-one relationship with a kernel table and may be a sub-set of its related kernel table. EG:
A "Product" table may have a "ProductNew" table just for new products.
Look up (aka validation). Look up tables have a look up key (which is often the primary key of the look up table) that is used to match foreign keys in a foreign table. The foreign table matches its foreign key in the look up table's primary key where it gets more info. EG:
A "Product" (foreign) table may have a "Supplier" (look up) table that describes the supplier of each product.
However, a table may act as both a foreign and a look up table. EG:
"Product" is a foreign table to the "Supplier" table, but "Product" is also a lookup table to the "ProductOrder" table.
Reference tables are look up tables that change very infrequently, eg a table of US states.
SELECT Product.Name
FROM Product INNER JOIN ProductNew
ON Product.ID = ProductNew.ProductID
--The base tables are 'Product' and 'ProductNew'.
In a table, the horizontal rows (aka records, tuples, or instances) are records for each instance in the subject. Each record in the subject will have the same attributes but possibly different values for each attribute.
In a table, the vertical columns (aka fields or attributes) are the different attributes (fields) that all rows or records in the table possess. There are two kinds of fields:
In a table the array of cells (aka domain) formed by the rows and columns hold the data of that describe the value for each attribute of each record in that table. A cell is an intersection of each row/record/instance with a column/field/attribute. EG:
In a database for a bookstore, the different subjects are entitled books, customers, and invoices. For the table called customers, the fields are name, address, and phone number. For the customers table, there are records are for Sue, George, and Amy. For the customers table, the identifier field is called ID. The George's record has a cell value of 773-588-2300 for the phone field.
Bookstore database Books table Customers table Authors table
Customers table ID field Name field Phone field Sue record 1 Sue 773-123-4567 George record 2 George 773-588-2300 Amy record 3 Amy 773-987-6543
Historically database theories were tied to the physical methods used to store the data—hence the terms files, fields, and records. These physical terms were transformed into conceptual terms—hence the terms tables, columns, and rows respectively. In E.F. Codd's breakthrough works on databases he used terms that freed the data from the physical structure—hence the terms relations, attributes, and tuples respectively. Note that this is the origin of "relational" in relational databases, as opposed to the dependency relations discussed next!
The way data in one table is related to data in another table is described as a relationship (aka dependency). There are three basic kinds of relationships:
Employees can have zero or one spouse, so an employee table can have a one-to-one relationship with a spouse table.
An author may have many books, so an author table may can have a one-to-many relationship with a books table.
An author may have many books, but a book can also have many authors. So an authors table and a books table can each have a one-to-many relationship with a third table, the author_books table. The three tables together form a many-to-many relationship.
Many MDBSs have a relationships view, where relationships can be made by dragging-and-dropping fields between tables.
If relationships are in place, then most MDBSs will automatically implement the same relationships when you make new queries. Also some MDBSs can enforce referential integrity. EG:
If two tables are related, then a change in the value of a look up key will also be changed in the foreign key.
Over normalization can be counter productive, especially if too joins are required to get useful information. Denormalization, usually by making redundant data, may be required to improve system performance.
The five normal forms were formulated by C. J. Date in his 1974 book An Introduction to Database Systems, although E.F. Codd of IBM was the first to present a relational model in 1969 with his 12 rules.
"A relation R is in the first normal form (1NF) if and only if all underlying domains contain atomic values only."
In other words, each cell in a table can only contain one value. Or some say that a row cannot have duplicate columns and a column cannot have more than 1 value. EG:
If a table has a field "CustomerID", then a record with multiple customers in the field "CustomerID" would violate this rule. The usual solution would be to create another table with records for the different customers.
"A relation R is in second normal from (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key."
In other words, every non-key field in a table is entirely dependent on the entire primary key. EG:
If a table has a composite primary key of "CustomerID" and "ProductID", then a field like "Product Description" violates this rule because it is dependent on only part of the primary key. The usual solution is to move the offending field to another table.
"A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key."
In other words, every field in a table depends only on the primary key and does not depend on another non-key field. EG:
If an "Orders" table has a primary key of "OrderID" and non-key fields of "CustomerID", "ProductID", and "Address", then "Address" violates this rule because it is dependent on the non-key field of "CustomerID". The usual solution is to move the offending field to another table.
"A relation R is in fourth normal form (4NF) if and only if, whenever there exists an MVD [MultiValued Dependence] in R, say A-->B, then all attributes of R are also functionally dependent on A (i.e., A-->X for all attributes X of R)."
In other words, if a table is the "many" table of a one-to-many relationship, then all of its fields are tied to that one-to-many relationship and it cannot be the "many" table to other unrelated one-to-many relationships and have non-key fields connected to that independent one-to-many relationship. EG:
If a customer has many addresses and many orders, then to put the "Customers" table in a one-to-many relationship with an "AddressesAndOrderses" violates this rule. The usual solution would be to create separate tables for the independent one-to-many relationships, i.e. create an "Addresses" table and an "Orders" table.
"A relation R is in fifth normal form (5NF) -- also called projection-join normal form (PJ/NF) -- if and only if every dependency in R is implied by the candidate keys of R."
In other words, if a table has a field with values that are often repeated, then the values should be replaced with keys that can be related to a look up table. EG:
If a "Popsicle" table has a field "Color", with many instances of "cherry" and "aqua", then the table violates this rule. The solution would be to make the "Color" field a look up key, and replace the values of "cherry" and "aqua" with the key values of "red" and "blue". Those keys could then be related to a look up table called "Seasonal", where the foreign keys of "red" and "blue" can be translated back to "cherry" and "aqua".
This would enable all values of "cherry" to be replaced with "strawberry" by making a single change in the "Seasonal" table, instead of changing every instance of "cherry" in the "Popsicle" table.
There are rules of thumb and variations on the rules of normalization that accomplish roughly the same goals:
In addition to the basic table structures (one-to-one, one-to-many, and many-to-many), there are other common data structures that are repeated in different databases.
Tree (aka Parent-Child). A brilliant and simple way to store a tree of data in a single table is to use an ID for each row and also to have each row identify its parent row. EG:
| Family Table | ||
|---|---|---|
| FamilyID | ParentID | Name |
| 1 | 1 | Grandfather |
| 2 | 1 | Son |
| 3 | 1 | Daughter |
| 4 | 2 | Grandson via Son |
| 5 | 2 | Grandson via Son |
| 6 | 3 | Granddaughter va Daughter |
These are variations where tables may have a column that may reference different tables.
Normally there are three kinds of relationships. Each can be rewritten to show that it is a t1 table, i.e. a table where each column references one other table.
A tn table is a table that has a column that may reference different tables. This is usually done by a pair of columns like TableName and TableID, or a single column like TableName. tn tables enable other kinds of relationships especially since some of the relationships are not commutative. I am calling structures with tn tables "shared" structures.
I will not do an example of each kind of shared structure, but I will do a few.
| Category Table | ||||
|---|---|---|---|---|
| CategoryID | TableName | Name | CatValue | NonNameOrder |
| 1 | Car | Type | SUV | 3 |
| 2 | Car | Type | Minivan | 2 |
| 3 | Car | Type | Passenger | 1 |
| 4 | Car | Color | Red | 2 |
| 5 | Car | Color | Blue | 1 |
| 6 | Boat | Power | Sail | 1 |
| 7 | Boat | Color | Green | 2 |
| 8 | Car | Type | Convertible | 4 |
| Car Table | ||||
|---|---|---|---|---|
| CarID | TypeCatID | ColorCatID | Name | NonNameOrder |
| 1 | 1 | 4 | SUV in Red | 1 |
| 2 | 2 | 4 | Minivan in Red | |
| 3 | 3 | 4 | Passenger in Red | |
| 4 | 1 | 5 | SUV in Blue | 2 |
| 5 | 2 | 5 | Minivan in Blue | |
| 6 | 3 | 5 | Passenger in Blue | |
| 7 | 8 | 4 | Convertible in Red | |
| Phone Table | |
|---|---|
| Columns | EGs |
| PhoneID | 1 |
| Table | Employee |
| TableID | 3 |
| Phone | 773-588-2300 |
| Description | Fax |
| Employee Table | Customer Table | Address Table | ||
|---|---|---|---|---|
| EmployeeID | CustomerID | AddressID | ||
| Name | Name | Address |
| Addressed Table | |||
|---|---|---|---|
| AddressedID | TableName | TableID | AddressID |
| 1 | Employee | 1 | 1 |
| 2 | Employee | 2 | 1 |
| 3 | Employee | 3 | 3 |
| 4 | Customer | 1 | 1 |
| 5 | Customer | 2 | 1 |
| 6 | Customer | 3 | 3 |
| 7 | Customer | 4 | 7 |
If you are given a date, you should be able to pull all rows that cover that date. Simply provide columns for DateEffective and DateRetired. If the DateEffective is NULL, then the row item has been effective since the beginning of time. If the DateRetired is NULL, then the row item is current. An item that has never been active should not be in the table.
The statement below gets all fee schedules that are current for a given date. Usually a variable would be used instead of a literal date value.
SELECT FS.Name
FROM FeeSchedule AS FS
WHERE (
(FS.DateEffective IS NULL AND FS.DateRetired IS NULL) OR
(FS.DateEffective IS NULL AND FS.DateRetired >= '10/1/01') OR
(FS.DateEffective <= '10/1/01' AND FS.DateRetired IS NULL) OR
(FS.DateEffective <= '10/1/01' AND FS.DateRetired >= '10/1/01')
)
This is a black box table for storing nearly any kind of simple information. However it might be simpler to store variable datatypes simply as a type of string since a string can store dates, integers, floats, etc. (I do this in Survey below).
| Event Table | EventType Table | |
|---|---|---|
| EventID | EventTypeID | |
| EventTypeID | EventCatID | |
| PertinentStaffID | EventName | |
| EntryStaffID | NonNameOrder | |
| EntryStamp | IsMenu | |
| StartStamp | RelatedFile | |
| EndStamp | ||
| EventString | EventStringNote | |
| EventFloat | EventFloatNote | |
| EventStamp | EventStampNote | |
| Note | Note |
Respondents take surveys. A survey has sections with questions. A question has one or more choices. Some questions allow 0, 1, n, or * answers. A respondent can take a survey 0, 1, n, or * times but it would be better to just give them a new survey, or if they re-take it, then just take the latest answer.
Respondent 1:* Take *:1 Survey 1:* Section
*:1 1:*
Answer *:1 Choice *:1 Question
create table Answer( AnswerID int identity primary key, ChoiceID int, TakeID int, AnsDatatype varchar(3), AnsActualValue Txt varchar(512), AnsDtm datetime) create table Choice( ChoiceID int primary key, QuestionID int, DescriptionHTML varchar(256), SortKey varchar(4), ChoiceDatatype varchar(3), ChoiceFixedValue varchar(512)) create table Question( QuestionID int primary key, SurveyID int, NameHTML varchar(256), DescriptionHTML varchar(1024), SortKey varchar(4), AnswersAllowed varchar(2) --0, 1, n, or * answers) create table Respondent( RespondentID varchar(64) primary key, Name varchar(256), NameFirst varchar(128), NameLast varchar(128), Note varchar(512)) create table Section( SectionID int identity primary key, SeurveyID, NameHTML varchar(256), DescriptionHTML varchar(1024), SortKey varchar(4)) create table Survey( SurveyID int identity primary key, NameHTML varchar(256), DescriptionHTML varchar(1024) DateActivated datetime, TakesAllowed varchar(2), Note varchar(512)) create table Take( TakeID int identity primary key, RespondentID varchar(64), SurveyID int, DateTaken datetime, TakenByMeans varchar(64), --web, postal, etc EnteredByWhom(256), --usually the Respondent RespondentComment varchar(1024), EnteredByComment varchar(1024))
Page Modified: (Hand noted: 2007-10-06 06:27:21Z) (Auto noted: 2007-11-17 06:47:56Z)