5 How Do We Design a Database?

Database Design

With database design we are trying to organize the data in such a way that redundancy is minimized without losing any data. Redundancy is when we repeat the same data more than once. For example, each time a student registers for a class their name and address might be repeated for each class in which they are registered. Another way you might have repeating data is if you have a list of those same students and their addresses in a separate file listing parking passes or ticket violations. Say the student’s address changes, then changing the address would mean editing in different locations; not only in multiple places in registration, but also in a different file at the parking office. There is a greater chance of making a data entry mistake when you must edit in multiple places. It takes more time and resources as well. Having multiple copies of information also increases the chances for unauthorized access to data that should be secure.

Good design also means our data should be accurate and reliable. Setting up constraints on our data is a good way to ensure it has integrity and is consistent. We learned about what data integrity and data consistency were earlier and now we will see this in action as we design a database.

For relational database design, reducing redundancy means splitting the data up into separate entities or tables to minimize redundancy. But it is not always clear how to do this, especially with large ‘real-life’ databases. Database design can be complex. You may not be the person who will design a database, usually this is the database administrator, but knowing the proper way to design a database will help you understand the structure of the database you work with.

The Statement of Work

It is wise to start with determining what output is needed and work backwards from there to determine what data is even going to be used in the database. As you analyze the needs of the business you can determine what is needed for the database. This may require a lengthy time period to accomplish. Remember this process can begin with the Statement of Work (SOW) that was covered earlier. A SOW is a document that describes the work that needs to be done on a project. It can help clarify the project for both the users and the database designer.

Basic Steps for Proper Design

Relational databases model real-world environments. The designer must analyze the real-world system and design it into a relational database.

A data model organizes data and standardizes how the data relate to each other. The purpose of the model is to show the data needed and created by business processes. The data model determines the structure of data and the relationships and constraints of the data.

Let’s look at an example of designing for a system that uses books and their associated information as output. Here’s a flat file showing all the data that is needed for the system:

ISBN	Title	Author	Author Address	Publisher	Publisher Address
1-1111-1111-1	Intro to Databases	Sue Smith	123 Street	Acme Publishing	60 Main Street
2-2222-2222-2	Database Design	John Jones	44 Fourth Ave	Frontier Inc	25th N 700 E
3-3333-3333-3	Data Retrieval	Sue Smith	123 Street	Budd Publisher	345 Allen Blvd
4-4444-4444-4	Data Types	Tim, Thomas, John Jones	67 E Bird Blvd, 44 Fourth Ave	Acme Publishing	60 Main Street
5-5555-5555-5	Queries for Fun	Mary Miller	32 Launa Dr	Perfect Inc	309 Wish Way

Each book is listed with the international standard book number (ISBN), title, author, and publisher. For relational databases, one table in the database will represent one entity in the real world with each row being one instance of that entity and one column storing an attribute associated with that entity.

Step 1 - Each table should be only one real-world entity.

In the table above, there are 3 different entities: books, authors and publishers. In the table, we see some of the authors and publishers are repeated. Remember we want to eliminate redundancy.

In the table above, the first two columns belong to the real world entity of books. The next two columns belong to the entity of author, and the last two columns belong to the entity of publisher. If we separate the data into 3 tables, we can eliminate the redundancy and have Sue Smith and Acme Publishing listed only once.

ISBN	Title
1-1111-1111-1	Intro to Databases
2-2222-2222-2	Database Design
3-3333-3333-3	Data Retrieval
4-4444-4444-4	Data Types
5-5555-5555-5	Queries for Fun

Author	Author Address
Sue Smith	123 Street
John Jones	44 Fourth Ave
Tim Thomas	67 E Bird Blvd
Mary Miller	32 Launa Dr.

Publisher	Publisher Address
Acme Publishing	60 Main Street
Frontier Inc	25th N 700 E
Budd Publisher	345 Allen Blvd
Perfect Inc	309 Wish Way

Differentiating between entities and attributes within those entities, can be tricky sometimes. For example, maybe we find that we want to store both of the author’s home address and work address. Suddenly the address attributes of the author’s table would become a separate table called Author Address with a foreign key that relates to the author’s primary key. You will ultimately make these decisions based on what output is needed from the database. For now we will leave the entities as they are.

Step 2 - Identify primary and foreign keys.

The book entity already has a great natural key that uniquely identifies each book; the ISBN. But there is a possibility of two different authors having the same name or even publishing companies having the same name, so we need a unique primary key for each of those tables. Here they have been assigned surrogate keys for the primary keys. A surrogate key is a key that the system automatically assigns to each row to each row unique.

Author ID	Author	Author Address
22	Sue Smith	123 Street
23	John Jones	44 Fourth Ave
24	Tim Thomas	67 E Bird Blvd
25	Mary Miller	32 Launa Dr.

Publisher ID	Publisher	Publisher Address
001	Acme Publishing	60 Main Street
002	Frontier Inc	25th N 700 E
003	Budd Publisher	345 Allen Blvd
004	Perfect Inc	309 Wish Way

Let’s look at the relationships of the entities now, which will help us establish the foreign keys. The relationship between Publishers and Books is one to many. A book belongs to one publisher and a publisher can publish many books. The foreign key will be located in the table with the many part of the relationship. So the primary key of Publisher becomes a foreign key of Books. Notice how Acme Publishing, 001 is listed multiple times as a foreign key in the Book table. They are the publisher for two different books, “Intro to Databases” and “Date Types”.

ISBN	Publisher ID	Title
1-1111-1111-1	001	Intro to Databases
2-2222-2222-2	002	Database Design
3-3333-3333-3	003	Data Retrieval
4-4444-4444-4	001	Data Types
5-5555-5555-5	004	Queries for Fun

Publisher ID	Publisher	Publisher Address
001	Acme Publishing	60 Main Street
002	Frontier Inc	25th N 700 E
003	Budd Publisher	345 Allen Blvd
004	Perfect Inc	309 Wish Way

The Books and Authors tables have a many-to-many relationship; each book can have many authors, and each author can write many books. Therefore, there will be a linking table between the two tables to resolve the many-to-many into 2 one-to-many relationships. The linking table will have the foreign keys made up of each primary key of the other tables. So in our linking table we have a composite primary key, meaning two attributes together ensure each row is unique. Together these two foreign keys make up the primary key of the linking table. We have author ID 22 repeated a few times and ISBN 4-4444-4444-4 repeated a few times but when we put each foreign key together with the other foreign key they uniquely identify each row. No two rows have both those attributes together exactly the same. We can see the many-to- many being resolved here. Sue is the author of multiple books and the book ‘Data Types’ has two authors. And it is all designed properly now for a relational database.

ISBN	Publisher ID	Title
1-1111-1111-1	001	Intro to Databases
2-2222-2222-2	002	Database Design
3-3333-3333-3	003	Data Retrieval
4-4444-4444-4	001	Data Types
5-5555-5555-5	004	Queries for Fun

ISBN	Author ID
1-1111-1111-1	22
2-2222-2222-2	23
3-3333-3333-3	22
4-4444-4444-4	23
4-4444-4444-4	24
5-5555-5555-5	25

Author ID	Author	Author Address
22	Sue Smith	123 Street
23	John Jones	44 Fourth Ave
24	Tim Thomas	67 E Bird Blvd
25	Mary Miller	32 Launa Dr.

Step 3 - Make sure the data is normalized.

Separating the data into real-world entities and establishing relationships and reducing redundancy are is part of normalizing the data. But there are a standard set of steps to follow to ensure your data is normalized.

There are 7 normal forms that must be applied in order. Our course covers the first 3 normal forms.

First Normal Form (1NF) - The value stored at the intersection of each row and column must be a scalar value, and the table must not contain any repeating columns. Scalar means – being described as a single value that might belong to a number of fields, but itself is one value. The table should also have a primary key.

Second Normal Form (2NF) - Every non-key column must depend on the entire primary key. This normal form has to do with composite keys where each field in the row has to rely on both parts of the composite primary key.

Third Normal Form (3NF) - Every non-key column must depend only on the primary key. Like 2NF but now we are looking at only the one part primary key.

By following normalization rules (or normal forms) we reduce data redundancy. The attributes of each entity are organized in a way that increases data consistency.

Normalization can slow performance and sometimes data is denormalized to improve this performance. Denormalizing can ignore some normal forms if the organization see this as a benefit to the system.

5 How Do We Design a Database?

Database Design

The Statement of Work

Basic Steps for Proper Design

Step 1 - Each table should be only one real-world entity.

Step 2 - Identify primary and foreign keys.

Step 3 - Make sure the data is normalized.