DATABASE NOMALIZATION

From a previous post we talked about MySQL databases. There we create databases and then create tables inside database. When creating a table we have to think about some problems which will occur in reality. If data are repeated or if it contains any multivalued attributes then there will be some anomalies as below,

  1. Insertion Anomaly : Think about an example where in an organization they keep information about the employees in following manner.
    • employee-id
    • employee-name
    • department-name
    • department-id
    • In this scenario if we try to enter data to the table we have to enter department information as well. If an Intern joins the organization and if he is not assigned to any department then there will be a insertion anomaly.
  2. Update Anomaly : If there exists any repeating groups such as the department name and department id in the previous example, if we want to change the name of the department name, then we have to find all the repeating elements and then change it. If we missed one or two to update then it will make a great trouble one day.
  3. Deletion Anomaly : If there exists a department where only one employee is working and if that only one employee is resigned from the job, then we have to remove him from the table. But this will lead that department to be removed from the table as well. Then there will be no entry about the department as well. .

So as a solution for these anomalies we can suggest Normalization. In normalization what we do is try to identify those anomalies and then remove them to make our data more reliable and efficient. There are few Normalization forms as well.

  1. 1st Normal Form
  2. 2nd Normal Form
  3. 3rd Normal Form
  4. Boyce-cod Normal Form
  5. 4th Normal Form
  6. 5th Normal Form

1st NORMAL FORM

In the first normal form we try to identify any repeating groups and multivalued attributes existing in the table and then remove them. To apply the 1st normal form the table should be in the Unnormalized form. Then we can apply the 1st normal form. After the 1st normal form their won’t be any multivalued attributes or any repeating groups in the table.

example :

  • There is a table called ‘Project’ with following column names.
    • Project_code
    • Project_name
    • Project_budget
    • Employee_name
    • Employee_id

Here as all the project details and the employee details are on the same table and a project can have multiple employees assigned to it, Project_code, Project_name, Project_budget can be repeated. So this table is in unnormalized form. Let’s make it to the 1st normal form.

After 1st normalization,

  • Project
    • Project_code
    • Project_name
    • Project_budget
  • Employee
    • Employee_name
    • Employee_id
    • Project_code

We can get two tables and use the Project_code as a foreign key to the Employee table in order to keep the relationship.

2nd NORMAL FORM

In the 2nd Normal Form we talk about the Partial Dependency. Partial Dependency means there are two or more keys in a table and all the non key attributes are depending on a single key. Then there is a problem. So to apply the 2nd Normal Form we have assure that the table is in 1st Normal Form. Then we can remove any partial dependencies if available.

example :

In the above example after we creating the Employee table, now there are two keys. Employee_id and the Project_code. But the non key attribute Employee_name depends only on the Employee_id. So there is a partial dependency in this table. So let’s apply 2nd Normal Form.

  • Project
    • Project_id
    • Project_name
    • Project_budget
  • Employee
    • Employee_id
    • Employee_name
  • Employee-Project
    • Employee_id
    • Project_id

Here I have created another table and put Employee_id and Project_id to it. Now there is no partial dependencies in the tables.

3rd NORMAL FORM

When a table is in 2nd Normal Form, we can apply the 3rd Normal Form. What we do here is, remove the Transitive Dependencies if available. Transitive Dependency means if any non key attribute depends on another non key attribute, it is known as transitive dependency. We can identify any transitive dependencies if available and then remove them in this step.

example :

In the above example think we keep Employee_city and City_code as well. Here both are non key attributes. But the City_code depends on the Employee_city. So it is a transitive dependency. So we can remove it.

  • Employee
    • Employee_id
    • Employee_name
    • Employee_city
  • City
    • City_name
    • City_code

BOYCE-COD NORMAL FORM

This is much more advanced topic. This applies when all the attributes have a link to the other attribute. For an example think about a scenario where a student is doing some subject and has an extra class for that. But the class is assigned to a Instructor accordng to the subject.

  • Class
    • Student
    • Instructor
    • Subject

So there can be different types of combinations as below,

  • { Student – Instructor } { Instructor – Subject }
  • { Student – Subject } { Instructor – Student }

So we can say the table is in Boyce-cod normal form when the tables are in one of the above combinations.

4th Normal Form and the 5th Normal Form has not much to do with. A relation 4th Normal Form should not have any multivalued dependencies available. Fifth Normal Form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. 1st, 2nd, 3rd and Boyce-cod are the most important normal forms to deal with.

Hope you got a clear idea about Database Normalization concepts. See you soon with another valuable topic. Thank You!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s