Tuesday 11 March 2014

Database Design Concepts-Theory of Normalization.

                                   Database Design Concepts-Theory of Normalization

Objectives:
  • What is Normalization and what is the need of Normalization in Relational Database Designing?
  • Normal Forms
Concept of normalization(also called Functional Dependency), was introduced by Professor Codd,in  1970.
What is Normalization? 

A relational database which has not been normalized may suffer from some or all of the below problems:
  1. Data redundancy
  2. Data Loss with increasing data size.
  3. Anomalies :Insertion, Update and Deletion.
Insertion Anomaly means that we cannot insert some data in the database.
Update Anomaly means that  we have data redundancy in the database and to make a modification we have to modify at various places in the database.
Deletion Anomaly means by deleting some particular data causes loss of another data.
Normalization is used to avoid or eliminate the three types of anomalies (insertion, deletion and update anomalies) which a database may suffer from.
To perform normalization the relation is converted into different normal forms.
First Normal Form:
A relation or table is called in first normal form if all it's attributes  are simple or none of the attribute is a relation in itself.
 
In other words, the relation should not have a composite attribute.
This normal form is called as FLAT-FILE  form also. 
Consider the below Student_Courses relation:
Student_Courses (Stud_id, Stud_name, Stud_phone, Stud_courses_taken)
Stud_id
Stud_name
Stud_phone
Stud_courses_taken
100
Deep
487 2454
AJ129-Advanced Java-12-A
200
Rachna
671 8120
DC123-Dataabse Concepts-3-B
300
Shreya
871 2356
OS123-Operating Systems-8-A
400
Bhoj
871 2356
OS123-Operating Systems-6-C
100
Deep
487 2454
DC123-Dataabse Concepts-5-A
200
Rachna
671 8120
AJ129-Advanced Java-8-B
600
Ajay
367 3452
ME567-Mechanical Engineering-9-B
  
If you observe the relation(table) carefully, then you will find that the attribute Stud_courses_taken is a composite attribute and further analysis will, explain that it's a another relation in itself i.e.
Stud_courses_taken is a relation consisting of the below attributes:
Course_id, Course_name, Course_duration and Course_Grade 
Now if we check the Student_courses relation for the data redundancy and anomalies then the below observations are found:
Data Redundancy : The Stud_name Deep appears more than one times in the same table, and the same case is for the course Operating Systems.  
Insertion Anomaly : If we want to add a new course to the database we cannot add it until a student has opted for it.
Update Anomaly : If we want to update the name of the course Database Concepts to Advanced Database Concepts we have to update it at various places.
Deletion Anomaly : If we want to delete the course Mechanical Engineering, then we will also lose a student record for Ajay.
So, our relation Student_courses suffer from all the problems and it has also an attribute Stud_courses_taken, which is composite, hence we can say the relation is not in first normal form.
To convert the Student_courses relation to first normal form we have to break the composite attribute into simple attributes and the relation Student_courses will be as below:
Student_Courses (Stud_id,Stud_name,Stud_phone,Course_id,Course_name,Course_duration,Course_grade)
Stud_id
Stud_name
Stud_phone
Couse_id
Course_name
Course_duration
Course_grade
100
Deep
487 2454
AJ129
Advanced Java
12
A
200
Rachna
671 8120
DC123
Databse Concepts
3
B
300
Shreya
871 2657
OS123
Operating Systems
8
A
400
Bhoj
871 2356
OS123
Operating Systems
8
C
100
Deep
487 2454
DC123
Databse Concepts
3
A
200
Rachna
671 8120
AJ129
Advanced Java
12
B
500
Ajay
367 3452
ME567
Mechanical Engineering
9
B
Now if we look to this new Student_courses relation then the primary key for this relation will be a composite primary key consisting of the two attributes Stud_id and Course_id, because neither
Stud_id nor Course_id, can individually define a tuple(row) for the relation.
But if we revisit the relation we find the below facts:
1)The attributes Stud_name and Stud_phone can be traced only using the Stud_id, or we can say that the attributes Stud_name and Stud_phone depends functionally on Stud_id
2)The attributes Course_name and Course_duration can be traced using only Course_id or we can say the attributes Course_name and Course_duration depends functionally on Course_id.
3)The attribute Course_grade needs both Stud_id and course_id to get traced or the attribute Course_grade functionally depend on the Stud_id and Course_id.
A determinant is an attribute or a group of attributes that determine the value of other attributes.
Dependent Attribute is an attribute whose value is determined by a determinant attribute.
 
If we draw a Functional Dependency diagram for the relation Student_course, it will be as below

   
 
Attribute Course_grade is fully functionally dependent on the primary key (Stud_id, Course_id) because both parts of the primary keys are needed to determine Course_grade.
 
On the other hand both Stud_name, and Stud_phone attributes are not fully functionally dependent on the primary key, because only a part of the primary key namely Stud_id is needed to determine both Stud_name and Stud_phone.
 
Also attributes  Course_name and Course_durationn are not fully functionally dependent on the primary key because only Course_id is needed to determine their values.

The new relation Student_courses still suffers from all three anomalies for the following reasons:
 
1.The relation contains redundant data (Note Database Concepts as the course _name  for  DC123 appears in more than one place).
 
2. The relation contains information about two entities Student and course.
Following is the detail description of the anomalies that relation Student-courses suffers from.
  • Insertion anomaly: We cannot add a new course such as MF247 with course name Mainframes Technology to the database unless we add a student who to take the course.
  •  Update anomaly: If we change the course description for DC123 from Database Concepts to New Database Concepts we have to make changes in more than one place or else the database will be inconsistent. In other words in some places the course description will be New Database Concepts and in any place were we forgot to make the changes the description still will be Database Concepts.
  • Deletion anomaly: If student Ajay is deleted from the database we also loose information that we had on course ME567 with description Mechanical Engineering.
To resolve this we have to further normalize the relation.
 
Second normal form.
A first normal form relation is in second normal form if all its non-primary attributes are fully functionally dependent on the primary key.
 
Primary attributes are those attributes, which are parts of the primary key, and
Non-primary attributes do not participate in the primary key.
 
In Student_courses relation both Stud_id  and Course_id are primary attributes because they are components of the primary key.However attributes Stud_name, Stud_phone, Course_name, Course_duration and Stud_grade all are non primary attributes because non of them is a component of the primary key.

To convert Student_courses to second normal relations we have to make all non-primary attributes to be fully functionally dependent on the primary key.
 
To do that we need to project (that is we break it down to two or more relations) Student_courses table into two or more tables.

However projections may cause problems. To avoid such problems it is important to keep attributes, which are dependent on each other in the same table, when a relation is projected to smaller relations.

Following this principle and examination of last Student_courses relation indicate that we should divide Student_courses relation into following three relations:
Student- Stud_id(PK), Stud_name, Stud_phone.
courses- Course_id(PK), Course_name, Course_duration.
Student_Course_Grade- Stud_id(FK-Student Table) ,Course_id(FK-Courses Table).

Student


Stud_id
Stud_name
Stud_phone
100
Deep
487 2454
200
Rachna
671 8120
300
Shreya
871 2657
400
Bhoj
871 2356
500
Ajay
367 3452
Courses


Couse_id
Course_name
Course_duration
AJ129
Advanced Java
12
DC123
Databse Concepts
3
OS123
Operating Systems
8
ME567
Mechanical Engineering
9








Student_Course_Grade


Std_id
Course_id
Std_grade
100
IS380
A
100
IS416
B
200
IS380
B
200
IS416
B
200
IS420
C
300
IS417
A


All these three relations are in second normal form.
 
Examination of these relations shows that we have eliminated the redundancy in the database

Now relation Student contains information only related to the entity student, relation Courses contains information related to entity Courses only, and the relation Student_Course_Grade contains information related to the relationship between these two entity.

Further these three sets are free from all anomalies. Let's check out the same:

  • Insertion anomaly: Now a new Course such as MF247 with course name Mainframes Technology can be inserted to the table Course.Equally we can add any new students to the database by adding their Stud_id, Stud_name and Stud_phone to Students table. Therefore our database, which made up of these three tables does not suffer from insertion anomaly.
  • Update anomaly: Since redundancy of the data was eliminated no update anomaly can occur.To change the Course_name for DC123 only one change is needed in table Courses.
  • Deletion anomaly: the deletion of student Ajay from the database is achieved by deleting Ajay's records from both Student and Student_Course_Grade relations and this does not have any side effect because the course ME567 with description Mechanical Engineering remains untouched in the table Courses.
Third Normal Form
A second normal form relation is in third normal form if all non-primary attributes (that is attributes that are not parts of the primary key or of any candidate key) have non-transitivity dependency on the primary key.


Partial Functional Dependency – A non-key column is dependent on some, but not all the columns in a composite primary key.
Transitive Functional Dependency- A transitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column
.