Database Design Concepts-Theory of Normalization
Objectives:
A relational database which has not been normalized may suffer from some or all of the below problems:
Second normal form.
Following this principle and examination of last Student_courses relation indicate that we should divide Student_courses relation into following three relations:
Objectives:
- What is Normalization and what is the need of Normalization in Relational Database Designing?
- Normal Forms
A relational database which has not been normalized may suffer from some or all of the below problems:
- Data redundancy
- Data Loss with increasing data size.
- Anomalies :Insertion, Update and Deletion.
Update Anomaly means that we have data redundancy in the database and to make a modification we have to modify at various places in the database.
Deletion Anomaly means by deleting some particular data causes loss of another data.
Normalization is used to avoid or eliminate the three types of anomalies (insertion, deletion and update anomalies) which a database may suffer from.
To perform normalization the relation is converted into different normal forms.
First Normal Form:
A relation or table is called in first normal form if all it's attributes are simple or none of the attribute is a relation in itself.
In other words, the relation should not have a composite attribute.
This normal form is called as FLAT-FILE form also.
Consider the below Student_Courses relation:
Student_Courses (Stud_id, Stud_name, Stud_phone, Stud_courses_taken)
Stud_id
|
Stud_name
|
Stud_phone
|
Stud_courses_taken
|
100
|
Deep
|
487 2454
|
AJ129-Advanced Java-12-A
|
200
|
Rachna
|
671 8120
|
DC123-Dataabse Concepts-3-B
|
300
|
Shreya
|
871 2356
|
OS123-Operating Systems-8-A
|
400
|
Bhoj
|
871 2356
|
OS123-Operating Systems-6-C
|
100
|
Deep
|
487 2454
|
DC123-Dataabse Concepts-5-A
|
200
|
Rachna
|
671 8120
|
AJ129-Advanced Java-8-B
|
600
|
Ajay
|
367 3452
|
ME567-Mechanical Engineering-9-B
|
If you observe the relation(table) carefully, then you will find that the attribute Stud_courses_taken is a composite attribute and further analysis will, explain that it's a another relation in itself i.e.
Stud_courses_taken is a relation consisting of the below attributes:
Course_id, Course_name, Course_duration and Course_Grade
Now if we check the Student_courses relation for the data redundancy and anomalies then the below observations are found:
Data Redundancy : The Stud_name Deep appears more than one times in the same table, and the same case is for the course Operating Systems.
Insertion Anomaly : If we want to add a new course to the database we cannot add it until a student has opted for it.
Update Anomaly : If we want to update the name of the course Database Concepts to Advanced Database Concepts we have to update it at various places.
Deletion Anomaly : If we want to delete the course Mechanical Engineering, then we will also lose a student record for Ajay.
So, our relation Student_courses suffer from all the problems and it has also an attribute Stud_courses_taken, which is composite, hence we can say the relation is not in first normal form.
To convert the Student_courses relation to first normal form we have to break the composite attribute into simple attributes and the relation Student_courses will be as below:
Student_Courses (Stud_id,Stud_name,Stud_phone,Course_id,Course_name,Course_duration,Course_grade)
Stud_id
|
Stud_name
|
Stud_phone
|
Couse_id
|
Course_name
|
Course_duration
|
Course_grade
|
100
|
Deep
|
487 2454
|
AJ129
|
Advanced Java
|
12
|
A
|
200
|
Rachna
|
671 8120
|
DC123
|
Databse Concepts
|
3
|
B
|
300
|
Shreya
|
871 2657
|
OS123
|
Operating Systems
|
8
|
A
|
400
|
Bhoj
|
871 2356
|
OS123
|
Operating Systems
|
8
|
C
|
100
|
Deep
|
487 2454
|
DC123
|
Databse Concepts
|
3
|
A
|
200
|
Rachna
|
671 8120
|
AJ129
|
Advanced Java
|
12
|
B
|
500
|
Ajay
|
367 3452
|
ME567
|
Mechanical Engineering
|
9
|
B
|
Now if we look to this new Student_courses relation then the primary key for this relation will be a composite primary key consisting of the two attributes Stud_id and Course_id, because neither
Stud_id nor Course_id, can individually define a tuple(row) for the relation.
But if we revisit the relation we find the below facts:
1)The attributes Stud_name and Stud_phone can be traced only using the Stud_id, or we can say that the attributes Stud_name and Stud_phone depends functionally on Stud_id
2)The attributes Course_name and Course_duration can be traced using only Course_id or we can say the attributes Course_name and Course_duration depends functionally on Course_id.
3)The attribute Course_grade needs both Stud_id and course_id to get traced or the attribute Course_grade functionally depend on the Stud_id and Course_id.
A determinant is an attribute or a group of attributes that determine the value of other attributes.
Dependent Attribute is an attribute whose value is determined by a determinant attribute.
Attribute Course_grade is fully functionally dependent on the primary key (Stud_id, Course_id) because both parts of the primary keys are needed to determine Course_grade.
On the other hand both Stud_name, and Stud_phone attributes are not fully functionally dependent on the primary key, because only a part of the primary key namely Stud_id is needed to determine both Stud_name and Stud_phone.
Also attributes Course_name and Course_durationn are not fully functionally dependent on the primary key because only Course_id is needed to determine their values.
The new relation Student_courses still suffers from all three anomalies for the following reasons:
1.The relation contains redundant data (Note Database Concepts as the course _name for DC123 appears in more than one place).
2. The relation contains information about two entities Student and course.
Following is the detail description of the anomalies that relation Student-courses suffers from.
Following is the detail description of the anomalies that relation Student-courses suffers from.
- Insertion anomaly: We cannot add a new course such as MF247 with course name Mainframes Technology to the database unless we add a student who to take the course.
- Update anomaly: If we change the course description for DC123 from Database Concepts to New Database Concepts we have to make changes in more than one place or else the database will be inconsistent. In other words in some places the course description will be New Database Concepts and in any place were we forgot to make the changes the description still will be Database Concepts.
- Deletion anomaly: If student Ajay is deleted from the database we also loose information that we had on course ME567 with description Mechanical Engineering.
A first normal form relation is in second normal form if all its non-primary attributes are fully functionally dependent on the primary key.
Primary attributes are those attributes, which are parts of the primary key, and
Non-primary attributes do not participate in the primary key.
In Student_courses relation both Stud_id and Course_id are primary attributes because they are components of the primary key.However attributes Stud_name, Stud_phone, Course_name, Course_duration and Stud_grade all are non primary attributes because non of them is a component of the primary key.
To convert Student_courses to second normal relations we have to make all non-primary attributes to be fully functionally dependent on the primary key.
To do that we need to project (that is we break it down to two or more relations) Student_courses table into two or more tables.
However projections may cause problems. To avoid such problems it is important to keep attributes, which are dependent on each other in the same table, when a relation is projected to smaller relations.
Following this principle and examination of last Student_courses relation indicate that we should divide Student_courses relation into following three relations:
Student- Stud_id(PK), Stud_name, Stud_phone.
courses- Course_id(PK), Course_name, Course_duration.
Student_Course_Grade- Stud_id(FK-Student Table) ,Course_id(FK-Courses Table).
courses- Course_id(PK), Course_name, Course_duration.
Student_Course_Grade- Stud_id(FK-Student Table) ,Course_id(FK-Courses Table).
Student
Stud_id
|
Stud_name
|
Stud_phone
|
100
|
Deep
|
487 2454
|
200
|
Rachna
|
671 8120
|
300
|
Shreya
|
871 2657
|
400
|
Bhoj
|
871 2356
|
500
|
Ajay
|
367 3452
|
Courses
Couse_id
|
Course_name
|
Course_duration
|
AJ129
|
Advanced Java
|
12
|
DC123
|
Databse Concepts
|
3
|
OS123
|
Operating Systems
|
8
|
ME567
|
Mechanical Engineering
|
9
|
Student_Course_Grade
Std_id
|
Course_id
|
Std_grade
|
100
|
IS380
|
A
|
100
|
IS416
|
B
|
200
|
IS380
|
B
|
200
|
IS416
|
B
|
200
|
IS420
|
C
|
300
|
IS417
|
A
|
All these three relations are in second normal form.
Examination of these relations shows that we have eliminated the redundancy in the database
Now relation Student contains information only related to the entity student, relation Courses contains information related to entity Courses only, and the relation Student_Course_Grade contains information related to the relationship between these two entity.
Further these three sets are free from all anomalies. Let's check out the same:
Now relation Student contains information only related to the entity student, relation Courses contains information related to entity Courses only, and the relation Student_Course_Grade contains information related to the relationship between these two entity.
Further these three sets are free from all anomalies. Let's check out the same:
- Insertion anomaly: Now a new Course such as MF247 with course name Mainframes Technology can be inserted to the table Course.Equally we can add any new students to the database by adding their Stud_id, Stud_name and Stud_phone to Students table. Therefore our database, which made up of these three tables does not suffer from insertion anomaly.
- Update anomaly: Since redundancy of the data was eliminated no update anomaly can occur.To change the Course_name for DC123 only one change is needed in table Courses.
- Deletion anomaly: the deletion of student Ajay from the database is achieved by deleting Ajay's records from both Student and Student_Course_Grade relations and this does not have any side effect because the course ME567 with description Mechanical Engineering remains untouched in the table Courses.
A second normal form relation is in third normal form if all non-primary attributes (that is attributes that are not parts of the primary key or of any candidate key) have non-transitivity dependency on the primary key.
Partial Functional Dependency – A non-key column is dependent on some, but not all the columns in a composite primary key.
Transitive Functional Dependency- A transitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column.
Transitive Functional Dependency- A transitive dependency is a type of functional dependency in which the value in a non-key column is determined by the value in another non-key column.