[Relational Theory] [Carol's HomePage]
Coloured Line

Finding Candidate Keys Using FDs

Coloured Line

Coloured Line

A candidate key uniquely identifies each row in a relational table.

To find a candidate key for a relation, we can use FDs to find a minimal set of attributes in the relation which functionally determines (uniquely identifies) all the attributes in the relation.
An attribute always functionally determines itself - trivial but true.


Coloured Line

Elmasri & Navathe's Algorithm

Elmasri and Navathe, in Fundamentals of Database Systems, give an algorithm for  Determining X+, the Closure of X under F 
where  X  is a subset of the attibutes in a given relation, and  F  is a set of functional dependencies specified for the given relation.

This algorithm allows us to find candidate keys using FDs.

If the closure of a set of attributes, X, in a relation is equal to the set of all the attributes in the relation, we know that  X  is a superkey of the relation.

If we cannot find a proper subset of  X  such that the closure of the proper subset is equal to the set of all the attributes in the relation, we know that  X  is a candidate key of the relation.


This algorithm is given as Algorithm 13.1 on page 370 of the First Edition; as Algorithm 12.1 on page 405 of the Second Edition; as Algorithm 14.1 on page 481 of the Third Edition; as Algorithm 10.1 on page 310 of the Fourth Edition; and as Algorithm 10.1 on page 353 of the Fifth Edition.

Algorithm <n>.1: Determining X+, the Closure of X under F  
X+ := X;
repeat
    oldX+ := X+;
    for each functional dependency YZ in F do
    if X+ ⊇ Y then X+ := X+ ∪ Z; 
  until (X+ = oldX+);

Coloured Line

Example Using Elmasri and Navathe's Algorithm

Problem:

We are given a relation

    R(A,B,C,D,E)

and the functional dependencies (FDs) amongst its attributes

    F = {A  -> B
        AC -> D
        B  -> E}

and asked to find the candidate key(s) of R.

Working:

I could apply the above algorithm to each and every subset of R, butI know that a *good* place to start looking for candidate keys is amongst the determinants in F.
A determinate is the LHS (Left Hand Side) of an FD.

If the closure of a determinant is equivalent to R (if it is not a proper subset of R) we know that the determinant is a superkey of R and may therefore be a candidate key of R.

Using  Algorithm <n>.1  to see if determinate A is a superkey

   
  X+ := {A}
  
  oldX+:= {A}
  for A  -> B,
    A  is a subset of X+
    so B is added to X+
  for AC -> D,
    AC is not a subset of X+
  for B  -> E,
    B  is a subset of X+
    so E is added to X+
  oldX+ != X+ repeat
  
  oldX+:= {A,B,E}
  for A  -> B,
    A  is a subset of X+, B is already in X+
  for AC -> D,
    AC is not a subset of X+
  for B  -> E,
    B  is a subset of X+, E is already in X+
  X+ = oldX+ stop
  
  X+ is {A}
	
  
	

  X+ is now {A,B}
	
  X+ is still {A,B}
	

  X+ is now {A,B,E}
	
	
  
	
  X+ is still {A,B,E}
	
  X+ is still {A,B,E}
	
  X+ is still {A,B,E}
  

The closure of {A} over F is {A,B,E} which is a proper subset of R.
Therefore, {A} is not a superkey of R and cannot be a candidate key for R.


Using  Algorithm <n>.1  to see if determinate AC is a superkey


  X+ := AC 
  
  oldX+:= {A,C}
  for A  -> B,
    A  is a subset of X+
    so B is added to X+
  for AC -> D,
    AC is a subset of X+
    so D is added to X+
  for B  -> E,
    B  is a subset of X+
    so E is added to X+
  oldX+ != X+ repeat
  
  oldX+:= {A,C,B,D,E}
  for A  -> B,
    A  is a subset of X+, B is already in X+
  for AC -> D,
    AC is not a subset of X+
  for B  -> E,
    B  is a subset of X+, E is already in X+
  X+ = oldX+ stop
	
  X+ is {A,C}
	

	

  X+ is now {A,C,B}

	
  X+ is now {A,C,B,D}


  X+ is now {A,C,B,D,E}
	


	
  X+ is still {A,C,B,D,E}
	
  X+ is still {A,C,B,D,E}

  X+ is still {A,C,B,D,E}
	
  

The closure of {A,C} over F is {A,B,C,D,E} which is not a proper subset of R.
{A,C} is a superkey for R and may be a candidate key for R.


Using  Algorithm <n>.1  to see if determinate B is a superkey


  X+ := B
  
  oldX+:= {B}
  for A -> B,
    A is a not a subset of X+ 
  for AC -> D,
    AC is not a subset of X+
  for B -> E,
    B is a subset of X+
    so E is added to X+
  oldX+ != X+ repeat
  
  oldX+:= {B,E}
  for A -> B,
    A is a subset of X+
  for AC -> D,
    AC is not a subset of X+
  for B -> E,
    B is a subset of X+,
    E is already in X+
  X+ = oldX+ stop
  

  X+ is {B}
	
  
	
  X+ is still {B}
	
  X+ is still {B}

	
  X+ is now {B,E}
	

	
	
  X+ is still {B,E}
	
  X+ is still {B,E}

	
  X+ is still {B,E}
	
  

The closure of {B} over F is {B,E} which is a proper subset of R.
Therefore, {B} is not a superkey of R and cannot be a candidate key for R.


Using  Algorithm <n>.1 , I have found one superkey {A,C} for R.

Connolly & Begg define a candidate key as

A superkey such that no proper subset is a superkey within the relation.

So, I need to check the proper subsets of the superkey I have found to see if any of these subsets are superkeys.

The proper subsets of {A,C} are

Examining this possible superkeys, I note that:

So I need to consider only {C} as a possible superkey.


Using  Algorithm <n>.1  to see if C is a superkey


  X+ := C
  
  oldX+:= {C}
  for A  -> B,
    A  is a not a subset of X+ 
  for AC -> D,
    AC is not a subset of X+
  for B  -> E,
    B  is not a subset of X+
  oldX+ = X+ stop
  

  X+ is {C}
	
  
	
  X+ is still {C}
	
  X+ is still {C}
	
  X+ is still {C}
	
 

The closure of {C} over F is {C} which is a proper subset of R.
Therefore, {C} is not a superkey of R and cannot be a candidate key for R.


I have discovered that no proper subsets of {A,C} are superkeys of R, so {A,C} is a candidate key for R.

Are there any other candidate keys for R?
I could use  Algorithm 12.1/14.1  for D and E, but I am exhausted.
I know that any attribute which is in none of the determinants in F cannot be a candidate key.

If you would like to see an exhaustive application of  Algorithm <n>.1 , you can find all the subsets of R and apply the algorithm to each subset.

I have found only one candidate key {A,C} which I must select as the primary key for R.

Therefore, the relational schema for R is

R (A, C, [pk]  B, D, E)

[Top]

Coloured Line

Carol Edmondson   <carol@cs.anu.edu.au>
URL: http://computer/people/Carol.Edmondson/theory/FindingCandidateKeys.shtml
Last modified: 18 September 2006 18:07:21 EST