The Definition of a Candidate Key

Database candidate keys sometimes become primary keys

A candidate key is a combination of attributes that uniquely identify a database record without referring to any other data. Each table may have one or more candidate. One of these candidate keys is selected as the table primary key. A table contains only one primary key, but it can contain several candidate keys. If a candidate key is composed of two or more columns, then it's called a composite key.

database
enot-poloskun / Getty Images

Properties of a Candidate Key

All candidate keys have some common properties. One of the properties is that for the lifetime of the candidate key, the attribute used for identification must remain the same. Another is that the value cannot be null. Lastly, the candidate key must be unique.

For example, to specifically and uniquely identify each employee, a company might use the employee's Social Security number. Some people share the same first names, last names, and position, but no two people use the same Social Security number. 

Social Security Number First Name Last Name Position
123-45-6780 Craig Jones Manager
234-56-7890 Craig Beal Associate
345-67-8900 Sandra Beal Manager
456-78-9010 Trina Jones Associate
567-89-0120 Sandra Smith Associate

Examples of Candidate Keys

Some types of data readily lend themselves as candidates:

  • International Standard Book Numbers: ISBNs uniquely identify books and related media. The issuance of ISBNs is tightly regulated by industry gatekeepers and ISBNs are never re-used by publishers.
  • Bank account numbers: Most banks do not recycle account numbers.
  • Serial numbers: Although serial numbers aren't governed across industries, in the context of a single supplier, a serial number should always be unique.
  • Driver license numbers: Usually, these numbers are not duplicated. However, a person who moves from state to state can have more than one DL number.
  • National Provider ID: Physicians and other licensed medical providers each have at least one NPI that's unique to them, issued by the U.S. Department of Health and Human Services.

However, some types of information that might seem like good candidates actually prove problematic:

  • Phone numbers: Most carriers recycle phone numbers, and individual subscribers can use several phone numbers simultaneously. 
  • Universal Price Codes: UPCs are unique, but the owner of a UPC block can recycle products at will.
  • Medical record numbers: MRNs are generally issued on a hospital level, without any sort of national guidance about the structure and format of these identifiers.
  • Social Security Numbers: Although they're theoretically unique, SSNs do get recycled, and SSN fraud is common enough to make this identifier problematic across large data sets. (In the context of an employer that verifies SSNs, this challenge isn't a problem.)
Was this page helpful?