How to use primary keys in SQL

0 Shares
0
0
0
0

Introduction

One of the valuable features of relational databases is the formatting of data in a well-defined structure. This structure is achieved by using tables with fixed columns, adhering to well-defined data types, and ensuring that each row has the same format. When you store data as rows in tables, it is equally important to be able to find and refer to them unambiguously. In Structured Query Language (SQL), this can be achieved with primary keys, which act as identifiers for individual rows in tables in a relational database.

In this tutorial, you will learn about primary keys and use several different types to identify unique rows in database tables. Using some sample datasets, you will create primary keys on single columns, multiple columns, and auto-incrementing sequential keys.

Prerequisites

To follow this guide, you need a computer running a SQL-based relational database management system (RDBMS). The instructions and examples in this guide have been validated using the following environment:

  • A server running Ubuntu 20.04, with a non-root user with administrative privileges and a firewall configured with UFW
  • MySQL is installed and secured on the server.
  • Basic introduction to executing SELECT queries to retrieve data from the database

Note: Please note that many RDBMSs use their own unique implementation of SQL. Although the commands mentioned in this tutorial work on most RDBMSs and primary keys are part of the SQL standard, some features are database-specific and therefore the exact syntax or output may differ if you test them on a system other than MySQL.

You will also need an empty database in which to create the tables using primary keys. We encourage you to follow the Connecting to MySQL and Setting Up a Sample Database section below for details on connecting to the MySQL server and creating the test database used in the examples in this guide.

Connecting to MySQL and setting up a sample database

In this section, you will connect to the MySQL server and create a database instance so that you can use the examples in this guide.

If your SQL database system runs on a remote server, SSH into your server from your local machine:

ssh sammy@your_server_ip

Then open the MySQL server prompt and replace Sami with your MySQL username:

mysql -u sammy -p

Create a database called primary_keys:

CREATE DATABASE primary_keys;

If the database was created successfully, you will receive the following output:

Output
Query OK, 1 row affected (0.01 sec)

To select the primary_keys database, run the following USE statement:

USE primary_keys;

You will get the following output:

Output
Database changed

Once you have selected a database, you can create sample tables within it. You are now ready to follow the rest of the guide and start working with primary keys in MySQL.

Introduction to Major Keys

Data in a relational database is stored in tables with a specific, uniform structure of individual rows. The table definition describes what columns there are and what types of data can be stored in the individual columns. This alone is sufficient to store information in the database and find it using various filter criteria using the WHERE clause. However, this structure does not guarantee that every row can be found unambiguously.

Imagine a database of all registered vehicles that are allowed to drive on public roads. The database contains information such as the make, model, year of manufacture, and paint color. However, if you were looking for a red Chevrolet Camaro made in 2007, you could find more than one. After all, car manufacturers sell the same cars to multiple customers. That’s why registered cars have license plates that identify each vehicle. If you searched for a car with the license plate OFP857, you can be sure that this criteria will find only one car. This is because, by law, license plates uniquely identify registered cars. In a relational database, such a piece of data is called a primary key.

Primary keys are unique identifiers found in a column or set of columns that can unambiguously identify each row in a database table. Several rules reflect the technical characteristics of primary keys:

  • A primary key must use unique values. If the primary key consists of more than one column, the combination of values in these columns must be unique across the entire table. Since the key is used to uniquely identify each row, it cannot appear more than once.
  • A primary key should not contain NULL values.
  • Each database table can only use one primary key.

The database engine enforces these rules, so if a primary key is defined on a table, you can trust that those attributes are correct.

In addition to these technical features, you should also consider the content of the data to decide what type of data is a good choice to become a primary key. Natural keys are identifiers that already exist in the data set, while surrogate keys are artificial identifiers.

Some data structures have primary keys that naturally occur in the data set, such as license plate numbers in a vehicle database or Social Security numbers in a U.S. citizen registry. Sometimes such identifiers are not a single value but a pair or combination of several values. For example, in a local city directory of houses, a street name or a street number alone cannot uniquely identify a house. There may be multiple houses on a street, and the same number can appear on multiple streets. But a pair of street names and street numbers can be considered a unique house identifier. Such naturally occurring identifiers are called natural keys.

However, often data cannot be uniquely identified by the values of a single column or a small subset of columns. Then, artificial primary keys are created, for example, using a sequence of numbers or randomly generated identifiers such as UUIDs. Such keys are called surrogate keys.

In the following sections, you will create natural keys based on one or more columns and create alternate keys in tables where a natural key is not an option.

Create a primary key on a column

In many cases, a data set naturally includes a column that can be used to uniquely identify rows in a table. In these cases, you can create a natural key to describe the data. Continuing from the previous example of the registered car database, imagine a table with the following structure:

Sample table
+---------------+-----------+------------+-------+------+
| license_plate | brand | model | color | year |
+---------------+-----------+------------+-------+------+
| ABC123 | Ford | Mustang | Red | 2018 |
| CES214 | Ford | Mustang | Red | 2018 |
| DEF456 | Chevrolet | Camaro | Blue | 2016 |
| GHI789 | Dodge | Challenger | Black | 2014 |
+---------------+-----------+------------+-------+------+

The first and second rows both describe a red 2018 Ford Mustang. You can't uniquely identify the car using the make and model of the car. The license plate is different in both cases and provides a good unique identifier for each row in the table. Since the license plate number is already part of the data, using it as the primary key creates a natural key. If you created the table without using a primary key on the License_plate column, you would run the risk of having a duplicate or empty page appear in the data set at some point.

Next, you create a table similar to the one above with the License_plate column as the primary key and the following columns:

  • License_plate: This column contains the license plate number, represented by the varchar data type.
  • Brand: This column represents the brand of the car, expressed using the varchar data type with a maximum of 50 characters.
  • Model: This column holds the car model, expressed using the varchar data type with a maximum of 50 characters.
  • Color: This column holds the color, expressed using the varchar data type with a maximum of 20 characters.
  • year: This column shows the year of manufacture of the car, expressed using the int data type to store numeric data.

To create the machines table, run the following SQL statement:

CREATE TABLE cars (
license_plate varchar(8) PRIMARY KEY,
brand varchar(50),
model varchar(50),
color varchar(20),
year int
);... */

The PRIMARY KEY statement follows the definition of the License_plate data type. When dealing with primary keys based on single columns, you can use the simplified syntax for creating the key and write the primary key in the column definition.

If the following output is printed, the table has been created:

Output
Query OK, 0 rows affected (0.00 sec)

After that, load the table with the sample rows provided in the example above by executing the following INSERT INTO operation:

INSERT INTO cars VALUES
('ABC123', 'Ford', 'Mustang', 'Red', 2018),
('CES214', 'Ford', 'Mustang', 'Red', 2018),
('DEF456', 'Chevrolet', 'Camaro', 'Blue', 2016),
('GHI789', 'Dodge', 'Challenger', 'Black', 2014);

The database will respond with a success message:

Output
Query OK, 4 rows affected (0.010 sec)
Records: 4 Duplicates: 0 Warnings: 0

You can now verify that the newly created table contains the expected data and format using the SELECT statement:

SELECT * FROM cars;

The output shows a table similar to the one at the beginning of the section:

Output
+---------------+-----------+------------+-------+------+
| license_plate | brand | model | color | year |
+---------------+-----------+------------+-------+------+
| ABC123 | Ford | Mustang | Red | 2018 |
| CES214 | Ford | Mustang | Red | 2018 |
| DEF456 | Chevrolet | Camaro | Blue | 2016 |
| GHI789 | Dodge | Challenger | Black | 2014 |
+---------------+-----------+------------+-------+------+

Next, you can check whether the primary key rules are guaranteed by the database engine. Try inserting a car with a duplicate license plate number by running the following:

INSERT INTO cars VALUES ('DEF456', 'Jeep', 'Wrangler', 'Yellow', 2019);

MySQL responds with an error message saying that the DEF456 tag results in a duplicate entry for the primary key:

Output
ERROR 1062 (23000): Duplicate entry 'DEF456' for key 'cars.PRIMARY'

Note: Under the hood, primary keys are implemented with unique indexes and share many of the same properties as indexes you might manually create for other columns on a table. Most importantly, primary key indexes also improve the performance of querying a table against the column on which the index is defined. For more information about using indexes for this purpose, see the How to Use Indexes guide in this tutorial.

Now you can be sure that duplicate license plates are not allowed. Next, check if a car with a blank license plate can be imported:

INSERT INTO cars VALUES (NULL, 'Jeep', 'Wrangler', 'Yellow', 2019);

This time the database will respond with another error message:

Output
ERROR 1048 (23000): Column 'license_plate' cannot be null

With these two rules enforced by the database, you can be sure that License_plate uniquely identifies each row in the table. If you query the table against each license plate, you can expect to return one row each time.

In the next section, you will learn how to use primary keys with multiple columns.

Creating a primary key across multiple columns

When one column is not enough to uniquely identify a row in a table, you can create primary keys that use more than one column.

For example, imagine a house registry where neither the street name nor the street number alone is sufficient to identify each house:

Sample table
+-------------------+---------------+-------------------+------+
| street_name | street_number | house_owner | year |
+-------------------+---------------+-------------------+------+
| 5th Avenue | 100 | Bob Johnson | 2018 |
| Broadway | 1500 | Jane Smith | 2016 |
| Central Park West | 100 | John Doe | 2014 |
| Central Park West | 200 | Tom Thompson | 2015 |
| Lexington Avenue | 5001 | Samantha Davis | 2010 |
| Park Avenue | 7000 | Michael Rodriguez | 2012 |
+-------------------+---------------+-------------------+------+

The street name Central Park West appears more than once in the table, as does the street number 100. However, no duplicate pairs of street name and street number can be seen. In this case, while neither column can be the primary key, the pair of those two values can be used to uniquely identify each row in the table.

Next, you create a table similar to the one shown above with the following columns:

  • street_name: This column represents the name of the street where the house is located, with a varchar data type limited to 50 characters.
  • street_number: This column holds the street number of the house, represented with the varchar data type. This column can store up to 5 characters. It does not use the numeric int data type because some street numbers may contain letters (e.g. 200B).
  • house_owner: This column contains the name of the house owner, represented by the varchar data type limited to 50 characters.
  • year: This column represents the year the house was built, represented with the int data type to store numeric values.

This time, the primary key uses both the street_name and street_number columns instead of just one. To do this, run the following SQL statement:

CREATE TABLE houses (
street_name varchar(50),
street_number varchar(5),
house_owner varchar(50),
year int,
PRIMARY KEY(street_name, street_number)
);

This time, unlike the previous example, the PRIMARY KEY statement appears below the column definitions. The PRIMARY KEY statement is enclosed in parentheses with two column names: street_name and street_number. This syntax creates a primary key in the houses table, which is located in two columns.

If the following output is printed, the table has been created:

Output
Query OK, 0 rows affected (0.00 sec)

After that, load the table with the sample rows provided in the previous example by executing the following INSERT INTO operation:

INSERT INTO houses VALUES
('Central Park West', '100', 'John Doe', 2014),
('Broadway', '1500', 'Jane Smith', 2016),
('5th Avenue', '100', 'Bob Johnson', 2018),
('Lexington Avenue', '5001', 'Samantha Davis', 2010),
('Park Avenue', '7000', 'Michael Rodriguez', 2012),
('Central Park West', '200', 'Tom Thompson', 2015);

The database will respond with a success message:

Output
Query OK, 6 rows affected (0.000 sec)
Records: 6 Duplicates: 0 Warnings: 0

You can now verify that the newly created table contains the expected data and format using the SELECT statement:

SELECT * FROM houses;

The output shows a table similar to the one at the beginning of the section:

Output
+-------------------+---------------+-------------------+------+
| street_name | street_number | house_owner | year |
+-------------------+---------------+-------------------+------+
| 5th Avenue | 100 | Bob Johnson | 2018 |
| Broadway | 1500 | Jane Smith | 2016 |
| Central Park West | 100 | John Doe | 2014 |
| Central Park West | 200 | Tom Thompson | 2015 |
| Lexington Avenue | 5001 | Samantha Davis | 2010 |
| Park Avenue | 7000 | Michael Rodriguez | 2012 |
+-------------------+---------------+-------------------+------+
6 rows in set (0.000 sec)

Now, let's check if the database allows rows that repeat street names and street numbers, but restricts duplicate full addresses from appearing in the table. Let's start by adding another house on Park Street:

INSERT INTO houses VALUES ('Park Avenue', '8000', 'Emily Brown', 2011);

MySQL responds with a success message because the address 8000 Park Avenue did not previously appear in the table:

Output
Query OK, 1 row affected (0.010 sec)

A similar result occurs when you add a house at 8000 Main Street and repeat the street number:

INSERT INTO houses VALUES ('Main Street', '8000', 'David Jones', 2009);

Once again, this inserts a new row correctly because the entire address is not repeated:

Output
Query OK, 1 row affected (0.010 sec)

However, add another house at 100 5th Avenue using the following INSERT statement:

INSERT INTO houses VALUES ('5th Avenue', '100', 'Josh Gordon', 2008);

The database responds with an error message, informing you that there is a duplicate entry for the primary key for the value pair 5th Avenue and 100:

Output
ERROR 1062 (23000): Duplicate entry '5th Avenue-100' for key 'houses.PRIMARY'

The database correctly enforces primary key rules with the key defined on a pair of columns. You can be sure that the full address, including the street name and street number, will not be duplicated in the table.

In this section, you created a natural key with a pair of columns to uniquely identify each row in the home table. But primary keys can't always be extracted from the data set. In the next section, you'll use artificial primary keys that don't come directly from the data.

Creating a sequential primary key

So far, you have created unique primary keys using columns from the sample dataset. But in some cases, data is inevitably duplicated, preventing the columns from being good unique identifiers. In these cases, you can create sequential primary keys using the generated identifiers. When your data requires you to create new identifiers to identify rows, the primary keys created on those artificial identifiers are called surrogate keys.

Imagine a list of book club members—an informal gathering where anyone can join without showing a government ID. It’s likely that people with similar names will join the club at some point:

Sample table
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| John | Doe |
| Jane | Smith |
| Bob | Johnson |
| Samantha | Davis |
| Michael | Rodriguez |
| Tom | Thompson |
| Sara | Johnson |
| David | Jones |
| Jane | Smith |
| Bob | Johnson |
+------------+-----------+

The names Bob Johnson and Jane Smith are repeated in the table. You would need to use an additional identifier to make sure who is who, and there is no way to uniquely identify the rows in that table. If you were keeping a list of book club members on paper, you could keep auxiliary identifiers to help distinguish people with similar names among the group.

You can do something similar in a relational database by using an additional column that holds generated, non-real IDs whose sole purpose is to uniquely isolate all rows in the table. Let's call it Member_id.

However, it is a pain to create such an identifier every time you want to add another book club member to the database. To solve this problem, MySQL provides a feature for auto-incrementing numeric columns, where the database automatically increments the column value by a sequence of integers.

Let's create a table similar to the one above. You'll add an additional auto-increment column (member_id) to hold an automatically assigned number for each club member. That automatically assigned number will serve as the primary key for the table:

Member_id: This column has an auto-incrementing numeric ID represented by the int data type.

first_name: This column contains the first name of the club members, represented by the varchar data type limited to 50 characters.

last_name: This column holds the last name of the club members, which is displayed with a varchar data type limited to 50 characters.

To create the table, run the following SQL statement:

CREATE TABLE club_members (
member_id int AUTO_INCREMENT PRIMARY KEY,
first_name varchar(50),
last_name varchar(50)
);

While the PRIMARY KEY statement appears after the column type definition, just like a single-column primary key, an additional attribute appears before it: AUTO_INCREMENT. It tells MySQL to automatically generate values for that column, if not explicitly provided, using an increasing sequence of numbers.

Note: The AUTO_INCREMENT property for column definitions is specific to MySQL. Other databases often provide similar methods for generating sequential keys, but the syntax varies between engines. If in doubt, we recommend consulting the official documentation for your RDBMS.

If the following output is printed, the table has been created:

Output
Query OK, 0 rows affected (0.00 sec)

After that, load the table with the sample rows provided in the example above by executing the following INSERT INTO operation:

INSERT INTO club_members (first_name, last_name) VALUES
('John', 'Doe'),
('Jane', 'Smith'),
('Bob', 'Johnson'),
('Samantha', 'Davis'),
('Michael', 'Rodriguez'),
('Tom', 'Thompson'),
('Sara', 'Johnson'),
('David', 'Jones'),
('Jane', 'Smith'),
('Bob', 'Johnson');

The INSERT statement now includes a list of column names (first_name and last_name), which ensures that the database knows that the Member_id column is not provided in the dataset, so the default value for it should be taken instead.

The database will respond with a success message:

Output
Query OK, 10 rows affected (0.002 sec)
Records: 10 Duplicates: 0 Warnings: 0

Use the SELECT statement to verify the data in the newly created table:

SELECT * FROM club_members;

The output shows a table similar to the one at the beginning of the section:

Output
+-----------+------------+-----------+
| member_id | first_name | last_name |
+-----------+------------+-----------+
| 1 | John | Doe |
| 2 | Jane | Smith |
| 3 | Bob | Johnson |
| 4 | Samantha | Davis |
| 5 | Michael | Rodriguez |
| 6 | Tom | Thompson |
| 7 | Sara | Johnson |
| 8 | David | Jones |
| 9 | Jane | Smith |
| 10 | Bob | Johnson |
+-----------+------------+-----------+
10 rows in set (0.000 sec)

However, this time, the Member_id column appears in the result and contains a sequence of numbers from 1 to 10. With this column, the duplicate rows for Jane Smith and Bob Johnson are no longer recognizable because each name is associated with a unique identifier (Member_id).

Now, let's check if the database allows us to add another Tom Thompson to the club's membership list:

INSERT INTO club_members (first_name, last_name) VALUES ('Tom', 'Thompson');

MySQL will respond with a success message:

Output
Query OK, 1 row affected (0.009 sec)

To check what numeric ID the database has assigned to the new entry, run the SELECT query again:

SELECT * FROM club_members;

There is one more row in the output:

Output
+-----------+------------+-----------+
| member_id | first_name | last_name |
+-----------+------------+-----------+
| 1 | John | Doe |
| 2 | Jane | Smith |
| 3 | Bob | Johnson |
| 4 | Samantha | Davis |
| 5 | Michael | Rodriguez |
| 6 | Tom | Thompson |
| 7 | Sara | Johnson |
| 8 | David | Jones |
| 9 | Jane | Smith |
| 10 | Bob | Johnson |
| 11 | Tom | Thompson |
+-----------+------------+-----------+
11 rows in set (0.000 sec)

A new row was automatically assigned the number 11 in the Member_id column via the AUTO_INCREMENT attribute of the database.

If the data you are working with does not have natural candidates for primary keys, and you do not want to come up with invented identifiers every time you add new data to the database, you can safely rely on sequentially generated identifiers as primary keys.

Result

By following this guide, you learned what primary keys are and how to create common types in MySQL to identify unique rows in database tables. You created natural primary keys, created primary keys that spanned multiple columns, and used auto-incrementing sequential keys where natural keys do not exist.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like