Introduction
Many database designs separate information into different tables based on relationships between specific data points. Even in cases like this, there are likely times when someone will want to retrieve information from more than one table at a time.
A common way to access data from multiple tables in a Structured Query Language (SQL) operation is to combine tables with a JOIN clause. Based on the join operation in relational algebra, a JOIN clause combines separate tables by matching rows in each table that are related to each other. Typically, this relationship is based on a pair of columns—one from each table—that share values, such as the foreign key of one table and the primary key of another table that the foreign key references.
This guide explains how to build various SQL queries that include a JOIN clause. It also shows the different types of JOIN clauses, how to combine data from multiple tables, and how to alias columns to make writing JOIN operations less tedious.
Prerequisites
To follow this guide, you will need a computer that has some kind of relational database management system (RDBMS) that uses SQL. The instructions and examples in this guide have been verified using the following environment:
- A server running Ubuntu 20.04, with a non-root user with administrative privileges and a firewall configured with UFW
- MySQL is installed and secured on the server.
- You will also need a database with some tables loaded with sample data that you can use to practice using JOIN operations. We encourage you to follow the Connecting to MySQL and Setting Up a Sample Database section below for details on how to connect to the MySQL server and create the test database used in the examples in this guide.
Connecting to MySQL and setting up a sample database
If your SQL database system runs on a remote server, SSH into your server from your local machine:
ssh sammy@your_server_ip
Then open the MySQL server prompt and replace Sami with your MySQL username:
mysql -u sammy -p
Create a database called joinsDB:
CREATE DATABASE joinsDB;
If the database is created successfully, you will receive the following output:
Output
Query OK, 1 row affected (0.01 sec)To select the joinsDB database, run the following USE statement:
USE joinsDB;Output
Database changedAfter selecting joinsDB, create a few tables within it. For the examples used in this guide, imagine that you run a factory and you have decided to track information about your production line, sales team employees, and your company's sales in a SQL database. You are going to start with three tables, the first of which will store information about your products. You decide that this first table needs three columns:
- Product ID: The identification number of each product expressed as an int data type. This column acts as the primary key of the table, meaning that each value acts as a unique identifier for the corresponding row. Since each value in a primary key must be unique, this column also has a UNIQUE constraint applied to it.
- productName: The name of each product is expressed using the varchar data type with a maximum of 20 characters.
- Price: The price of each product, expressed using the decimal data type. This specifies that each value in this column is limited to a maximum of four digits in length, with two of those digits to the right of the decimal point. Therefore, the range of allowed values in this column is -99.99 to 99.99.
Create a table named products that has these three columns:
CREATE TABLE products (
productID int UNIQUE,
productName varchar(20),
price decimal (4,2),
PRIMARY KEY (productID)
);The second table stores information about the employees in your company's sales team. You decide that this table also needs three columns:
- empID: Similar to the productID column, this column contains a unique identification number for each employee in the sales team, expressed as an int data type. As such, this column will have a UNIQUE constraint and will act as the primary key for the team table.
- empName: The name of each salesperson expressed using the varchar data type with a maximum of 20 characters.
- productSpecialty: Each member of your sales team is assigned a product as a specialty. They can sell any product your company makes, but their overall focus is on whichever product they specialize in. To represent this in the table, you create this column that holds the productID value of each product that each employee specializes in.
To ensure that the productSpecialty column contains only values that represent valid product ID numbers, you decide to apply a foreign key constraint to the column that references the productID column of the Products table. A foreign key constraint is a way of expressing a relationship between two tables by requiring that values in the column it applies to must exist in the column it references. In the following CREATE TABLE statement, the FOREIGN KEY constraint requires that any value added to the productSpecialty column in the team table must exist in the productID column of the Products table.
Create a table called Team with these three columns:
CREATE TABLE team (
empID int UNIQUE,
empName varchar(20),
productSpecialty int,
PRIMARY KEY (empID),
FOREIGN KEY (productSpecialty) REFERENCES products (productID)
);
The last table you create will hold the company's sales records. This table will have four columns:
- saleID: Similar to the productID and empID columns, this column contains a unique identification number for each sale, expressed as an int data type. This column also has a UNIQUE constraint so that it can act as the primary key of the sales table.
- Quantity: The number of units of each product sold is expressed with the int data type.
- Product ID: The identification number of the product sold, expressed as an int.
- Salesperson: Identification number of the employee who made the sale.
As with the productSpecialty column from the team table, you decide to apply FOREIGN KEY constraints to both the productID and salesperson columns. This ensures that these columns only contain values that are present in the productID column of the products table and the empID columns of the team table, respectively.
Create a table called Sales with these four columns:
CREATE TABLE sales (
saleID int UNIQUE,
quantity int,
productID int,
salesperson int,
PRIMARY KEY (saleID),
FOREIGN KEY (productID) REFERENCES products (productID),
FOREIGN KEY (salesperson) REFERENCES team (empID)
);Then, load the Products table with sample data by executing the following INSERT INTO operation:
INSERT INTO products
VALUES
(1, 'widget', 18.99),
(2, 'gizmo', 14.49),
(3, 'thingamajig', 39.99),
(4, 'doodad', 11.50),
(5, 'whatzit', 29.99);Then load the team table with sample data:
INSERT INTO team
VALUES
(1, 'Florence', 1),
(2, 'Mary', 4),
(3, 'Diana', 3),
(4, 'Betty', 2);Load the sales table with sample data as well:
INSERT INTO sales
VALUES
(1, 7, 1, 1),
(2, 10, 5, 4),
(3, 8, 2, 4),
(4, 1, 3, 3),
(5, 5, 1, 3);Finally, imagine that your company makes a few sales without anyone on your sales team getting involved. To record these sales, run the following operation to add three rows to the Sales table that don't have a value for the Salesperson column:
INSERT INTO sales (saleID, quantity, productID)
VALUES
(6, 1, 5),
(7, 3, 1),
(8, 4, 5);With that done, you are ready to follow the rest of the guide and start learning how to join tables together in SQL.
Understanding the syntax of the JOIN operation
JOIN clauses can be used in a variety of SQL statements, including UPDATE and DELETE operations. However, for illustrative purposes, the examples in this guide use SELECT queries to demonstrate how JOIN clauses work.
The following example shows the general syntax of a SELECT statement that includes a JOIN clause:
SELECT table1.column1, table2.column2
FROM table1 JOIN table2
ON search_condition;This syntax begins with a SELECT statement that returns two columns from two separate tables. Note that since JOIN clauses compare the contents of more than one table, this example syntax specifies which table each column is to be selected from, preceding the column name with the table name and a period. This is known as a fully qualified column reference.
You can use fully qualified column references like these in any operation, but doing so is technically only necessary in operations where two columns from different tables have the same name. However, it is good practice to use them when working with multiple tables, as they can help with the readability and understanding of JOIN operations.
After SELECT comes the FROM clause. In any query, the FROM clause is where you define the data set that needs to be searched to return the data you want. The only difference here is that the FROM clause consists of two tables separated by the JOIN keyword. A useful way to think about writing queries is to remember to select the columns that will be returned from which table you want to query.
Then there is an ON clause that explains how the query should join the two tables by defining a search condition. A search condition is a set of one or more statements or expressions that can evaluate to "true", "false", or "unknown" for a particular condition. It can be useful to think of the JOIN operation as combining every row from both tables and then returning every row for which the search condition in the ON clause evaluates to "true".
In an ON clause, it usually makes sense to include a search condition that tests whether two related columns—such as the foreign key of one table and the primary key of another table that the foreign key references—have equal values. This is sometimes referred to as an equi join.
As an example of how equi can join matching data from multiple tables, run the following query using the sample data you added earlier. This joins the Products and Team tables with a search condition that tests for matching values in their respective productID and productSpecialty columns. It then returns the name of each sales team member, the name of each product they specialize in, and the price of those products:
SELECT team.empName, products.productName, products.price
FROM products JOIN team
ON products.productID = team.productSpecialtyدر اینجا مجموعه نتایج این پرس و جو است:
Output
+----------+-------------+-------+
| empName | productName | price |
+----------+-------------+-------+
| Florence | widget | 18.99 |
| Mary | doodad | 11.50 |
| Diana | thingamajig | 39.99 |
| Betty | gizmo | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)To illustrate how SQL combines these tables to form this result set, let's take a closer look at the process. To be clear, the following is not exactly how a database management system would join two tables, but it can be helpful to think of the JOIN operation as a procedure.
First, the query prints the products for each row and column in the first table in the FROM clause:
JOIN Process Example
+-----------+-------------+-------+
| productID | productName | price |
+-----------+-------------+-------+
| 1 | widget | 18.99 |
| 2 | gizmo | 14.49 |
| 3 | thingamajig | 39.99 |
| 4 | doodad | 11.50 |
| 5 | whatzit | 29.99 |
+-----------+-------------+-------+JOIN Process Example
+-----------+-------------+-------+-------+----------+------------------+
| productID | productName | price | empID | empName | productSpecialty |
+-----------+-------------+-------+-------+----------+------------------+
| 1 | widget | 18.99 | 1 | Florence | 1 |
| 2 | gizmo | 14.49 | 4 | Betty | 2 |
| 3 | thingamajig | 39.99 | 3 | Diana | 3 |
| 4 | doodad | 11.50 | 2 | Mary | 4 |
| 5 | whatzit | 29.99 | | | |
+-----------+-------------+-------+-------+----------+------------------+JOIN Process Example
+----------+-------------+-------+
| empName | productName | price |
+----------+-------------+-------+
| Florence | widget | 18.99 |
| Mary | doodad | 11.50 |
| Diana | thingamajig | 39.99 |
| Betty | gizmo | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)Using equi joins is the most common way to join tables, but other SQL operators such as <, >, LIKE, NOT LIKE, or even BETWEEN can be used in the ON clause search condition. However, be aware that using more complex search conditions can make it difficult to predict what data will appear in the result set.
In most implementations, you can join tables by any set of columns that have what the SQL standard calls a «qualified JOIN» data type. This means that, in general, you can join a column that holds numeric data to any other column that holds numeric data, regardless of the corresponding data types. Similarly, you can usually join any column that holds character values to any other column that holds character data. As mentioned earlier, the columns that you match to join two tables are usually columns that represent a relationship between the tables, such as a foreign key and the primary key of another table that it references.
Many SQL implementations allow you to join columns that have the same name with the USING keyword instead of ON. The syntax for such an operation might be:
SELECT table1.column1, table2.column2
FROM table1 JOIN table2
USING (related_column);In this example syntax, the USING statement is equivalent to ON table1.related_column = table2.related_column;.
Since sales and products each have a column called productID, you can join these columns by matching them with the USING keyword. The following statement does this and returns the saleID of each sale, the number of units sold, the name of each product sold, and its price. It also sorts the result set by the saleID value in ascending order:
SELECT sales.saleID, sales.quantity, products.productName, products.price
FROM sales JOIN products
USING (productID)
ORDER BY saleID;Output
+--------+----------+-------------+-------+
| saleID | quantity | productName | price |
+--------+----------+-------------+-------+
| 1 | 7 | widget | 18.99 |
| 2 | 10 | whatzit | 29.99 |
| 3 | 8 | gizmo | 14.49 |
| 4 | 1 | thingamajig | 39.99 |
| 5 | 5 | widget | 18.99 |
| 6 | 1 | whatzit | 29.99 |
| 7 | 3 | widget | 18.99 |
| 8 | 4 | whatzit | 29.99 |
+--------+----------+-------------+-------+
8 rows in set (0.00 sec)When joining tables, the database system sometimes arranges rows in ways that are not easy to predict. Including an ORDER BY clause like this can help make result sets more coherent and readable.
Joining more than two tables
There may be times when you need to combine data from more than two tables. You can join any number of tables by nesting JOIN clauses within other JOIN clauses. The following syntax is an example of what it would look like when joining three tables:
SELECT table1.column1, table2.column2, table3.column3
FROM table1 JOIN table2
ON table1.related_column = table2.related_column
JOIN table3
ON table3.related_column = table1_or_2.related_column;The FROM clause in this example statement begins by joining table 1 with table 2. After this ON clause, it begins a second JOIN that combines the initial set of joined tables with table 3. Note that the third table can be joined to a column in either the first or second table.
For example, imagine you want to know how much your employee's sales generated, but you only care about sales records that include sales of a product in which the employee specializes.
To obtain this information, you can run the following query. This query starts by joining the Products and Sales tables together by matching their respective productID columns. It then joins the Team table to the first two tables by matching each row in the initial JOIN to its productSpecialty column. The query then filters the results with a WHERE clause to return only rows where the matched employee is also the person who made the sale. The query also includes an ORDER BY clause that sorts the final results in ascending order by the value in the saleID column:
SELECT sales.saleID,
team.empName,
products.productName,
(sales.quantity * products.price)
FROM products JOIN sales
USING (productID)
JOIN team
ON team.productSpecialty = sales.productID
WHERE team.empID = sales.salesperson
ORDER BY sales.saleID;
Output
+--------+----------+-------------+-----------------------------------+
| saleID | empName | productName | (sales.quantity * products.price) |
+--------+----------+-------------+-----------------------------------+
| 1 | Florence | widget | 132.93 |
| 3 | Betty | gizmo | 115.92 |
| 4 | Diana | thingamajig | 39.99 |
+--------+----------+-------------+-----------------------------------+
3 rows in set (0.00 sec)All the examples so far have had one type of JOIN expression: an inner JOIN. Continue reading the next section for an overview of inner joins, outer joins, and their differences.
Inner vs. outer JOIN operations
There are two main types of JOIN statements: INNER joins and OUTER joins. The difference between these two types of joins is the data they return. Inner joins return only matching rows from each joined table, while OUTER joins return both matching and non-matching rows.
The syntax and sample queries from the previous sections all use INNER JOIN statements, even though none of them include the INNER keyword. Most SQL implementations treat every JOIN statement as an INNER join unless explicitly stated otherwise.
Queries that specify an outer JOIN combine multiple tables and return any rows that match, as well as any rows that don't. This can be useful for finding rows with missing values or in cases where a partial match is acceptable.
Outer join operations can be divided into three types: LEFT OUTER joins, RIGHT OUTER joins, and FULL OUTER joins. LEFT OUTER joins, or simply left joins, return every matching row from the two joined tables, as well as every non-matching row from the “left” table. In the context of a JOIN operation, the “left” table is always the first table specified immediately after the FROM keyword and to the left of the JOIN keyword. Similarly, the «right» table is the second table, or the table that comes immediately after the JOIN, and a RIGHT OUTER join returns every matching row from the joined tables, along with every non-matching row from the «right» table. FULL OUTER JOIN returns every row from both tables, including rows from either table that do not match.
To demonstrate how these different types of JOIN clauses return data, run the following example queries on the tables created in the previous section Connecting to and Setting Up a Sample Database. These queries are identical except that each specifies a different type of JOIN clause.
This first example uses an inner JOIN to combine the sales and team tables by matching the corresponding salesperson and empID columns. Again, the INNER keyword is implied even though it is not explicitly included:
SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales JOIN team
ON sales.salesperson = team.empID;Output +--------+----------+-------------+----------+ | saleID | quantity | salesperson | empName | +--------+----------+-------------+----------+ | 1 | 7 | 1 | Florence | | 4 | 1 | 3 | Diana | | 5 | 5 | 3 | Diana | | 2 | 10 | 4 | Betty | | 3 | 8 | 4 | Betty | +--------+----------+-------------+----------+ 5 rows in set (0.00 sec)
SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales LEFT OUTER JOIN team
ON sales.salesperson = team.empID;Output
+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName |
+--------+----------+-------------+----------+
| 1 | 7 | 1 | Florence |
| 2 | 10 | 4 | Betty |
| 3 | 8 | 4 | Betty |
| 4 | 1 | 3 | Diana |
| 5 | 5 | 3 | Diana |
| 6 | 1 | NULL | NULL |
| 7 | 3 | NULL | NULL |
| 8 | 4 | NULL | NULL |
+--------+----------+-------------+----------+
8 rows in set (0.00 sec)SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
FROM sales RIGHT JOIN team
ON sales.salesperson = team.empID;Note that the JOIN clause of this query reads RIGHT JOIN instead of RIGHT OUTER JOIN. Just as the INNER keyword is not required to specify an INNER JOIN clause, OUTER is implied whenever you write LEFT JOIN or RIGHT JOIN.
The result of this query is the opposite of the previous one, as it returns every row from both tables, but only unique rows from the “right” table:
Output
+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName |
+--------+----------+-------------+----------+
| 1 | 7 | 1 | Florence |
| NULL | NULL | NULL | Mary |
| 4 | 1 | 3 | Diana |
| 5 | 5 | 3 | Diana |
| 2 | 10 | 4 | Betty |
| 3 | 8 | 4 | Betty |
+--------+----------+-------------+----------+
6 rows in set (0.00 sec)Table and column aliases in JOIN clauses
When joining tables with long or very descriptive names, writing multiple fully qualified column references can become tedious. To avoid this, users sometimes find it useful to provide a shorter table or column name alias.
You can do this in SQL by following each table definition in the FROM clause with the AS keyword and then following it with the alias of your choice:
SELECT t1.column1, t2.column2
FROM table1 AS t1 JOIN table2 AS t2
ON t1.related_column = t2.related_column;This example syntax uses aliases in the SELECT clause, even though they are not defined before the FROM clause. This is possible because in SQL queries, the execution order starts with the FROM clause. This can be confusing, but it is useful to remember this and think about aliases before you start writing the query.
For example, run the following query that joins the Sales and Products tables, providing them with the aliases S and P, respectively:
SELECT S.saleID, S.quantity,
P.productName,
(P.price * S.quantity) AS revenue
FROM sales AS S JOIN products AS P
USING (productID);Output
+--------+----------+-------------+---------+
| saleID | quantity | productName | revenue |
+--------+----------+-------------+---------+
| 1 | 7 | widget | 132.93 |
| 2 | 10 | whatzit | 299.90 |
| 3 | 8 | gizmo | 115.92 |
| 4 | 1 | thingamajig | 39.99 |
| 5 | 5 | widget | 94.95 |
| 6 | 1 | whatzit | 29.99 |
| 7 | 3 | widget | 56.97 |
| 8 | 4 | whatzit | 119.96 |
+--------+----------+-------------+---------+
8 rows in set (0.00 sec)SELECT S.saleID, S.quantity, P.productName, (P.price * S.quantity) revenue
FROM sales S JOIN products P
USING (productID);Even though the AS keyword is not required to define an alias, it is considered good practice to include it. Doing so can help keep the intent of the query clear and improve its readability.
Result
By reading this guide, you learned how to use the JOIN operation to combine separate tables into a single query result set. While the commands shown here should work on most relational databases, note that each SQL database uses its own unique implementation of the language. You should refer to your DBMS documentation for a more complete explanation of each command and their full set of options.









