MySQL basics – Joins
A join in SQL is an attempt to match up rows in one table with rows in another. This allows you to treat the joined rows as if they were in one big table. The differences between the various join types mostly have to do with whether and how non-matching rows are discarded. Everything discussed below is generic SQL and not specific to MySQL unless noted. I use “record” and “row” here interchangeably.
Setup
For purposes of illustration, let’s assume we are working with a movie database.
We have a movie table to hold information about each movie and an actor
table to hold information about actors. Here’s a simple layout of the two
tables and some sample data. This contrived example only allows at most one
actor per movie. I’ll cover other modeling scenarios (such as many-to-many)
in a future post.
actor Table:
| actor_id | actor | gender | is_alive |
|---|---|---|---|
| 1 | Angelina Jolie | F | 1 |
| 2 | Béla Lugosi | M | 0 |
| 3 | Carrie Fisher | F | 1 |
| 4 | Daniel Craig | M | 1 |
movie Table:
| movie_id | actor_id | title | year |
|---|---|---|---|
| 1 | 1 | Mr. & Mrs. Smith | 2005 |
| 2 | 1 | Lara Croft: Tomb Raider | 2001 |
| 3 | 2 | Dracula | 1931 |
| 4 | 3 | Star Wars: A New Hope | 1977 |
| 5 | 3 | Star Wars: The Empire Strikes Back | 1980 |
| 6 | 3 | Star Wars: Return of the Jedi | 1983 |
| 7 | NULL | Lion King | 1994 |
| 8 | NULL | Time Bandits | 1981 |
Join Structure
Joins are added to the FROM clause to indicate additional tables to pull
records from. They consist of an optional type, followed by the JOIN keyword,
then the table to be joined, and then the ON condition(s). You can have as
many joins as needed in a SQL statement, and you can join the same table more
than once (even to itself) as long as you use distinct table aliases.
FROM table1
[type] JOIN table2 ON condition [AND condition ...]
[type] JOIN table3 ON condition ...
Tables listed before the JOIN keyword are considered on the left side of the
join, while tables after the JOIN keyword are considered on the right side
of the join. These directional distinctions matter for outer joins, explained
below.
Inner Join
An INNER join is the most restrictive type, and only returns rows from the left
and right tables which both match the ON condition(s).
SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id
Results:
| movie_id | title | actor |
|---|---|---|
| 1 | Mr. & Mrs. Smith | Angelina Jolie |
| 2 | Lara Croft: Tomb Raider | Angelina Jolie |
| 3 | Dracula | Béla Lugosi |
| 4 | Star Wars: A New Hope | Carrie Fisher |
| 5 | Star Wars: The Empire Strikes Back | Carrie Fisher |
| 6 | Star Wars: Return of the Jedi | Carrie Fisher |
Notice how Lion King and Time Bandits from the movie table and Daniel Craig
from the actor table are missing from the results. None of those rows matched
the condition (A.actor_id = M.actor_id). Also notice how Angelina Jolie and
Carrie Fisher get returned multiple times. This is because those artist
rows matched the condition multiple times. If we wanted to return a single row
per matching artist, we would need to use a GROUP BY clause, the topic for
another post.
Note that the default join type in MySQL is INNER join, so if you omit the join
type and just use the keyword JOIN, you are performing an INNER join.
Left Outer Join
A LEFT OUTER join returns all rows from the left side of the join, and any matching rows from the right side. That is, rows from the left side of the join which do not match the condition will not be thrown out as with the INNER join above. Instead, they just get NULL values.
SELECT M.movie_id, M.title, A.actor
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id
Results:
| movie_id | title | actor |
|---|---|---|
| 1 | Mr. & Mrs. Smith | Angelina Jolie |
| 2 | Lara Croft: Tomb Raider | Angelina Jolie |
| 3 | Dracula | Béla Lugosi |
| 4 | Star Wars: A New Hope | Carrie Fisher |
| 5 | Star Wars: The Empire Strikes Back | Carrie Fisher |
| 6 | Star Wars: Return of the Jedi | Carrie Fisher |
| 7 | Lion King | NULL |
| 8 | Time Bandits | NULL |
Notice how we now get Lion King and Time Bandits from the movie table
even though they do not match any artist records. The A.actor field becomes
NULL for these rows. We still do not see Daniel Craig however as his row
was eliminated from the join.
It’s important to note that it is not the order of the tables in the ON
condition which determines left side and right side, but which side of the
JOIN keyword they are on. If we switch the order of tables in the ON clause
in the query above to be ON M.actor_id = A.actor_id, we will get exactly the
same results.
In MySQL, the OUTER keyword is optional, so LEFT JOIN is equivalent to LEFT
OUTER JOIN.
Right Outer Join
A RIGHT OUTER join returns all rows from the right side of the join, and any matching rows from the left side. This is exactly the same as a LEFT OUTER join with the sides switched.
SELECT M.movie_id, M.title, A.actor
FROM movie M
RIGHT OUTER JOIN actor A ON A.actor_id = M.actor_id
Results:
| movie_id | title | actor |
|---|---|---|
| 1 | Mr. & Mrs. Smith | Angelina Jolie |
| 2 | Lara Croft: Tomb Raider | Angelina Jolie |
| 3 | Dracula | Béla Lugosi |
| 4 | Star Wars: A New Hope | Carrie Fisher |
| 5 | Star Wars: The Empire Strikes Back | Carrie Fisher |
| 6 | Star Wars: Return of the Jedi | Carrie Fisher |
| NULL | NULL | Daniel Craig |
This time, we get a row for Daniel Craig with no matching movie information. Lion King and Time Bandits are eliminated by the join condition as with the INNER join.
Note that you’ll almost never see a RIGHT OUTER join in practice as it can so easily be rewritten as a LEFT OUTER join, just switch the order of the tables. Here is the same query rewritten as a LEFT OUTER join.
SELECT M.movie_id, M.title, A.actor
FROM actor A
LEFT OUTER JOIN movie M ON A.actor_id = M.actor_id
JOIN vs WHERE
When dealing with multiple join conditions for a single join, you may wonder
what the difference is between putting a condition in the WHERE clause versus
the JOIN clause. The WHERE clause applies to rows after they’re joined,
whereas the JOIN conditions apply to rows before they’re joined. For
INNER joins, there is no difference, since all conditions need to be met by both
sides of the join to be returned, so the order doesn’t matter. But for OUTER
joins, the order can cause subtle differences.
Here’s an example. Let’s say that we want to retrieve all movies along with
any matching living actors. The first condition we need is easy, M.actor_id =
A.actor_id, same as the LEFT OUTER join example above. The second condition
is also easy, A.is_alive = 1, but where to put it, in the JOIN or in the
WHERE?
In the WHERE clause:
SELECT M.movie_id, M.title, A.actor, A.is_alive
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id
WHERE A.is_alive = 1
Results:
| movie_id | title | actor | is_alive |
|---|---|---|---|
| 1 | Mr. & Mrs. Smith | Angelina Jolie | 1 |
| 2 | Lara Croft: Tomb Raider | Angelina Jolie | 1 |
| 4 | Star Wars: A New Hope | Carrie Fisher | 1 |
| 5 | Star Wars: The Empire Strikes Back | Carrie Fisher | 1 |
| 6 | Star Wars: Return of the Jedi | Carrie Fisher | 1 |
Interesting, but not what we wanted. It joined all eight movies to any matching
artists (same as the LEFT OUTER join example), then eliminated any row where
is_alive != 1. Note that NULL is never equal to anything, even NULL. But
what if we put it in the JOIN clause?
In the JOIN clause:
SELECT M.movie_id, M.title, A.actor, A.is_alive
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id AND A.is_alive = 1
Results:
| movie_id | title | actor | is_alive |
|---|---|---|---|
| 1 | Mr. & Mrs. Smith | Angelina Jolie | 1 |
| 2 | Lara Croft: Tomb Raider | Angelina Jolie | 1 |
| 3 | Dracula | NULL | NULL |
| 4 | Star Wars: A New Hope | Carrie Fisher | 1 |
| 5 | Star Wars: The Empire Strikes Back | Carrie Fisher | 1 |
| 6 | Star Wars: Return of the Jedi | Carrie Fisher | 1 |
| 7 | Lion King | NULL | NULL |
| 8 | Time Bandits | NULL | NULL |
Perfect! It eliminated Béla Lugosi from the artist set (poor chap is dead), then joined all eight movies with any remaining matching artists.
So the moral of this story is “order is important”. Remember, pillage before you burn.
Cross Join
A CROSS join, also known as a Cartesian Product, does not attempt to match rows from the left and right sides as an INNER or OUTER join would. Instead it gives rows representing every combination of the left and right sides. It does not therefore have any conditional statements.
SELECT M.title, A.actor
FROM movie M
CROSS JOIN actor A
Results:
| title | actor |
|---|---|
| Mr. & Mrs. Smith | Angelina Jolie |
| Mr. & Mrs. Smith | Béla Lugosi |
| Mr. & Mrs. Smith | Carrie Fisher |
| Mr. & Mrs. Smith | Daniel Craig |
| Lara Croft: Tomb Raider | Angelina Jolie |
| Lara Croft: Tomb Raider | Béla Lugosi |
| Lara Croft: Tomb Raider | Carrie Fisher |
| Lara Croft: Tomb Raider | Daniel Craig |
| Dracula | Angelina Jolie |
| Dracula | Béla Lugosi |
| Dracula | Carrie Fisher |
| Dracula | Daniel Craig |
| Star Wars: A New Hope | Angelina Jolie |
| Star Wars: A New Hope | Béla Lugosi |
| Star Wars: A New Hope | Carrie Fisher |
| Star Wars: A New Hope | Daniel Craig |
| Star Wars: The Empire Strikes Back | Angelina Jolie |
| Star Wars: The Empire Strikes Back | Béla Lugosi |
| Star Wars: The Empire Strikes Back | Carrie Fisher |
| Star Wars: The Empire Strikes Back | Daniel Craig |
| Star Wars: Return of the Jedi | Angelina Jolie |
| Star Wars: Return of the Jedi | Béla Lugosi |
| Star Wars: Return of the Jedi | Carrie Fisher |
| Star Wars: Return of the Jedi | Daniel Craig |
| Lion King | Angelina Jolie |
| Lion King | Béla Lugosi |
| Lion King | Carrie Fisher |
| Lion King | Daniel Craig |
| Time Bandits | Angelina Jolie |
| Time Bandits | Béla Lugosi |
| Time Bandits | Carrie Fisher |
| Time Bandits | Daniel Craig |
As you can see, CROSS joins are not that useful and can create a truly staggering number of results. Cross joining a table with N rows and a table with M rows will create a result set of N x M rows.
Other Join Types
Here are some more obscure joins, some specific to MySQL. I have never needed to use these in practice.
Full Outer Join
A FULL OUTER join simply returns all rows from both sides of the join, matching
them by the ON condition where possible. MySQL does not support JOIN syntax
for a FULL OUTER join, but it can be simulated through the clever use of
UNIONS.
USING keyword
MySQL supports an optional syntax for joins through the USING keyword. Instead
of explicitly listing ON conditions, you can specify USING(column) which
causes column in both tables to be compared. Thus in our example, these two
queries are equivalent:
Explicit condition:
SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id
USING() keyword:
SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A USING (actor_id)
I don’t recommend the USING syntax as it is less specific and less portable
than explicitly listing the ON conditions. Also, it is not exactly identical
to the explicit form in that the results will have only actor_id not
M.actor_id and A.actor_id as you would expect.
Natural Join
In MySQL, a NATURAL join is simply a normal INNER or OUTER join where the ON
condition is assumed to be the comparison of all identically named columns in
the tables. I don’t recommend this syntax either for the same reasons as with
the USING keyword. Here are two equivalent queries:
Explicit condition:
SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id
NATURAL join:
SELECT M.movie_id, M.title, A.actor
FROM movie M
NATURAL INNER JOIN actor A