Home > MySQL > MySQL basics – Joins

MySQL basics – Joins

A join in SQL is an attempt to match up rows in one table with rows in another. This allows you to treat the joined rows as if they were in one big table. The differences between the various join types mostly have to do with whether and how non-matching rows are discarded. Everything discussed below is generic SQL and not specific to MySQL unless noted. I use “record” and “row” here interchangeably.

Setup

For purposes of illustration, let’s assume we are working with a movie database. We have a movie table to hold information about each movie and an actor table to hold information about actors. Here’s a simple layout of the two tables and some sample data. This contrived example only allows at most one actor per movie. I’ll cover other modeling scenarios (such as many-to-many) in a future post.

actor Table:

actor_id actor gender is_alive
1 Angelina Jolie F 1
2 Béla Lugosi M 0
3 Carrie Fisher F 1
4 Daniel Craig M 1

movie Table:

movie_id actor_id title year
1 1 Mr. & Mrs. Smith 2005
2 1 Lara Croft: Tomb Raider 2001
3 2 Dracula 1931
4 3 Star Wars: A New Hope 1977
5 3 Star Wars: The Empire Strikes Back 1980
6 3 Star Wars: Return of the Jedi 1983
7 NULL Lion King 1994
8 NULL Time Bandits 1981

Join Structure

Joins are added to the FROM clause to indicate additional tables to pull records from. They consist of an optional type, followed by the JOIN keyword, then the table to be joined, and then the ON condition(s). You can have as many joins as needed in a SQL statement, and you can join the same table more than once (even to itself) as long as you use distinct table aliases.

FROM table1
[type] JOIN table2 ON condition [AND condition ...]
[type] JOIN table3 ON condition ...

Tables listed before the JOIN keyword are considered on the left side of the join, while tables after the JOIN keyword are considered on the right side of the join. These directional distinctions matter for outer joins, explained below.

Inner Join

An INNER join is the most restrictive type, and only returns rows from the left and right tables which both match the ON condition(s).

SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id

Results:

movie_id title actor
1 Mr. & Mrs. Smith Angelina Jolie
2 Lara Croft: Tomb Raider Angelina Jolie
3 Dracula Béla Lugosi
4 Star Wars: A New Hope Carrie Fisher
5 Star Wars: The Empire Strikes Back Carrie Fisher
6 Star Wars: Return of the Jedi Carrie Fisher

Notice how Lion King and Time Bandits from the movie table and Daniel Craig from the actor table are missing from the results. None of those rows matched the condition (A.actor_id = M.actor_id). Also notice how Angelina Jolie and Carrie Fisher get returned multiple times. This is because those artist rows matched the condition multiple times. If we wanted to return a single row per matching artist, we would need to use a GROUP BY clause, the topic for another post.

Note that the default join type in MySQL is INNER join, so if you omit the join type and just use the keyword JOIN, you are performing an INNER join.

Left Outer Join

A LEFT OUTER join returns all rows from the left side of the join, and any matching rows from the right side. That is, rows from the left side of the join which do not match the condition will not be thrown out as with the INNER join above. Instead, they just get NULL values.

SELECT M.movie_id, M.title, A.actor
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id

Results:

movie_id title actor
1 Mr. & Mrs. Smith Angelina Jolie
2 Lara Croft: Tomb Raider Angelina Jolie
3 Dracula Béla Lugosi
4 Star Wars: A New Hope Carrie Fisher
5 Star Wars: The Empire Strikes Back Carrie Fisher
6 Star Wars: Return of the Jedi Carrie Fisher
7 Lion King NULL
8 Time Bandits NULL

Notice how we now get Lion King and Time Bandits from the movie table even though they do not match any artist records. The A.actor field becomes NULL for these rows. We still do not see Daniel Craig however as his row was eliminated from the join.

It’s important to note that it is not the order of the tables in the ON condition which determines left side and right side, but which side of the JOIN keyword they are on. If we switch the order of tables in the ON clause in the query above to be ON M.actor_id = A.actor_id, we will get exactly the same results.

In MySQL, the OUTER keyword is optional, so LEFT JOIN is equivalent to LEFT OUTER JOIN.

Right Outer Join

A RIGHT OUTER join returns all rows from the right side of the join, and any matching rows from the left side. This is exactly the same as a LEFT OUTER join with the sides switched.

SELECT M.movie_id, M.title, A.actor
FROM movie M
RIGHT OUTER JOIN actor A ON A.actor_id = M.actor_id

Results:

movie_id title actor
1 Mr. & Mrs. Smith Angelina Jolie
2 Lara Croft: Tomb Raider Angelina Jolie
3 Dracula Béla Lugosi
4 Star Wars: A New Hope Carrie Fisher
5 Star Wars: The Empire Strikes Back Carrie Fisher
6 Star Wars: Return of the Jedi Carrie Fisher
NULL NULL Daniel Craig

This time, we get a row for Daniel Craig with no matching movie information. Lion King and Time Bandits are eliminated by the join condition as with the INNER join.

Note that you’ll almost never see a RIGHT OUTER join in practice as it can so easily be rewritten as a LEFT OUTER join, just switch the order of the tables. Here is the same query rewritten as a LEFT OUTER join.

SELECT M.movie_id, M.title, A.actor
FROM actor A
LEFT OUTER JOIN movie M ON A.actor_id = M.actor_id

JOIN vs WHERE

When dealing with multiple join conditions for a single join, you may wonder what the difference is between putting a condition in the WHERE clause versus the JOIN clause. The WHERE clause applies to rows after they’re joined, whereas the JOIN conditions apply to rows before they’re joined. For INNER joins, there is no difference, since all conditions need to be met by both sides of the join to be returned, so the order doesn’t matter. But for OUTER joins, the order can cause subtle differences.

Here’s an example. Let’s say that we want to retrieve all movies along with any matching living actors. The first condition we need is easy, M.actor_id = A.actor_id, same as the LEFT OUTER join example above. The second condition is also easy, A.is_alive = 1, but where to put it, in the JOIN or in the WHERE?

In the WHERE clause:

SELECT M.movie_id, M.title, A.actor, A.is_alive
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id
WHERE A.is_alive = 1

Results:

movie_id title actor is_alive
1 Mr. & Mrs. Smith Angelina Jolie 1
2 Lara Croft: Tomb Raider Angelina Jolie 1
4 Star Wars: A New Hope Carrie Fisher 1
5 Star Wars: The Empire Strikes Back Carrie Fisher 1
6 Star Wars: Return of the Jedi Carrie Fisher 1

Interesting, but not what we wanted. It joined all eight movies to any matching artists (same as the LEFT OUTER join example), then eliminated any row where is_alive != 1. Note that NULL is never equal to anything, even NULL. But what if we put it in the JOIN clause?

In the JOIN clause:

SELECT M.movie_id, M.title, A.actor, A.is_alive
FROM movie M
LEFT OUTER JOIN actor A ON A.actor_id = M.actor_id AND A.is_alive = 1

Results:

movie_id title actor is_alive
1 Mr. & Mrs. Smith Angelina Jolie 1
2 Lara Croft: Tomb Raider Angelina Jolie 1
3 Dracula NULL NULL
4 Star Wars: A New Hope Carrie Fisher 1
5 Star Wars: The Empire Strikes Back Carrie Fisher 1
6 Star Wars: Return of the Jedi Carrie Fisher 1
7 Lion King NULL NULL
8 Time Bandits NULL NULL

Perfect! It eliminated Béla Lugosi from the artist set (poor chap is dead), then joined all eight movies with any remaining matching artists.

So the moral of this story is “order is important”. Remember, pillage before you burn.

Cross Join

A CROSS join, also known as a Cartesian Product, does not attempt to match rows from the left and right sides as an INNER or OUTER join would. Instead it gives rows representing every combination of the left and right sides. It does not therefore have any conditional statements.

SELECT M.title, A.actor
FROM movie M
CROSS JOIN actor A

Results:

title actor
Mr. & Mrs. Smith Angelina Jolie
Mr. & Mrs. Smith Béla Lugosi
Mr. & Mrs. Smith Carrie Fisher
Mr. & Mrs. Smith Daniel Craig
Lara Croft: Tomb Raider Angelina Jolie
Lara Croft: Tomb Raider Béla Lugosi
Lara Croft: Tomb Raider Carrie Fisher
Lara Croft: Tomb Raider Daniel Craig
Dracula Angelina Jolie
Dracula Béla Lugosi
Dracula Carrie Fisher
Dracula Daniel Craig
Star Wars: A New Hope Angelina Jolie
Star Wars: A New Hope Béla Lugosi
Star Wars: A New Hope Carrie Fisher
Star Wars: A New Hope Daniel Craig
Star Wars: The Empire Strikes Back Angelina Jolie
Star Wars: The Empire Strikes Back Béla Lugosi
Star Wars: The Empire Strikes Back Carrie Fisher
Star Wars: The Empire Strikes Back Daniel Craig
Star Wars: Return of the Jedi Angelina Jolie
Star Wars: Return of the Jedi Béla Lugosi
Star Wars: Return of the Jedi Carrie Fisher
Star Wars: Return of the Jedi Daniel Craig
Lion King Angelina Jolie
Lion King Béla Lugosi
Lion King Carrie Fisher
Lion King Daniel Craig
Time Bandits Angelina Jolie
Time Bandits Béla Lugosi
Time Bandits Carrie Fisher
Time Bandits Daniel Craig

As you can see, CROSS joins are not that useful and can create a truly staggering number of results. Cross joining a table with N rows and a table with M rows will create a result set of N x M rows.

Other Join Types

Here are some more obscure joins, some specific to MySQL. I have never needed to use these in practice.

Full Outer Join

A FULL OUTER join simply returns all rows from both sides of the join, matching them by the ON condition where possible. MySQL does not support JOIN syntax for a FULL OUTER join, but it can be simulated through the clever use of UNIONS.

USING keyword

MySQL supports an optional syntax for joins through the USING keyword. Instead of explicitly listing ON conditions, you can specify USING(column) which causes column in both tables to be compared. Thus in our example, these two queries are equivalent:

Explicit condition:

SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id

USING() keyword:

SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A USING (actor_id)

I don’t recommend the USING syntax as it is less specific and less portable than explicitly listing the ON conditions. Also, it is not exactly identical to the explicit form in that the results will have only actor_id not M.actor_id and A.actor_id as you would expect.

Natural Join

In MySQL, a NATURAL join is simply a normal INNER or OUTER join where the ON condition is assumed to be the comparison of all identically named columns in the tables. I don’t recommend this syntax either for the same reasons as with the USING keyword. Here are two equivalent queries:

Explicit condition:

SELECT M.movie_id, M.title, A.actor
FROM movie M
INNER JOIN actor A ON A.actor_id = M.actor_id

NATURAL join:

SELECT M.movie_id, M.title, A.actor
FROM movie M
NATURAL INNER JOIN actor A
Categories: MySQL Tags:
  1. No comments yet.
  1. No trackbacks yet.