Today we’re going to explore the different kinds of joins that can be used in SQL, as well as when each of those join types would be useful to us. If you’re new to the concept of joins in general we’ll quickly go over why joins are useful in the first place, but feel free to skip ahead if you’re already familiar with the concept. We’ll dive into the SQL join examples after we’ve had a brief general introduction to the basics.

I’m assuming for this post that you have some experience with database concepts like tables, rows, columns, and simple select statements. If you don’t, I suggest checking out Helen Anderson’s excellent Relational Database Fundamentals article, as well as the rest of her SQL 101 posts.

What is a SQL join query, and when would I use one?

An SQL join query will be needed any time you want to relate the rows from two or more tables together. Let’s work through an example so that we can begin thinking in practical terms.

We’ll look at an imaginary scenario involving medications and drug distributors. Let’s pretend that we’re responsible for producing a sales report summarizing the market performance of each drug over a certain time period.

First I’ll outline the structure and content of each table in this scenario.

Products

Code Distributor Price
001 Drug Co. 100.00
002 Drug Co. 125.00
A29 Medicine Co. 75.00
A 3-digit code identifies products in our example. The price is a base cost that can be offset by rebates.

Sales

Code Quantity Date
001 15 2019-01-23
001 27 2019-05-02
002 5 2019-03-01
A29 56 2019-12-15
002 8 2019-07-14
A29 19 2019-08-26
Individual sales and their quantities over time.

Rebates

Code Dollars Date
001 300 2019-01-23
A29 250 2019-08-26
002 350 2019-07-14
Rebates can be redeemed later and act as a partial refund to the purchaser. A rebate must be tied back to a sale through the drug code and date of sale.

There are several questions we might have to answer as an analyst at our fictional company.

  • How much revenue do we have, broken out by drug?
  • How much revenue is each distributor bringing in?
  • What are the lowest and highest prices paid, after rebates have been applied?

We will need to relate our tables together if we are going to answer any of these questions. Let’s take a look at how we could answer the first question using joins.

To calculate revenue, we’ll need to aggregate the sales data. Looking at the sales table, however, you can see that we have the quantity sold in each transaction, but not the price at which it was sold. The products table contains the price, so we’ll need to relate the two tables together. How would you do this without a computer? You would go row by row, taking the Code column from sales and matching it against the Code column in the products table.

That’s basically what a SQL database does while executing a join. Let’s write the SQL that tells our database how to do the same.

SELECT products.code, sales.quantity * products.price AS cost FROM sales 
JOIN products ON sales.code = products.code;

SQL joins consist of a what and a how: What tables are we going to relate together, and how do I do that? The table written after the JOIN keyword is what we’re relating to the sales table, and the condition after the ON keyword tells the database what must be true in order for any given pair of rows to be related.

Let’s take a look at the resulting table after running this query.

Code Quantity Date Code Distributor Price Cost
001 15 2019-01-23 001 Drug Co. 100.00 1500.00
001 27 2019-05-02 001 Drug Co. 100.00 2700.00
002 5 2019-03-01 002 Drug Co. 125.00 625.00
A29 56 2019-12-15 A29 Medicine Co. 75.00 4200.00
002 8 2019-07-14 002 Drug Co. 125.00 1000.00
A29 19 2019-08-26 A29 Medicine Co. 75.00 1425.00

Now that we’ve related the sales and products tables together, we have all of the necessary pieces to answer the original question, but we still need to go a step further. We need to aggregate by drug code, and sum the price across those drug codes.

SELECT products.code, sum(sales.quantity * products.price) AS rev FROM sales 
JOIN products ON sales.code = products.code 
GROUP BY products.code;

This query brings us to the answer for our first question.

Code Rev
001 4200.00
002 1625.00
A29 5625.00

What are the different types of join?

There are several different kinds of join, but fortunately for us, only a few of them make up the vast majority of use cases. Let’s get a high level overview of the different join types before diving into examples for each one. I’ll list them roughly in order of how frequently you might find a use for each.

Type Equivalent To What Does It Do?
JOIN INNER JOIN Result contains rows where there is a match between both tables. Rows without a match are omitted.
LEFT JOIN LEFT OUTER JOIN Result contains all rows from the left side of the join. NULL values appear where there was no match on the right side.
RIGHT JOIN RIGHT OUTER JOIN Result contains all rows from the right side of the join. NULL values appear where there was no match on the left side.
FULL JOIN FULL OUTER JOIN Result contains all rows. NULL values appear where there was no match.
CROSS JOIN Result contains all combinations of the left and right side tables (Cartesian product).

JOIN

The INNER JOIN or simply JOIN is what we started with in the example at the beginning of this post. The INNER part is optional and does not change the meaning of the query in any way. An inner join is going to return only those rows that satisfy the condition in the ON clause. If we refer to the join part of the original example query, we can see that this means our result data will contain rows only where the product code in the sales table has a corresponding entry in the products table.

In our original example, every row from sales has a corresponding row in the products table. What would happen if we had a sale record with a product code that’s not in our products table?

In that case, the extra row from the sales table would not appear at all in our results. This type of join is useful when you would like to omit any rows where there is no match between tables. This behavior is the opposite of OUTER joins (i.e. LEFT, RIGHT, or FULL join types).

LEFT JOIN

The LEFT OUTER JOIN or simply LEFT JOIN guarantees that every row from the left hand table will appear in the output. The OUTER part is optional and does not change the meaning of the query in any way.

SELECT * FROM sales LEFT JOIN products ON sales.code = products.code;

In this case, if we have a record in the sales table whose product code does not exist in the products table, our output will still contain that row. Because there was no matching product, what would the product columns contain? Let’s look at that row of the output specifically.

Code Quantity Date Code Distributor Price
B58 10 2019-09-01 NULL NULL NULL

The NULL value is special in SQL and denotes the absence of a value. The OUTER family of join types will use NULL to populate columns for rows where a match was not found for the left or right side.

We can use the scenario introduced at the beginning as a practical example of when a LEFT JOIN would be needed. In our first example, we have a rebates table that should offset the prices paid in the sales table. We left out the rebates table when we wrote our initial query for analyzing revenue.

While every sale had a corresponding product, the same is not true of rebates. In fact, the rebates table has only 3 rows. If we were to use an inner join, our query would incorrectly output only 3 rows in total. Let’s revise the initial revenue query to include rebates while making use of a LEFT JOIN in the process.

SELECT 
products.code, 
sum(sales.quantity * products.price - coalesce(rebates.dollars, 0)) AS rev 
FROM sales 
JOIN products ON sales.code = products.code
LEFT JOIN rebates ON sales.code = rebates.code AND sales.date = rebates.date
GROUP BY products.code;

The coalesce function returns, in order from left to right, the first non-null value. This is necessary because our LEFT JOIN is going to populate the rebate dollars column with NULL for any sale record that has no corresponding rebate. In those cases, zero will be the first non-null value that the coalesce function encounters.

RIGHT JOIN

The RIGHT OUTER JOIN or simply RIGHT JOIN is similar to the LEFT JOIN in that it ensures every row (from the right hand side in this case) is present in the output.

SELECT * FROM sales RIGHT JOIN products ON sales.code = products.code;

Looking at our previous example of an additional sales record without a matching product, in this case the extra sale would be omitted, because we are essentially starting from the right hand side (i.e. the products table). If instead of an extra sale we had an extra product without a corresponding sale, our output would appear as follows:

Code Quantity Date Code Distributor Price
NULL NULL NULL B58 Medicine Co. 500

FULL JOIN

The FULL JOIN, or equivalently FULL OUTER JOIN, is the combination of the LEFT JOIN and RIGHT JOIN. In this situation, an output row can be missing either the left or right hand side of the join.

SELECT * FROM sales FULL JOIN products ON sales.code = products.code;

Let’s combine the previous LEFT JOIN and RIGHT JOIN examples so that we have both an extra sales record and an extra product. It’s important to note here that the new sale has no corresponding product, and likewise, the new product has no corresponding sale. The output would appear as follows in this case:

Code Quantity Date Code Distributor Price
C26 10 2019-09-01 NULL NULL NULL
NULL NULL NULL B58 Medicine Co. 500

CROSS JOIN

The CROSS JOIN forms a Cartesian product, or in other words, all possible combinations of row pairs between two tables.

SELECT * FROM sales CROSS JOIN products;

Note here that it doesn’t make sense to have an ON clause, because we’re explicitly asking the database to generate all combinations, regardless of whether the rows match in any way. Taking the sales and products tables from our earlier examples and applying a CROSS JOIN would generate 18 rows, or 6 per product, because every product is matched to every sale.

You may sometimes see a CROSS JOIN written as an inner join with an ON condition that always evaluates to true.

SELECT * FROM sales JOIN products ON TRUE;

The FROM clause also supports a syntax that will produce cross joins.

SELECT * FROM sales, products;

You might be wondering when we would need to produce all combinations like this. Referring to our original example, let’s assume that we need to organize our revenue report not only by drug code, but by quarter as well. We also want a row for every possible product and quarter, even if there were no sales for that particular combination.

Let’s assume we have a table containing the quarters that we’re interested in.

Quarter
201901
201902
201903
201904

Because we may not have a sale for all possible product / quarter combinations, it makes sense to use a CROSS JOIN to generate these combinations as a first step. We can then join to the products and sales tables with a LEFT JOIN, while using coalesce to handle cases where a given product / quarter did not have a sale.