Search for Courses, Topics

Joins in SQL

SQL joins are mostly used when a user is trying to extricate data from multiple tables at one time

24 Jul 2021-14 mins read
quiz
Challenge Inside! : Find out where you stand! Try quiz, solve problems & win rewards!

Overview

A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. This article give breif idea about different types of Joins such as INNER/EQUI JOIN, NATURAL JOIN, CROSS JOIN,SELF JOIN, etc.

Scope

In this article, we will focus on an advanced topic in SQL i.e. SQL Joins.

  • This article explains What is a SQL joins theoratically and then shows a example.
  • This artical also cover Different types of SQL joins with detailed examples.
  • When to use which join is also a part of this article.
  • And in the last we have discussed few Interview questions based on SQL JOIN.

What are Joins?

SQL Joins are mostly used when a user is trying to extricate data from multiple tables (which have one-to-many or many-to-many relationships with each other) at one time. Join keyword merges two or more tables and creates a temporary image of the merged table. Then according to the conditions provided, it extracts the required data from the image table and once data is fetched the temporary image of the merged tables is dumped.

Large databases are often prone to data redundancy, i.e., the creation of repetitive data anomalies by insertion, deletion and updation. But by using SQL Joins we promote database normalization which results in low data redundancy and elimination of redundant data.

In relational databases like SQL, there are two key fields generally used: Primary Key and Foreign Key. While the primary key is necessary for a table to qualify as a part of the relational database and identify each row uniquely of the table to which it belongs, the foreign key is responsible for linking two tables in the database. Here, the foreign key needs to be the primary key of another table.

In some cases, the foreign key and the primary key that it references are present in the same table. In such cases, we use SQL Self Join.

When we use SQL Joins, we often use these two key fields to identify what the user needs and accordingly form our queries.

Example of SQL Joins:

We have a company’s employee database, where Table 1 (emp_dets) contains information about the employee like: employee id, employee name, and supervisor id. Table 2 (supervisor_dets) contains information about supervisors, i.e., their id and name.

Table 1 has emp_id as the primary key and Table 2 has supervisor_id as the primary key. In Table 1, supervisor_id references Table 2. Hence, it is a foreign key for Table 1.

Joins in SQL

Depending on the needs of the users, there are several types of joins. These joins are broadly classified into four types, i.e., Cross Self, Inner, Outer.

Types of SQL Joins

1. Cartesian / Cross Joins

The Cartesian Join a.k.a. Cross Join is the cartesian product of all the rows of the first table with all the rows of the second table.

Let’s say if we have m rows in the first table and n rows in the second table, then the resulting cartesian join table will have m*n rows. This usually happens when the matching column or WHERE condition is not specified.

General Syntax:

SELECT column-name(s)
FROM table1 CROSS JOIN table2;

SELECT is used to specify all columns we need to display in the resulting table. FROM specifies the tables where we need to look for these columns. The type of join, i.e., CROSS JOIN in this case, is placed between the two tables we wish to join.

Example:

Let’s consider the scenario, where the first table contains customer details, i.e., customer id and customer name and the second table contains shopping details, i.e., product id and product name.

Cartesian / Cross Joins in SQL

Problem Statement:

Write a query to give the cartesian product of the Customers and Shopping_Details tables.

Query:

SELECT *
FROM Customers CROSS JOIN Shopping_Details;

2. Self Join

In SQL Self Join, a table is joined to itself. This means each row of the table is joined with itself and all other rows concerning stated conditions if any. In other words, we can say that it is a merge between two copies of the same table. This is extremely helpful when the foreign key references the primary key of the same table.

General Syntax:

SELECT a.column1 , b.column2
FROM table_name a, table_name b
WHERE some_condition;

Here we reference the same table with different names, i.e., a and b. This signifies a SELF JOIN.

Example:

Let’s consider an employee table with employee id, name, phone number, and supervisor id. The supervisors are present in the employee table itself and hence, supervisor id acts like a foreign key which is also the primary key as it references the employee id.

Table_Name: Employees

Self Join in SQL

Problem Statement:

Write a query to get all the employees which are also supervisors of some other employees from the given employees table.

Query:

SELECT a.Name AS Supervisors
FROM Employees a, Employees b
WHERE a.ID = b.supervisor_ID;

Here we use AS to rename the column name of the resultant table.

3. Inner / Equi Join

SQL Inner Join or Equi Join is the most simple join where all rows from the intended tables are cached together if they meet the stated condition. Two or more tables are required for this join. Inner Join can be used with various SQL conditional statements like WHERE, GROUP BY, ORDER BY, etc.

SQL inner join

General Syntax:

SELECT column-name 
FROM table-1 INNER JOIN table-2 
WHERE table-1.column-name = table-2.column-name;

We can alternately use just the “JOIN” keyword instead of “INNER JOIN”.

Example:

Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table named Shopping_Details, gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity.

Equi Join in SQL

Problem Statement:

Write a query to get all the customers who have bought items from the store. Display their name, item bought, and quantity.

Query:

SELECT Customers.Name, Shopping_Details.Item_Name, Shopping_Details.Quantity
FROM Customers INNER JOIN Shopping_Details
WHERE Customers.ID==Shopping_Details.ID;

Special Case of Inner Join: Natural Join

SQL Natural Join is a type of Inner join which is based on the condition that columns having the same name and same datatype present in both the tables to be joined.

General Syntax:

SELECT * FROM 
table-1 NATURAL JOIN table-2;

Example:

Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table named Shopping_Details gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity.

Natural Join in SQL

Problem Statement:

Write a query to find all details of customers who bought something from the store.

Query:

SELECT *
FROM Customers NATURAL JOIN Shopping_Details;

Outer Joins

SQL Outer joins give both matched and unmatched rows of data depending on the type of outer joins. These types are outer joins are sub-divided into the following types:

  • Left Outer Join
  • Right Outer Join
  • Full Outer Join

1. Left Outer Join

In this join a.k.a. SQL Left Join, all the rows of the left-hand table, regardless of following the stated conditions are added to the output table. Whereas, only matching rows of the right-hand table are added.

Rows belonging to the left-hand table, and not having values from the right-hand table are presented as NULL values in the resulting table.

Left Outer Join

General Syntax:

SELECT column-name(s)
FROM table1 LEFT OUTER JOIN table2
ON table1.column-name = table2.column-name;

Example:

Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table named Shopping_Details, gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity.

Left Outer Join in SQL

Problem Statement:

Write a query to display all customers irrespective of items bought or not. Display the name of the customer and item bought. If nothing is bought, display NULL.

Query:

SELECT Customers. Name, Shopping_Details.Item_Name
FROM Customers LEFT OUTER JOIN Shopping_Details;
ON Customers.ID = Shopping_Details.ID;

2. Right Outer Join

Similar to the left outer join, in the case of the Right Outer Join a.k.a. SQL Right Join, all the rows on the right-hand table, regardless of following the stated conditions are added to the output table. Whereas, only matching rows of the left-hand table are added.

Rows belonging to the right-hand table, and not having values from the left-hand table are presented as NULL values in the resulting table.

Right Outer Join

General Syntax:

SELECT column-name(s)
FROM table1 RIGHT OUTER JOIN table2
ON table1.column-name = table2.column-name;

Example:

Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table named Shopping_Details, gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity.

Right Outer Join in SQL

Problem Statement:

Write a query to get all the items bought by customers , even if the customer does not exist in the Customer database. Display customer name and item name. If a customer doesn’t exist, display NULL.

Query:

SELECT Customers.Name, Shopping_Details.Item_Name
FROM Customers RIGHT OUTER JOIN Shopping_Details;
ON Customers.ID = Shopping_Details.ID;

3. Full Outer Join

The full outer join (a.k.a. SQL Full Join) firstly adds all the rows matching to the stated condition in the query and then adds the remaining unmatched rows from both the tables. We need two or more tables for the join.

After the matched rows are added to the output table, the unmatched rows of the left-hand table are added with subsequent NULL values, and then unmatched rows of the right-hand table are added with subsequent NULL values.

Full Outer Join

General Syntax:

SELECT column-name(s)
FROM table1 FULL OUTER JOIN table2
ON table1.column-name = table2.column-name;

Example:

Let’s consider two tables of a supermarket set-up. The first table named Customers gives us information about different customers, i.e., their customer id, name and phone number. Here, CustID is the primary key that uniquely identifies each row. The second table named Shopping_Details gives us information about items bought by customers, i.e., item id, customer id (referencing the customer that bought the item), item name, and quantity.

Full Outer Join in SQL

Problem Statement:

Write a query to provide data for all customers and items ever bought from the store. Display name of customer and item name. If either data does not exist, display NULL.

Query:

SELECT Customers.Name, Shopping_Details.Item_Name
FROM Customers FULL OUTER JOIN Shopping_Details
WHERE Customer.ID = Shopping_Details.ID;

When to use what?

SQL is the most essential skill for people looking for Data Engineering, Data Science and Software Engineering Roles. Joins in SQL is one of the advanced SQL concepts, and is often asked in interviews. In these questions, it is not directly stated what SQL join to use. Hence, we need to use a 4 step analysis before we start forming our SQL query.

  1. Identification: Identify tables relating to the problem statement. We also need to identify relations between these tables, order in which they are connected, primary keys and foreign keys. Example: Let’s say we have Table A and B. Table A and Table B share a relation of Employee Details – Department Details. Table A has 3 fields – ID, Name, and DeptID. Table B has two fields – DeptID, and DeptName. Table A has a primary key ID and Table B’s primary key is DeptID. Table A and Table B are connected with the foreign key in Table A, i.e., Table B’s primary key DeptID.
  2. Observe: Observe which join will be most suitable for the scenario. This means it should be able to retrieve all the required columns and have the least number of columns that need to be eliminated by the condition. Example: If all values of Table A are required irrespective of the condition depending on Table C, we can use a left outer join on A and C.
  3. Deconstruction: Now that we have all requirements to form our query, firstly we need to break it into sub parts. This helps us form the query quicker and make our understanding of the database structure quicker. Here, we also form the conditions on the correctly identified relationships. Example: You need to present data from Table A and Table B. But Table A’s foreign key is Table C’s primary key which is Table B’s foreign key. Hence breaking down the query into results from Table B and C (let’s say Temp) and then common results between its Temp and Table A will give us the correct solution
  4. Compilation: Finally we combine all the parts and form our final query. We can use query optimization techniques like heuristic optimization(link to article) which will result in quicker responses.

Let’s take a look at some interview question based on SQL Joins:

  1. Write a query in SQL to find the names of departments where more than two employees are working. Sample Table: emp_dept
dpt_codedpt_name
57Sales
63Finance
47HR

Sample Table: emp_details

emp_idemp_fnameemp_lnameemp_dpt
1001JimHalpert 5757
1002KevinMalone63
1003DwightShrute57

Solution:

Query:

SELECT emp_dept.dpt_name
FROM emp_details 
INNER JOIN emp_dept
ON emp_dept = dpt_code
GROUP BY emp_department.dpt_name
HAVING COUNT(*) > 2;

Output:

dpt_name 
Sales   

Explanation:

Since the question directly gives a single condition which we can directly execute without any loopholes, we directly link both tables using an INNER JOIN.

  1. Write a SQL statement to make a list in ascending order for the salesmen who work either for one or more customers or not yet join under any of the customers.

Sample Table: customers

cust_idcust_namecitysalesman_id
101Nick RimandoNew York648
102Brad DavisScranton271
103Graham ZusiAtlanta271
104Julian GreenNew York648

Sample Table: salesman

salesman_idsalesman_namecity
648Jim HalpertNew York
271Dwight ShruteScranton
017Pam BeeslyScranton

Solution:

Query:

SELECT a.cust_name,a.city,b.name AS "Salesman", b.city 
FROM customer a 
RIGHT OUTER JOIN salesman b 
ON b.salesman_id=a.salesman_id 
ORDER BY b.salesman_id;

Output:

cust_namecitySalesmancity
NULLNULLPam BeeslyScranton
Brad DavisScrantonDwight ShruteScranton
Graham ZusiAtlantaDwight ShruteScranton
Nick RimandoNew YorkJim HalpertNew York
Julian GreenNew YorkJim HalpertNew York

Explanation:

In this question, it states “one or more customers or not yet join under any of the customers”. If it was “one or more customers” only, we could have directly executed the condition with the use of an INNER JOIN. But, the “or not yet” creates the demand of a complex condition formation. To avoid this complexity, we can think of the resultant table by using a RIGHT OUTER JOIN where we will only have to form the condition of the “one or more” statement and the “or not yet” results will be added as part of the RIGHT OUTER JOIN.

Conclusion

We see an in-depth analysis of Joins in SQL. To summarize:

  • What are Joins?
  • Types of SQL Joins and how to use them: Self Join, Inner Joins, Outer Joins – Left, Right, and Full, Cartesian/Cross Joins.
  • How to approach SQL Joins question – 4 step analysis.
  • Examples of Interview level questions and how to identify which Join to use.

Some Tips for acing SQL Join questions:

  • Practice SQL Join questions on real-world data
  • Get familiar with database management basics – like visualizing ER diagrams, making schemas
  • Use heuristic optimization for optimizing queries
Challenge Time!
quiz Time to test your skills and win rewards! Note: Rewards will be credited after the next product update.