Extraction from Databases| SQL

Learn via video course
FREE
View all courses
Python and SQL for Data Science
Python and SQL for Data Science
by Srikanth Varma
1000
5
Start Learning
Python and SQL for Data Science
Python and SQL for Data Science
by Srikanth Varma
1000
5
Start Learning
Topics Covered

Overview

SQL stands for Structured Query Language, which is used to interact with Relational Databases. SQL is a clause-based query language that can enable extracting large amounts of data within a short period of time. It is an important skill for Data Scientists and Data Analysts, who often query and extract data from databases.

Introduction

  • SQL stands for Structured Query Language which is used to query, update, and manage Relational Databases (RDBMS) and extract data. It is an essential skill for Data Scientists and Analysts as Relational Databases are very common in organizations due to their simplicity and ease of maintenance.
  • A Relational Database (RDBMS) is a collection of data items with pre-defined relationships with them. It organizes data in the form of tables, i.e. in the form of rows and columns. In RDBMS, the schema for each feature is pre-defined. A few of the most common Relational Databases are - MySQL, Oracle, etc.
  • Relational Databases are managed and interacted with using SQL language. So to extract data from Relational Database(s), you need to write queries using SQL statements and clauses.

SQL Statements

One of the main advantages of SQL is its simple syntax and learning curve. To extract data from RDBMS, you need to write SQL statements which include _clauses, table names, field names, logical expressions,_ etc. As shown in the below table, there is a basic order for clauses you need to follow while writing SQL statements for data extraction.

Order Clause Description
1 SELECT SELECT statement includes data that you want to display. It can include a list of columns or computed features.
2 FROM FROM clause indicates the table name which you want to query for data extraction
3 WHERE WHERE is used to filter data based on a logical expression
4 GROUP BY GROUP BY clause is used to aggregate data
5 HAVING HAVING is used to filter aggregated data
6 ORDER BY ORDER BY clause is used to sort data in ascending or descending order based on one or multiple columns
7 LIMIT LIMIT is used to limit the number of rows in the final data

Let’s see some examples of SQL queries to understand how to use the above clauses. Suppose we have a table named _student_ as shown below -

Student ID Name Section Marks Grade
1 Chandler A 92 A
2 Monica B 90 A
3 Phoebe C 89 B
4 Joey C 71 C
5 Rachel B 65 C
6 Ross A 97 A

Suppose, if you want to display student name, their marks, section when their grade is A, you can use below query -

The above query will display below table -

Name Section Marks
Chandler A 92
Monica B 90
Ross A 97

If you want to display average marks for each section, you can use the GROUP BY clause as shown in the below query -

Above query will display below table -

Section AverageMarks
A 94.5
B 77.5
C 80

Now, let’s say you want to display section where average marks are >= 80. Then you can use the below query -

Section AverageMarks
A 94.5
C 80

Now, if you want to display the above table in ascending orders of average marks, then you can use the below query -

Section AverageMarks
C 80
A 94.5

If you want to display the section with highest average marks, you can use the query below.

Section AverageMarks
A 94.5

Filtering Data with SQL

In SQL, WHERE clause is used to filter data. Let’s get into some of the methods which you can use in the WHERE clause to filter data -

Using ‘=’ Expression

It is an equal sign which can be used to filter data when a column is equal to the specified value. Below is one example query for the equal operator -

Name Marks Grade
Chandler 92 A
Ross 97 A

Using Relational Expressions

Below are the relational expressions which you can use in WHERE clause for data filtering -

  • > or <: select rows with greater or less than specified value
  • >=: select rows with greater than or equal to specified value
  • <=: select rows with less than or equal to specified value

Let’s have a look at the example mentioned below -

Name Section Marks Grade
Chandler A 92 A
Monica B 90 A
Phoebe C 89 B
Ross A 97 A

Using Logical Expressions

Below are the logical expressions you can use in the WHERE clause to filter data -

  • AND (&) - When two specified conditions are met
  • OR (|) - When any of the two specified conditions is met
  • NOT - When a specified condition is negated

Let’s have a look at below SQL query to understand how logical expression works -

Name Section Marks Grade
Chandler A 92 A
Monica B 90 A
Phoebe C 89 B
Joey C 71 C
Ross A 97 A

Using IN Operator

IN operator can be used to filter data when you want to specify a limited number of options to match for a column. Below is a query that uses IN operator to filter data

Name Section Marks Grade
Chandler A 92 A
Phoebe C 89 B
Joey C 71 C
Ross A 97 A

Using BETWEEN Operator

The BETWEEN operator is used when you want to specify a range in filtering condition.

Name Section Marks Grade
Monica B 90 A
Phoebe C 89 B

Using LIKE Operator

The LIKE operator is used on String columns when you want to match a substring to filter input data.

Name Section Marks Grade
Phoebe C 89 B
Joey C 71 C

Extraction From Multiple Tables

In a real-world scenario, data is stored in disparate databases, and you must extract data from multiple tables. One way of extracting data from multiple tables is by using JOIN clauses. JOIN clauses combine data from multiple tables based on matching conditions of specified key(s). There are various types of join -

  • Left Join
  • Right Join
  • Inner Join
  • Outer Join

JOINS

Suppose we have another table named _student_details_ containing the student’s city and age details as shown below -

Student ID City Age
1 Chicago 21
2 New York 23
3 Seattle 20
4 Boston 19

Now, if we want to fetch a student's city and age and combine it with another table, we can use a join operation. We can join these two tables based on _StudentID_ as it is a common key between these two tables. Let’s use INNER join for our case. Inner Join will display results only for those _StudentID_ which are present in both tables.

Name Section City Age
Chandler A Chicago 21
Monica B New York 23
Phoebe C Seattle 20
Joey C Boston 19

Combined Extraction

Another way to combine data from multiple tables is by using the UNION clause. The UNION operator combines data from multiple tables using multiple SELECT commands. While applying the UNION clause, you must ensure that columns in each SELECT statement should be of the same data types and in the same order.

General SQL Conventions

  • SQL is not case sensitive. So it can be written in lower or upper case letters. However, it is good practice to write SQL clauses such as SELECT, WHERE, FROM, etc. in uppercase letters.
  • Stored data in tables is case-sensitive. So while using LIKE conditions to match a substring, you must ensure that the specified string is in the required case order.
  • You should avoid spaces in table and column names.
  • SQL ignores white space in code, and no indentation is required while writing SQL queries.
  • It is considered a good practice to write comments in your SQL queries.

Conclusion

  • SQL stands for Structured Query Language, and it is used to extract data from Relational Databases quickly.
  • Its main advantage is its simplified syntax, and you can combine its clauses to write a complex query.
  • Using SQL, you can filter data by specifying multiple conditions, and pull out data from multiple tables using JOIN and UNION clauses.