Writing good and performant SQL queries

Devraj Singh
4 min readAug 12, 2021

When the amount of data in the database is small, it doesn’t matter much (in terms of performance) how you write your SQL query and hence you will not notice any noticeable difference in the performance, but when there is significant data in the database, you will notice what is a slow query and what is fast and what is faster. Anyway, here are the few tips to write performant or faster or efficient (whatever you want to call) SQL queries.

Never SELECT *
It is really considered bad practice to use SELECT *. First SELECT * is slow. Many times, we don't need data from all columns. Fetching data for all columns makes it slow. Even if we need data for all columns, SELECT * is not considered a good practice. Using columns names in the SQL query makes it a cleaner query.

Use INNER JOIN rather than WHERE
I always used to join two tables using WHERE a clause(Bad habit). Using WHERE for joining two tables is bad in terms of performance. Here are the two ways to do inner join:
SELECT e.emp_no, s.salary FROM employee e, salary s WHERE e.emp_no=s.emp_no
and
SELECT employee.emp_no, salary.salary FROM employee INNER JOIN salary ON employee.emp_no=salary.emp_no
Now let's say the employee table has 40 rows and salary has 100 rows so in the case of FROM employee e, salary s WHERE , then in case one, it will create a cartesian product and will create 40*100=4000 results internally and then filter out based on WHERE condition.
While in the case of INNER JOIN, it will only create the rows which are matched

Indexes
The index in DB is a very powerful double-edged sword. Use it properly, it fastens your queries by many times. Use it in a bad way, it can hamper the performance. Here are a few things to remember while creating an index:
a) Create indexes on columns that will be queried frequently. By default, many databases will create indexes on the primary key column and column with unique constraints. But you should not rely on that and create indexes manually. Hence, the column on which you are creating indexes should have UNIQUE constraints. Though it is not a hard rule, but a better approach.
b) Prefer integer column than char column for creating indexes.
c) Please please drop the unused indexes. Indexes are overhead and not worth if it is not used.
d) Don’t create indexes on too many columns. It hampers the performance while doing insertion and deletion. So try to create indexes on the minimum number of columns.

wildcard usage
Consider below two SQL queries:

SELECT name FROM employee WHERE name LIKE %john%; // not performantSELECT name FROM employee WHERE name LIKE john%; // faster

When you use the LIKE pattern %john%, then you should be aware of the performance penalty. You should avoid(if possible) a pattern that has a wildcard at the beginning of the pattern because in that case, the DBMS will do a full table scan of every row. On the other hand, hello% is always faster because it will only scan those columns whose column value starts with hello.

WHERE is better than HAVING
Use WHERE wherever possible and use HAVING when WHERE can’t serve the purpose.HAVING works on group functions while WHERE works on conditions based on the value of columns. Hence, HAVING will first bring all rows and then prepares the resultset while WHERE will first apply the condition and prepare the filtered result set. For example:
SELECT count(*),from_date FROM salaries WHERE from_date > '1980-11-24' GROUP BY from_date;
is better than
SELECT count(*),from_date FROM salaries GROUP BY from_date HAVING from_date > '1960-11-24';

Avoid sub-query
sub-query is slow. inner subquery runs for every row for the table mentioned in the outer query. sub-query can mostly be replaced by JOIN operation.
SELECT emp_no FROM employees WHERE emp_no NOT IN (SELECT emp_no FROM salary);
can be replaced with
SELECT employee.emp_no FROM employees LEFT JOIN salaries WHERE salaries.emp_no IS NULL;
Another way to replace sub-query is to run sub-query separately and store the output in the variable which can be used by the main query later.

Deleting all rows
There are two ways to delete all rows in the table:

TRUNCATE TABLE employees; -- faster
DELETE * FROM employees;

Truncate is faster. Why? TRUNCATE is DDL and it will not maintain anything in the transaction logs. In the case of DELETE, it will delete each row at a time, write in the transaction logs and repeat the same for all rows. Hence slower. The downside of TRUNCATE is that it can’t be undone because it doesn’t write anything in the logs.

Readability

I follow this rule while writing SQL: First, make it work(return the correct result), then make it readable and then optimize it.
Write SQL keywords in capital letters so that they can be distinguished with table and column names.

select name from employee; -- bad because SQL keywords are not CAPITALISED.SELECT name FROM employees; -- good. Now I can easily spot the difference between SQL keyword and table and column name because SQL keyword are CAPITALISED.

Conclusion

Having listed all the tips, please note that modern DBMS has an SQL optimizer and may optimize the query without your knowledge. However, it is always good to be aware of what is bad and good queries.

References

https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_in.htm

--

--