As a web developer, SQL is a very integral part of my day-to-day development workflow. Even when I use ORM or NoSQL - I can't imagine an extended period of time where I am not writing SQL.
In this guide - I will cover some of the basic SQL statements you will write often as a web developer.
1 - Create tables
CREATE TABLE users(
id int(11) unsigned PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(155) NOT NULL,
email VARCHAR(155) NOT NULL,
created_at datetime DEFAULT NULL,
verified tinyint(1) DEFAULT NULL,
KEY `user_email` (`email`)
);
You can also run the following query to get the create statement for any table:
SHOW CREATE TABLE users;
2 - GROUP BY
I skipped past SELECT statements as I'm assuming you would already know this. In this statement, we basically want to get a count of posts written by each author.
SELECT author_id, count(id) as total
FROM posts
GROUP BY author_id
This should return:
+-----------+-----------+
| author_id | total |
+-----------+-----------+
| 1 | 6 |
| 2 | 4 |
+-----------+-----------+
Note: Whenever you see "as" in a SQL statement - it's just an alias, a name to identify whatever field you selected. If we didn't use as total. We would see something like:
+-----------+-----------+
| author_id | count(id) |
+-----------+-----------+
| 1 | 6 |
| 2 | 4 |
+-----------+-----------+
As you can imagine, if you working with a programming language or generating some kind of report. "count(id)" is not a very good label hence why we use the alias "total".
You can also use aliases to refer to tables.
3 - JOIN's
In SQL - you often spread data across multiple tables, this is known as normalization and is done in this way, to prevent repetition and keep your tables compact.
Since an Author can have hundreds of blog posts, it does not make sense to store the author's name, email, and other information against every single post in the posts table.
When you need to change the email for example, you now need to update 100+ rows in the posts table which is cumbersome and will be slow as the table grows.
A better approach, store the author in an "authors" table and add a foreign key "author_id" to the posts table.
This is where JOINs come in, they allow you to select data across multiple tables and merge those tables virtually in your select query as if it's one giant table.
There are many different types of JOINs, in this article - I am just going to cover "INNER JOINS". For these types of JOINs, the query will only select rows if there's data in both tables.
Coming back to our previous query in the group by, we select "author_id" - however this is not very useful, and more often than not, we would also want the author name.
Let us rewrite the above to pull in the author's name instead of author ID.
NOTE: depending on your db config. If you get an error - you may need to add all the selected fields after a.id to the group by.
SELECT a.name, count(a.id) as total
FROM posts p JOIN authors a ON(a.id = p.author_id)
GROUP BY a.id
Result:
+-----------+-----------+
| name | total |
+-----------+-----------+
| John | 6 |
| Paul | 4 |
+-----------+-----------+
4 - Aggregate functions
Often you will need to count or total a column. MySQL offers many useful functions to handle these. The common ones are:
count()
sum()
avg()
min()
max()
NOTE: MySQL reserved keywords and functions are case insensitive, therefore COUNT() and count() are the same. We usually uppercase all function and reserved words so it's easy to read.
How to use them:
SELECT COUNT(id) as total_users FROM users;
SELECT MAX(price) as max_price FROM sales;
SELECT MIN(price) as min_price FROM sales;
You can also combine these with GROUP BY. In the above, we did "max_price", which basically selects the highest price an item was sold for.
However what if we want to know the highest-priced item sold by category?
SELECT category_id as cat_id, max(price) as max_price
FROM sales
GROUP BY cat_id;
Result:
+---------+------------+
| cat_id | max_price |
+---------+------------+
| 1 | 55.34 |
| 2 | 9.99 |
| 3 | 12.00 |
+---------+------------
You can of course JOIN categories to get the category name instead of the ID.
5 - Date functions
You almost always need to work with dates. In MySQL - these are the most common date functions/reserved words I use:
NOW() or CURRENT_TIMESTAMP - the current timestamp.
CURRENT_DATE - similar to NOW, except is only the date without time.
DATE_SUB(date, interval) - subtract x number of days/time from a date.
Examples:
SELECT title, category
FROM posts
WHERE the_date=CURRENT_DATE;
# Select all posts that were updated in the last 5 days.
SELECT title, category
FROM posts
WHERE updated_at >= DATE_SUB(NOW(), INTERVAL 5 DAY);
6 - Pagination
There are two popular types of pagination we can use in SQL:
Cursor pagination - incrementally move the cursor based on a column usually a primary key field. In large tables, this can improve performance quite a bit.
LIMIT & Offset - this is the most common. You basically use paging to move through records.
Examples:
# Cursor pagination
# When results are returned, you simply take the last ID
# in the results set and update "x" to move the cursor.
SELECT title,description
FROM posts
WHERE id > x
ORDER By id ASC
# Limit & Offset
# In your app you would have a page and per page variable:
# page = 1 , perPage = 100
# As the user moves from page to page
# we just calculate the offset as follows:
# offset = (page-1) * perPage
# In SQL
SELECT title,description
FROM posts
ORDER By id ASC
LIMIT perPage OFFSET offset
# Note LIMIT & Offset has a shortcut as well:
SELECT title,description
FROM posts
ORDER By id ASC
LIMIT offset, perPage
Bonus
Sometimes, you may need to run multiple queries since a JOIN might not be able to select all the information you need in one query.
Instead of running multiple queries, it may end up improving performance just to use a sub-query.
Examples:
SELECT
p.title,
p.description,
(
SELECT COUNT(id)
FROM post_likes
WHERE post_id = p.id
) as likes,
c.name as category
FROM posts p
JOIN categories c ON (p.category_id=c.id)
ORDER BY p.title
In the above, if you JOIN on post_likes - there could be hundreds of likes. This will lead to duplicate posts in your results, even if you have 50 posts but 5000 likes - the rows returned will be 5000 instead of 50.
Using a subquery - we get only 50 posts, but we also count the number of likes for each row which returns one column instead of each "like".
You could also use a group by together with a JOIN in this case, but it's not always the case and usually, a subquery is a last resort or used for performance reasons.
This brings me to the final query you need to know, and that is EXPLAIN.
When writing SQL - it's always a good idea to first put "EXPLAIN" in front of your query:
EXPLAIN SELECT title, category_id FROM posts where created_at=CURRENT_DATE \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: posts
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6
Extra: Using where
Explain is a powerful tool, and will need an article, all of its own to explain ๐ how it works, however just in simple terms.
Check the "rows" count, the fewer rows you have to scan usually means a higher performing query.
These as well: key, key_len, extra - will also give you some insights - sometimes you may need to add an index to your table which will help speed up queries.
Conclusion
There is a ton more you can do with SQL, however, understanding these basic concepts will help you tremendously throughout your career. Most other complex SQL queries are built on these constructs.