alias |
---|
Structured Query Language |
- Developed based on the ANSI SQL Standard.
- However, there are a lot of different vendor specific implementations available.
- Despite wide use, some syntax are not part of the standard.
AUTO_INCREMENT
(MySQL),SERIAL
orIDENTITY
(Postgres),AUTOINCREMENT
(SQLite)LIMIT
(MySQL, Postgres)TRUNCATE
,TEMPORARY
- UPSERT operations
ISNULL
(SQL Server)- Some data types:
ENUM
(MySQL & Postgres),MONEY
(SQL Server) - Full-text search capabilities
- Stored procedure languages (T-SQL (SQL Server), PL/SQL (Oracle))
- Each column must have a data type which restricts the type of data that can be assigned to it.
- Despite standardization, implementations vary across different RDBMS.
- Categories:
NULL
- Represents the absence of a value.
- Character
CHAR(n)
(Fixed-length)VARCHAR(n)
(Variable-length)TEXT
(for large amounts of text data, less standardized)
- Numeric
INTEGER
,INT
,SMALLINT
,BIGINT
DECIMAL
/NUMERIC
REAL
DOUBLE PRECISION
FLOAT
(Precision can vary)
- Temporal
DATE
(YYYY-MM-DD)TIME
(HH:MM:SS)TIMESTAMP
(YYYY-MM-DD HH:MM:SS)
- Other Types
- Binary String (MySQL & SQL Server)
BINARY(n)
(Fixed-length)VARBINARY(n)
(Variable-length)
BOOLEAN
- Direct support in Postgres
- Implemented in MySQL as
TINYINT(1)
BLOB
(Binary Large Object) &CLOB
(Character Large Object)- Supported in Oracle
- SQL Server uses
VARBINARY(MAX)
andVARCHAR(MAX)
- MySQL uses
LONGBLOB
andLONGTEXT
JSON
JSONB
(Postgres-specific Binary JSON format)
- Arrays
- Used to store multiple values of the same type in a single column.
- Syntax and support for arrays can vary between different database systems.
XML
- Supported in systems like SQL Server, Oracle, DB2
- e.g.
CREATE TABLE docs (id INT, content XML)
- Binary String (MySQL & SQL Server)
CREATE TABLE Employee (
EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Gender CHAR(1),
DateOfBirth DATE,
Salary DECIMAL(10, 2),
IsManager BOOLEAN,
DepartmentID INT,
JoinDate TIMESTAMP`
);
-- Creating a table with a JSON column
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
profile JSON,
scores INTEGER[]
);
-- Insert data into a table with a JSON column
INSERT INTO users (name, profile)
VALUES (
'Alice',
'{"age": 30, "city": "New York", "interests": ["reading", "hiking"]}',
ARRAY[85, 92, 78]
);
- User-Defined Types (UDTs) can be used to create custom data types tailored to specific needs.
- Syntax and support for UDTs can vary between different database systems.
-- Distinct Type
CREATE TYPE US_DOLLAR AS DECIMAL(10,2);
-- Structured Type
CREATE TYPE Address AS (
street VARCHAR(100),
city VARCHAR(50),
state CHAR(2),
zip VARCHAR(10)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
home_address Address,
credit_balance US_DOLLAR
);
-- Insert data
INSERT INTO customers VALUES (
1,
'John Doe',
('123 Main St', 'Anytown', 'CA', '12345'),
349.45
);
- Arithmetic Operators:
+
,-
,*
,/
,%
- Comparison Operators:
=
,!=
,<
,>
,<=
,>=
- Logical Operators:
AND
,OR
,NOT
- String Concatenation:
||
- Set Operators:
UNION
,INTERSECT
,EXCEPT
Important
/
performs integer division if either operand is an integer.
-
Enforce data integrity and maintain consistency within a relational database.
-
NOT NULL
- Ensures that a column's value is not null.
-
UNIQUE
- Ensures that a column's value is unique in the table.
-
PRIMARY KEY
- Combines
UNIQUE
andNOT NULL
. - Uniquely identifies each row.
- Must contain unique, non-NULL values
- UNIQUE and NOT NULL implicitly.
- A table can only have one primary key composed of single or multiple fields (composite key).
- Generally created while the database and table are created, but can also be created after a table is created.
- There are 3 types of Primary Keys:
- Surrogate Key: Autogenerated by the database.
- Natural Key: Naturally generated by the outside world.
- Composite Key: The combination of 2 or more columns generates a unique identifier which you can utilize as the primary key.
- Combines
-
FOREIGN KEY
- aka referencing key
- Links to a row in another table.
- A column or combination of columns whose values match a primary key in a different table.
- Prevents the destruction of those links.
-
DEFAULT
- Specifies a value for a column, if one is not given.
- If
NULL
is explicitly provided as a value, it will override the default value. - Must be a literal constant.
- Examples:
City VARCHAR(255) DEFAULT 'Unknown'
,ALTER TABLE Employees ALTER COLUMN City SET DEFAULT 'Unknown';
-
CHECK
- Ensures the value of a column satisfies a specific condition.
- Example:
size DECIMAL CHECK (size > 0 AND size <= 100)
-
Table constraints are not tied to any particular column, and can encompass multiple columns.
UNIQUE(column_list)
PRIMARY KEY(column_list)
CHECK(condition)
- applied on inserting and updatingREFERENCES
- Every column constraint can be written as a table constraint.
CREATE TABLE user (
user_id TEXT,
email TEXT,
...
CONSTRAINT pk_user_id PRIMARY KEY (user_id),
CONSTRAINT valid_email CHECK (email ~* '^EMAIL_REGEX$')
);
- It's possible to use a combination of constraints on each column of a table, but in some cases it might lead to redundancy (e.g.
NOT NULL DEFAULT
) and/or conflicting requirements (e.g.DEFAULT CHECK
). - To generate an auto-incrementing number column:
- PostgreSQL -
SERIAL
- MySQL -
AUTO_INCREMENT
- SQLite -
AUTOINCREMENT
- PostgreSQL -
CREATE TABLE table_name (
variable_name variable_datatype AUTO_INCREMENT,
-- Other columns...
);
CREATE TABLE table_name (
variable_name variable_datatype PRIMARY KEY AUTOINCREMENT,
-- Other columns...
);
CREATE TABLE table_name (
variable_name SERIAL PRIMARY KEY, -- Type and Value implicitly created
-- Other columns...
);
- Modifying the
DEFAULT
constraint of a column:
-- Add a `DEFAULT` constraint to an existing column of a table
ALTER TABLE table_name ALTER col_name SET DEFAULT default_value;
-- Remove a `DEFAULT` constraint from an existing column of a table.
ALTER TABLE table_name ALTER col_name DROP DEFAULT;
CREATE TABLE [IF NOT EXISTS] Users(
ID INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
AGE INT NOT NULL,
PRIMARY KEY(ID)
);
CREATE TABLE Orders(
OID INT NOT NULL,
DATE DATETIME,
AMOUNT INT,
USER_ID INT,
FOREIGN KEY (USER_ID) references Users(ID),
-- or USER_ID INT references Users(ID),
PRIMARY KEY(OID)
);
-- Add a Primary Key to an existing table
ALTER TABLE Users ADD PRIMARY KEY (ID);
-- Add a Foreign Key to an existing table
ALTER TABLE Orders
ADD FOREIGN KEY (USER_ID) REFERENCES Users(ID);
- A
UNIQUE
key allows forNULL
column values for records.- For databases that allow
NULL
values for aUNIQUE
field, theUNIQUE
constraint applies only to the non-null values.- i.e. Multiple
NULL
entries for aUNIQUE
field won't be considered as duplicates.
- i.e. Multiple
- For databases that allow
CREATE TABLE users (
user_id INT UNIQUE,
first_name VARCHAR(255)
);
INSERT INTO students (studentId, firstName, lastName) VALUES
(1, 'John'),
(2, 'Jane'),
(NULL, 'Alice'), -- Valid
(NULL, 'Bob'); -- Valid
- A candidate key is a field or combination of fields that uniquely identifies each row in a table.
- All candidate keys that are not primary keys are secondary or alternate keys.
- Not related to foreign keys.
- Help ensure data integrity.
- Used for indexing.
CREATE TABLE Book (
BookID INT PRIMARY KEY, -- Primary Key
ISBN VARCHAR(20) UNIQUE, -- Secondary Key
Name VARCHAR(100)
);
- AKA referential actions.
- Used to simultaneously delete or update data from both the child and parent tables.
- Used to perform operations in a single command without violating the referential integrity.
- Defined in foreign key constraints using
ON DELETE
andON UPDATE
clauses. - Implicitly triggered by
DELETE
orUPDATE
operations on the parent table.- The database engine processes these actions before firing any user-defined triggers (e.g.,
AFTER
triggers).
- The database engine processes these actions before firing any user-defined triggers (e.g.,
- Use cascades
- to ensure data integrity in parent-child relationships.
- to avoid manual updates.
- to avoid orphaned records / dangling references.
- Use
CASCADE
cautiously to avoid unintended deletions.
CREATE TABLE parent (
parent_id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE child (
child_id INT PRIMARY KEY,
parent_id INT,
FOREIGN KEY (parent_id)
REFERENCES parent(parent_id)
ON DELETE CASCADE
ON UPDATE SET NULL
);
ON DELETE CASCADE
- automatically deletes child rows when the parent is deleted.
-- When deleting parent with ID 1,
-- all child rows with parent_id = 1 are deleted
DELETE FROM parent WHERE parent_id = 1;
ON UPDATE CASCADE
- automatically updates child rows when the parent’s key is updated.
-- When updating parent’s ID,
-- all child rows with parent_id = 1 become 2
UPDATE parent SET parent_id = 2 WHERE parent_id = 1;
ON DELETE SET NULL
- sets child’s foreign key toNULL
.- Requires nullable column.
CREATE TABLE child (
child_id INT PRIMARY KEY,
parent_id INT NULL, -- Must be nullable
FOREIGN KEY (parent_id)
REFERENCES parent(parent_id) ON DELETE SET NULL
);
ON DELETE SET DEFAULT
- sets child’s foreign key to its default value.- Requires default definition.
- Not supported by MySQL.
CREATE TABLE child (
child_id INT PRIMARY KEY,
parent_id INT DEFAULT 0,
FOREIGN KEY (parent_id)
REFERENCES parent(parent_id)
ON DELETE SET DEFAULT
);
ON DELETE NO ACTION
- raises an error if the constraint is violated.- Default in some databases.
CREATE TABLE child (
child_id INT PRIMARY KEY,
parent_id INT,
FOREIGN KEY (parent_id)
REFERENCES parent(parent_id)
ON DELETE NO ACTION
);
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE
);
INSERT INTO customers (customer_id, name)
VALUES (1, 'Alice');
INSERT INTO orders (order_id, customer_id) VALUES (101, 1);
DELETE FROM customers WHERE customer_id = 1;
-- Result: Both customer and order rows are deleted
- Data Definition Language
- Defines data structure
- Used to create objects on the server.
- Can be used to create:
- Database
- User
- Table
- Index
- Trigger
- Function
- Stored Procedure
- View
- In certain RDBMS with transactional DDL (e.g. Postgres, SQLite), rollbacks are allowed.
CREATE DATABASE [IF NOT EXISTS] my_db;
-- Supported in MySQL, SQL Server
USE my_db;
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] my_table (
col_name data_type constraints,
-- ...
);
Note
IF NOT EXISTS
is not part of the SQL standard, but it's widely used in DDL statements with support in Postgres and MySQL.
- Used to remove objects from the server.
- Can't be rolled back.
- Any object created using
CREATE
can be dropped usingDROP
. - It's not allowed to drop a table referenced by foreign key constraint.
- Objects related to the table like views, procedures needs to be explicitly dropped.
-- Completely remove a table from database.
DROP [TABLE] table_name;
- Used to change some characteristics of an object, i.e. to add, drop, or modify some option on the object.
- Commonly used to change table characteristics, like:
- Add/Drop columns
- Add/Drop constraints
- Modify column data types
- Modify column constraints
-- Alter the table.
ALTER TABLE table_name;
-- Rename a table.
ALTER TABLE old_table_name
RENAME TO new_table_name;
-- Add a column to a table.
ALTER TABLE table_name
ADD column_name INT;
-- MySQL - Modify a column.
ALTER TABLE table_name
MODIFY COLUMN column_name TEXT;
-- Drop a column from a table.
ALTER TABLE table_name
DROP COLUMN column_name;
-- Rename Databse/Roles
ALTER <DATABASE|ROLE> old_name RENAME TO new_name;
-- PostgreSQL: Change the owner of a schema/database.
ALTER SCHEMA bookstore_schema OWNER TO db_owner;
ALTER DATABASE bookstore_db OWNER TO db_owner;
- Used to remove all data from a table along with all space allocated for the records.
- Also deallocates memory for removed objects.
- Can't be rolled back.
- Conditions aren't allowed.
- Unlike
DROP
truncate will preserve the structure of the table.
-- Remove all the data and not the table itself.
TRUNCATE [TABLE] table_name;
- Used to rename objects.
- Availability and syntax of
RENAME
varies between different DBMS.
RENAME TABLE old_name TO new_name [, old_name2 TO new_name2] -- ...
- Typically used to add comments or descriptions to database objects like tables, columns, or views.
- Comments are not used by the database itself but can be helpful for documentation purposes or for providing additional information about the structure of the database.
- Could also be written using
--
for single line and/* */
for multi-line comments.
- Could also be written using
- Data Manipulation Language
- Used to manage data within database objects.
- Some dialects like Postgres and SQLite support the
RETURNING
clause which returns columns after an operation.
- Used to insert records into a table.
INSERT INTO table_name (column1,...columnN)
VALUES (value1,...valueN)[, (valueA,...valueZ)];
- Used to modify whole records or parts of records in a database table.
- Can modify multiple records if conditions don't provide unique results.
UPDATE table_name
SET col_name = value[, col2_name = value2, ...]
[WHERE condition];
- Used to remove data from a database table.
DELETE FROM table_name
[WHERE condition];
- Data Query Language
- Search, filter, group, aggregate stored data
SELECT <projection>
FROM <table_name>
<filter>
<grouping>
<ordering>
<offset>
-- OR
SELECT [ALL | DISTINCT]
select_expr [, select_expr] ...
[into_option]
[FROM table_ref]
[WHERE where_condition]
[GROUP BY {col_name | expr | position}]
[HAVING having_condition]
[ORDER BY {col_name | expr | position}]
[ASC | DESC]
[LIMIT {[offset,] row_count | row_count OFFSET offset}];
SELECT column1, column2, columnN FROM table_name;
-- Aliases
SELECT column1 as col1, column2 as col2, columnN FROM table_name;
SELECT * FROM table_name;
SELECT * -- SELECT Users.age
FROM table_name -- FROM Users
WHERE [condition]; -- WHERE name = 'John'
Note
The WHERE
clause is used in SELECT
, INSERT
and UPDATE
statements
-
Filters individual rows before aggregation.
-
IN
- True if the operand is included in a list of expressions.
- e.g.,
WHERE id IN (23, 45, 67)
,WHERE allergies IN ('Penicillin', 'Morphine')
can be a shorthand to `WHERE allergies = 'Morphine' OR allergies = 'Penicillin')
-
IS
- Used to filter on values that are
NULL
, notNULL
, true or false. SELECT * FROM <table> WHERE <field> IS [NOT] NULL;
- e.g.,
WHERE first_name IS NOT NULL
,WHERE last_name = '' IS NOT FALSE
- Used to filter on values that are
-
NOT
- Meaning: Reverses the value of any boolean expression.
- e.g.,
WHERE NOT (id=100)
-
LIKE
- Meaning: True if the operand matches a pattern (
%
for zero or more characters &_
to match any single character). - e.g.,
WHERE type LIKE 'a%'
(starts with 'a'),WHERE type LIKE '___'
(exactly 3 characters long) ILIKE
- supported by Postgres for case-insensitive pattern matching.
- Meaning: True if the operand matches a pattern (
-
BETWEEN
- Meaning: True if the operand falls within an inclusive range.
- e.g.,
WHERE price BETWEEN 1.5 and 2.5
,WHERE name BETWEEN 'm' AND 'p'
- Places rows into groups and then potentially perform some aggregate function on those groups.
- Often mandatory to be used with aggregate functions like
count
,max
andmin
. - Grouping is done based on the similarity of the row's attribute values.
- Always used before the
ORDER BY
clause inSELECT
. HAVING
can be used to filter out groups that meet a certain condition.- Filters groups after aggregation.
SELECT type, AVG(price) FROM cars GROUP BY type;
Important
SQL doesn't allow mixing aggregate and non-aggregate columns in a SELECT
without a GROUP BY
.
-- Invalid Query
SELECT department, AVG(salary)
FROM employees;
-- Fix
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
The exception is when aggregating the entire table into a single row.
SELECT
'Company' as company_name,
COUNT(*) AS total_employees,
AVG(salary) as avg_salary
FROM employees;
SELECT
CURRENT_DATE AS report_date,
COUNT(*) AS total_employees,
SUM(salary) as total_salary_expense
FROM employees;
- Ensures presentation of columns.
- Output is sorted based on the column's values.
- Always used after the
GROUP BY
clause inSELECT
. - The default sort order in
ORDER BY
clause is ascending (ASC).- This applies universally across major database systems.
- Subsequent columns act as tiebreakers for rows with equal values in earlier columns.
SELECT columns
FROM table_name
ORDER BY col1 [ASC/DESC], col2 [ASC/DESC], col3 [ASC/DESC];
SELECT name, price FROM cars ORDER BY name ASC;
LIMIT
restricts the number of records returned from aSELECT
statement.- Not part of the SQL standard.
- Supported in MySQL and Postgres.
OFFSET
specifies from which record position to start counting from.- Often used in conjunction with the
LIMIT
clause. - Some SQL implementations use the
SKIP
keyword instead ofOFFSET
.
- Often used in conjunction with the
-- Limit
SELECT name, price FROM cars ORDER BY name ASC LIMIT 10;
-- Offset
SELECT name, price, type FROM produce ORDER BY name ASC LIMIT 5 OFFSET 5;
- Part of the SQL standard.
- Used to remove duplicate rows from the result set of a
SELECT
statement. - Operates on the entire row of selected columns.
- It can impact performance on large datasets.
- It's applied before the result set is returned.
NULL
values are considered equal forDISTINCT
operations.
SELECT DISTINCT department FROM employees;
SELECT DISTINCT city, state FROM addresses;
SELECT COUNT(DISTINCT department) AS unique_departments FROM employees;
SELECT DISTINCT(column1), column2 FROM table_name;
DISTINCT
can sometimes be used interchangeably with GROUP BY for simple queries.
SELECT DISTINCT department, job_title FROM employees;
-- Equivalent to:
SELECT department, job_title FROM employees GROUP BY department, job_title;
CASE
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2
WHEN condition_3 THEN result_3
ELSE default_result
END [AS column_alias]
CASE
- used to handle conditional logic across different query components / clauses.- Evaluates conditions sequentially and stops at the first match
- Always use
ELSE
to handle unexpected cases
- Works with all comparison operators (
=
,>
,BETWEEN
,LIKE
, etc.) - Avoid using in
GROUP BY
unless necessary for conditional grouping
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100),
gpa DECIMAL(3,2),
attendance DECIMAL(4,2),
major VARCHAR(50),
scholarship BOOLEAN
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
credits INT,
difficulty VARCHAR(20)
);
CREATE TABLE student_courses (
student_id INT,
course_id INT,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
INSERT INTO students (student_id, name, gpa, attendance, major, scholarship)
VALUES
(1, 'Alice', 3.8, 95.0, 'CS', TRUE),
(2, 'Bob', 2.9, 88.0, 'Math', FALSE),
(3, 'Charlie', 3.5, 92.0, 'Biology', TRUE),
(4, 'David', 3.1, 85.0, 'Physics', FALSE),
(5, 'Eve', 3.9, 98.0, 'Chemistry', TRUE);
INSERT INTO courses (course_id, course_name, credits, difficulty)
VALUES
(101, 'Intro to Programming', 3, 'Easy'),
(102, 'Advanced Algorithms', 4, 'Hard'),
(103, 'Data Structures', 3, 'Medium'),
(104, 'Machine Learning', 4, 'Hard');
INSERT INTO student_courses (student_id, course_id)
VALUES
(1, 101),
(1, 102),
(2, 103),
(3, 101),
(3, 104),
(4, 102),
(5, 103);
-- SELECT (GPA to Letter Grade, Sort by Grade then GPA)
SELECT
student_id,
name,
gpa,
CASE
WHEN gpa >= 3.7 THEN 'A'
WHEN gpa >= 3.3 THEN 'B'
WHEN gpa >= 3.0 THEN 'C'
ELSE 'D'
END AS letter_grade
FROM students;
ORDER BY letter_grade, gpa desc;
-- WHERE (Filter by GPA category)
SELECT *
FROM students
WHERE CASE
WHEN gpa >= 3.7 THEN 'A'
WHEN gpa >= 3.3 THEN 'B'
WHEN gpa >= 3.0 THEN 'C'
ELSE 'D'
END IN ('A', 'B');
-- Aggregate Functions (Weighted GPA)
SELECT
s.student_id,
s.name,
SUM(CASE
WHEN c.course_name = 'Intro to Programming' THEN c.credits * 1.0
WHEN c.course_name = 'Advanced Algorithms' THEN c.credits * 1.2
ELSE c.credits * 1.1
END) AS weighted_credits
FROM students s
JOIN student_courses sc ON s.student_id = sc.student_id
JOIN courses c ON sc.course_id = c.course_id
GROUP BY s.student_id, s.name;
-- GROUP BY and HAVING (Filter groups by average GPA)
SELECT
major,
AVG(gpa) AS avg_gpa
FROM students
GROUP BY major
HAVING AVG(gpa) > CASE
WHEN major = 'CS' THEN 3.5
WHEN major = 'Math' THEN 3.2
ELSE 3.0
END;
-- ORDER BY (Sort by custom priority)
SELECT
student_id,
name,
scholarship,
CASE
WHEN scholarship = TRUE THEN 1
ELSE 2
END AS priority
FROM students
ORDER BY priority, gpa DESC;
-- Nested CASE (Combine GPA and Attendance)
SELECT
student_id,
name,
gpa,
attendance,
CASE
WHEN gpa >= 3.7 THEN 'Top Performer'
ELSE CASE
WHEN attendance > 90 THEN 'High Attendance'
ELSE 'Needs Improvement'
END
END AS status
FROM students;
- Used to combine records from two or more tables in a database.
- A means for combining fields from two tables by using values common to each.
- Different Types of Joins
- INNER JOIN
- OUTER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
[!quote]- SQL Joins ![[SQL Joins (Datacamp).pdf]]
Source: Datacamp
[!quote]- JOIN in Relationships ![[Databases#Relationships / Multiplicity]]
-- Implicit Join without using 'JOIN' ()
SELECT Users.ID, Users.Name, Orders.Amount, Orders.OrderDate
FROM Users, Orders
ON Users.ID = Orders.UserID;
-- Inner Join (Explicit Join - Recommended)
SELECT Users.ID, Users.Name, Orders.Amount, Orders.OrderDate
FROM Users
INNER JOIN Orders
ON Users.ID = Orders.UserID;
Source: DbVisualizer
- INNER JOIN / JOIN
- Returns records that have matching values in both tables.
- Equivalent to
JOIN
- Compares each row of the first table with each row of the second table.
- If the join condition is true, it creates a new row combining columns from both tables.
- Use cases:
- Finding related records across tables.
- Filtering out records that don't have corresponding entries in both tables.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d
ON e.DepartmentID = d.DepartmentID;
- LEFT JOIN (LEFT OUTER JOIN)
- Returns all records from the left table and matched records from the right table.
- If there's no match, the result is NULL on the right side.
- Includes all rows from the left table.
- For each row in the left table, it looks for matching rows in the right table.
- If a match is found, it combines the data; if not, it fills with NULL for right table columns.
- Use cases:
- Finding all records from one table, regardless of whether they have corresponding records in another table.
- Identifying records in one table that don't have matches in another (by filtering for NULL values in the joined columns).
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
LEFT JOIN Departments d
ON e.DepartmentID = d.DepartmentID;
- RIGHT JOIN (RIGHT OUTER JOIN)
- Similar to LEFT JOIN but returns all records from the right table and matched records from the left.
- Includes all rows from the right table.
- For each row in the right table, it looks for matching rows in the left table.
- If a match is found, it combines the data; if not, it fills with NULL for left table columns.
- Use cases:
- Similar to LEFT JOIN, but when you want to prioritize the right table.
- Often, developers prefer using LEFT JOIN and switching table order instead of using RIGHT JOIN for consistency.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
RIGHT JOIN Departments d
ON e.DepartmentID = d.DepartmentID;
- FULL JOIN (FULL OUTER JOIN)
- Returns all records when there's a match in either the left or right table.
- Combines the results of both LEFT and RIGHT joins.
- Returns all rows from both tables, with NULL values where there's no match.
- Use cases:
- Getting a complete view of data from two tables, including unmatched records from both sides.
- Identifying records that exist in one table but not in the other.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
FULL JOIN Departments d
ON e.DepartmentID = d.DepartmentID;
- CROSS JOIN / CARTESIAN JOIN
- Used to return all possible row combinations from each table.
- If no condition is provided, the result set is obtained by multiplying each row of the first table with all rows in the second table.
- Common use cases:
- Creating a comprehensive dataset for testing
- Finding missing relationships
SELECT *
FROM table_1
CROSS JOIN table_2;
- SELF JOIN
- Used to intersect or join a table in the database to itself.
SELECT table_1.col_1 AS col_a, table_2.col_2 AS col_b
FROM my_table table_1
JOIN my_table table_2 ON table_1.col_3 = table_2.col_4;
-- or
SELECT table_1.col_1 AS col_a, table_2.col_2 AS col_b
FROM my_table table_1, my_table table_2
WHERE table_1.col_3 = table_2.col_4;
- Primarily used in
JOIN
operations as a shorthand way to specify join conditions when the columns being joined have the same name in both tables. - Part of the SQL standard and is supported by many database management systems.
- Equivalent to an
ON
clause where the named columns are equated.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
JOIN Departments d USING (DepartmentID);
- Can be used with multiple columns:
SELECT * FROM table1
JOIN table2 USING (column1, column2, column3);
-- ON Clause Equivalent:
SELECT * FROM table1
JOIN table2
ON table1.column1 = table2.column1
AND table1.column2 = table2.column2
AND table1.column3 = table2.column3;
-
Used to combine the result of two queries.
-
To perform set operations,
- The order and number of columns must be the same.
- Data types must be compatible.
-
UNION
- Merges result sets of multiple
SELECT
statements into a single result set, removing duplicates. UNION ALL
doesn't remove duplicate rows.- The column names in the result-set are usually equal to the column names in the first
SELECT
statement. SELECT * FROM table_a UNION [ALL] SELECT * FROM table_b;
- Merges result sets of multiple
-
INTERSECT
- Retrieves the common rows that appear in the result sets of two
SELECT
statements. - Unsupported by MySQL
SELECT * FROM table_a INTERSECT SELECT * FROM table_b;
- Retrieves the common rows that appear in the result sets of two
-
EXCEPT
- Retrieves rows present in the result set of the first
SELECT
statement but not in the secondSELECT
statement. MINUS
is found in some databases, and is functionally equivalent toEXCEPT
.SELECT * FROM table_a EXCEPT SELECT * FROM table_b;
- Retrieves rows present in the result set of the first
- The general logical order in which SQL clauses are processed:
FROM
ON
(for joins)JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT
/OFFSET
- This order is why aliases created in the
SELECT
clause can't be used in theWHERE
clause, but can be used inORDER BY
.- It also clarifies why
HAVING
is used for filtering grouped results instead ofWHERE
.
- It also clarifies why
- Modern database engines use query optimizers that may rearrange operations for better performance, as long as the final result remains the same.
- Data Control Language
- Used for access control to grant or revoke access permissions to database objects
GRANT
,REVOKE
-- Revoke a specific privilege from a role
REVOKE SELECT ON table_name FROM role_name;
-- Revoke all privileges on a table from a user or role
REVOKE ALL PRIVILEGES ON table_name FROM user_or_role;
-- Revoke a privilege from multiple roles at once
REVOKE SELECT ON table_name FROM role1, role2, role3;
-- Revoke a privilege from PUBLIC (all roles)
REVOKE SELECT ON table_name FROM PUBLIC;
-- Revoke EXECUTE privilege on a function
REVOKE EXECUTE ON FUNCTION function_name() FROM role_name;
-- Revoke privileges on all tables in a schema
REVOKE SELECT ON ALL TABLES IN SCHEMA schema_name FROM role_name;
-- Revoke a role from a user
REVOKE role_name FROM user_name;
- Transaction Control Language
- Defines concurrent operation boundaries
SAVEPOINT
,ROLLBACK
,COMMIT
-- Create the database
CREATE DATABASE ecommerce;
-- Create roles
CREATE ROLE admin_role;
CREATE ROLE customer_role;
-- Create users and assign roles
CREATE USER admin_user WITH ENCRYPTED PASSWORD 'admin_password';
CREATE USER customer_user WITH PASSWORD 'customer_password';
GRANT admin_role TO admin_user;
GRANT customer_role TO customer_user;
-- Grant privileges
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO admin_role;
GRANT SELECT, INSERT, UPDATE ON customers TO customer_role;
GRANT SELECT ON products TO customer_role;
GRANT SELECT, INSERT ON orders TO customer_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO customer_role;
-- Insert sample data
INSERT INTO products (name, description, price, stock_quantity)
VALUES
('Laptop', 'High-performance laptop', 999.99, 50),
('Smartphone', 'Latest model smartphone', 599.99, 100),
('Headphones', 'Noise-cancelling headphones', 199.99, 200);
-- Create a stored procedure for customer registration
CREATE OR REPLACE FUNCTION register_customer(
p_username VARCHAR(50),
p_email VARCHAR(100),
p_password VARCHAR(255)
) RETURNS void AS $$
BEGIN
INSERT INTO customers (username, email, password_hash)
VALUES (p_username, p_email, p_password);
END;
$$
LANGUAGE plpgsql;
-- Grant execute permission on the stored procedure
GRANT EXECUTE ON FUNCTION register_customer(VARCHAR, VARCHAR, VARCHAR) TO customer_role;
-- In Application Code
-- Register a new customer (as customer_user)
SET ROLE customer_user;
SELECT register_customer('john_doe', '[email protected]', 'hashed_password_here');
-- Place an order (as customer_user)
INSERT INTO orders (customer_id, total_amount)
VALUES ((
SELECT customer_id
FROM customers
WHERE username = 'john_doe'
), 1599.98);
-- Reset role
RESET ROLE;
- Pre-written operations that perform specific tasks on data within a database.
- Return a single value or a table.
- Can be used in SQL statements like any other expression, and in different ways:
- to manipulate or transform data as it's being retrieved (in
SELECT
statements) - to filter data based on specific conditions (in
WHERE
clauses) - to group and filter aggregated data (in
GROUP BY
andHAVING
clauses) - to determine the sorting order of results (in
ORDER BY
)
- to manipulate or transform data as it's being retrieved (in
SELECT UPPER(name), LENGTH(city) FROM employees;
SELECT * FROM orders WHERE YEAR(order_date) = 2025;
SELECT department, AVG(salary)
FROM employees
WHERE hire_date > '2025-01-01'
GROUP BY department
HAVING AVG(salary) > 50000;
SELECT * FROM products ORDER BY LENGTH(product_name) DESC;
COALESCE
- A standard SQL function that accepts a list of arguments and returns the first non-NULL value from the list.- Supported by most databases.
- Primarily used for handling NULL values in databases.
- Can be used to replace NULL values with user-defined fallback values during expression evaluation.
ISNULL()
is the SQL Server equivalent ofCOALESCE()
-- Returns First Non-NULL Value - 'Postgres'
SELECT COALESCE(NULL, NULL, NULL, 'Postgres', NULL, 'SQLite');
-- Keeps falling back on subsequent column
-- values until a non-NULL value is found
-- Ensure no NULL names are returned as a result
SELECT COALESCE(
first_name,
middle_name,
last_name,
'NO NAME') AS combined_column_alias
FROM people;
-- e.g. Get salary expense report for company of employees
-- For employees with NULL salary, use a default 50000
SELECT
current_date as report_date,
count(*) as employee_count,
sum(coalesce(salary, 50000)) as salary_expense
FROM employees;
- User-Defined Functions are used to create reusable blocks of SQL code.
-- General Syntax
CREATE OR REPLACE FUNCTION func_name (para_1 datatype, param_2 datatype, ...)
RETURNS datatype AS
BEGIN
-- Function logic
RETURN value;
END;
SELECT function_name(arg_1, arg_2) AS some_name;
-- Example
CREATE FUNCTION full_name (first_name VARCHAR(50), last_name VARCHAR(50))
RETURNS VARCHAR(101) AS
BEGIN
RETURN CONCAT(first_name, ' ', last_name);
END;
SELECT full_name('John', 'Doe') AS name;
-
Aggregate functions operate on a set of values (multiple rows) and return a single result.
- Typically used with the
GROUP BY
clause.
- Typically used with the
-
COUNT([columns])
- Returns the number of non-NULL rows in a set.
SELECT COUNT(*) AS total_employees FROM employees;
SUM()
- Returns the total sum of a numeric column.
SELECT SUM(salary) AS total_salary FROM employees;
SELECT
SUM(gender='M') AS male_count,
SUM(gender='F') AS female_count
FROM patients;
AVG()
- Returns the average value of a numeric column.
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
HAVING avg_salary > 50000;
MIN()
- Returns the smallest value in a set.
SELECT MIN(salary) AS lowest_salary FROM employees;
MAX()
- Returns the largest value in a set.
SELECT MAX(salary) AS highest_salary FROM employees;
- Operate on a single value and return a single value.
ABS()
- Returns the absolute value of a number.
SELECT ABS(-15.7) AS absolute_value;
CEILING()
- Rounds a number up to the nearest integer.
SELECT CEILING(15.2) AS rounded_up;
FLOOR()
- Rounds a number down to the nearest integer.
SELECT FLOOR(15.7) AS rounded_down;
ROUND()
- Rounds a number to a specified number of decimal places.
-- Round to 2 decimal places
SELECT ROUND(123.4567, 2) AS rounded_value;
-- Result: 123.46
-- Round to the nearest integer
SELECT ROUND(123.5) AS rounded_integer;
-- Result: 124
-- Round to negative decimal places (rounds to tens, hundreds, etc.)
SELECT ROUND(1234.56, -2) AS rounded_hundreds;
-- Result: 1200
POWER
- Raises a number to a specified power or exponent.
SELECT POWER(4, 2); -- Returns 16
SELECT
patient_id,
CASE
WHEN weight/(POWER(height/100.0,2)) >= 30 THEN 1
ELSE 0
END AS isObese
FROM patients;
-
Manipulate date and time values.
-
CURRENT_DATE
- Get the current date -
CURRENT_TIME
- Get the current time -
CURRENT_TIMESTAMP
- Get the current timestamp
-- Returns 2025-02-22T00:00:00.000Z
SELECT CURRENT_DATE AS today;
-- Returns 05:17:40.370677+00
SELECT CURRENT_TIME AS time_now;
-- Returns 2025-02-22T05:17:40.370Z
SELECT CURRENT_TIMESTAMP AS now;
EXTRACT()
- Used for year, month, day extraction.
-- Extract year from a date
SELECT EXTRACT(YEAR FROM DATE '2025-02-21') AS year;
-- Result: 2025
-- Extract month from a date
SELECT EXTRACT(MONTH FROM DATE '2025-02-21') AS month;
-- Result: 2
- Other widely supported functions:
YEAR()
- Returns the year from a date.MONTH()
- Returns the month from a date.DAY()
- Returns the day of the month.HOUR()
- Returns the hour from a time.MINUTE()
- Returns the minute from a time.SECOND()
- Returns the second from a time.
UPPER()
/LOWER()
- Converts a string to uppercase and lowercase.
-- Return 'HELLO, WORLD'
SELECT UPPER('hello, world') AS greeting;
-- Return 'hello, world'
SELECT LOWER('HELLO, WORLD') AS greeting;
CONCAT()
- Combines two or more strings.
SELECT CONCAT('Hello', ' ', 'World') AS greeting;
SUBSTRING()
- Extracts a portion of a string.
SELECT SUBSTRING('SQL Tutorial', 1, 3) AS extract;
TRIM()
- Removes leading and trailing spaces.
SELECT TRIM(' SQL Server ') AS trimmed_string;
LENGTH()
- Returns the number of characters in a string.- Widely supported variation of the standard
CHARACTER_LENGTH()
orCHAR_LENGTH()
functions. - In some databases, this function is called
LEN()
(SQL Server) orCHAR_LENGTH()
.
- Widely supported variation of the standard
-- Returns 13
SELECT LENGTH('Hello, World!') AS string_length;
CAST()
- Converts a value from one data type to another.
SELECT CAST('100' AS INT) AS converted_integer;
CONVERT()
- Converts a value from one data type to another with more options.- Syntax and options can vary across different databases.
SELECT CONVERT(DATE, '2025-02-21') AS converted_date;
- [[Databases#Access Control|Access Control (Databases)]] 📄
GRANT <privilege> ON <database_object> TO <role> [WITH GRANT OPTION];
GRANT <provider_role> TO <role_member> [WITH ADMIN OPTION];
GRANT ALL PRIVILEGES ON <table> TO <role>|<user>;
GRANT ALL ON ALL TABLES [IN SCHEMA <schema>] <role>|<user>;
GRANT [SELECT, UPDATE, INSERT, ...] ON <table> [IN SCHEMA <schema>] <role>|<user>;
REVOKE <privilege> ON <database_object> FROM <role>;
REVOKE <role> FROM <user>;
-- Ecommerce Example
-- Roles
CREATE ROLE SalesRole;
CREATE ROLE FulfillmentRole;
-- Permissions
GRANT SELECT, INSERT ON orders TO SalesRole;
GRANT UPDATE ON orders TO FulfillmentRole;
-- Row-Level Security (PostgreSQL)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
region VARCHAR(50)
);
CREATE POLICY customer_rls ON customers
FOR ALL
USING (region = current_user_region());
- Roles - group users with shared permissions.
CREATE ROLE SalesRole;
GRANT SELECT ON customers TO SalesRole;
- Privileges - actions allowed on database objects.
DENY
takes precedence overGRANT
.
-- Assign permissions
GRANT INSERT, UPDATE ON orders TO SalesRole;
-- Block Access
DENY DELETE ON employees TO ReadOnlyUser;
-- Remove permissions
REVOKE ALL PRIVILEGES ON products FROM Public;
- Role-Based Access (Least Privilege Principle):
CREATE ROLE AccountingRole;
GRANT SELECT ON invoices TO AccountingRole;
- Avoid Direct Table Access (Least Privilege Principle):
-- Bad: Grants full table access
GRANT ALL ON employees TO HRUser;
-- Good: Use a view
CREATE VIEW employee_info AS SELECT name, department FROM employees;
GRANT SELECT ON employee_info TO HRUser;
- Row-Level Security (RLS)
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
total DECIMAL(10, 2)
);
CREATE POLICY order_rls ON orders
FOR ALL
USING (customer_id = current_user_id());
- Avoid Over-Privileging.
-- Bad: Grants excessive rights
GRANT ALL PRIVILEGES ON *.* TO Public;
-- Good: Granular grants
GRANT SELECT, INSERT ON customers TO SalesRole;
- Audit Regularly.
-- Check role permissions
SELECT * FROM pg_user WHERE usename = 'ReadOnlyUser';
- Use Views for [[Data Masking]].
CREATE VIEW masked_customers AS
SELECT name, CONCAT(SUBSTRING(email, 1, 3), '...') AS masked_email
FROM customers;
- Avoid Over-Reliance on
Public
Role.
- A collection of objects that define the structure of the database.
- Acts as a namespace that helps organize and segregate data within a database.
- Use Cases
- Logical Organization - Grouping related tables and other objects together.
- Access Control - Managing permissions and security at a granular level.
- Data Integrity - Enforcing rules and relationships between tables.
- Components
- Tables
- Views
- Stored Procedures
- Functions
- Indexes
CREATE SCHEMA bookstore AUTHORIZATION db_owner;
ALTER SCHEMA bookstore OWNER TO new_db_owner;
CREATE SCHEMA IF NOT EXISTS Products;
CREATE SCHEMA IF NOT EXISTS Orders;
CREATE TABLE IF NOT EXISTS Products.Items (
ItemID SERIAL PRIMARY KEY,
ItemName VARCHAR(100) NOT NULL,
Price DECIMAL(10, 2) NOT NULL,
StockQuantity INT NOT NULL
);
CREATE TABLE IF NOT EXISTS Orders.OrderDetails (
OrderID SERIAL PRIMARY KEY,
ItemID INT NOT NULL,
Quantity INT NOT NULL,
OrderDate DATE NOT NULL,
FOREIGN KEY (ItemID) REFERENCES Products.Items(ItemID)
);
SELECT * FROM Products.Items;
SELECT * FROM Orders.OrderDetails;
-- Multiple Objects in One Statement
CREATE SCHEMA schema_name
CREATE TABLE table_name1 (...),
CREATE TABLE table_name2 (...),
CREATE VIEW view_name AS SELECT ...;
Note
- The
public
schema in Postgres is a default schema that is created automatically when a new database is initialized.- It serves as a shared namespace for database objects, allowing users to create and access tables, views, functions, and other objects without needing to specify a schema name.
CREATE TABLE my_table (id SERIAL PRIMARY KEY);
-- is equivalent to:
CREATE TABLE public.my_table (id SERIAL PRIMARY KEY);
- An index is a database object that provides a fast and efficient way to look up and retrieve data from a table.
- Improves the performance of
SELECT
queries by reducing the amount of data that needs to be scanned. - Partial Indexes - includes only a subset of rows from a table
-- Create a Non-Clustered Index on 'Users'
CREATE INDEX idx_UserID ON Users(user_id);
-- Create a Clustered Index on 'Accounts'
CREATE INDEX idx_AccountID ON Accounts(account_id);
CLUSTER Accounts USING idx_AccountID;
-- Partial Index
CREATE INDEX idx_name
ON table (col0, ...)
WHERE condition;
-- Drop an index
DROP INDEX idx_UserID;
- Covering Index - includes all columns needed for a query, avoiding table lookups.
-- Create a covering index
CREATE INDEX idx_covering ON users (email, status);
-- Query using the index
SELECT * FROM users WHERE email = '[email protected]' AND status = 'active';
- Best Practices
- Choose Columns Wisely
- High Selectivity: Index columns with unique values (e.g.,
email
). - Query Patterns: Index columns used in
WHERE
,JOIN
, orORDER BY
.
- High Selectivity: Index columns with unique values (e.g.,
- Avoid Over-Indexing
- Indexes slow
INSERT
,UPDATE
, andDELETE
operations. - Example: Avoid indexing
boolean
columns with low cardinality.
- Indexes slow
- Maintain Indexes
- Fragmentation: Rebuild indexes periodically (e.g.,
REINDEX
in PostgreSQL). - Auto-Vacuum: PostgreSQL’s auto-vacuum handles index maintenance.
- Fragmentation: Rebuild indexes periodically (e.g.,
- Use Federated Indexes
- Combine multiple columns to enable covering queries.
- Choose Columns Wisely
CREATE INDEX idx_federated ON orders (customer_id, order_date);
- Common Pitfalls
- Using
SELECT *
forces table scans even with indexes.- Select only needed columns.
- Indexing
status
(e.g.,active/inactive
) is rarely useful. - Don't ignore index maintenance.
- Fragmented indexes degrade performance over time.
- Indexing adds overhead for small tables.
- Indexes slow
INSERT/UPDATE
, and should be avoided for frequent writes.
- Using
Important
The order of operations is crucial.
Note
A composite primary key (PK) inherently serves as an index in most database systems.
- Ensure that a series of database operations are treated as a single unit of work, maintaining data integrity and [[Database Consistency|consistency]], even in the face of system failures or concurrent access.
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
-- or ROLLBACK; if there's an error
- PostgreSQL
DO
blocks run in their own transaction by default.- If any statement fails, the entire block is rolled back automatically.
- Explicit
ROLLBACK
inEXCEPTION
blocks is invalid.
- Explicit
- If any statement fails, the entire block is rolled back automatically.
DO $$
BEGIN
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'Transaction rolled back: %', SQLERRM;
END $$;
-- SQL Server Error Handling
BEGIN TRANSACTION;
BEGIN TRY
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
END TRY
BEGIN CATCH
ROLLBACK;
-- Log error or handle it appropriately
END CATCH
- A virtual table based on the result of a
SELECT
query. - Provide a way to represent the result of a query as if it were a table.
- Can be used to restrict access to specific columns or rows of a table.
- Users can be granted permission to access a view without granting direct access to the underlying table.
- Can also be used to save commonly used or complex queries by giving them a name.
CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
SELECT * FROM view_name;
- Database objects that contain the pre-computed results of a query, stored as a concrete table rather than a virtual table like standard views.
- Improve query performance, especially for complex queries with joins and aggregations.
- Reduce computational load on the database system.
- Indexes can be built on any column of the materialized view.
- Not part of the SQL standard and implementation varies across different database systems.
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
product_id,
DATE_TRUNC('month', sale_date) AS sale_month,
SUM(sale_amount) AS total_sales
FROM sales_transactions
GROUP BY product_id, sale_month;
- Used to define named temporary result sets within a
SELECT
,INSERT
,UPDATE
,DELETE
, orMERGE
statement. - Also known as "WITH clauses."
- Improves readability and maintainability of complex queries.
- Can be referenced multiple times in the main query.
- Supports recursion (in most implementations).
- Multiple CTEs can be defined separated by commas.
WITH cte_name AS (
SELECT column1, column2, ...
FROM table
WHERE condition
)
SELECT * FROM cte_name;
-- Multiple CTEs
WITH cte1 AS (...),
cte2 AS (...)
SELECT * FROM cte1 JOIN cte2 ON ...;
-- Examples
WITH high_salary_employees AS (
SELECT * FROM employees
WHERE salary > 100000
)
SELECT department, AVG(salary) as avg_high_salary
FROM high_salary_employees
GROUP BY department;
-- Generate a hierarchical list of employees starting from employee_id 1.
WITH RECURSIVE subordinates AS (
SELECT employee_id, manager_id, name
FROM employees
WHERE employee_id = 1
UNION ALL
SELECT e.employee_id, e.manager_id, e.name
FROM employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
SELECT * FROM subordinates;
- Precompiled collection of one or more SQL statements stored in the database.
- Can accept parameters, execute complex operations, and return results.
- Help encapsulate business logic and improve performance by reducing the need to send multiple queries from an application to the database.
CREATE PROCEDURE my_procedure (param_1 datatype, param_2 datatype) AS
BEGIN
-- SQL statements
END;
CALL my_procedure(arg_1, arg_2);
- A special type of stored procedure.
- Automatically runs SQL code in response to DML events.
CREATE OR REPLACE TRIGGER delete_user
{BEFORE | AFTER | INSTEAD OF} {INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
BEGIN
-- SQL statements to be executed
END;
-- SQLite Example
CREATE TRIGGER update_joined_table
AFTER UPDATE ON customers OR UPDATE ON orders
FOR EACH ROW
BEGIN
-- Drop the existing joined table if it exists
DELETE FROM joined;
-- Create the new joined table
INSERT INTO joined_table
-- Create the new joined table
CREATE TABLE joined_table AS
SELECT *
FROM customers c
INNER JOIN orders o ON c.id = o.id;
END;
-
The SQL standard defines a
JSON
data type to store JSON documents. -
JSON_OBJECT()
- Creates a JSON object -
JSON_ARRAY()
- Creates a JSON array -
JSON_VALUE()
- Extracts a scalar value from a JSON document -
JSON_QUERY()
- Extracts a JSON object or array -
JSON_TRANSFORM()
- Modifies a JSON document -
IS JSON
Checks if a string is valid JSON
-- Creating a JSON object
SELECT JSON_OBJECT('name': 'Alice', 'age': 30) AS person;
-- Creating a JSON array
SELECT JSON_ARRAY('apple', 'banana', 'cherry') AS fruits;
-- Extracting a scalar value from a JSON document
SELECT JSON_VALUE('{"name": "Bob", "age": 35}', '$.name') AS name;
-- Extracting a JSON object or array
SELECT JSON_QUERY('{"address": {"city": "New York", "zip": "10001"}}', '$.address') AS address;
-- Modifying a JSON document
SELECT JSON_TRANSFORM('{"name": "Charlie", "age": 40}',
SET '$.age' = 41,
SET '$.city' = 'London') AS updated_person;
-- Checking if a string is valid JSON
SELECT '{"name": "David"}' IS JSON AS is_valid_json;
-- Comparing JSON objects
SELECT JSON_OBJECT('a': 1, 'b': 2) = JSON_OBJECT('b': 2, 'a': 1) AS are_equal;
- A query nested inside of a larger query.
- Can occur in various subsections of a query:
SELECT
clause (Inner Query)- Create temporary columns on the result set
- The column created by an inner query has a value equal to the result of the query.
- Can also be called
inner query
orinner select
while the container query is called theouter query
orouter select
- The
inner query
executes before theouter query
- Create temporary columns on the result set
WHERE
clause (Nested Query)- Can return single or multiple rows
FROM
clause (Inline View)- Create temporary tables
- Can be nested in a
SELECT
,INSERT
,UPDATE
,DELETE
, or even inside of another subquery. - Logical operators can be used to compare the results of the subquery
- While convenient, subqueries perform worse than joins.
CREATE TABLE IF NOT EXISTS students(
id INT PRIMARY KEY,
name VARCHAR(40) NOT NULL UNIQUE
);
CREATE TABLE IF NOT EXISTS evaluations (
id INT PRIMARY KEY,
studentId INT NOT NULL,
evalName VARCHAR(10) NOT NULL,
mark INT DEFAULT 0,
FOREIGN KEY(studentId) REFERENCES students(id),
CONSTRAINT mark_check CHECK(mark >= 0), CHECK(mark <= 100)
);
INSERT INTO students (id, name)
VALUES (1, 'Steve'), (2, 'Jane'), (3, 'Casey');
INSERT INTO evals (id, studentId, evalName, mark)
VALUES
(1, 1, 'quiz 1', 98),
(2, 2, 'quiz 1', 80),
(3, 3, 'quiz 1', 95),
(4, 1, 'test 1', 72),
(5, 2, 'test 1', 100),
(6, 3, 'test 1', 68);
-- To find all students that scored higher than Jane (2) on 'quiz 1'
-- Nested Query
SELECT a.id, a.name, b.evalName, b.mark
FROM students a, evals b
WHERE a.id = b.studentId AND b.evalName = 'quiz 1' AND b.mark > (
SELECT mark
FROM evals
WHERE evalName = 'quiz 1' AND studentId = 2
);
-- Inline View (Temporary Tables)
SELECT a.name, b.evalName, b.mark
FROM students a, (
SELECT studentId, evalName, mark
FROM evals
WHERE mark > 90
) b
WHERE a.id = b.studentId;
-- Inline Query (Temporary Columns)
SELECT a.id, a.name, (
SELECT AVG(mark)
FROM evals
WHERE studentId = a.id
GROUP BY studentId
) avg
FROM students a;
- Existence Operators (
EXISTS
/NOT EXISTS
) are used in subqueries to check whether the subquery returns any rows.
-- Find departments that have at least one employee
SELECT department_name
FROM departments d
WHERE EXISTS (
SELECT 1
FROM employees e
WHERE e.department_id = d.department_id
);
-- Find departments with no employees
SELECT department_name
FROM departments d
WHERE NOT EXISTS (
SELECT 1
FROM employees e
WHERE e.department_id = d.department_id
);
- Quantifiers (
ALL
/ANY
/SOME
) are used with comparison operators in subqueries.SOME
is synonymous withANY
.
-- Find employees who earn more than all employees in department 5
SELECT employee_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE department_id = 5
);
-- Find employees who earn more than any employee in department 5
SELECT employee_name, salary
FROM employees
WHERE salary > [ANY|SOME] (
SELECT salary
FROM employees
WHERE department_id = 5
);
[!note] Object Hierarchies in Postgres
- A database cluster in Postgres is the environment (collection of databases, roles and other entities) managed by a single Postgres server.
- It is the main global object.
- Schemas are defined within databases as containers for tables, functions, data types and operators.
- Postgres uses the
public
schema by default.- Objects within a schema must have unique names.
-
Postgres supports both
IF EXISTS
andIF NOT EXISTS
clauses in various SQL statements, such asCREATE TABLE
,DROP TABLE
,CREATE DATABASE
, andDROP DATABASE
. -
The
createdb
anddropdb
CLI commands are bundled with a Postgres installation.
createdb [--encoding=UTF8 --locale=en_US] db_name
dropdb [--if-exists] db_name
[!tip] Postgres Features
[[#Error Handling|Error handling for transactions]] via
DO
blocks.
ILIKE
operator for case-insensitive pattern matching.RegEx (POSIX) support using the
~*
operator.
- e.g.
SELECT 'PostgreSQL' ~* '^p.*sql';
(True)
psql
- the default command line client implemented as part of the PostgreSQL distribution.- Can connect to local or remote databases and either process queries as a batch or interactively.
- Can be used to modify databases and manage PostgreSQL itself using meta-commands, which are non-SQL shortcuts that start with a "
\
". They allow you to query information about your data structures and the system.- e.g.
\dt
- lists all available tables,\conninfo
- displays info about the current connection,\h
- get help about SQL commands,\?
- get info about meta-commands
- e.g.
# With Options
psql -U username -d database_name [-h [host|localhost] -p port]
# With a Connection String (Percent-Encoded URI)
psql "postgresql://<username>:<password>@<hostname>:<port>/<database>"
Command | Purpose |
---|---|
\conninfo |
Display the current connection details |
\l |
Lists all databases |
\c dbname |
Switches to a specific database |
\g |
Re-runs the last query |
\dt |
Lists tables in the current database |
\di |
Shows indexes on a table |
\d+ tab |
Describes a table’s schema (columns, indexes, constraints) |
\du |
Lists user roles |
\i file |
Executes SQL from a file |
\o file |
Redirects output to a file |
\timing |
Measures query execution time |
\? |
Displays all psql commands |
\h |
Shows help for SQL commands |
- Postgres has a robust extension ecosystem.
- Extensions can add new functions, data types, and even change core database behavior.
PostGIS
- Adds support for geographic objectspgcrypto
- Cryptographic functions- e.g.
crypt()
,gen_salt()
,pgp_sym_encrypt()
- e.g.
hstore
- Key-value pair storagehstore_to_json()
,each()
- Use Cases
- User Preferences - Store customizable settings (e.g., UI themes, notifications).
- Time-Series Data - Track dynamic metrics (e.g., machine sensors).
- Metadata - Attach tags or attributes to records (e.g., product features).
- Extensions can add new functions, data types, and even change core database behavior.
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL,
config hstore
);`
INSERT INTO users (username, config)
VALUES ('user1', '"points"=>"879", "theme"=>"dark"');
-- Check if a key exists
SELECT * FROM users WHERE config ? 'theme';
-- Get all keys
SELECT keys(config) FROM users;
[!note] Authentication Peer authentication is the default authentication mechanism configured for most PostgreSQL installations. It assumes that the system administrator is also the database administrator.
-
Postgres uses roles and privileges as authentication and authorization mechanisms.
- A role is a grouping of a specific set of permissions and owned entities.
- Both users and user groups are implemented as a single, unified concept called roles.
- A role is a grouping of a specific set of permissions and owned entities.
-
By default, roles own any object they create themselves.
- Only
superuser
roles can delete or modify objects that they do not own. - Only the database owner can create objects in the public schema by default.
- Only
GRANT ALL ON SCHEMA public TO new_user;
GRANT ALL PRIVILEGES ON DATABASE database_name TO new_user;
- Role attributes - flags or properties assigned to roles that determine their core privileges and capabilities across the entire database cluster.
- Key attributes in PostgreSQL access control:
CREATEDB
/NOCREATEDB
SUPERUSER
/NOSUPERUSER
CREATEROLE
/NOCREATEROLE
LOGIN
/NOLOGIN
INHERIT
- Key attributes in PostgreSQL access control:
CREATE ROLE readonly WITH LOGIN ENCRYPTED PASSWORD 'read-only';
- A user is the same as a role, but it assumes the
LOGIN
attribute by default.
Important
Always encrypt when storing a role that can log in.
[!note] Privilege Inheritance In Postgres, when special role attributes such as
LOGIN
,SUPERUSER
,CREATEDB
, andCREATEROLE
are granted to a role, they are never inherited by users/roles who are based on that role.For example, if
my_user
is based onmy_role
, and theCREATEDB
attribute was granted tomy_role
,my_user
doesn't automatically inherit theCREATEDB
attribute. In order to allow these privileges onmy_user
,SET ROLE
must explicitly be set to the role that has these attributes, or the attribute must explicitly be granted tomy_user
.-- Set role SET ROLE my_role; CREATE DATABASE new_database; -- Grant attribute ALTER ROLE my_user WITH CREATEDB;
- Read More 📄
- Uses a unique concept called "type affinity" for columns.
- Supports five main type affinities:
TEXT
,NUMERIC
,INTEGER
,REAL
, andBLOB
. - Other SQL data types are mapped to these five.
- Supports five main type affinities:
CREATE TABLE example (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
age INTEGER,
salary REAL,
data BLOB,
info TEXT, -- Can store JSON as text
created_at DATETIME DEFAULT current_timestamp
);
-
Uses
AUTOINCREMENT
withINTEGER
PRIMARY KEY
for auto-incrementing columns. -
Uses
BLOB
for binary data. -
Doesn't have a native
JSON
data type. -
Doesn't support multiple schemas within the same database file.
-
Supports the
IF NOT EXISTS
syntax when creating tables, indexes and triggers, and theIF EXISTS
syntax when dropping them. -
Because of SQLite's nature as an embedded database engine, there is no need for a
DROP DATABASE
statement. -
In addition to the built-in functions that come with it, SQLite allows you to create custom functions (User-Defined Functions or UDFs) using C/C++, which are compiled into the SQLite library.
-
In SQLite,
LIKE
is case-insensitive by default. -
SQLite can also load additional functions at runtime through extensions.
- e.g., the JSON1 extension adds JSON manipulation functions, FTS5 allows full-text search capabilities, REGEXP Extension provides the
regexp()
function.
- e.g., the JSON1 extension adds JSON manipulation functions, FTS5 allows full-text search capabilities, REGEXP Extension provides the
-- JSON Support using JSON1
SELECT load_extension('json1');
CREATE TABLE users (id INTEGER PRIMARY KEY, data JSON);
INSERT INTO users (data) VALUES ('{"name": "Alice", "age": 32}');
-- Full-Text Search (FTS) using FTS5
-- Create base table
CREATE TABLE Books (
id INTEGER PRIMARY KEY,
title TEXT,
desc TEXT
);
-- Create FTS virtual table
CREATE VIRTUAL TABLE BookSearch USING fts5(title, desc);
-- Populate FTS table
INSERT INTO BookSearch SELECT title, desc FROM Books;
-- Search for "prog*" in descriptions
SELECT title
FROM BookSearch
WHERE BookSearch MATCH 'desc:prog*'
ORDER BY rank;
- SQLite provides some powerful interactive shell commands.
sqlite3 # Opens an in-memory database
sqlite3 test.db # Creates/opens a file-based database
sqlite3 :memory: # Creates a temporary database
Command | Purpose |
---|---|
.open [filename] |
Open a new database file (e.g., .open mydb.db ) |
.open |
Switch databases |
.database |
List database connections |
.tables |
Lists all tables in the current database |
.schema |
View table structure |
.schema table |
Shows the CREATE TABLE statement for a table |
.indexes table |
Lists indexes on a table |
.dump [table] |
Export all tables or a specific table as SQL |
.import [file] [table] |
Load / Import CSV data into a table |
.backup backup.db |
Backup databases |
.read script.sql |
Executes SQL commands from a file |
.output [file] |
Redirects output to a file |
.help |
Lists all dot commands |
.exit or .quit |
Exit |
.timer on |
Measure query execution time |
.mode [format] |
Outputs results in specific format (csv , column or html ) |
.headers [on|off] |
Displays column headers |
.nullvalue NULL |
Replaces NULL with a custom string (e.g., .nullvalue 'N/A' ) |
[!note]
VACUUM
- In SQLite,
VACUUM
is a command used to optimize and maintain databases.
- It copies the entire database content to a temporary file and then overwrites the original database with the optimized version, which requires free disk space up to twice the size of the original database.
VACUUM; VACUUM main.table_name;
- Benefits:
- Rebuilds the database file, reducing its size by removing unused space left after deletions.
- Reorganizes data, improving performance by reducing fragmentation.
- Rebuilds indexes from scratch, potentially improving query performance.
- Can remove traces of deleted content, making it harder for adversaries to recover deleted data.
- In SQLite, it's possible to add other databases to the current connection.
ATTACH DATABASE "sales.db" AS sales;
sqlite> .database
main: /path/to/customers.db
sales: /path/to/sales.db
sqlite>
[!note]
PRAGMA
ThePRAGMA
clause in SQLite is a special SQL statement used to modify the operation of the SQLite library or to query the SQLite library for internal (non-table) data. It is used to control various aspects of SQLite's behavior and retrieve internal information about the database.PRAGMA pragma_name; PRAGMA pragma_name = value;
- SQLite provides ways to import data from and export data to external file formats:
.import FILE TABLE
– Import data from CSV, JSON etc. into a table.export TABLE FILE
– Export data from table into various formats
.import users.csv users
.export users json users.json
[!note]- Migrating a SQLite Database ![[Migrate SQLite Database]]
- SQLite's primary journal modes:
- Rollback Journal
- The default mode in SQLite.
- It works by creating a temporary journal file that stores a copy of the original database pages before any changes are made.
- When a transaction begins, SQLite creates a rollback journal file.
- If a
ROLLBACK
command is issued, SQLite uses the journal file to restore the original state of the database. - Changes are made directly to the database file.
- If a
COMMIT
command is issued, the journal file is deleted, confirming the changes.
- When somebody is writing to the database, all the readers must stop because the actual file is being modified.
- Ideal for:
- Small databases or low-concurrency environments.
- Situations where simplicity is preferred over performance.
- Write-Ahead Logging (WAL)
- Enhances performance by allowing concurrent reads and writes.
- When a transaction begins, changes are written to a WAL file.
- All changes are appended to the WAL file, which acts as a log of operations.
- A commit record is added to the WAL file. The changes are not immediately applied to the database file.
- Periodically, a checkpoint operation moves changes from the WAL file to the database file.
- Ideal for:
- High-concurrency environments.
- Applications requiring fast write throughput and better data integrity.
- SSD storage where WAL performs optimally.
- Enhances performance by allowing concurrent reads and writes.
- Rollback Journal
PRAGMA journal_mode=WAL;
- Data replication for SQLite can be implemented using services like Litestream and LiteFS.
CREATE ROLE "user1" WITH LOGIN PASSWORD 'secretpassword';
- Primarily used to enclose string literals.
- e.g.,
SELECT * FROM users WHERE name = 'John';
- e.g.,
- If a single quote needs to be included within a string, it can be escaped by doubling it.
- e.g.,
SELECT * FROM users WHERE name = 'O''Reilly';
- e.g.,
- Required by most SQL databases (like SQL Server, PostgreSQL, and Oracle) for string literals.
- Generally used to denote identifiers, such as table or column names, particularly when these names include special characters or spaces.
- e.g.,
SELECT "First Name" FROM "User Info";
- e.g.,
- The behavior of double quotes can differ across database systems.
- MySQL allows both single and double quotes for strings interchangeably, making it more flexible compared to others.
- PostgreSQL uses double quotes strictly for identifiers and requires single quotes for string literals.
- Unlike unquoted identifiers, quoted identifiers are case-sensitive.
- e.g.
"USERS"
and"users"
are treated differently.
- e.g.
- Unlike unquoted identifiers, quoted identifiers are case-sensitive.
Note
It is generally recommended to reserve double quotes for identifiers to avoid confusion and ensure compatibility across different SQL dialects.
- Always store timestamps in [[UTC]] in a database to ensure consistency across different time zones.
- When users input times, convert them to UTC before storing in the database.
- When sending or receiving data via APIs, use UTC and ISO 8601 format for consistency.
CREATE TABLE events (
id INT PRIMARY KEY,
event_name VARCHAR(255),
event_time TIMESTAMP
);
INSERT INTO events (id, event_name, event_time)
VALUES (1, 'User Login', '2025-02-21 21:55:00 UTC');
-- Get all events from the last 24 hours
SELECT * FROM events
WHERE event_time > NOW() - INTERVAL '24 hours';
- When displaying times to users, convert UTC to their local time zone at the application layer / code, not in the database.
- SQL uses a three-valued logic:
- Besides true and false, the result of logical expressions can also be unknown.
- It is a consequence of supporting
NULL
to mark absent data. - In SQL,
NULL
is not equal to anything, not even to anotherNULL
. - It is not possible to tell whether a comparison to
NULL
results true or false.
-- Each of the following comparisons are unknown
NULL = 1
NULL <> 1
NULL > 1
NULL = NULL
-- Evaluates to true and false
(NULL = 1) OR (1 = 1)
(NULL = 1) AND (0 = 1)
-
SQL has the
IS [NOT] NULL
predicate to test whether a value isNULL
or not, and theIS NOT DISTINCT FROM
predicate to compare two values while treating twoNULL
values as the same. -
WHERE
,HAVING
andWHEN
require true conditions.WHERE
rejects all rows when the condition evaluates to unknown.
-
CHECK
constraints accept true and unknown.
-- This query always returns the empty set
SELECT col
FROM t
WHERE col = NULL; -- Always unknown
-- Use 'IS NULL' to search for 'NULL' values
SELECT col
FROM t
WHERE col IS NULL;