Skip to content

Latest commit

 

History

History
2426 lines (1939 loc) · 72.4 KB

SQL.md

File metadata and controls

2426 lines (1939 loc) · 72.4 KB
alias
Structured Query Language

[!quote]- SQL Architecture (Diagram) SQL Architecture Source: TutorialsPoint

Standardization

  • Developed based on the ANSI SQL Standard.
    • However, there are a lot of different vendor specific implementations available.
    • Despite wide use, some syntax are not part of the standard.
      • AUTO_INCREMENT (MySQL), SERIAL or IDENTITY (Postgres), AUTOINCREMENT (SQLite)
      • LIMIT (MySQL, Postgres)
      • TRUNCATE, TEMPORARY
      • UPSERT operations
      • ISNULL (SQL Server)
      • Some data types: ENUM (MySQL & Postgres), MONEY (SQL Server)
      • Full-text search capabilities
      • Stored procedure languages (T-SQL (SQL Server), PL/SQL (Oracle))

Data Types

  • Each column must have a data type which restricts the type of data that can be assigned to it.
  • Despite standardization, implementations vary across different RDBMS.
  • Categories:
    • NULL
      • Represents the absence of a value.
    • Character
      • CHAR(n) (Fixed-length)
      • VARCHAR(n) (Variable-length)
      • TEXT (for large amounts of text data, less standardized)
    • Numeric
      • INTEGER, INT, SMALLINT, BIGINT
      • DECIMAL / NUMERIC
      • REAL
      • DOUBLE PRECISION
      • FLOAT (Precision can vary)
    • Temporal
      • DATE (YYYY-MM-DD)
      • TIME (HH:MM:SS)
      • TIMESTAMP (YYYY-MM-DD HH:MM:SS)
    • Other Types
      • Binary String (MySQL & SQL Server)
        • BINARY(n) (Fixed-length)
        • VARBINARY(n) (Variable-length)
      • BOOLEAN
        • Direct support in Postgres
        • Implemented in MySQL as TINYINT(1)
      • BLOB (Binary Large Object) & CLOB (Character Large Object)
        • Supported in Oracle
        • SQL Server uses VARBINARY(MAX) and VARCHAR(MAX)
        • MySQL uses LONGBLOB and LONGTEXT
      • JSON
        • JSONB (Postgres-specific Binary JSON format)
      • Arrays
        • Used to store multiple values of the same type in a single column.
        • Syntax and support for arrays can vary between different database systems.
      • XML
        • Supported in systems like SQL Server, Oracle, DB2
        • e.g. CREATE TABLE docs (id INT, content XML)
CREATE TABLE Employee (
    EmployeeID INT,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Gender CHAR(1),
    DateOfBirth DATE,
    Salary DECIMAL(10, 2),
    IsManager BOOLEAN,
    DepartmentID INT,
    JoinDate TIMESTAMP`
);
-- Creating a table with a JSON column
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    profile JSON,
    scores INTEGER[]
);

-- Insert data into a table with a JSON column
INSERT INTO users (name, profile)
VALUES (
    'Alice', 
    '{"age": 30, "city": "New York", "interests": ["reading", "hiking"]}',
    ARRAY[85, 92, 78]
);
  • User-Defined Types (UDTs) can be used to create custom data types tailored to specific needs.
    • Syntax and support for UDTs can vary between different database systems.
-- Distinct Type
CREATE TYPE US_DOLLAR AS DECIMAL(10,2);

-- Structured Type
CREATE TYPE Address AS (
    street VARCHAR(100),
    city VARCHAR(50),
    state CHAR(2),
    zip VARCHAR(10)
);

CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    home_address Address,
    credit_balance US_DOLLAR
);

-- Insert data
INSERT INTO customers VALUES (
    1, 
    'John Doe', 
    ('123 Main St', 'Anytown', 'CA', '12345'),
    349.45
);

Operators

  • Arithmetic Operators: +, -, *, /, %
  • Comparison Operators: =, !=, <, >, <=, >=
  • Logical Operators: AND, OR, NOT
  • String Concatenation: ||
  • Set Operators: UNION, INTERSECT, EXCEPT

Important

/ performs integer division if either operand is an integer.

Constraints

  • Enforce data integrity and maintain consistency within a relational database.

  • NOT NULL

    • Ensures that a column's value is not null.
  • UNIQUE

    • Ensures that a column's value is unique in the table.
  • PRIMARY KEY

    • Combines UNIQUE and NOT NULL.
    • Uniquely identifies each row.
    • Must contain unique, non-NULL values
      • UNIQUE and NOT NULL implicitly.
    • A table can only have one primary key composed of single or multiple fields (composite key).
    • Generally created while the database and table are created, but can also be created after a table is created.
    • There are 3 types of Primary Keys:
      • Surrogate Key: Autogenerated by the database.
      • Natural Key: Naturally generated by the outside world.
      • Composite Key: The combination of 2 or more columns generates a unique identifier which you can utilize as the primary key.
  • FOREIGN KEY

    • aka referencing key
    • Links to a row in another table.
    • A column or combination of columns whose values match a primary key in a different table.
    • Prevents the destruction of those links.
  • DEFAULT

    • Specifies a value for a column, if one is not given.
    • If NULL is explicitly provided as a value, it will override the default value.
    • Must be a literal constant.
    • Examples:
      • City VARCHAR(255) DEFAULT 'Unknown',
      • ALTER TABLE Employees ALTER COLUMN City SET DEFAULT 'Unknown';
  • CHECK

    • Ensures the value of a column satisfies a specific condition.
    • Example: size DECIMAL CHECK (size > 0 AND size <= 100)
  • Table constraints are not tied to any particular column, and can encompass multiple columns.

    • UNIQUE(column_list)
    • PRIMARY KEY(column_list)
    • CHECK(condition) - applied on inserting and updating
    • REFERENCES
    • Every column constraint can be written as a table constraint.
CREATE TABLE user (
    user_id TEXT,
    email TEXT,
    ...
    CONSTRAINT pk_user_id PRIMARY KEY (user_id),
    CONSTRAINT valid_email CHECK (email ~* '^EMAIL_REGEX$')
);
  • It's possible to use a combination of constraints on each column of a table, but in some cases it might lead to redundancy (e.g. NOT NULL DEFAULT) and/or conflicting requirements (e.g. DEFAULT CHECK).
  • To generate an auto-incrementing number column:
    • PostgreSQL - SERIAL
    • MySQL - AUTO_INCREMENT
    • SQLite - AUTOINCREMENT
CREATE TABLE table_name (
    variable_name variable_datatype AUTO_INCREMENT,
    -- Other columns...
);

CREATE TABLE table_name (
    variable_name variable_datatype PRIMARY KEY AUTOINCREMENT,
    -- Other columns...
);

CREATE TABLE table_name (
    variable_name SERIAL PRIMARY KEY, -- Type and Value implicitly created
    -- Other columns...
);
  • Modifying the DEFAULT constraint of a column:
-- Add a `DEFAULT` constraint to an existing column of a table
ALTER TABLE table_name ALTER col_name SET DEFAULT default_value;

-- Remove a `DEFAULT` constraint from an existing column of a table.
ALTER TABLE table_name ALTER col_name DROP DEFAULT;

Keys

CREATE TABLE [IF NOT EXISTS] Users(
    ID    INT           NOT NULL,
    NAME  VARCHAR(20)   NOT NULL,
    AGE   INT           NOT NULL,
    PRIMARY KEY(ID)
);

CREATE TABLE Orders(
    OID      INT        NOT NULL,
    DATE     DATETIME,
    AMOUNT   INT,
    USER_ID  INT,
    FOREIGN KEY (USER_ID) references Users(ID),
    -- or USER_ID INT references Users(ID),
    PRIMARY KEY(OID)
);
-- Add a Primary Key to an existing table
ALTER TABLE Users ADD PRIMARY KEY (ID);

-- Add a Foreign Key to an existing table
ALTER TABLE Orders
   ADD FOREIGN KEY (USER_ID) REFERENCES Users(ID);
  • A UNIQUE key allows for NULL column values for records.
    • For databases that allow NULL values for a UNIQUE field, the UNIQUE constraint applies only to the non-null values.
      • i.e. Multiple NULL entries for a UNIQUE field won't be considered as duplicates.
CREATE TABLE users (
    user_id INT UNIQUE,
    first_name VARCHAR(255)
);

INSERT INTO students (studentId, firstName, lastName) VALUES
    (1, 'John'),
    (2, 'Jane'),
    (NULL, 'Alice'),  -- Valid
    (NULL, 'Bob');    -- Valid
  • A candidate key is a field or combination of fields that uniquely identifies each row in a table.
  • All candidate keys that are not primary keys are secondary or alternate keys.
    • Not related to foreign keys.
    • Help ensure data integrity.
    • Used for indexing.
CREATE TABLE Book (
    BookID INT PRIMARY KEY,   -- Primary Key
    ISBN VARCHAR(20) UNIQUE,  -- Secondary Key
    Name VARCHAR(100)
);

Cascading Actions

  • AKA referential actions.
  • Used to simultaneously delete or update data from both the child and parent tables.
  • Used to perform operations in a single command without violating the referential integrity.
  • Defined in foreign key constraints using ON DELETE and ON UPDATE clauses.
  • Implicitly triggered by DELETE or UPDATE operations on the parent table.
    • The database engine processes these actions before firing any user-defined triggers (e.g., AFTER triggers).
  • Use cascades
    • to ensure data integrity in parent-child relationships.
    • to avoid manual updates.
    • to avoid orphaned records / dangling references.
  • Use CASCADE cautiously to avoid unintended deletions.
CREATE TABLE parent (
    parent_id INT PRIMARY KEY,
    name VARCHAR(50)
);
    
CREATE TABLE child (     
    child_id INT PRIMARY KEY,    
    parent_id INT,    
    FOREIGN KEY (parent_id) 
        REFERENCES parent(parent_id) 
        ON DELETE CASCADE 
        ON UPDATE SET NULL
);
  • ON DELETE CASCADE - automatically deletes child rows when the parent is deleted.
-- When deleting parent with ID 1, 
--  all child rows with parent_id = 1 are deleted
DELETE FROM parent WHERE parent_id = 1; 
  • ON UPDATE CASCADE - automatically updates child rows when the parent’s key is updated.
-- When updating parent’s ID, 
--  all child rows with parent_id = 1 become 2
UPDATE parent SET parent_id = 2 WHERE parent_id = 1; 
  • ON DELETE SET NULL - sets child’s foreign key to NULL.
    • Requires nullable column.
CREATE TABLE child ( 
    child_id INT PRIMARY KEY,
    parent_id INT NULL,  -- Must be nullable
    FOREIGN KEY (parent_id) 
        REFERENCES parent(parent_id) ON DELETE SET NULL
);
  • ON DELETE SET DEFAULT - sets child’s foreign key to its default value.
    • Requires default definition.
    • Not supported by MySQL.
CREATE TABLE child (
    child_id INT PRIMARY KEY,
    parent_id INT DEFAULT 0,
    FOREIGN KEY (parent_id) 
        REFERENCES parent(parent_id) 
        ON DELETE SET DEFAULT
);
  • ON DELETE NO ACTION - raises an error if the constraint is violated.
    • Default in some databases.
CREATE TABLE child (
    child_id INT PRIMARY KEY,
    parent_id INT,
    FOREIGN KEY (parent_id) 
        REFERENCES parent(parent_id) 
        ON DELETE NO ACTION
);
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(50)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) 
        REFERENCES customers(customer_id) 
        ON DELETE CASCADE
); 
    
INSERT INTO customers (customer_id, name) 
VALUES (1, 'Alice'); 

INSERT INTO orders (order_id, customer_id) VALUES (101, 1);

DELETE FROM customers WHERE customer_id = 1;
-- Result: Both customer and order rows are deleted

Sublanguages

DDL

  • Data Definition Language
  • Defines data structure

CREATE

  • Used to create objects on the server.
  • Can be used to create:
    • Database
    • User
    • Table
    • Index
    • Trigger
    • Function
    • Stored Procedure
    • View
  • In certain RDBMS with transactional DDL (e.g. Postgres, SQLite), rollbacks are allowed.
CREATE DATABASE [IF NOT EXISTS] my_db;

-- Supported in MySQL, SQL Server
USE my_db; 

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] my_table (
    col_name data_type constraints,
    -- ...
);

Note

IF NOT EXISTS is not part of the SQL standard, but it's widely used in DDL statements with support in Postgres and MySQL.

DROP

  • Used to remove objects from the server.
  • Can't be rolled back.
  • Any object created using CREATE can be dropped using DROP.
  • It's not allowed to drop a table referenced by foreign key constraint.
    • Objects related to the table like views, procedures needs to be explicitly dropped.
-- Completely remove a table from database.
DROP [TABLE] table_name;

ALTER

  • Used to change some characteristics of an object, i.e. to add, drop, or modify some option on the object.
  • Commonly used to change table characteristics, like:
    • Add/Drop columns
    • Add/Drop constraints
    • Modify column data types
    • Modify column constraints
-- Alter the table.
ALTER TABLE table_name;

-- Rename a table.
ALTER TABLE old_table_name 
RENAME TO new_table_name;

-- Add a column to a table.
ALTER TABLE table_name
ADD column_name INT;

-- MySQL - Modify a column.
ALTER TABLE table_name
MODIFY COLUMN column_name TEXT;

-- Drop a column from a table.
ALTER TABLE table_name
DROP COLUMN column_name;

-- Rename Databse/Roles
ALTER <DATABASE|ROLE> old_name RENAME TO new_name;

-- PostgreSQL: Change the owner of a schema/database.
ALTER SCHEMA bookstore_schema OWNER TO db_owner;
ALTER DATABASE bookstore_db OWNER TO db_owner;

TRUNCATE

  • Used to remove all data from a table along with all space allocated for the records.
    • Also deallocates memory for removed objects.
  • Can't be rolled back.
  • Conditions aren't allowed.
  • Unlike DROP truncate will preserve the structure of the table.
-- Remove all the data and not the table itself.
TRUNCATE [TABLE] table_name;

RENAME

  • Used to rename objects.
  • Availability and syntax of RENAME varies between different DBMS.
RENAME TABLE old_name TO new_name [, old_name2 TO new_name2] -- ...

COMMENT

  • Typically used to add comments or descriptions to database objects like tables, columns, or views.
  • Comments are not used by the database itself but can be helpful for documentation purposes or for providing additional information about the structure of the database.
    • Could also be written using -- for single line and /* */ for multi-line comments.

DML

  • Data Manipulation Language
  • Used to manage data within database objects.
  • Some dialects like Postgres and SQLite support the RETURNING clause which returns columns after an operation.

INSERT

  • Used to insert records into a table.
INSERT INTO table_name (column1,...columnN)
VALUES (value1,...valueN)[, (valueA,...valueZ)];

UPDATE

  • Used to modify whole records or parts of records in a database table.
  • Can modify multiple records if conditions don't provide unique results.
UPDATE table_name 
SET col_name = value[, col2_name = value2, ...]
[WHERE condition];

DELETE

  • Used to remove data from a database table.
DELETE FROM table_name 
[WHERE condition];

DQL

  • Data Query Language
  • Search, filter, group, aggregate stored data

SELECT

SELECT <projection> 
FROM <table_name> 
<filter> 
<grouping> 
<ordering> 
<offset>

-- OR

SELECT [ALL | DISTINCT]
    select_expr [, select_expr] ...
    [into_option]
    [FROM table_ref]
    [WHERE where_condition]
    [GROUP BY {col_name | expr | position}]
    [HAVING having_condition]
    [ORDER BY {col_name | expr | position}]
        [ASC | DESC]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}];
SELECT column1, column2, columnN FROM table_name;

-- Aliases
SELECT column1 as col1, column2 as col2, columnN FROM table_name;

SELECT * FROM table_name;

SELECT *           -- SELECT Users.age
FROM table_name    -- FROM Users
WHERE [condition]; -- WHERE name = 'John'

Note

The WHERE clause is used in SELECT, INSERT and UPDATE statements

Filtering using WHERE

  • Filters individual rows before aggregation.

  • IN

    • True if the operand is included in a list of expressions.
    • e.g., WHERE id IN (23, 45, 67), WHERE allergies IN ('Penicillin', 'Morphine') can be a shorthand to `WHERE allergies = 'Morphine' OR allergies = 'Penicillin')
  • IS

    • Used to filter on values that are NULL, not NULL, true or false.
    • SELECT * FROM <table> WHERE <field> IS [NOT] NULL;
    • e.g., WHERE first_name IS NOT NULL, WHERE last_name = '' IS NOT FALSE
  • NOT

    • Meaning: Reverses the value of any boolean expression.
    • e.g., WHERE NOT (id=100)
  • LIKE

    • Meaning: True if the operand matches a pattern (% for zero or more characters & _ to match any single character).
    • e.g., WHERE type LIKE 'a%' (starts with 'a'), WHERE type LIKE '___' (exactly 3 characters long)
    • ILIKE - supported by Postgres for case-insensitive pattern matching.
  • BETWEEN

    • Meaning: True if the operand falls within an inclusive range.
    • e.g., WHERE price BETWEEN 1.5 and 2.5, WHERE name BETWEEN 'm' AND 'p'

Grouping using GROUP BY

  • Places rows into groups and then potentially perform some aggregate function on those groups.
  • Often mandatory to be used with aggregate functions like count, max and min.
  • Grouping is done based on the similarity of the row's attribute values.
  • Always used before the ORDER BY clause in SELECT.
  • HAVING can be used to filter out groups that meet a certain condition.
    • Filters groups after aggregation.
SELECT type, AVG(price) FROM cars GROUP BY type;

Important

SQL doesn't allow mixing aggregate and non-aggregate columns in a SELECT without a GROUP BY.

-- Invalid Query
SELECT department, AVG(salary)
FROM employees;

-- Fix
SELECT department, AVG(salary)
FROM employees
GROUP BY department;

The exception is when aggregating the entire table into a single row.

SELECT 
    'Company' as company_name,
    COUNT(*) AS total_employees,   
    AVG(salary) as avg_salary
FROM employees;

SELECT 
    CURRENT_DATE AS report_date,
    COUNT(*) AS total_employees,   
    SUM(salary) as total_salary_expense
FROM employees;

Ordering using ORDER BY

  • Ensures presentation of columns.
  • Output is sorted based on the column's values.
  • Always used after the GROUP BY clause in SELECT.
  • The default sort order in ORDER BY clause is ascending (ASC).
    • This applies universally across major database systems.
  • Subsequent columns act as tiebreakers for rows with equal values in earlier columns.
SELECT columns 
FROM table_name 
ORDER BY col1 [ASC/DESC], col2 [ASC/DESC], col3 [ASC/DESC];

SELECT name, price FROM cars ORDER BY name ASC;

Offset using LIMIT and OFFSET

  • LIMIT restricts the number of records returned from a SELECT statement.
    • Not part of the SQL standard.
    • Supported in MySQL and Postgres.
  • OFFSET specifies from which record position to start counting from.
    • Often used in conjunction with the LIMIT clause.
    • Some SQL implementations use the SKIP keyword instead of OFFSET.
-- Limit
SELECT name, price FROM cars ORDER BY name ASC LIMIT 10;

-- Offset
SELECT name, price, type FROM produce ORDER BY name ASC LIMIT 5 OFFSET 5;

Removing Duplicates using DISTINCT

  • Part of the SQL standard.
  • Used to remove duplicate rows from the result set of a SELECT statement.
  • Operates on the entire row of selected columns.
    • It can impact performance on large datasets.
  • It's applied before the result set is returned.
  • NULL values are considered equal for DISTINCT operations.
SELECT DISTINCT department FROM employees;

SELECT DISTINCT city, state FROM addresses;

SELECT COUNT(DISTINCT department) AS unique_departments FROM employees;

SELECT DISTINCT(column1), column2 FROM table_name;
  • DISTINCT can sometimes be used interchangeably with GROUP BY for simple queries.
SELECT DISTINCT department, job_title FROM employees; 
-- Equivalent to: 
SELECT department, job_title FROM employees GROUP BY department, job_title;

Conditional Statements using CASE

CASE 
    WHEN condition_1 THEN result_1
    WHEN condition_2 THEN result_2
    WHEN condition_3 THEN result_3
    ELSE default_result
END [AS column_alias]
  • CASE - used to handle conditional logic across different query components / clauses.
    • Evaluates conditions sequentially and stops at the first match
    • Always use ELSE to handle unexpected cases
  • Works with all comparison operators (=, >, BETWEEN, LIKE, etc.)
  • Avoid using in GROUP BY unless necessary for conditional grouping
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    name VARCHAR(100),
    gpa DECIMAL(3,2),
    attendance DECIMAL(4,2),
    major VARCHAR(50),
    scholarship BOOLEAN
);

CREATE TABLE courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100),
    credits INT,
    difficulty VARCHAR(20)
);

CREATE TABLE student_courses (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

INSERT INTO students (student_id, name, gpa, attendance, major, scholarship)
VALUES
(1, 'Alice', 3.8, 95.0, 'CS', TRUE),
(2, 'Bob', 2.9, 88.0, 'Math', FALSE),
(3, 'Charlie', 3.5, 92.0, 'Biology', TRUE),
(4, 'David', 3.1, 85.0, 'Physics', FALSE),
(5, 'Eve', 3.9, 98.0, 'Chemistry', TRUE);

INSERT INTO courses (course_id, course_name, credits, difficulty)
VALUES
(101, 'Intro to Programming', 3, 'Easy'),
(102, 'Advanced Algorithms', 4, 'Hard'),
(103, 'Data Structures', 3, 'Medium'),
(104, 'Machine Learning', 4, 'Hard');

INSERT INTO student_courses (student_id, course_id)
VALUES
(1, 101),
(1, 102),
(2, 103),
(3, 101),
(3, 104),
(4, 102),
(5, 103);

-- SELECT (GPA to Letter Grade, Sort by Grade then GPA)
SELECT 
    student_id,
    name,
    gpa,
    CASE 
        WHEN gpa >= 3.7 THEN 'A'
        WHEN gpa >= 3.3 THEN 'B'
        WHEN gpa >= 3.0 THEN 'C'
        ELSE 'D'
    END AS letter_grade
FROM students;
ORDER BY letter_grade, gpa desc;

-- WHERE (Filter by GPA category)
SELECT *
FROM students
WHERE CASE 
    WHEN gpa >= 3.7 THEN 'A'
    WHEN gpa >= 3.3 THEN 'B'
    WHEN gpa >= 3.0 THEN 'C'
    ELSE 'D'
END IN ('A', 'B');

-- Aggregate Functions (Weighted GPA)
SELECT 
    s.student_id,
    s.name,
    SUM(CASE 
        WHEN c.course_name = 'Intro to Programming' THEN c.credits * 1.0
        WHEN c.course_name = 'Advanced Algorithms' THEN c.credits * 1.2
        ELSE c.credits * 1.1
    END) AS weighted_credits
FROM students s
JOIN student_courses sc ON s.student_id = sc.student_id
JOIN courses c ON sc.course_id = c.course_id
GROUP BY s.student_id, s.name;

-- GROUP BY and HAVING (Filter groups by average GPA)
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major
HAVING AVG(gpa) > CASE 
        WHEN major = 'CS' THEN 3.5
        WHEN major = 'Math' THEN 3.2
        ELSE 3.0
    END;

-- ORDER BY (Sort by custom priority)
SELECT 
    student_id,
    name,
    scholarship,
    CASE 
        WHEN scholarship = TRUE THEN 1
        ELSE 2
    END AS priority
FROM students
ORDER BY priority, gpa DESC;

-- Nested CASE (Combine GPA and Attendance)
SELECT 
    student_id,
    name,
    gpa,
    attendance,
    CASE 
        WHEN gpa >= 3.7 THEN 'Top Performer'
        ELSE CASE 
            WHEN attendance > 90 THEN 'High Attendance'
            ELSE 'Needs Improvement'
        END
    END AS status
FROM students;

JOIN

  • Used to combine records from two or more tables in a database.
  • A means for combining fields from two tables by using values common to each.
  • Different Types of Joins
    • INNER JOIN
    • OUTER JOIN
      • LEFT JOIN
      • RIGHT JOIN
      • FULL JOIN

[!quote]- SQL Joins ![[SQL Joins (Datacamp).pdf]]

Source: Datacamp

[!quote]- JOIN in Relationships ![[Databases#Relationships / Multiplicity]]

-- Implicit Join without using 'JOIN' ()
SELECT Users.ID, Users.Name, Orders.Amount, Orders.OrderDate
FROM Users, Orders
ON Users.ID = Orders.UserID;

-- Inner Join (Explicit Join - Recommended)
SELECT Users.ID, Users.Name, Orders.Amount, Orders.OrderDate
FROM Users 
INNER JOIN Orders
ON Users.ID = Orders.UserID;

[!quote]- SQL Joins Diagram SQL Joins Diagram

Source: DbVisualizer

  • INNER JOIN / JOIN
    • Returns records that have matching values in both tables.
    • Equivalent to JOIN
    • Compares each row of the first table with each row of the second table.
      • If the join condition is true, it creates a new row combining columns from both tables.
    • Use cases:
      • Finding related records across tables.
      • Filtering out records that don't have corresponding entries in both tables.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d 
ON e.DepartmentID = d.DepartmentID;
  • LEFT JOIN (LEFT OUTER JOIN)
    • Returns all records from the left table and matched records from the right table.
    • If there's no match, the result is NULL on the right side.
    • Includes all rows from the left table.
      • For each row in the left table, it looks for matching rows in the right table.
      • If a match is found, it combines the data; if not, it fills with NULL for right table columns.
    • Use cases:
      • Finding all records from one table, regardless of whether they have corresponding records in another table.
      • Identifying records in one table that don't have matches in another (by filtering for NULL values in the joined columns).
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
LEFT JOIN Departments d 
ON e.DepartmentID = d.DepartmentID;
  • RIGHT JOIN (RIGHT OUTER JOIN)
    • Similar to LEFT JOIN but returns all records from the right table and matched records from the left.
    • Includes all rows from the right table.
      • For each row in the right table, it looks for matching rows in the left table.
      • If a match is found, it combines the data; if not, it fills with NULL for left table columns.
    • Use cases:
      • Similar to LEFT JOIN, but when you want to prioritize the right table.
      • Often, developers prefer using LEFT JOIN and switching table order instead of using RIGHT JOIN for consistency.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
RIGHT JOIN Departments d 
ON e.DepartmentID = d.DepartmentID;
  • FULL JOIN (FULL OUTER JOIN)
    • Returns all records when there's a match in either the left or right table.
    • Combines the results of both LEFT and RIGHT joins.
      • Returns all rows from both tables, with NULL values where there's no match.
    • Use cases:
      • Getting a complete view of data from two tables, including unmatched records from both sides.
      • Identifying records that exist in one table but not in the other.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
FULL JOIN Departments d 
ON e.DepartmentID = d.DepartmentID;
  • CROSS JOIN / CARTESIAN JOIN
    • Used to return all possible row combinations from each table.
    • If no condition is provided, the result set is obtained by multiplying each row of the first table with all rows in the second table.
    • Common use cases:
      • Creating a comprehensive dataset for testing
      • Finding missing relationships
SELECT *  
FROM table_1
CROSS JOIN table_2;
  • SELF JOIN
    • Used to intersect or join a table in the database to itself.
SELECT table_1.col_1 AS col_a, table_2.col_2 AS col_b
FROM my_table table_1
JOIN my_table table_2 ON table_1.col_3 = table_2.col_4;

-- or

SELECT table_1.col_1 AS col_a, table_2.col_2 AS col_b
FROM my_table table_1, my_table table_2
WHERE table_1.col_3 = table_2.col_4;
USING()
  • Primarily used in JOIN operations as a shorthand way to specify join conditions when the columns being joined have the same name in both tables.
  • Part of the SQL standard and is supported by many database management systems.
  • Equivalent to an ON clause where the named columns are equated.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
JOIN Departments d USING (DepartmentID);
  • Can be used with multiple columns:
SELECT * FROM table1
JOIN table2 USING (column1, column2, column3);

-- ON Clause Equivalent:
SELECT * FROM table1
JOIN table2 
ON table1.column1 = table2.column1
   AND table1.column2 = table2.column2
   AND table1.column3 = table2.column3;

Set Operations

  • Used to combine the result of two queries.

  • To perform set operations,

    • The order and number of columns must be the same.
    • Data types must be compatible.
  • UNION

    • Merges result sets of multiple SELECT statements into a single result set, removing duplicates.
    • UNION ALL doesn't remove duplicate rows.
    • The column names in the result-set are usually equal to the column names in the first SELECT statement.
    • SELECT * FROM table_a UNION [ALL] SELECT * FROM table_b;
  • INTERSECT

    • Retrieves the common rows that appear in the result sets of two SELECT statements.
    • Unsupported by MySQL
    • SELECT * FROM table_a INTERSECT SELECT * FROM table_b;
  • EXCEPT

    • Retrieves rows present in the result set of the first SELECT statement but not in the second SELECT statement.
    • MINUS is found in some databases, and is functionally equivalent to EXCEPT.
    • SELECT * FROM table_a EXCEPT SELECT * FROM table_b;

Order of Clause Processing

  • The general logical order in which SQL clauses are processed:
    1. FROM
    2. ON (for joins)
    3. JOIN
    4. WHERE
    5. GROUP BY
    6. HAVING
    7. SELECT
    8. DISTINCT
    9. ORDER BY
    10. LIMIT / OFFSET
  • This order is why aliases created in the SELECT clause can't be used in the WHERE clause, but can be used in ORDER BY.
    • It also clarifies why HAVING is used for filtering grouped results instead of WHERE.
  • Modern database engines use query optimizers that may rearrange operations for better performance, as long as the final result remains the same.

DCL

  • Data Control Language
  • Used for access control to grant or revoke access permissions to database objects
  • GRANT, REVOKE
-- Revoke a specific privilege from a role
REVOKE SELECT ON table_name FROM role_name;

-- Revoke all privileges on a table from a user or role
REVOKE ALL PRIVILEGES ON table_name FROM user_or_role;

-- Revoke a privilege from multiple roles at once
REVOKE SELECT ON table_name FROM role1, role2, role3;

-- Revoke a privilege from PUBLIC (all roles)
REVOKE SELECT ON table_name FROM PUBLIC;

-- Revoke EXECUTE privilege on a function
REVOKE EXECUTE ON FUNCTION function_name() FROM role_name;

-- Revoke privileges on all tables in a schema
REVOKE SELECT ON ALL TABLES IN SCHEMA schema_name FROM role_name;

-- Revoke a role from a user
REVOKE role_name FROM user_name;

TCL

  • Transaction Control Language
  • Defines concurrent operation boundaries
  • SAVEPOINT, ROLLBACK, COMMIT

Example

-- Create the database
CREATE DATABASE ecommerce;

-- Create roles
CREATE ROLE admin_role;
CREATE ROLE customer_role;

-- Create users and assign roles
CREATE USER admin_user WITH ENCRYPTED PASSWORD 'admin_password';
CREATE USER customer_user WITH PASSWORD 'customer_password';

GRANT admin_role TO admin_user;
GRANT customer_role TO customer_user;

-- Grant privileges
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO admin_role;

GRANT SELECT, INSERT, UPDATE ON customers TO customer_role;
GRANT SELECT ON products TO customer_role;
GRANT SELECT, INSERT ON orders TO customer_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO customer_role;

-- Insert sample data
INSERT INTO products (name, description, price, stock_quantity) 
VALUES
    ('Laptop', 'High-performance laptop', 999.99, 50),
    ('Smartphone', 'Latest model smartphone', 599.99, 100),
    ('Headphones', 'Noise-cancelling headphones', 199.99, 200);

-- Create a stored procedure for customer registration
CREATE OR REPLACE FUNCTION register_customer(
    p_username VARCHAR(50),
    p_email VARCHAR(100),
    p_password VARCHAR(255)
) RETURNS void AS $$
BEGIN
    INSERT INTO customers (username, email, password_hash)
    VALUES (p_username, p_email, p_password);
END;
$$
 LANGUAGE plpgsql;

-- Grant execute permission on the stored procedure
GRANT EXECUTE ON FUNCTION register_customer(VARCHAR, VARCHAR, VARCHAR) TO customer_role;

-- In Application Code

-- Register a new customer (as customer_user)
SET ROLE customer_user;
SELECT register_customer('john_doe', '[email protected]', 'hashed_password_here');

-- Place an order (as customer_user)
INSERT INTO orders (customer_id, total_amount) 
VALUES ((
    SELECT customer_id 
    FROM customers 
    WHERE username = 'john_doe'
), 1599.98);

-- Reset role
RESET ROLE;

Functions

  • Pre-written operations that perform specific tasks on data within a database.
  • Return a single value or a table.
  • Can be used in SQL statements like any other expression, and in different ways:
    • to manipulate or transform data as it's being retrieved (in SELECT statements)
    • to filter data based on specific conditions (in WHERE clauses)
    • to group and filter aggregated data (in GROUP BY and HAVING clauses)
    • to determine the sorting order of results (in ORDER BY)
SELECT UPPER(name), LENGTH(city) FROM employees;

SELECT * FROM orders WHERE YEAR(order_date) = 2025;

SELECT department, AVG(salary) 
FROM employees
WHERE hire_date > '2025-01-01'
GROUP BY department
HAVING AVG(salary) > 50000;

SELECT * FROM products ORDER BY LENGTH(product_name) DESC;
  • COALESCE - A standard SQL function that accepts a list of arguments and returns the first non-NULL value from the list.
    • Supported by most databases.
    • Primarily used for handling NULL values in databases.
    • Can be used to replace NULL values with user-defined fallback values during expression evaluation.
    • ISNULL() is the SQL Server equivalent of COALESCE()
-- Returns First Non-NULL Value - 'Postgres'
SELECT COALESCE(NULL, NULL, NULL, 'Postgres', NULL, 'SQLite');

-- Keeps falling back on subsequent column 
-- values until a non-NULL value is found
-- Ensure no NULL names are returned as a result
SELECT COALESCE(
    first_name,
    middle_name,
    last_name, 
    'NO NAME') AS combined_column_alias
FROM people;

-- e.g. Get salary expense report for company of employees
-- For employees with NULL salary, use a default 50000
SELECT 
    current_date as report_date,
    count(*) as employee_count,
    sum(coalesce(salary, 50000)) as salary_expense
FROM employees;
  • User-Defined Functions are used to create reusable blocks of SQL code.
-- General Syntax
CREATE OR REPLACE FUNCTION func_name (para_1 datatype, param_2 datatype, ...)
RETURNS datatype AS
BEGIN
    -- Function logic
    RETURN value;
END;

SELECT function_name(arg_1, arg_2) AS some_name;

-- Example
CREATE FUNCTION full_name (first_name VARCHAR(50), last_name VARCHAR(50))
RETURNS VARCHAR(101) AS
BEGIN
    RETURN CONCAT(first_name, ' ', last_name);
END;

SELECT full_name('John', 'Doe') AS name;

Aggregate Functions

  • Aggregate functions operate on a set of values (multiple rows) and return a single result.

    • Typically used with the GROUP BY clause.
  • COUNT([columns]) - Returns the number of non-NULL rows in a set.

SELECT COUNT(*) AS total_employees FROM employees;
  • SUM() - Returns the total sum of a numeric column.
SELECT SUM(salary) AS total_salary FROM employees;

SELECT 
    SUM(gender='M') AS male_count,
    SUM(gender='F') AS female_count
FROM patients;
  • AVG() - Returns the average value of a numeric column.
SELECT department, AVG(salary) AS avg_salary 
FROM employees
GROUP BY department;
HAVING avg_salary > 50000;
  • MIN() - Returns the smallest value in a set.
SELECT MIN(salary) AS lowest_salary FROM employees;
  • MAX() - Returns the largest value in a set.
SELECT MAX(salary) AS highest_salary FROM employees;

Scalar Functions

  • Operate on a single value and return a single value.

Numeric Functions

  • ABS() - Returns the absolute value of a number.
SELECT ABS(-15.7) AS absolute_value;
  • CEILING() - Rounds a number up to the nearest integer.
SELECT CEILING(15.2) AS rounded_up;
  • FLOOR() - Rounds a number down to the nearest integer.
SELECT FLOOR(15.7) AS rounded_down;
  • ROUND() - Rounds a number to a specified number of decimal places.
-- Round to 2 decimal places
SELECT ROUND(123.4567, 2) AS rounded_value;
-- Result: 123.46

-- Round to the nearest integer
SELECT ROUND(123.5) AS rounded_integer;
-- Result: 124

-- Round to negative decimal places (rounds to tens, hundreds, etc.)
SELECT ROUND(1234.56, -2) AS rounded_hundreds;
-- Result: 1200
  • POWER - Raises a number to a specified power or exponent.
SELECT POWER(4, 2);  -- Returns 16

SELECT 
    patient_id,
    CASE 
      WHEN weight/(POWER(height/100.0,2)) >= 30 THEN 1
      ELSE 0
    END AS isObese
FROM patients;

Date & Time Functions

  • Manipulate date and time values.

  • CURRENT_DATE - Get the current date

  • CURRENT_TIME - Get the current time

  • CURRENT_TIMESTAMP - Get the current timestamp

-- Returns 2025-02-22T00:00:00.000Z
SELECT CURRENT_DATE AS today;

-- Returns 05:17:40.370677+00
SELECT CURRENT_TIME AS time_now;

-- Returns 2025-02-22T05:17:40.370Z
SELECT CURRENT_TIMESTAMP AS now;
  • EXTRACT() - Used for year, month, day extraction.
-- Extract year from a date
SELECT EXTRACT(YEAR FROM DATE '2025-02-21') AS year;
-- Result: 2025

-- Extract month from a date
SELECT EXTRACT(MONTH FROM DATE '2025-02-21') AS month;
-- Result: 2
  • Other widely supported functions:
    • YEAR() - Returns the year from a date.
    • MONTH() - Returns the month from a date.
    • DAY() - Returns the day of the month.
    • HOUR() - Returns the hour from a time.
    • MINUTE() - Returns the minute from a time.
    • SECOND() - Returns the second from a time.

String Functions

  • UPPER() / LOWER() - Converts a string to uppercase and lowercase.
-- Return 'HELLO, WORLD'
SELECT UPPER('hello, world') AS greeting;

-- Return 'hello, world'
SELECT LOWER('HELLO, WORLD') AS greeting;
  • CONCAT() - Combines two or more strings.
SELECT CONCAT('Hello', ' ', 'World') AS greeting;
  • SUBSTRING() - Extracts a portion of a string.
SELECT SUBSTRING('SQL Tutorial', 1, 3) AS extract;
  • TRIM() - Removes leading and trailing spaces.
SELECT TRIM('   SQL Server   ') AS trimmed_string;
  • LENGTH() - Returns the number of characters in a string.
    • Widely supported variation of the standard CHARACTER_LENGTH() or CHAR_LENGTH() functions.
    • In some databases, this function is called LEN() (SQL Server) or CHAR_LENGTH().
-- Returns 13
SELECT LENGTH('Hello, World!') AS string_length;

Conversion Functions

  • CAST() - Converts a value from one data type to another.
SELECT CAST('100' AS INT) AS converted_integer;
  • CONVERT() - Converts a value from one data type to another with more options.
    • Syntax and options can vary across different databases.
SELECT CONVERT(DATE, '2025-02-21') AS converted_date;

Access Control

  • [[Databases#Access Control|Access Control (Databases)]] 📄
GRANT <privilege> ON <database_object> TO <role> [WITH GRANT OPTION];

GRANT <provider_role> TO <role_member> [WITH ADMIN OPTION];

GRANT ALL PRIVILEGES ON <table> TO <role>|<user>;

GRANT ALL ON ALL TABLES [IN SCHEMA <schema>] <role>|<user>;

GRANT [SELECT, UPDATE, INSERT, ...] ON <table> [IN SCHEMA <schema>] <role>|<user>;

REVOKE <privilege> ON <database_object> FROM <role>;

REVOKE <role> FROM <user>;
-- Ecommerce Example

-- Roles
CREATE ROLE SalesRole;
CREATE ROLE FulfillmentRole;

-- Permissions
GRANT SELECT, INSERT ON orders TO SalesRole;
GRANT UPDATE ON orders TO FulfillmentRole;

-- Row-Level Security (PostgreSQL)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    region VARCHAR(50)
);

CREATE POLICY customer_rls ON customers
    FOR ALL
    USING (region = current_user_region());
  • Roles - group users with shared permissions.
CREATE ROLE SalesRole;
GRANT SELECT ON customers TO SalesRole;
  • Privileges - actions allowed on database objects.
    • DENY takes precedence over GRANT.
-- Assign permissions
GRANT INSERT, UPDATE ON orders TO SalesRole;

-- Block Access
DENY DELETE ON employees TO ReadOnlyUser;

-- Remove permissions
REVOKE ALL PRIVILEGES ON products FROM Public;

Best Practices

  • Role-Based Access (Least Privilege Principle):
CREATE ROLE AccountingRole;
GRANT SELECT ON invoices TO AccountingRole;
  • Avoid Direct Table Access (Least Privilege Principle):
-- Bad: Grants full table access
GRANT ALL ON employees TO HRUser;

-- Good: Use a view
CREATE VIEW employee_info AS SELECT name, department FROM employees;
GRANT SELECT ON employee_info TO HRUser;
  • Row-Level Security (RLS)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    total DECIMAL(10, 2)
);

CREATE POLICY order_rls ON orders
    FOR ALL
    USING (customer_id = current_user_id());
  • Avoid Over-Privileging.
-- Bad: Grants excessive rights
GRANT ALL PRIVILEGES ON *.* TO Public;

-- Good: Granular grants
GRANT SELECT, INSERT ON customers TO SalesRole;
  • Audit Regularly.
-- Check role permissions
SELECT * FROM pg_user WHERE usename = 'ReadOnlyUser';
  • Use Views for [[Data Masking]].
CREATE VIEW masked_customers AS
SELECT name, CONCAT(SUBSTRING(email, 1, 3), '...') AS masked_email
FROM customers;
  • Avoid Over-Reliance on Public Role.

Schemas

  • A collection of objects that define the structure of the database.
  • Acts as a namespace that helps organize and segregate data within a database.
  • Use Cases
    • Logical Organization - Grouping related tables and other objects together.
    • Access Control - Managing permissions and security at a granular level.
    • Data Integrity - Enforcing rules and relationships between tables.
  • Components
    • Tables
    • Views
    • Stored Procedures
    • Functions
    • Indexes
CREATE SCHEMA bookstore AUTHORIZATION db_owner;

ALTER SCHEMA bookstore OWNER TO new_db_owner;
CREATE SCHEMA IF NOT EXISTS Products;
CREATE SCHEMA IF NOT EXISTS Orders;

CREATE TABLE IF NOT EXISTS Products.Items (
    ItemID SERIAL PRIMARY KEY,
    ItemName VARCHAR(100) NOT NULL,
    Price DECIMAL(10, 2) NOT NULL,
    StockQuantity INT NOT NULL
);

CREATE TABLE IF NOT EXISTS Orders.OrderDetails (
    OrderID SERIAL PRIMARY KEY,
    ItemID INT NOT NULL,
    Quantity INT NOT NULL,
    OrderDate DATE NOT NULL,
    FOREIGN KEY (ItemID) REFERENCES Products.Items(ItemID)
);

SELECT * FROM Products.Items;
SELECT * FROM Orders.OrderDetails;
-- Multiple Objects in One Statement
CREATE SCHEMA schema_name
    CREATE TABLE table_name1 (...),
    CREATE TABLE table_name2 (...),
    CREATE VIEW view_name AS SELECT ...;

Note

  • The public schema in Postgres is a default schema that is created automatically when a new database is initialized.
    • It serves as a shared namespace for database objects, allowing users to create and access tables, views, functions, and other objects without needing to specify a schema name.
CREATE TABLE my_table (id SERIAL PRIMARY KEY);
-- is equivalent to:
CREATE TABLE public.my_table (id SERIAL PRIMARY KEY);

Indexing

  • An index is a database object that provides a fast and efficient way to look up and retrieve data from a table.
  • Improves the performance of SELECT queries by reducing the amount of data that needs to be scanned.
  • Partial Indexes - includes only a subset of rows from a table
-- Create a Non-Clustered Index on 'Users'
CREATE INDEX idx_UserID ON Users(user_id);

-- Create a Clustered Index on 'Accounts'
CREATE INDEX idx_AccountID ON Accounts(account_id);

CLUSTER Accounts USING idx_AccountID;

-- Partial Index
CREATE INDEX idx_name 
ON table (col0, ...)
WHERE condition;

-- Drop an index
DROP INDEX idx_UserID;
  • Covering Index - includes all columns needed for a query, avoiding table lookups.
-- Create a covering index
CREATE INDEX idx_covering ON users (email, status);

-- Query using the index
SELECT * FROM users WHERE email = '[email protected]' AND status = 'active';
  • Best Practices
    • Choose Columns Wisely
      • High Selectivity: Index columns with unique values (e.g., email).
      • Query Patterns: Index columns used in WHERE, JOIN, or ORDER BY.
    • Avoid Over-Indexing
      • Indexes slow INSERT, UPDATE, and DELETE operations.
      • Example: Avoid indexing boolean columns with low cardinality.
    • Maintain Indexes
      • Fragmentation: Rebuild indexes periodically (e.g., REINDEX in PostgreSQL).
      • Auto-Vacuum: PostgreSQL’s auto-vacuum handles index maintenance.
    • Use Federated Indexes
      • Combine multiple columns to enable covering queries.
CREATE INDEX idx_federated ON orders (customer_id, order_date);
  • Common Pitfalls
    • Using SELECT * forces table scans even with indexes.
      • Select only needed columns.
    • Indexing status (e.g., active/inactive) is rarely useful.
    • Don't ignore index maintenance.
      • Fragmented indexes degrade performance over time.
    • Indexing adds overhead for small tables.
    • Indexes slow INSERT/UPDATE, and should be avoided for frequent writes.

Important

The order of operations is crucial.

Note

A composite primary key (PK) inherently serves as an index in most database systems.

Transactions

  • Ensure that a series of database operations are treated as a single unit of work, maintaining data integrity and [[Database Consistency|consistency]], even in the face of system failures or concurrent access.
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

COMMIT;
-- or ROLLBACK; if there's an error

Error Handling

  • PostgreSQL DO blocks run in their own transaction by default.
    • If any statement fails, the entire block is rolled back automatically.
      • Explicit ROLLBACK in EXCEPTION blocks is invalid.
DO $$
BEGIN
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
EXCEPTION
    WHEN OTHERS THEN
      RAISE NOTICE 'Transaction rolled back: %', SQLERRM;
END $$;
-- SQL Server Error Handling
BEGIN TRANSACTION;

BEGIN TRY
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
    COMMIT;
END TRY
BEGIN CATCH
    ROLLBACK;
    -- Log error or handle it appropriately
END CATCH

Views

  • A virtual table based on the result of a SELECT query.
  • Provide a way to represent the result of a query as if it were a table.
  • Can be used to restrict access to specific columns or rows of a table.
    • Users can be granted permission to access a view without granting direct access to the underlying table.
  • Can also be used to save commonly used or complex queries by giving them a name.
CREATE OR REPLACE VIEW view_name AS
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition;

SELECT * FROM view_name;

Materialized Views

  • Database objects that contain the pre-computed results of a query, stored as a concrete table rather than a virtual table like standard views.
  • Improve query performance, especially for complex queries with joins and aggregations.
  • Reduce computational load on the database system.
  • Indexes can be built on any column of the materialized view.
  • Not part of the SQL standard and implementation varies across different database systems.
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
    product_id,
    DATE_TRUNC('month', sale_date) AS sale_month,
    SUM(sale_amount) AS total_sales
FROM sales_transactions
GROUP BY product_id, sale_month;

Common Table Expressions (CTEs)

  • Used to define named temporary result sets within a SELECT, INSERT, UPDATE, DELETE, or MERGE statement.
  • Also known as "WITH clauses."
  • Improves readability and maintainability of complex queries.
  • Can be referenced multiple times in the main query.
  • Supports recursion (in most implementations).
  • Multiple CTEs can be defined separated by commas.
WITH cte_name AS (
    SELECT column1, column2, ...
    FROM table
    WHERE condition
)
SELECT * FROM cte_name;

-- Multiple CTEs
WITH cte1 AS (...),
     cte2 AS (...)
SELECT * FROM cte1 JOIN cte2 ON ...;
-- Examples
WITH high_salary_employees AS (
    SELECT * FROM employees
    WHERE salary > 100000
) 
SELECT department, AVG(salary) as avg_high_salary
FROM high_salary_employees 
GROUP BY department;

-- Generate a hierarchical list of employees starting from employee_id 1.
WITH RECURSIVE subordinates AS (
    SELECT employee_id, manager_id, name
    FROM employees
    WHERE employee_id = 1
    UNION ALL
    SELECT e.employee_id, e.manager_id, e.name
    FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
SELECT * FROM subordinates;

Stored Procedures

  • Precompiled collection of one or more SQL statements stored in the database.
  • Can accept parameters, execute complex operations, and return results.
  • Help encapsulate business logic and improve performance by reducing the need to send multiple queries from an application to the database.
CREATE PROCEDURE my_procedure (param_1 datatype, param_2 datatype) AS
BEGIN
    -- SQL statements
END;

CALL my_procedure(arg_1, arg_2);

Triggers

  • A special type of stored procedure.
  • Automatically runs SQL code in response to DML events.
CREATE OR REPLACE TRIGGER delete_user 
{BEFORE | AFTER | INSTEAD OF} {INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
BEGIN
    -- SQL statements to be executed
END;
-- SQLite Example
CREATE TRIGGER update_joined_table
AFTER UPDATE ON customers OR UPDATE ON orders
FOR EACH ROW
BEGIN
    -- Drop the existing joined table if it exists
    DELETE FROM joined;

    -- Create the new joined table
    INSERT INTO joined_table

    -- Create the new joined table
    CREATE TABLE joined_table AS
    SELECT *
    FROM customers c
    INNER JOIN orders o ON c.id = o.id;
END;

JSON

  • The SQL standard defines a JSON data type to store JSON documents.

  • JSON_OBJECT() - Creates a JSON object

  • JSON_ARRAY() - Creates a JSON array

  • JSON_VALUE() - Extracts a scalar value from a JSON document

  • JSON_QUERY() - Extracts a JSON object or array

  • JSON_TRANSFORM() - Modifies a JSON document

  • IS JSON Checks if a string is valid JSON

-- Creating a JSON object
SELECT JSON_OBJECT('name': 'Alice', 'age': 30) AS person;

-- Creating a JSON array
SELECT JSON_ARRAY('apple', 'banana', 'cherry') AS fruits;

-- Extracting a scalar value from a JSON document
SELECT JSON_VALUE('{"name": "Bob", "age": 35}', '$.name') AS name;

-- Extracting a JSON object or array
SELECT JSON_QUERY('{"address": {"city": "New York", "zip": "10001"}}', '$.address') AS address;

-- Modifying a JSON document
SELECT JSON_TRANSFORM('{"name": "Charlie", "age": 40}', 
    SET '$.age' = 41, 
    SET '$.city' = 'London') AS updated_person;

-- Checking if a string is valid JSON
SELECT '{"name": "David"}' IS JSON AS is_valid_json;

-- Comparing JSON objects
SELECT JSON_OBJECT('a': 1, 'b': 2) = JSON_OBJECT('b': 2, 'a': 1) AS are_equal;

Subquery

  • A query nested inside of a larger query.
  • Can occur in various subsections of a query:
    • SELECT clause (Inner Query)
      • Create temporary columns on the result set
        • The column created by an inner query has a value equal to the result of the query.
      • Can also be called inner query or inner select while the container query is called the outer query or outer select
      • The inner query executes before the outer query
    • WHERE clause (Nested Query)
      • Can return single or multiple rows
    • FROM clause (Inline View)
      • Create temporary tables
  • Can be nested in a SELECT, INSERT, UPDATE, DELETE, or even inside of another subquery.
  • Logical operators can be used to compare the results of the subquery
  • While convenient, subqueries perform worse than joins.
CREATE TABLE IF NOT EXISTS students(
    id INT PRIMARY KEY,
    name VARCHAR(40) NOT NULL UNIQUE
);

CREATE TABLE IF NOT EXISTS evaluations (
    id INT PRIMARY KEY,
    studentId INT NOT NULL,
    evalName VARCHAR(10) NOT NULL,
    mark INT DEFAULT 0,
    FOREIGN KEY(studentId) REFERENCES students(id),
    CONSTRAINT mark_check CHECK(mark >= 0), CHECK(mark <= 100)
);

INSERT INTO students (id, name) 
VALUES (1, 'Steve'), (2, 'Jane'), (3, 'Casey');

INSERT INTO evals (id, studentId, evalName, mark) 
VALUES 
    (1, 1, 'quiz 1', 98),
    (2, 2, 'quiz 1', 80), 
    (3, 3, 'quiz 1', 95), 
    (4, 1, 'test 1', 72), 
    (5, 2, 'test 1', 100), 
    (6, 3, 'test 1', 68);

-- To find all students that scored higher than Jane (2) on 'quiz 1'

-- Nested Query
SELECT a.id, a.name, b.evalName, b.mark
FROM students a, evals b 
WHERE a.id = b.studentId AND b.evalName = 'quiz 1' AND b.mark > (
    SELECT mark 
    FROM evals 
    WHERE evalName = 'quiz 1' AND studentId = 2
);

-- Inline View (Temporary Tables)
SELECT a.name, b.evalName, b.mark 
FROM students a, (
    SELECT studentId, evalName, mark
    FROM evals 
    WHERE mark > 90
) b 
WHERE a.id = b.studentId;

-- Inline Query (Temporary Columns)
SELECT a.id, a.name, (
    SELECT AVG(mark) 
    FROM evals 
    WHERE studentId = a.id 
    GROUP BY studentId
) avg 
FROM students a;
  • Existence Operators (EXISTS / NOT EXISTS) are used in subqueries to check whether the subquery returns any rows.
-- Find departments that have at least one employee
SELECT department_name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
);

-- Find departments with no employees
SELECT department_name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
);
  • Quantifiers (ALL / ANY / SOME) are used with comparison operators in subqueries.
    • SOME is synonymous with ANY.
-- Find employees who earn more than all employees in department 5
SELECT employee_name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department_id = 5
);

-- Find employees who earn more than any employee in department 5
SELECT employee_name, salary
FROM employees
WHERE salary > [ANY|SOME] (
    SELECT salary
    FROM employees
    WHERE department_id = 5
);

Dialects

Postgres

[!note] Object Hierarchies in Postgres

  • A database cluster in Postgres is the environment (collection of databases, roles and other entities) managed by a single Postgres server.
    • It is the main global object.
  • Schemas are defined within databases as containers for tables, functions, data types and operators.
    • Postgres uses the public schema by default.
    • Objects within a schema must have unique names.
  • Postgres supports both IF EXISTS and IF NOT EXISTS clauses in various SQL statements, such as CREATE TABLE, DROP TABLE, CREATE DATABASE, and DROP DATABASE.

  • The createdb and dropdb CLI commands are bundled with a Postgres installation.

createdb [--encoding=UTF8 --locale=en_US] db_name

dropdb [--if-exists] db_name

[!tip] Postgres Features

  • [[#Error Handling|Error handling for transactions]] via DO blocks.

  • ILIKE operator for case-insensitive pattern matching.

  • RegEx (POSIX) support using the ~* operator.

    • e.g. SELECT 'PostgreSQL' ~* '^p.*sql'; (True)

psql

  • psql - the default command line client implemented as part of the PostgreSQL distribution.
    • Can connect to local or remote databases and either process queries as a batch or interactively.
    • Can be used to modify databases and manage PostgreSQL itself using meta-commands, which are non-SQL shortcuts that start with a "\". They allow you to query information about your data structures and the system.
      • e.g. \dt - lists all available tables, \conninfo - displays info about the current connection, \h - get help about SQL commands, \? - get info about meta-commands
# With Options
psql -U username -d database_name [-h [host|localhost] -p port]

# With a Connection String (Percent-Encoded URI)
psql "postgresql://<username>:<password>@<hostname>:<port>/<database>"

[!example]+ Postgres Connection URI String Postgres Connection URI String

Command Purpose
\conninfo Display the current connection details
\l Lists all databases
\c dbname Switches to a specific database
\g Re-runs the last query
\dt Lists tables in the current database
\di Shows indexes on a table
\d+ tab Describes a table’s schema (columns, indexes, constraints)
\du Lists user roles
\i file Executes SQL from a file
\o file Redirects output to a file
\timing Measures query execution time
\? Displays all psql commands
\h Shows help for SQL commands

Extensions

  • Postgres has a robust extension ecosystem.
    • Extensions can add new functions, data types, and even change core database behavior.
      • PostGIS - Adds support for geographic objects
      • pgcrypto - Cryptographic functions
        • e.g. crypt(), gen_salt(), pgp_sym_encrypt()
      • hstore - Key-value pair storage
        • hstore_to_json(), each()
        • Use Cases
          • User Preferences - Store customizable settings (e.g., UI themes, notifications).
          • Time-Series Data - Track dynamic metrics (e.g., machine sensors).
          • Metadata - Attach tags or attributes to records (e.g., product features).
CREATE EXTENSION IF NOT EXISTS hstore;

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    config hstore
);`

INSERT INTO users (username, config) 
VALUES ('user1', '"points"=>"879", "theme"=>"dark"');

-- Check if a key exists 
SELECT * FROM users WHERE config ? 'theme'; 

-- Get all keys 
SELECT keys(config) FROM users;

Authentication & Authorization

[!note] Authentication Peer authentication is the default authentication mechanism configured for most PostgreSQL installations. It assumes that the system administrator is also the database administrator.

  • Postgres uses roles and privileges as authentication and authorization mechanisms.

    • A role is a grouping of a specific set of permissions and owned entities.
      • Both users and user groups are implemented as a single, unified concept called roles.
  • By default, roles own any object they create themselves.

    • Only superuser roles can delete or modify objects that they do not own.
    • Only the database owner can create objects in the public schema by default.
GRANT ALL ON SCHEMA public TO new_user;

GRANT ALL PRIVILEGES ON DATABASE database_name TO new_user;
  • Role attributes - flags or properties assigned to roles that determine their core privileges and capabilities across the entire database cluster.
    • Key attributes in PostgreSQL access control:
      • CREATEDB / NOCREATEDB
      • SUPERUSER / NOSUPERUSER
      • CREATEROLE / NOCREATEROLE
      • LOGIN / NOLOGIN
      • INHERIT
CREATE ROLE readonly WITH LOGIN ENCRYPTED PASSWORD 'read-only';
  • A user is the same as a role, but it assumes the LOGIN attribute by default.

Important

Always encrypt when storing a role that can log in.

[!note] Privilege Inheritance In Postgres, when special role attributes such as LOGIN, SUPERUSER, CREATEDB, and CREATEROLE are granted to a role, they are never inherited by users/roles who are based on that role.

For example, if my_user is based on my_role, and the CREATEDB attribute was granted to my_role, my_user doesn't automatically inherit the CREATEDB attribute. In order to allow these privileges on my_user, SET ROLE must explicitly be set to the role that has these attributes, or the attribute must explicitly be granted to my_user.

-- Set role
SET ROLE my_role; 
CREATE DATABASE new_database;

-- Grant attribute
ALTER ROLE my_user WITH CREATEDB;

SQLite

  • Uses a unique concept called "type affinity" for columns.
    • Supports five main type affinities: TEXT, NUMERIC, INTEGER, REAL, and BLOB.
    • Other SQL data types are mapped to these five.
CREATE TABLE example (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    age INTEGER,
    salary REAL,
    data BLOB,
    info TEXT,  -- Can store JSON as text
    created_at DATETIME DEFAULT current_timestamp
);
  • Uses AUTOINCREMENT with INTEGER PRIMARY KEY for auto-incrementing columns.

  • Uses BLOB for binary data.

  • Doesn't have a native JSON data type.

  • Doesn't support multiple schemas within the same database file.

  • Supports the IF NOT EXISTS syntax when creating tables, indexes and triggers, and the IF EXISTS syntax when dropping them.

  • Because of SQLite's nature as an embedded database engine, there is no need for a DROP DATABASE statement.

  • In addition to the built-in functions that come with it, SQLite allows you to create custom functions (User-Defined Functions or UDFs) using C/C++, which are compiled into the SQLite library.

  • In SQLite, LIKE is case-insensitive by default.

  • SQLite can also load additional functions at runtime through extensions.

    • e.g., the JSON1 extension adds JSON manipulation functions, FTS5 allows full-text search capabilities, REGEXP Extension provides the regexp() function.
-- JSON Support using JSON1
SELECT load_extension('json1');

CREATE TABLE users (id INTEGER PRIMARY KEY, data JSON);

INSERT INTO users (data) VALUES ('{"name": "Alice", "age": 32}');

-- Full-Text Search (FTS) using FTS5
-- Create base table
CREATE TABLE Books (
    id INTEGER PRIMARY KEY,
    title TEXT,
    desc TEXT
);

-- Create FTS virtual table
CREATE VIRTUAL TABLE BookSearch USING fts5(title, desc);

-- Populate FTS table
INSERT INTO BookSearch SELECT title, desc FROM Books;

-- Search for "prog*" in descriptions
SELECT title 
FROM BookSearch 
WHERE BookSearch MATCH 'desc:prog*' 
ORDER BY rank;
  • SQLite provides some powerful interactive shell commands.
sqlite3           # Opens an in-memory database
sqlite3 test.db   # Creates/opens a file-based database
sqlite3 :memory:  # Creates a temporary database
Command Purpose
.open [filename] Open a new database file (e.g., .open mydb.db)
.open Switch databases
.database List database connections
.tables Lists all tables in the current database
.schema View table structure
.schema table Shows the CREATE TABLE statement for a table
.indexes table Lists indexes on a table
.dump [table] Export all tables or a specific table as SQL
.import [file] [table] Load / Import CSV data into a table
.backup backup.db Backup databases
.read script.sql Executes SQL commands from a file
.output [file] Redirects output to a file
.help Lists all dot commands
.exit or .quit Exit
.timer on Measure query execution time
.mode [format] Outputs results in specific format (csv, column or html)
.headers [on|off] Displays column headers
.nullvalue NULL Replaces NULL with a custom string (e.g., .nullvalue 'N/A')

[!note] VACUUM

  • In SQLite, VACUUM is a command used to optimize and maintain databases.
    • It copies the entire database content to a temporary file and then overwrites the original database with the optimized version, which requires free disk space up to twice the size of the original database.
VACUUM;

VACUUM main.table_name;
  • Benefits:
    • Rebuilds the database file, reducing its size by removing unused space left after deletions.
    • Reorganizes data, improving performance by reducing fragmentation.
    • Rebuilds indexes from scratch, potentially improving query performance.
    • Can remove traces of deleted content, making it harder for adversaries to recover deleted data.
  • In SQLite, it's possible to add other databases to the current connection.
ATTACH DATABASE "sales.db" AS sales;
sqlite> .database
main: /path/to/customers.db
sales: /path/to/sales.db
sqlite>

[!note] PRAGMA The PRAGMA clause in SQLite is a special SQL statement used to modify the operation of the SQLite library or to query the SQLite library for internal (non-table) data. It is used to control various aspects of SQLite's behavior and retrieve internal information about the database.

PRAGMA pragma_name;

PRAGMA pragma_name = value;

Importing & Exporting Data

  • SQLite provides ways to import data from and export data to external file formats:
    • .import FILE TABLE – Import data from CSV, JSON etc. into a table
    • .export TABLE FILE – Export data from table into various formats
.import users.csv users

.export users json users.json

[!note]- Migrating a SQLite Database ![[Migrate SQLite Database]]

Optimizations

  • SQLite's primary journal modes:
    • Rollback Journal
      • The default mode in SQLite.
      • It works by creating a temporary journal file that stores a copy of the original database pages before any changes are made.
        • When a transaction begins, SQLite creates a rollback journal file.
        • If a ROLLBACK command is issued, SQLite uses the journal file to restore the original state of the database.
        • Changes are made directly to the database file.
        • If a COMMIT command is issued, the journal file is deleted, confirming the changes.
      • When somebody is writing to the database, all the readers must stop because the actual file is being modified.
      • Ideal for:
        • Small databases or low-concurrency environments.
        • Situations where simplicity is preferred over performance.
    • Write-Ahead Logging (WAL)
      • Enhances performance by allowing concurrent reads and writes.
        • When a transaction begins, changes are written to a WAL file.
        • All changes are appended to the WAL file, which acts as a log of operations.
        • A commit record is added to the WAL file. The changes are not immediately applied to the database file.
        • Periodically, a checkpoint operation moves changes from the WAL file to the database file.
      • Ideal for:
        • High-concurrency environments.
        • Applications requiring fast write throughput and better data integrity.
        • SSD storage where WAL performs optimally.
PRAGMA journal_mode=WAL;
  • Data replication for SQLite can be implemented using services like Litestream and LiteFS.

Noteworthy

Quotes

CREATE ROLE "user1" WITH LOGIN PASSWORD 'secretpassword';

Single Quotes

  • Primarily used to enclose string literals.
    • e.g., SELECT * FROM users WHERE name = 'John';
  • If a single quote needs to be included within a string, it can be escaped by doubling it.
    • e.g., SELECT * FROM users WHERE name = 'O''Reilly';
  • Required by most SQL databases (like SQL Server, PostgreSQL, and Oracle) for string literals.

Double Quotes

  • Generally used to denote identifiers, such as table or column names, particularly when these names include special characters or spaces.
    • e.g., SELECT "First Name" FROM "User Info";
  • The behavior of double quotes can differ across database systems.
    • MySQL allows both single and double quotes for strings interchangeably, making it more flexible compared to others.
    • PostgreSQL uses double quotes strictly for identifiers and requires single quotes for string literals.
      • Unlike unquoted identifiers, quoted identifiers are case-sensitive.
        • e.g. "USERS" and "users" are treated differently.

Note

It is generally recommended to reserve double quotes for identifiers to avoid confusion and ensure compatibility across different SQL dialects.

Dates & Time Zones

  • Always store timestamps in [[UTC]] in a database to ensure consistency across different time zones.
    • When users input times, convert them to UTC before storing in the database.
    • When sending or receiving data via APIs, use UTC and ISO 8601 format for consistency.
CREATE TABLE events (
    id INT PRIMARY KEY,
    event_name VARCHAR(255),
    event_time TIMESTAMP
);

INSERT INTO events (id, event_name, event_time) 
VALUES (1, 'User Login', '2025-02-21 21:55:00 UTC');

-- Get all events from the last 24 hours
SELECT * FROM events 
WHERE event_time > NOW() - INTERVAL '24 hours';
  • When displaying times to users, convert UTC to their local time zone at the application layer / code, not in the database.

Three-Valued Logic

  • SQL uses a three-valued logic:
    • Besides true and false, the result of logical expressions can also be unknown.
  • It is a consequence of supporting NULL to mark absent data.
  • In SQL, NULL is not equal to anything, not even to another NULL.
  • It is not possible to tell whether a comparison to NULL results true or false.
-- Each of the following comparisons are unknown
NULL = 1
NULL <> 1
NULL > 1
NULL = NULL

-- Evaluates to true and false
(NULL = 1) OR (1 = 1)
(NULL = 1) AND (0 = 1)
  • SQL has the IS [NOT] NULL predicate to test whether a value is NULL or not, and the IS NOT DISTINCT FROM predicate to compare two values while treating two NULL values as the same.

  • WHERE, HAVING and WHEN require true conditions.

    • WHERE rejects all rows when the condition evaluates to unknown.
  • CHECK constraints accept true and unknown.

-- This query always returns the empty set
SELECT col
FROM t
WHERE col = NULL; -- Always unknown

-- Use 'IS NULL' to search for 'NULL' values
SELECT col
FROM t
WHERE col IS NULL;

Further

Books 📚

Resources 🧩

Videos 🎥

SQL Joins Explained