Databases

PostgreSQL

Database

대용량 데이터를 정리, 관리하는 infrastructure.

Pros
- Data integity, user control
- Automate steps for re-use
- Support data for websites and applications

SQL

Sturctured Query Language

Seeing database information

Database > name > Schema > schema_name > Tables > Columns
Database > name > Schema > schema_name > Tables > Constraints
- Primary = golden key
- Foreign = dual key
  - "Dependencies" tab will show referencing tables
  - Right click on key > Properties > Columns will show what table uses foreign key as primary key

Show all current setting of runtime parameters

SHOW ALL
SHOW TIMEZONE
SELECT NOW()

SELECT Statements

SELECT COUNT(DISTINCT columns) 
FROM table
WHERE conditions->column = 'value'/column NOT/BETWEEN value1 AND value2/ value IN (option1, option2)
ORDER BY column ASC/DESC
LIMIT 1
;

Not good practice to use (*), if you do not need all the columns -> increases traffic, slows down retrieval.

DISTINCT - Select unique values

SELECT DISTINCT(columns)
FROM table
;

Casting
- CAST(val AS INTEGER)
- col::INTEGER

COUNT - Number of rows that match the condition of query

One of multiple options

Count unique values in cloumn

SELECT COUNT(DISTINCT column)
FROM table
;

Pattern matching - LIKE, ILIKE, SIMILAR TO

LIKE = case sensitive
- % - any sequence of characters
  - All values beginning with 'A' - A%
  - All values ending with 'a' - %a
- _ - any single charater
ILIKE = case insensitive
SIMILAR TO = Regular expresison

AGGREGATE functions

Take multiple inputs and return a single output.
Happens only in the SELECT or HAVING clause.
Cannot be used to return with normal columns as is.

ROUND(AVG(), 2) COUNT() MAX() MIN() SUM()

GROUP BY

Must come after a FROM or WHERE statement.
SELECT columns must be in aggregate functions or the GROUP BY call.
WHERE statements should not refer to aggregation result, but on the categorical column.

SELECT column1, AGG(col) AS column2
FROM table
GROUP BY column1
HAVING AGG(col) > value
;

AS
- rename aggregate result with alias
- Executed at the end of a query -- cannot be used for filtering (inside a WHERE or HAVING statement)
HAVING - Filter results after an aggregation

Convert timestamps to dates - DATE(column)

Subqueries

SELECT student, grade
FROM test_scores
WHERE grade > (SELECT AVG(grade) FROM test_scores)
;

Using IN

SELECT student, grade
FROM test_scores
WHERE student IN (SELECT sudent FROM honor_roll)
;

EXISTS - Test for existence in subquery

SELECT student, grade
FROM test_scores
WHERE EXISTS (SELECT student FROM table WHERE condition)
;

Conditional Expressions

CASE

Only execute SQL code when certain conditions are met. (= if-else).
Returns results in a new column.
You can call functions on the results of the case column.

CASE
  WHEN condition1 THEN result1
  WHEN condition2 THEN result2
  ...
  ELSE other_result AS col_name
END
;

CASE expression
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ...
  ELSE other_result AS col_name
END
;

COALESCE

Accepts unlimited number of arguments, returns the first argument that is not null.
If all arguments are null, returns null.
Useful when replacing null values in column w/o affecting the table.
Kind of like get() in Python.

SELECT item, (price - COALESCE(discount, 0))
AS final_price FROM table
;

NULLIF(x, y)

Returns NULL if x == y, otherwise x.
Useful when NULL value can cause an error, such as division by 0 (y=0).

Views

Import/Export

Self-Join

Joining two copies of the same table.
Useful for comparing values in a column of rows within the same table.
Necessary to use alias to avoid ambiguity.

SELECT tableA.col, tableB.col
FROM table AS tableA JOIN table AS tableB ON tableA.col1 = tableB.col2
;

JOIN Statements

Combine information from multiple tabels.

Inner - Only intersection
Full Outer - Merge everything from both sides
Left - Fill in information for data points in Table A, using Table B
Right - Same as reverse order using LEFT
Union - Combine/Concatenate results from two SELECT statements

SELECT * 
FROM TableA INNER JOIN TableB
ON TableA.col = TableB.col

Get rows unique to either table ($Intersection^C$) - FULL JOIN with WHERE

SELECT * 
FROM TableA FULL OUTER JOIN TableB
ON TableA.col = TableB.col
WHERE TableA.id IS null OR TableB.id IS null

Get only unique rows in Table A - RIGHT JOIN with WHERE

SELECT * 
FROM TableA LEFT JOIN TableB
ON TableA.col = TableB.col
WHERE TableB.id IS null

Concatenate results / Stack tables vertically - UNION

SELECT * FROM TableA
UNION
SELECT * FROM TableB

Views

Stored query. When there are combinations and conditions that you use recurrently, you can create a View to quickly repeat the same query.
View can be accessed as a virtual table in the database, but does not store the data physically.
Views cand be altered/updated.

CREATE OR REPLACE VIEW view AS 
(... Query ... )

DROP VIEW IF EXISTS view;

ALTER VIEW view RENAME TO new_view;

TABLE Statements

Create custom database

Data Types

Serial - unique integer sequences - SMALLSERIAL, SERIAL, LARGESERIAL
- Does not adjust. Removed and failed rows will consume a serial number.
Boolean
Character - char, varchar, text
Numeric
Temporal - date, time, timestamp, interval
UUID - Universally Unique Identifiers
Array
JSON
Hstor keyvalue pair
Special types - network address, geodata

Primary and Foreign Keys

Primary key
- Column(s) used to identify a unique row in table
- Integer based, unique, SERIAL
Foreign key
- Primary key of another table
- A table can have multiple foreign keys depending on its relationships w/ other tables

Import from CSV

CREATE TABLE table(
  col TYPE, 
  ...
)

CSV must contain table, SQL will not create a table for you.
Right click: Schema > Table > Import/Export

Constraints

Rules enforced on data columns for data integrity/reliability.
PRIMARY KEY = unique identifier for rows
FOREIGN KEY = data based on columns from other tables
NOT NULL = cannot contain NULL values
UNIQUE = all values must be unqiue
CHECK = other conditions
EXCLUSION = Unique combination of columns. Ensures that if any two rows are compared on the specified column/expression/operator, not all of these comparisons will return True.
REFERENCES = value stored in the column must exist in a column in another table

CREATE

Add new tables to the database.

CREATE TABLE table_name (
  column_name TYPE column_constraint PRIMARY KEY,
  column_name TYPE CHECK (constraint),
  column_name TYPE REFERENCES another_table(col)
  ...
  table_constraint table_constraint,
  ...
) INHERITS existing_table_name
;

INSERT

Add new rows to a table.
Inserted row values must match up for the table, including constraints.
SERIAL columns do not need to be given a value.
CURRENT_TIMESTAMP

INSERT INTO table (col1, col2, ...)
VALUES
  (value1, value2, ...),
  (value1, value2, ...), 
  ...
;

Values from another table

INSERT INTO table (col1, col2, ...)
SELECT col1, col2, ...
FROM another_table
WHERE condition 
;

UPDATE

Change vaules in a table.
Adding RETURNING columns at the end will return affected rows.

Reset everything

UPDATE table 
SET 
  col1 = val1,
  col2 = val2,
  ...
RETURNING col1, col2
;

Conditional replacement

UPDATE table 
SET 
  col1 = val1,
  col2 = val2,
  ...
WHERE 
  condition
;

Set based on another column in same table

UPDATE table 
SET 
  col1 = col2
;

Set based on another column in another table

UPDATE table 
SET original_col = another_table.new_col 
FROM another_table 
WHERE table.id = another_table.id 
;

DELETE

Remove rows from a table.
Adding RETURNING columns at the end will return affected rows.

Remove based on conditional value

DELETE FROM table
WHERE condition
;

Remove based on presence in other tables

DELETE FROM tableA
USING tableB
WHERE tableA.id = tableB.id 
;

Remove all rows from table

DELETE FROM tableA
;

ALTER

ALTER TABLE table action;

Add primary key

ALTER TABLE table
ADD PRIMARY KEY (col);

Add new columns

ALTER TABLE table 
ADD COLUMN column TYPE
;

Remove columns

ALTER TABLE table 
DROP COLUMN col 
;

Alter column data types

ALTER TABLE table
ALTER COLUMN col TYPE CHAR(12);

Set/drop DEFAULT values for columns

ALTER TABLE table 
ALTER COLUMN col
SET/DROP DEFAULT value 
;

Altering constraints

ALTER TABLE table 
ALTER COLUMN col
SET/DROP NOT NULL
ADD CONSTRAINT constraint
;

#### Rename table

```SQL
ALTER TABLE table 
RENAME TO table
;

Rename column

ALTER TABLE table 
RENAME COLUMN col1 TO col2
;

Drop columns

Complete removal of a column.
Also removes all indices and constraints.
Will not remove columns used in views, triggered, or stored procedures without CASCADE clause.

ALTER TABLE table
DROP COLUMN IF EXISTS col CASCADE,
...

Drop tables

DROP TABLE table;

DATE / TIME

Can be used to log when data points were created.

TIME
DATE
TIMESTAMP - datetime
TIMESTAMPTZ - datetime + timezone
NOW
DOW - day of week (0-6)
TIMEOFDAY - datetime + day of week (Monday, Tuesday)
CURRENT_TIME
CURRENT_DATE

Extracting information from a time based data type

EXTARCT(YEAR FROM date_col)
- Extract sub component
- YEAR, MONTH, DAY, WEEK, QUARTER
AGE(date_col)
- Calculate duration Durantion

Formatting Functions

TO_CHAR(date_col, 'mm-dd-yyyy')
- Convert data types into formatted string
  - timestamp, interval, integer, double precision, numeric
TO_DATE

| Pattern | Description | |:-------:|:------------| | HH | Hour (01-12) | | HH12 | Hour (01-12) | | HH24 | Hour (00-23) | | MI | Minutes (00-59) | | SS | Second (00-59) | | MS | Millisecond (000-999) | | US | Microsecond (000000 -999999) | | SSSS | Seconds past midnight (0-86399) | | AM, PM | Meridiem indicator w/o periods | | A.M., P.M. | Meridiem indicator w/ periods | | Y,YYY | Year w/ comma | | YYYY | Year w/o comma | | YYY, YY, Y | Last digits of year |

String Functions

| Function | Description | |:---------:|:-----------| | a || b | Concatenate a and b | | BIT_LENGTH(str) | Number of bits | | CHAR_LENGTH(str) | Number of characters | | LOWER() / UPPER() | Convert case | | OCTET_LENGTH() | Number of bytes | | OVERLAY(base_str PLACING insert_str FROM start_idx FOR end_idx) | Replace/Insert substring into stirng | | POSITION(substr, str) | Index of specified substring | | SUBSTRING() | | | TRIM() | | | LEFT(str, x) |

Math Function

| Function | Description | |:---------:|:-----------| | |/ 25.0 ... SQRT(numeric) | Square root | | ||/ 27.9 ... CBRT(dp) | Cube root | | @ -5.0 ... ABS(x) | Absolute value | | 91 & 15 | Bitwise AND | | 32 | 3 | Bitwise OR | | 17 # 5 | Bitwise XOR | | ~1| Bitwise NOT | | 1 << 4 | Bitwise left shift | | 8 >> 2 | Bitwise right shift | | CEIL(numeric), FLOOR(numeric)) | Ronud up/down to nearest integer |
| ROUND(x, y) | Round to y decimal places | | TRUNC(x, y) | Drop excess decimal places | | SCALE() | Number of decimal points (precision) | | SIGN() | -1, 0, +1 | | DEGREES(dp) | Radians to degrees | | DIV(x, y) | Integer quotient, // | | EXP(numeric) | Exponential | | POWER(a, b) | a raised to the power of b | | PI() | Constant | | LN(numeric) | Natural log | | LOG(x), LOG10(numeric) | Logarithm base 10 | | WIDTH_BUCKET(target_numeric, low, high, num_buckets)
WIDTH_BUCKET(target_numeric, array_containing_buckets) | Which bucket the target is assigned | | RANDOM() | Random value between 0 and 1 | | SETSEED(DP) | Seed for random number generation |

PostgreSQL

Download Link

Restore a Database

Servers > PostgreSQL > Databases > Create > Database
Set database name > Save
Right click database > Restore
Check Restore options - Pre-data, data, post-data

Process failed

This error message can pop up because postgreSQL tried to load the same database twice. You can safely ignore this message, and continue by refreshing the loaded database.

Psycogp2

import psycopg2 as pg2

# secret = password
# Connect to database
conn = pg2.connect(database='database_name', user='postgres', password=secret)

# Create cursor object
cur = conn.cursor()

cur.execute('SQL QUERY')

data = cur.fetchall()
# cur.fetchone() 
# cur.fetchmany(10)

# Proceed with data 

# Commit changes to the database
cur.commit()

conn.close()

Query tool

Query History > Copy to Query Editor
Download query results as a .csv file

Resources

Udemy - The Complete SQL Bootcamp: Go from Zero to Hero