Databases
PostgreSQL
Database
대용량 데이터를 정리, 관리하는 infrastructure.
- Pros
- Data integity, user control
- Automate steps for re-use
- Support data for websites and applications
SQL
Sturctured Query Language
Seeing database information
- Database > name > Schema > schema_name > Tables > Columns
- Database > name > Schema > schema_name > Tables > Constraints
- Primary = golden key
- Foreign = dual key
- "Dependencies" tab will show referencing tables
- Right click on key > Properties > Columns will show what table uses foreign key as primary key
Show all current setting of runtime parameters
- SHOW ALL
- SHOW TIMEZONE
- SELECT NOW()
SELECT Statements
SELECT COUNT(DISTINCT columns)
FROM table
WHERE conditions->column = 'value'/column NOT/BETWEEN value1 AND value2/ value IN (option1, option2)
ORDER BY column ASC/DESC
LIMIT 1
;
- Not good practice to use (*), if you do not need all the columns -> increases traffic, slows down retrieval.
DISTINCT - Select unique values
SELECT DISTINCT(columns)
FROM table
;
- Casting
- CAST(val AS INTEGER)
- col::INTEGER
COUNT - Number of rows that match the condition of query
One of multiple options
Count unique values in cloumn
SELECT COUNT(DISTINCT column)
FROM table
;
Pattern matching - LIKE, ILIKE, SIMILAR TO
- LIKE = case sensitive
- % - any sequence of characters
- All values beginning with 'A' -
A%
- All values ending with 'a' -
%a
- All values beginning with 'A' -
- _ - any single charater
- % - any sequence of characters
- ILIKE = case insensitive
- SIMILAR TO = Regular expresison
AGGREGATE functions
- Take multiple inputs and return a single output.
- Happens only in the SELECT or HAVING clause.
- Cannot be used to return with normal columns as is.
ROUND(AVG(), 2) COUNT() MAX() MIN() SUM()
GROUP BY
- Must come after a FROM or WHERE statement.
- SELECT columns must be in aggregate functions or the GROUP BY call.
- WHERE statements should not refer to aggregation result, but on the categorical column.
SELECT column1, AGG(col) AS column2
FROM table
GROUP BY column1
HAVING AGG(col) > value
;
-
AS
- rename aggregate result with alias
- Executed at the end of a query -- cannot be used for filtering (inside a WHERE or HAVING statement)
-
HAVING - Filter results after an aggregation
Convert timestamps to dates - DATE(column)
Subqueries
SELECT student, grade
FROM test_scores
WHERE grade > (SELECT AVG(grade) FROM test_scores)
;
Using IN
SELECT student, grade
FROM test_scores
WHERE student IN (SELECT sudent FROM honor_roll)
;
EXISTS - Test for existence in subquery
SELECT student, grade
FROM test_scores
WHERE EXISTS (SELECT student FROM table WHERE condition)
;
Conditional Expressions
CASE
- Only execute SQL code when certain conditions are met. (= if-else).
- Returns results in a new column.
- You can call functions on the results of the case column.
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE other_result AS col_name
END
;
CASE expression
WHEN value1 THEN result1
WHEN value2 THEN result2
...
ELSE other_result AS col_name
END
;
COALESCE
- Accepts unlimited number of arguments, returns the first argument that is not null.
- If all arguments are null, returns null.
- Useful when replacing null values in column w/o affecting the table.
- Kind of like
get()
in Python.
SELECT item, (price - COALESCE(discount, 0))
AS final_price FROM table
;
NULLIF(x, y)
- Returns NULL if x == y, otherwise x.
- Useful when NULL value can cause an error, such as division by 0 (y=0).
Views
Import/Export
Self-Join
- Joining two copies of the same table.
- Useful for comparing values in a column of rows within the same table.
- Necessary to use alias to avoid ambiguity.
SELECT tableA.col, tableB.col
FROM table AS tableA JOIN table AS tableB ON tableA.col1 = tableB.col2
;
JOIN Statements
Combine information from multiple tabels.
- Inner - Only intersection
- Full Outer - Merge everything from both sides
- Left - Fill in information for data points in Table A, using Table B
- Right - Same as reverse order using LEFT
- Union - Combine/Concatenate results from two SELECT statements
SELECT *
FROM TableA INNER JOIN TableB
ON TableA.col = TableB.col
Get rows unique to either table ($Intersection^C$) - FULL JOIN with WHERE
SELECT *
FROM TableA FULL OUTER JOIN TableB
ON TableA.col = TableB.col
WHERE TableA.id IS null OR TableB.id IS null
Get only unique rows in Table A - RIGHT JOIN with WHERE
SELECT *
FROM TableA LEFT JOIN TableB
ON TableA.col = TableB.col
WHERE TableB.id IS null
Concatenate results / Stack tables vertically - UNION
SELECT * FROM TableA
UNION
SELECT * FROM TableB
Views
- Stored query. When there are combinations and conditions that you use recurrently, you can create a View to quickly repeat the same query.
- View can be accessed as a virtual table in the database, but does not store the data physically.
- Views cand be altered/updated.
CREATE OR REPLACE VIEW view AS
(... Query ... )
DROP VIEW IF EXISTS view;
ALTER VIEW view RENAME TO new_view;
TABLE Statements
Create custom database
Data Types
- Serial - unique integer sequences - SMALLSERIAL, SERIAL, LARGESERIAL
- Does not adjust. Removed and failed rows will consume a serial number.
- Boolean
- Character - char, varchar, text
- Numeric
- Temporal - date, time, timestamp, interval
- UUID - Universally Unique Identifiers
- Array
- JSON
- Hstor keyvalue pair
- Special types - network address, geodata
Primary and Foreign Keys
- Primary key
- Column(s) used to identify a unique row in table
- Integer based, unique, SERIAL
- Foreign key
- Primary key of another table
- A table can have multiple foreign keys depending on its relationships w/ other tables
Import from CSV
CREATE TABLE table(
col TYPE,
...
)
- CSV must contain table, SQL will not create a table for you.
- Right click: Schema > Table > Import/Export
Constraints
- Rules enforced on data columns for data integrity/reliability.
- PRIMARY KEY = unique identifier for rows
- FOREIGN KEY = data based on columns from other tables
- NOT NULL = cannot contain NULL values
- UNIQUE = all values must be unqiue
- CHECK = other conditions
- EXCLUSION = Unique combination of columns. Ensures that if any two rows are compared on the specified column/expression/operator, not all of these comparisons will return True.
- REFERENCES = value stored in the column must exist in a column in another table
CREATE
Add new tables to the database.
CREATE TABLE table_name (
column_name TYPE column_constraint PRIMARY KEY,
column_name TYPE CHECK (constraint),
column_name TYPE REFERENCES another_table(col)
...
table_constraint table_constraint,
...
) INHERITS existing_table_name
;
INSERT
- Add new rows to a table.
- Inserted row values must match up for the table, including constraints.
- SERIAL columns do not need to be given a value.
- CURRENT_TIMESTAMP
INSERT INTO table (col1, col2, ...)
VALUES
(value1, value2, ...),
(value1, value2, ...),
...
;
Values from another table
INSERT INTO table (col1, col2, ...)
SELECT col1, col2, ...
FROM another_table
WHERE condition
;
UPDATE
- Change vaules in a table.
- Adding
RETURNING columns
at the end will return affected rows.
Reset everything
UPDATE table
SET
col1 = val1,
col2 = val2,
...
RETURNING col1, col2
;
Conditional replacement
UPDATE table
SET
col1 = val1,
col2 = val2,
...
WHERE
condition
;
Set based on another column in same table
UPDATE table
SET
col1 = col2
;
Set based on another column in another table
UPDATE table
SET original_col = another_table.new_col
FROM another_table
WHERE table.id = another_table.id
;
DELETE
- Remove rows from a table.
- Adding
RETURNING columns
at the end will return affected rows.
Remove based on conditional value
DELETE FROM table
WHERE condition
;
Remove based on presence in other tables
DELETE FROM tableA
USING tableB
WHERE tableA.id = tableB.id
;
Remove all rows from table
DELETE FROM tableA
;
ALTER
ALTER TABLE table action;
Add primary key
ALTER TABLE table
ADD PRIMARY KEY (col);
Add new columns
ALTER TABLE table
ADD COLUMN column TYPE
;
Remove columns
ALTER TABLE table
DROP COLUMN col
;
Alter column data types
ALTER TABLE table
ALTER COLUMN col TYPE CHAR(12);
Set/drop DEFAULT values for columns
ALTER TABLE table
ALTER COLUMN col
SET/DROP DEFAULT value
;
Altering constraints
ALTER TABLE table
ALTER COLUMN col
SET/DROP NOT NULL
ADD CONSTRAINT constraint
;
#### Rename table
```SQL
ALTER TABLE table
RENAME TO table
;
Rename column
ALTER TABLE table
RENAME COLUMN col1 TO col2
;
Drop columns
- Complete removal of a column.
- Also removes all indices and constraints.
- Will not remove columns used in views, triggered, or stored procedures without CASCADE clause.
ALTER TABLE table
DROP COLUMN IF EXISTS col CASCADE,
...
Drop tables
DROP TABLE table;
DATE / TIME
Can be used to log when data points were created.
- TIME
- DATE
- TIMESTAMP - datetime
- TIMESTAMPTZ - datetime + timezone
- NOW
- DOW - day of week (0-6)
- TIMEOFDAY - datetime + day of week (Monday, Tuesday)
- CURRENT_TIME
- CURRENT_DATE
Extracting information from a time based data type
- EXTARCT(YEAR FROM date_col)
- Extract sub component
- YEAR, MONTH, DAY, WEEK, QUARTER
- AGE(date_col)
- Calculate duration Durantion
Formatting Functions
- TO_CHAR(date_col, 'mm-dd-yyyy')
- Convert data types into formatted string
- timestamp, interval, integer, double precision, numeric
- Convert data types into formatted string
- TO_DATE
| Pattern | Description | |:-------:|:------------| | HH | Hour (01-12) | | HH12 | Hour (01-12) | | HH24 | Hour (00-23) | | MI | Minutes (00-59) | | SS | Second (00-59) | | MS | Millisecond (000-999) | | US | Microsecond (000000 -999999) | | SSSS | Seconds past midnight (0-86399) | | AM, PM | Meridiem indicator w/o periods | | A.M., P.M. | Meridiem indicator w/ periods | | Y,YYY | Year w/ comma | | YYYY | Year w/o comma | | YYY, YY, Y | Last digits of year |
String Functions
| Function | Description | |:---------:|:-----------| | a || b | Concatenate a and b | | BIT_LENGTH(str) | Number of bits | | CHAR_LENGTH(str) | Number of characters | | LOWER() / UPPER() | Convert case | | OCTET_LENGTH() | Number of bytes | | OVERLAY(base_str PLACING insert_str FROM start_idx FOR end_idx) | Replace/Insert substring into stirng | | POSITION(substr, str) | Index of specified substring | | SUBSTRING() | | | TRIM() | | | LEFT(str, x) |
Math Function
| Function | Description |
|:---------:|:-----------|
| |/ 25.0 ... SQRT(numeric) | Square root |
| ||/ 27.9 ... CBRT(dp) | Cube root |
| @ -5.0 ... ABS(x) | Absolute value |
| 91 & 15 | Bitwise AND |
| 32 | 3 | Bitwise OR |
| 17 # 5 | Bitwise XOR |
| ~1| Bitwise NOT |
| 1 << 4 | Bitwise left shift |
| 8 >> 2 | Bitwise right shift |
| CEIL(numeric), FLOOR(numeric)) | Ronud up/down to nearest integer |
| ROUND(x, y) | Round to y decimal places |
| TRUNC(x, y) | Drop excess decimal places |
| SCALE() | Number of decimal points (precision) |
| SIGN() | -1, 0, +1 |
| DEGREES(dp) | Radians to degrees |
| DIV(x, y) | Integer quotient, // |
| EXP(numeric) | Exponential |
| POWER(a, b) | a raised to the power of b |
| PI() | Constant |
| LN(numeric) | Natural log |
| LOG(x), LOG10(numeric) | Logarithm base 10 |
| WIDTH_BUCKET(target_numeric, low, high, num_buckets)
WIDTH_BUCKET(target_numeric, array_containing_buckets) | Which bucket the target is assigned |
| RANDOM() | Random value between 0 and 1 |
| SETSEED(DP) | Seed for random number generation |
PostgreSQL
Restore a Database
Servers > PostgreSQL > Databases > Create > Database
- Set database name > Save
- Right click database > Restore
- Check Restore options - Pre-data, data, post-data
Process failed
This error message can pop up because postgreSQL tried to load the same database twice. You can safely ignore this message, and continue by refreshing the loaded database.
Psycogp2
import psycopg2 as pg2
# secret = password
# Connect to database
conn = pg2.connect(database='database_name', user='postgres', password=secret)
# Create cursor object
cur = conn.cursor()
cur.execute('SQL QUERY')
data = cur.fetchall()
# cur.fetchone()
# cur.fetchmany(10)
# Proceed with data
# Commit changes to the database
cur.commit()
conn.close()
Query tool
- Query History > Copy to Query Editor
- Download query results as a .csv file