Databases

PostgreSQL


Database

대용량 데이터를 정리, 관리하는 infrastructure.

  • Pros
    • Data integity, user control
    • Automate steps for re-use
    • Support data for websites and applications

SQL

Sturctured Query Language

Seeing database information

  • Database > name > Schema > schema_name > Tables > Columns
  • Database > name > Schema > schema_name > Tables > Constraints
    • Primary = golden key
    • Foreign = dual key
      • "Dependencies" tab will show referencing tables
      • Right click on key > Properties > Columns will show what table uses foreign key as primary key

Show all current setting of runtime parameters

  • SHOW ALL
  • SHOW TIMEZONE
  • SELECT NOW()

SELECT Statements

SELECT COUNT(DISTINCT columns) 
FROM table
WHERE conditions->column = 'value'/column NOT/BETWEEN value1 AND value2/ value IN (option1, option2)
ORDER BY column ASC/DESC
LIMIT 1
;
  • Not good practice to use (*), if you do not need all the columns -> increases traffic, slows down retrieval.

DISTINCT - Select unique values

SELECT DISTINCT(columns)
FROM table
;
  • Casting
    • CAST(val AS INTEGER)
    • col::INTEGER

COUNT - Number of rows that match the condition of query

One of multiple options

Count unique values in cloumn
SELECT COUNT(DISTINCT column)
FROM table
;

Pattern matching - LIKE, ILIKE, SIMILAR TO

  • LIKE = case sensitive
    • % - any sequence of characters
      • All values beginning with 'A' - A%
      • All values ending with 'a' - %a
    • _ - any single charater
  • ILIKE = case insensitive
  • SIMILAR TO = Regular expresison

AGGREGATE functions

  • Take multiple inputs and return a single output.
  • Happens only in the SELECT or HAVING clause.
  • Cannot be used to return with normal columns as is.

ROUND(AVG(), 2) COUNT() MAX() MIN() SUM()

GROUP BY

  • Must come after a FROM or WHERE statement.
  • SELECT columns must be in aggregate functions or the GROUP BY call.
  • WHERE statements should not refer to aggregation result, but on the categorical column.
SELECT column1, AGG(col) AS column2
FROM table
GROUP BY column1
HAVING AGG(col) > value
;
  • AS

    • rename aggregate result with alias
    • Executed at the end of a query -- cannot be used for filtering (inside a WHERE or HAVING statement)
  • HAVING - Filter results after an aggregation

Convert timestamps to dates - DATE(column)

Subqueries

SELECT student, grade
FROM test_scores
WHERE grade > (SELECT AVG(grade) FROM test_scores)
;

Using IN

SELECT student, grade
FROM test_scores
WHERE student IN (SELECT sudent FROM honor_roll)
;

EXISTS - Test for existence in subquery

SELECT student, grade
FROM test_scores
WHERE EXISTS (SELECT student FROM table WHERE condition)
;

Conditional Expressions

CASE

  • Only execute SQL code when certain conditions are met. (= if-else).
  • Returns results in a new column.
  • You can call functions on the results of the case column.
CASE
  WHEN condition1 THEN result1
  WHEN condition2 THEN result2
  ...
  ELSE other_result AS col_name
END
;
CASE expression
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ...
  ELSE other_result AS col_name
END
;

COALESCE

  • Accepts unlimited number of arguments, returns the first argument that is not null.
  • If all arguments are null, returns null.
  • Useful when replacing null values in column w/o affecting the table.
  • Kind of like get() in Python.
SELECT item, (price - COALESCE(discount, 0))
AS final_price FROM table
;

NULLIF(x, y)

  • Returns NULL if x == y, otherwise x.
  • Useful when NULL value can cause an error, such as division by 0 (y=0).

Views

Import/Export

Self-Join

  • Joining two copies of the same table.
  • Useful for comparing values in a column of rows within the same table.
  • Necessary to use alias to avoid ambiguity.
SELECT tableA.col, tableB.col
FROM table AS tableA JOIN table AS tableB ON tableA.col1 = tableB.col2
;

JOIN Statements

Combine information from multiple tabels.

  • Inner - Only intersection
  • Full Outer - Merge everything from both sides
  • Left - Fill in information for data points in Table A, using Table B
  • Right - Same as reverse order using LEFT
  • Union - Combine/Concatenate results from two SELECT statements
SELECT * 
FROM TableA INNER JOIN TableB
ON TableA.col = TableB.col

Get rows unique to either table ($Intersection^C$) - FULL JOIN with WHERE

SELECT * 
FROM TableA FULL OUTER JOIN TableB
ON TableA.col = TableB.col
WHERE TableA.id IS null OR TableB.id IS null

Get only unique rows in Table A - RIGHT JOIN with WHERE

SELECT * 
FROM TableA LEFT JOIN TableB
ON TableA.col = TableB.col
WHERE TableB.id IS null

Concatenate results / Stack tables vertically - UNION

SELECT * FROM TableA
UNION
SELECT * FROM TableB

Views

  • Stored query. When there are combinations and conditions that you use recurrently, you can create a View to quickly repeat the same query.
  • View can be accessed as a virtual table in the database, but does not store the data physically.
  • Views cand be altered/updated.
CREATE OR REPLACE VIEW view AS 
(... Query ... )

DROP VIEW IF EXISTS view;
ALTER VIEW view RENAME TO new_view;

TABLE Statements

Create custom database

Data Types

  • Serial - unique integer sequences - SMALLSERIAL, SERIAL, LARGESERIAL
    • Does not adjust. Removed and failed rows will consume a serial number.
  • Boolean
  • Character - char, varchar, text
  • Numeric
  • Temporal - date, time, timestamp, interval
  • UUID - Universally Unique Identifiers
  • Array
  • JSON
  • Hstor keyvalue pair
  • Special types - network address, geodata

Primary and Foreign Keys

  • Primary key
    • Column(s) used to identify a unique row in table
    • Integer based, unique, SERIAL
  • Foreign key
    • Primary key of another table
    • A table can have multiple foreign keys depending on its relationships w/ other tables

Import from CSV

CREATE TABLE table(
  col TYPE, 
  ...
)
  • CSV must contain table, SQL will not create a table for you.
  • Right click: Schema > Table > Import/Export

Constraints

  • Rules enforced on data columns for data integrity/reliability.
  • PRIMARY KEY = unique identifier for rows
  • FOREIGN KEY = data based on columns from other tables
  • NOT NULL = cannot contain NULL values
  • UNIQUE = all values must be unqiue
  • CHECK = other conditions
  • EXCLUSION = Unique combination of columns. Ensures that if any two rows are compared on the specified column/expression/operator, not all of these comparisons will return True.
  • REFERENCES = value stored in the column must exist in a column in another table

CREATE

Add new tables to the database.

CREATE TABLE table_name (
  column_name TYPE column_constraint PRIMARY KEY,
  column_name TYPE CHECK (constraint),
  column_name TYPE REFERENCES another_table(col)
  ...
  table_constraint table_constraint,
  ...
) INHERITS existing_table_name
;

INSERT

  • Add new rows to a table.
  • Inserted row values must match up for the table, including constraints.
  • SERIAL columns do not need to be given a value.
  • CURRENT_TIMESTAMP
INSERT INTO table (col1, col2, ...)
VALUES
  (value1, value2, ...),
  (value1, value2, ...), 
  ...
;

Values from another table

INSERT INTO table (col1, col2, ...)
SELECT col1, col2, ...
FROM another_table
WHERE condition 
;

UPDATE

  • Change vaules in a table.
  • Adding RETURNING columns at the end will return affected rows.

Reset everything

UPDATE table 
SET 
  col1 = val1,
  col2 = val2,
  ...
RETURNING col1, col2
;

Conditional replacement

UPDATE table 
SET 
  col1 = val1,
  col2 = val2,
  ...
WHERE 
  condition
;

Set based on another column in same table

UPDATE table 
SET 
  col1 = col2
;

Set based on another column in another table

UPDATE table 
SET original_col = another_table.new_col 
FROM another_table 
WHERE table.id = another_table.id 
;

DELETE

  • Remove rows from a table.
  • Adding RETURNING columns at the end will return affected rows.

Remove based on conditional value

DELETE FROM table
WHERE condition
;

Remove based on presence in other tables

DELETE FROM tableA
USING tableB
WHERE tableA.id = tableB.id 
;

Remove all rows from table

DELETE FROM tableA
;

ALTER

ALTER TABLE table action;

Add primary key

ALTER TABLE table
ADD PRIMARY KEY (col);

Add new columns

ALTER TABLE table 
ADD COLUMN column TYPE
;

Remove columns

ALTER TABLE table 
DROP COLUMN col 
;

Alter column data types

ALTER TABLE table
ALTER COLUMN col TYPE CHAR(12);

Set/drop DEFAULT values for columns

ALTER TABLE table 
ALTER COLUMN col
SET/DROP DEFAULT value 
;

Altering constraints

ALTER TABLE table 
ALTER COLUMN col
SET/DROP NOT NULL
ADD CONSTRAINT constraint
;

#### Rename table

```SQL
ALTER TABLE table 
RENAME TO table
;

Rename column

ALTER TABLE table 
RENAME COLUMN col1 TO col2
;

Drop columns

  • Complete removal of a column.
  • Also removes all indices and constraints.
  • Will not remove columns used in views, triggered, or stored procedures without CASCADE clause.
ALTER TABLE table
DROP COLUMN IF EXISTS col CASCADE,
...

Drop tables

DROP TABLE table;

DATE / TIME

Can be used to log when data points were created.

  • TIME
  • DATE
  • TIMESTAMP - datetime
  • TIMESTAMPTZ - datetime + timezone
  • NOW
  • DOW - day of week (0-6)
  • TIMEOFDAY - datetime + day of week (Monday, Tuesday)
  • CURRENT_TIME
  • CURRENT_DATE

Extracting information from a time based data type

  • EXTARCT(YEAR FROM date_col)
    • Extract sub component
    • YEAR, MONTH, DAY, WEEK, QUARTER
  • AGE(date_col)
    • Calculate duration Durantion

Formatting Functions

  • TO_CHAR(date_col, 'mm-dd-yyyy')
    • Convert data types into formatted string
      • timestamp, interval, integer, double precision, numeric
  • TO_DATE

| Pattern | Description | |:-------:|:------------| | HH | Hour (01-12) | | HH12 | Hour (01-12) | | HH24 | Hour (00-23) | | MI | Minutes (00-59) | | SS | Second (00-59) | | MS | Millisecond (000-999) | | US | Microsecond (000000 -999999) | | SSSS | Seconds past midnight (0-86399) | | AM, PM | Meridiem indicator w/o periods | | A.M., P.M. | Meridiem indicator w/ periods | | Y,YYY | Year w/ comma | | YYYY | Year w/o comma | | YYY, YY, Y | Last digits of year |

String Functions

| Function | Description | |:---------:|:-----------| | a || b | Concatenate a and b | | BIT_LENGTH(str) | Number of bits | | CHAR_LENGTH(str) | Number of characters | | LOWER() / UPPER() | Convert case | | OCTET_LENGTH() | Number of bytes | | OVERLAY(base_str PLACING insert_str FROM start_idx FOR end_idx) | Replace/Insert substring into stirng | | POSITION(substr, str) | Index of specified substring | | SUBSTRING() | | | TRIM() | | | LEFT(str, x) |

Math Function

| Function | Description | |:---------:|:-----------| | |/ 25.0 ... SQRT(numeric) | Square root | | ||/ 27.9 ... CBRT(dp) | Cube root | | @ -5.0 ... ABS(x) | Absolute value | | 91 & 15 | Bitwise AND | | 32 | 3 | Bitwise OR | | 17 # 5 | Bitwise XOR | | ~1| Bitwise NOT | | 1 << 4 | Bitwise left shift | | 8 >> 2 | Bitwise right shift | | CEIL(numeric), FLOOR(numeric)) | Ronud up/down to nearest integer |
| ROUND(x, y) | Round to y decimal places | | TRUNC(x, y) | Drop excess decimal places | | SCALE() | Number of decimal points (precision) | | SIGN() | -1, 0, +1 | | DEGREES(dp) | Radians to degrees | | DIV(x, y) | Integer quotient, // | | EXP(numeric) | Exponential | | POWER(a, b) | a raised to the power of b | | PI() | Constant | | LN(numeric) | Natural log | | LOG(x), LOG10(numeric) | Logarithm base 10 | | WIDTH_BUCKET(target_numeric, low, high, num_buckets)
WIDTH_BUCKET(target_numeric, array_containing_buckets) | Which bucket the target is assigned | | RANDOM() | Random value between 0 and 1 | | SETSEED(DP) | Seed for random number generation |

PostgreSQL

Download Link

Restore a Database

  • Servers > PostgreSQL > Databases > Create > Database
  • Set database name > Save
  • Right click database > Restore
  • Check Restore options - Pre-data, data, post-data

Process failed

This error message can pop up because postgreSQL tried to load the same database twice. You can safely ignore this message, and continue by refreshing the loaded database.

Psycogp2

import psycopg2 as pg2

# secret = password
# Connect to database
conn = pg2.connect(database='database_name', user='postgres', password=secret)

# Create cursor object
cur = conn.cursor()

cur.execute('SQL QUERY')

data = cur.fetchall()
# cur.fetchone() 
# cur.fetchmany(10)

# Proceed with data 

# Commit changes to the database
cur.commit()

conn.close()

Query tool

  • Query History > Copy to Query Editor
  • Download query results as a .csv file

Resources