IT Stuff

Relational Algebra

RA is a formal language that forms conventions used in implemented languages like SQL.
RA operates on relations and produces relations as result.

Query (expression) on set of relations produces relation as result.

Key is an attribute or a set of attributes whose value is guaranteed to be unique.
Tuple  - row of a relation.
Attribute  - column of a relation.
Relation - table consisting of attribute (column) and tuples (rows).
Schema  - relation header (attribute names)

We'll use simple college admission database with three relations (keys are in bold):

1st relation: College(schema: cName, state, enrollment)
2nd relation: Student(schema: sID, sName, GPA, sizeHS) # sizeHS = size of High School a student attended
3rd relation: Apply(schema: sID, cName, major, decision)

Simplest query in a RA is simply the relation name, for example Student is valid expression (query) in RA and it's returning copy of Student relation.

Use operators to filter, slice, combine relations:

Select - picks certain rows out of a relation - σ (sigma):

Students with GPA > 3.7 (GPA - Grade Point Average)

σ _{GPA > 3.7} Student

Students with GPA > 3.7 and sizeHS < 1000 (caret ^ is logical and operator)

σ _{GPA > 3.7 ^ sizeHS < 1000} Student

Application for Stanford for CS major

σ_{cName="Stanford" ^ major="CS"} Apply

Select operator general case:

σ_condition Relation

Project operator - picks certain columns:

Apply relation with sID and decision columns only

П_sID,decision Apply

Project operator general case:

П_{col1,col2,col3...} Relation

To Select and Project at the same time:

ID and name of students with GPA>3.7:

П_sID,sName (σ_GPA>3.7 Student)

Duplicates:

SQL is based on multisets/bags, so duplicates are also shown
RA is based on sets, so duplicates are eliminated automatically

Cross-product operator (a.k.a. cross-join) - combine two relations (a.k.a. Cartesian product) horizontally:

Student x Apply as result we get big relation which going to have eight (8) attributes
as a convention when cross-product is done and we get the same attributes for both cross-producted relations we preface their name with the name of the relation they came from: Student.sID / Apply.sID
cross-product gives as relation where for every row of Student relation you get all rows of Apply relation
Names and GPA's of students with HS>1000 who applied to CS and were rejected:

П_{sName, GPA} (σ_{Student.sID=Apply.sID ^ HS>1000 ^ major="CS" ^ decision="Reject"} (Student x Apply))

Difference operator A\B or A-B returns elements of A that not in B:

if A = {a, b, c} and B = {b, c, d} then A - B = {a}
IDs of students who didn't apply anywhere:

(П_sID Student) - (П_sID Apply)

IDs and names of students who didn't apply anywhere (we can't just add sName to the project from Student relation because Apply relation has not this attribute and we might not use difference operator on sets with different quantity of attributes):

П_sID,sName( ( (П_sID Student) - (П_sID Apply) ) ⋈ Student )

Union (denoted by ∪) operator is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. Union combines vertically:

A ∪ B means all members of A and also all members of B that not in A (RA is set based and removes duplicates):

if A = {a, b, c} and B = {b, c, d} then:

all members of A are {a, b, c}
all members of B that not in A is B\A = {d}
combine both {a, b, c} and {d} => {a, b, c, d}
so: A ∪ B = {a, b, c, d}

List of college and student names:

(П_cName College) ∪ (П_sName Student)

Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:

Names and GPA's of students with HS>1000 who applied to CS and were rejected:

П_{sName, GPA} (σ_{HS>1000 ^ major="CS" ^ decision="Reject"}(Student ⋈ Apply))

Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:

П_{sName, GPA} (σ_{HS>1000 ^ major="CS" ^ decision="Reject" ^ enrollment>20000}(Student ⋈ (Apply ⋈ College)))

Relation between ⋈ and x:

E1 ⋈ E2 => П_{schema(E1) U schema(E2)} (σ_{E1.a1= E2.a1 ^ E1.a2= E2.a2 ^ ...}(E1 x E2))

Theta Join - operator takes two expressions/relations and combines them with bow tie looking operator (⋈) but with a subscript theta (_θ) which means select condition (any select condition you want):

E1 ⋈_θE2 = σ_θ (E1 ⋈ E2 )
Theta join is the basic operation implemented in RDBMS so the term "join" often means theta-join

Intersection operator - A ∩ B of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements:

if A = {1, 2, 3} and B = {2, 3, 4} then A ∩ B = {2, 3}
Names that are both college name and student name:

(П_sName Student) ∩ (П_cName College)

Expressing intersection via difference

A ∩ B => A - ( A - B )

Expressing intersection via natural join:

A ∩ B => A ⋈ B

Rename operator uses ρ (rho), it reassigns schema in the result of expression (relation). Above (in 7.2.1 and 10.2.1 we used operators on relations with different attribute names - different schemas, in practice RA doesn't allow that, so we need to use rename operator):

General form:

ρ_{R(A1,A2,...An)} Relation E # call result of the relation E - R with attributes A1 to An

Unify schemas for set operators:

List of college and student names (7.2.1):

ρ_C1(name) (П_cName College) ∪ ρ_C2(name)(П_sName Student)

Names that are both college name and student name (10.2.1):

ρ_C1(name) (П_sName Student) ∩ ρ_C2(name) (П_cName College)

for disambiguation in self-joins:

pairs of colleges in same state:

With cross-join

σ_{s1=s2 ^ n1!=n2}( ρ_C1(n1,s1,e1) College x ρ_C2(n2,s2,e2) College )

With natural-join:

σ_n1!=n2( ρ_C1(n1,s,e1) College x ρ_C2(n2,s,e2) College )

Select, Project, Cross-join, union, difference, rename are RA basic operators
Natural join, theta join, intersection are not RA basic operators they can be expressed with use of basic operators, so there are actually are abbreviations

These materials were used to write this synopsis-post:

IT Stuff

Tuesday, July 16, 2019

Relational Algebra

No comments:

Post a Comment