Tuesday, July 16, 2019

Relational Algebra

RA is a formal language that forms conventions used in implemented languages like SQL.
RA operates on relations and produces relations as result.
Query (expression) on set of relations produces relation as result. 
Key is an attribute or a set of attributes whose value is guaranteed to be unique.
Tuple  - row of a relation.
Attribute  - column of a relation.
Relation - table consisting of attribute (column) and tuples (rows).
Schema  - relation header (attribute names)
We'll use simple college admission database with three relations (keys are in bold):
  1. 1st relation: College(schema: cName, state, enrollment)
  2. 2nd relation: Student(schema: sID, sName, GPA, sizeHS) # sizeHS = size of High School a student attended
  3. 3rd relation: Apply(schema: sID, cName, major, decision)
Simplest query in a RA is simply the relation name, for example Student is valid expression (query) in RA and it's returning copy of Student relation.
Use operators to filter, slice, combine relations:
  1. Select - picks certain rows out of a relation - σ (sigma):
    1. Students with GPA > 3.7 (GPA - Grade Point Average)
      1. σ GPA > 3.7 Student
    2. Students with GPA > 3.7 and sizeHS < 1000 (caret ^ is logical and operator)
      1. σ GPA > 3.7  ^ sizeHS < 1000 Student
    3. Application for Stanford for CS major
      1. σ cName="Stanford" ^ major="CS" Apply
    4. Select operator general case:
      1. σ condition Relation
  2. Project operator - picks certain columns:
    1. Apply relation with sID and decision columns only
      1. П sID,decision Apply
    2. Project operator general case:
      1. П col1,col2,col3... Relation
  3. To Select and Project at the same time:
    1. ID and name of students with GPA>3.7:
      1. П sID,sName (σ GPA>3.7 Student)
  4. Duplicates:
    1. SQL is based on multisets/bags, so duplicates are also shown
    2. RA is based on sets, so duplicates are eliminated automatically
  5. Cross-product operator (a.k.a. cross-join) - combine two relations (a.k.a. Cartesian product) horizontally:
    1. Student x Apply as result we get big relation which going to have eight (8) attributes
    2. as a convention when cross-product is done and we get the same attributes for both cross-producted relations we preface their name with the name of the relation they came from: Student.sID / Apply.sID
    3. cross-product gives as relation where for every row of Student relation you get all rows of Apply relation
    4. Names and GPA's of students with HS>1000 who applied to CS and were rejected:
      1.  П sName, GPA (σ Student.sID=Apply.sID ^ HS>1000 ^ major="CS" ^ decision="Reject" (Student x Apply))
  6. Difference operator A\B or A-B returns elements of A that not in B:
    1. if A = {a, b, c} and B = {b, c, d} then A - B = {a}
    2. IDs of students who didn't apply anywhere:
      1.  sID Student) - (П sID Apply)
    3. IDs and names of students who didn't apply anywhere (we can't just add sName to the project from Student relation because Apply relation has not this attribute and we might not use difference operator on sets with different quantity of attributes):
      1. П sID,sName ( ( (П sID Student) - (П sID Apply) ) ⋈ Student )
  7. Union (denoted by ∪) operator is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. Union combines vertically:
    1. A ∪ B means all members of A and also all members of B that not in A (RA is set based and removes duplicates):
      1. if A = {a, b, c} and B = {b, c, d} then:
        1. all members of A are {a, b, c} 
        2. all members of B that not in A is B\A = {d}
        3. combine both {a, b, c} and {d} => {a, b, c, d}
        4. so: A ∪ B = {a, b, c, d}
    2. List of college and student names:
      1.  cName College) ∪ (П sName Student)
  8. Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:
    1. Names and GPA's of students with HS>1000 who applied to CS and were rejected:
      1. П sName, GPA (σ HS>1000 ^ major="CS" ^ decision="Reject"(Student ⋈ Apply))
    2. Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:
      1. П sName, GPA (σ HS>1000 ^ major="CS" ^ decision="Reject" ^ enrollment>20000(Student ⋈ (Apply ⋈ College)))
    3. Relation between ⋈ and x:
      1. E1 ⋈ E2 => П schema(E1) U schema(E2) (σ E1.a1= E2.a1 ^ E1.a2= E2.a2 ^ ... (E1 x E2))
  9. Theta Join - operator takes two expressions/relations and combines them with bow tie looking operator (⋈) but with a subscript theta (θ) which means select condition (any select condition you want):
    1. E1 ⋈θE2 = σθ (E1 ⋈ E2 )
    2. Theta join is the basic operation implemented in RDBMS so the term "join" often means theta-join
  10. Intersection operator -  A ∩ B of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements:
    1. if A = {1, 2, 3} and B = {2, 3, 4} then A ∩ B = {2, 3}
    2. Names that are both college name and student name:
      1.  sName Student) ∩ (П cName College)
    3. Expressing intersection via difference
      1. A ∩ B => A - ( A - B ) 
    4. Expressing intersection via natural join:
      1. A ∩ B => A ⋈ B
  11. Rename operator uses ρ (rho), it reassigns schema in the result of expression (relation). Above (in 7.2.1 and 10.2.1 we used operators on relations with different attribute names - different schemas, in practice RA doesn't allow that, so we need to use rename operator):
    1. General form:
      1. ρ R(A1,A2,...An) Relation E # call result of the relation E - R with attributes A1 to An
    2. Unify schemas for set operators:
      1. List of college and student names (7.2.1):
        1. ρ C1(name) (П cName College) ∪ ρ C2(name)  sName Student)
      2. Names that are both college name and student name (10.2.1):
        1. ρ C1(name) (П sName Student) ∩ ρ C2(name) (П cName College)
    3. for disambiguation in self-joins:
      1. pairs of colleges in same state:
        1. With cross-join
          1. σ s1=s2 ^ n1!=n2 ( ρ C1(n1,s1,e1) College x ρ C2(n2,s2,e2) College )
        2. With natural-join:
          1. σ n1!=n2 ( ρ C1(n1,s,e1) College x ρ C2(n2,s,e2) College )
  12. Select, Project, Cross-join, union, difference, rename are RA basic operators
  13. Natural join, theta join, intersection are not RA basic operators they can be expressed with use of basic operators, so there are actually are abbreviations

These materials were used to write this synopsis-post:

No comments:

Post a Comment