Thursday, July 25, 2019

DB Basics, Cross, Natural, Inner, Outer, Theta Join

In this blog-post I'll try to go from formal notions in Relational Algebra to the practical SQL using the same queries as in https://it-tuff.blogspot.com/2019/07/relational-algebra-db-basics-select.html.

Prerequisites for practical learning:
  1. install mysq or mariadb server
  2. RA Relation is table in SQL and tables are in database:
    1. CREATE DATABASE Test;
    2. USE Test;
    3. SHOW DATABASES;
  3. RA key is PRIMARY KEY  in SQL, RA Atribute is column in SQL and RA Tuple is row in SQL. To fill table we first must create it's schema:
    1. Data types:
      1. VARCHAR - used for storing alphabetic or mixed alpha-numeric data
      2. INTEGER - storing whole numbers from ~ -2billions to ~+2billions
      3. DECIMAL - storing whole and non-whole numbers, you must specify length of number and also length of the fractional part - DECIMAL(10,4) - number length is 10 digits with 4 digits after decimal-point
      4. after showing data type you must show probable maximal length of that data
    2. CREATE TABLE College (cName VARCHAR(255), PRIMARY KEY (cName) , state VARCHAR(10), enrollment INTEGER);
    3. SHOW TABLES;
    4. CREATE TABLE Student (sID INTEGER, PRIMARY KEY(sID), sName VARCHAR(255), GPA DECIMAL(4,2), sizeHS INTEGER); # HS = High School
    5. SHOW TABLES;
    6. CREATE TABLE Apply (sID INTEGER, PRIMARY KEY(sID), cName VARCHAR(255), major VARCHAR(255), decision VARCHAR(20));
    7. SHOW TABLES;
  4. Now fill tables with test data:
    1. INSERT INTO College (cName, state, enrollment) VALUES ("Amridge", "AL", 749),  ("Berkeley", "CA", 42159), ("Stanford", "CA", 43797), ("Wyoming", "WY", 2024), ("Harcum", "PA", 1425);
    2. INSERT INTO Student (sID, sName, GPA, sizeHS) VALUES (1001, "Nita Millwood", 3.2, 900), (1002, "Vincenzo Lyons", 3.8, 750), (1003, "Zachery Lefebvre", 2.9, 1500), (1004, "Wilbert Chan", 3.6, 1620), (1005, "Mirna Hamann", 3.9, 1000), (1006, "Delta Shutt", 2.5, 1300), (1007, "Ryan Lacefield", 3.1, 1460);
    3. INSERT INTO Apply (sID, cName, major, decision) VALUES (1001, "Amridge", "BA", "accept"), (1002, "Berkeley", "CS", "accept"), (1003, "Houston", "CE" ,"reject"), (1004, "Berkeley", "CS", "reject"), (1005, "Stanford", "CS", "accept");

Practicing SQL:
  1. In SQL RA Select and Project are combined into one operator SELECT:
    1. right after select we write Projection part (* means all columns/attributes)
    2. after Projection part we write FROM and then write table/relation name
    3. after table name we write WHERE with needed column/attribute parameters - this is condition of the Selection
    4. RA ^ (logical and) is AND in SQL
    5. students with GPA>3.7 :
      1. Select * FROM Student WHERE GPA > 3.7;
    6. Application for Stanford for CS major 
      1. SELECT * FROM Apply WHERE cName="Stanford" AND major="CS"
    7. ID and name of students with GPA>3.7: 
      1. SELECT sID,sName FROM Student WHERE GPA > 3.7
  2. In SQL RA Cross-Product is CROSS JOIN in MySQL CROSS JOIN and INNER JOIN are the same, in Oracle you can't specify ON clause for CROSS JOIN (only WHERE is allowed) and Oracle INNER JOIN allows ON clause. Also theta join is join using only WHERE condition and not using ON or USING:
    1. Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected: 
      1. To deeply understand this we'll compose this query step by step:
      2. First we'll find all students:
        1. SELECT * FROM Student ;
      3. Now we need to find applications of all students (cross-product):
        1. SELECT * FROM Student CROSS JOIN Apply ;
      4. Previous query must be filtered by the condition Student.sID=Apply.sID:
        1. SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID ;
      5. Add sizeHS > 1000 condition:
        1. SELECT* FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000;
      6. Add two other conditions - major="CS" and decision="reject":
        1. SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;
      7. Now make projection to select only sName and GPA:
        1. SELECT sName, GPA FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;
  3. RA Union in SQL is UNION - this operator is used to make composition of the results of two (or more)  select statements:
    1. List of college and student names:
      1. SELECT cName FROM College 
      2. UNION 
      3. SELECT sName FROM Student;
  4. RA Rename operator is AS in SQL:
    1. List of college and student names under the name Names:
      1. SELECT cName AS Names FROM College
      2. UNION
      3. SELECT sName FROM Student;
    2. for disambiguation in self-joins (when relation/table is joined with itself):
      1. pairs of colleges in same state (we name 1st call of College table C1, and the second - C2):
      2. Only renaming tables:
        1. SELECT * 
        2. FROM College AS C1 
        3. CROSS JOIN College AS C2 
        4. WHERE 
        5. C1.state=C2.state AND
        6. C1.cName != C2.cName;
      3. Renaming tables and columns:
        1. SELECT C1.cName AS C1, C2.cName AS C2, C1.State 
        2. FROM College AS C1 
        3. CROSS JOIN College AS C2 
        4. WHERE C1.state=C2.state AND
        5. C1.cName != C2.cName;
  5. Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above cross-join example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:
    1. Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected:
      1. SELECT sName, GPA 
      2. FROM Student 
      3. NATURAL JOIN Apply 
      4. WHERE sizeHS>100 AND 
      5. major="CS" AND 
      6. decision="reject";
    2. The same with column and table renaming:
      1. SELECT St.sName, St.GPA 
      2. FROM Student AS St 
      3. NATURAL JOIN Apply AS Ap 
      4. WHERE St.sizeHS>1000 AND 
      5. Ap.major="CS" AND 
      6. Ap.decision="reject";
    3. Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:
      1. Using table rename and two select statements:
        1. SELECT S.sName, S.GPA 
        2. FROM Student AS S 
        3. NATURAL JOIN
        4.  (SELECT * 
        5. FROM Apply AS A 
        6. NATURAL JOIN College AS C  
        7. WHERE C.enrollment>20000 AND 
        8. A.major="CS" AND
        9.  A.decision="reject") AS A 
        10. WHERE S.sizeHS>1000;
      2. Using several natural joins in one Select:
        1. SELECT sName, GPA
        2. FROM Student
        3. NATURAL JOIN Apply
        4. NATURAL JOIN College
        5. WHERE sizeHS>1000 AND
        6. major="CS" AND
        7. decision="reject";
  6. RA Difference operator can be simulated with LEFT JOIN in MySQL (left join adds found rows from the right side to the left side, if right side is empty then NULL values are used), here you must show which columns are used for selection:
    1. IDs of students who didn't apply anywhere:
    2. We can use ON Student.sID = Apply.sID:
      1. SELECT Student.sID 
      2. FROM Student
      3. LEFT JOIN Apply
      4. ON Student.sID=Apply.sID
      5. WHERE Apply.sID IS NULL;
    3. Also "ON Student.sID=Apply.sID" = USING(sID) - when both columns have the same name:
      1. SELECT Student.sID 
      2. FROM Student
      3. LEFT JOIN Apply
      4. USING(sID)
      5. WHERE Apply.sID IS NULL;
  7. MySQL RIGHT JOIN works similar to  LEFT JOIN, the difference is that RIGHT JOIN uses right relation as the main, and LEFT JOIN uses left relation as the main one.
  8. FULL JOIN is INNER JOIN + RIGHT JOIN + LEFT JOIN
  9. Intersection operator can be simulated in MySQL using join and DISTINCT (show only unique values):
    1. Names that are both college name and student name:
      1. SELECT DISTINCT(sName) 
      2. FROM Student
      3. INNER JOIN College
      4. ON sName=cName;
  10. Inner and Outer joins:
    1. Inner join show only data which is in both left and right relations (using ON or USING)
    2. Outer joins use on relation as the main and completes this relation with the data from the other one and all empty data filled with NULLs (LEFT and RIGHT joins are: LEFT OUTER JOIN and RIGHT OUTER JOIN)

No comments:

Post a Comment