Monday, July 29, 2019

CAP Theorem

CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
  1. Consistency: data on every non-failing node in the distributed system is the same. So that updates across distributed system must be done before allowing further reads.
  2. Availability: Availability can be used in two different meanings:
    1. Availability of real service - can be measured as ratio expressed as a percentage between working and non-working time of the service
    2. Availability in context of CAP theorem - for a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. So that data must be replicated between nodes of the system and also server is not allowed to ignore the client's requests.
  3. Partition tolerance: the system continues to operate even if any one part of the system is lost or fails. Partition tolerance doesn’t require every node still be available to handle requests. It just means that partitions may occur. If you deploy on a typical IP network, partitions will occur; partition tolerance in these environments is not optional. So only a total network failure can cause a system to respond incorrectly.
So in practice every distributed system using network, must use P, and thus we have two possible types of systems: AP or CP. For systems not using network, we have AC, AP, CP models.
Conventional databases assume no partitioning - clusters were assumed to be small and local (CA).
NoSQL systems may sacrifice consistency. 

AP or AC:

  1. On systems that allow reads before updating all the nodes, we will get high Availability
  2. On systems that lock all the nodes before allowing reads, we will get Consistency

Description of the CAP theorem:
  1. Setup:
    1. we have distributed system consisting of 2 servers - S1 and S2
    2. S1 and S2 are interconnected
    3. C connects to both S1 and S2
    4. client - C - can query any of these servers (S1 or S2)
    5. S1 and S2 keep track on a variable v with initial value = 0 (v=0)
    6. write is done from C to S1 or S2 (write request and write responce) and read is done from C (read request and read responce) to S1 or S2
  2. Consistency:
    1. consistent system:
      1. C write-request S1 => v=1
      2. S1 write => v=1
      3. S1 write-response C => v=1 
      4. S1 update S2
      5. S2 update => v=1
    2. inconsistent system:
      1. C write-request S1 => v=1
      2. S1 write => v=1
      3. S1 write-response C => v=1 
      4. S2 is not updated and v on S2 is still v=1
  3. Patition:
    1. When partition occurs - S1 and S2 are no more interconnected

Thursday, July 25, 2019

DB Basics, Cross, Natural, Inner, Outer, Theta Join

In this blog-post I'll try to go from formal notions in Relational Algebra to the practical SQL using the same queries as in https://it-tuff.blogspot.com/2019/07/relational-algebra-db-basics-select.html.

Prerequisites for practical learning:
  1. install mysq or mariadb server
  2. RA Relation is table in SQL and tables are in database:
    1. CREATE DATABASE Test;
    2. USE Test;
    3. SHOW DATABASES;
  3. RA key is PRIMARY KEY  in SQL, RA Atribute is column in SQL and RA Tuple is row in SQL. To fill table we first must create it's schema:
    1. Data types:
      1. VARCHAR - used for storing alphabetic or mixed alpha-numeric data
      2. INTEGER - storing whole numbers from ~ -2billions to ~+2billions
      3. DECIMAL - storing whole and non-whole numbers, you must specify length of number and also length of the fractional part - DECIMAL(10,4) - number length is 10 digits with 4 digits after decimal-point
      4. after showing data type you must show probable maximal length of that data
    2. CREATE TABLE College (cName VARCHAR(255), PRIMARY KEY (cName) , state VARCHAR(10), enrollment INTEGER);
    3. SHOW TABLES;
    4. CREATE TABLE Student (sID INTEGER, PRIMARY KEY(sID), sName VARCHAR(255), GPA DECIMAL(4,2), sizeHS INTEGER); # HS = High School
    5. SHOW TABLES;
    6. CREATE TABLE Apply (sID INTEGER, PRIMARY KEY(sID), cName VARCHAR(255), major VARCHAR(255), decision VARCHAR(20));
    7. SHOW TABLES;
  4. Now fill tables with test data:
    1. INSERT INTO College (cName, state, enrollment) VALUES ("Amridge", "AL", 749),  ("Berkeley", "CA", 42159), ("Stanford", "CA", 43797), ("Wyoming", "WY", 2024), ("Harcum", "PA", 1425);
    2. INSERT INTO Student (sID, sName, GPA, sizeHS) VALUES (1001, "Nita Millwood", 3.2, 900), (1002, "Vincenzo Lyons", 3.8, 750), (1003, "Zachery Lefebvre", 2.9, 1500), (1004, "Wilbert Chan", 3.6, 1620), (1005, "Mirna Hamann", 3.9, 1000), (1006, "Delta Shutt", 2.5, 1300), (1007, "Ryan Lacefield", 3.1, 1460);
    3. INSERT INTO Apply (sID, cName, major, decision) VALUES (1001, "Amridge", "BA", "accept"), (1002, "Berkeley", "CS", "accept"), (1003, "Houston", "CE" ,"reject"), (1004, "Berkeley", "CS", "reject"), (1005, "Stanford", "CS", "accept");

Practicing SQL:
  1. In SQL RA Select and Project are combined into one operator SELECT:
    1. right after select we write Projection part (* means all columns/attributes)
    2. after Projection part we write FROM and then write table/relation name
    3. after table name we write WHERE with needed column/attribute parameters - this is condition of the Selection
    4. RA ^ (logical and) is AND in SQL
    5. students with GPA>3.7 :
      1. Select * FROM Student WHERE GPA > 3.7;
    6. Application for Stanford for CS major 
      1. SELECT * FROM Apply WHERE cName="Stanford" AND major="CS"
    7. ID and name of students with GPA>3.7: 
      1. SELECT sID,sName FROM Student WHERE GPA > 3.7
  2. In SQL RA Cross-Product is CROSS JOIN in MySQL CROSS JOIN and INNER JOIN are the same, in Oracle you can't specify ON clause for CROSS JOIN (only WHERE is allowed) and Oracle INNER JOIN allows ON clause. Also theta join is join using only WHERE condition and not using ON or USING:
    1. Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected: 
      1. To deeply understand this we'll compose this query step by step:
      2. First we'll find all students:
        1. SELECT * FROM Student ;
      3. Now we need to find applications of all students (cross-product):
        1. SELECT * FROM Student CROSS JOIN Apply ;
      4. Previous query must be filtered by the condition Student.sID=Apply.sID:
        1. SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID ;
      5. Add sizeHS > 1000 condition:
        1. SELECT* FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000;
      6. Add two other conditions - major="CS" and decision="reject":
        1. SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;
      7. Now make projection to select only sName and GPA:
        1. SELECT sName, GPA FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;
  3. RA Union in SQL is UNION - this operator is used to make composition of the results of two (or more)  select statements:
    1. List of college and student names:
      1. SELECT cName FROM College 
      2. UNION 
      3. SELECT sName FROM Student;
  4. RA Rename operator is AS in SQL:
    1. List of college and student names under the name Names:
      1. SELECT cName AS Names FROM College
      2. UNION
      3. SELECT sName FROM Student;
    2. for disambiguation in self-joins (when relation/table is joined with itself):
      1. pairs of colleges in same state (we name 1st call of College table C1, and the second - C2):
      2. Only renaming tables:
        1. SELECT * 
        2. FROM College AS C1 
        3. CROSS JOIN College AS C2 
        4. WHERE 
        5. C1.state=C2.state AND
        6. C1.cName != C2.cName;
      3. Renaming tables and columns:
        1. SELECT C1.cName AS C1, C2.cName AS C2, C1.State 
        2. FROM College AS C1 
        3. CROSS JOIN College AS C2 
        4. WHERE C1.state=C2.state AND
        5. C1.cName != C2.cName;
  5. Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above cross-join example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:
    1. Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected:
      1. SELECT sName, GPA 
      2. FROM Student 
      3. NATURAL JOIN Apply 
      4. WHERE sizeHS>100 AND 
      5. major="CS" AND 
      6. decision="reject";
    2. The same with column and table renaming:
      1. SELECT St.sName, St.GPA 
      2. FROM Student AS St 
      3. NATURAL JOIN Apply AS Ap 
      4. WHERE St.sizeHS>1000 AND 
      5. Ap.major="CS" AND 
      6. Ap.decision="reject";
    3. Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:
      1. Using table rename and two select statements:
        1. SELECT S.sName, S.GPA 
        2. FROM Student AS S 
        3. NATURAL JOIN
        4.  (SELECT * 
        5. FROM Apply AS A 
        6. NATURAL JOIN College AS C  
        7. WHERE C.enrollment>20000 AND 
        8. A.major="CS" AND
        9.  A.decision="reject") AS A 
        10. WHERE S.sizeHS>1000;
      2. Using several natural joins in one Select:
        1. SELECT sName, GPA
        2. FROM Student
        3. NATURAL JOIN Apply
        4. NATURAL JOIN College
        5. WHERE sizeHS>1000 AND
        6. major="CS" AND
        7. decision="reject";
  6. RA Difference operator can be simulated with LEFT JOIN in MySQL (left join adds found rows from the right side to the left side, if right side is empty then NULL values are used), here you must show which columns are used for selection:
    1. IDs of students who didn't apply anywhere:
    2. We can use ON Student.sID = Apply.sID:
      1. SELECT Student.sID 
      2. FROM Student
      3. LEFT JOIN Apply
      4. ON Student.sID=Apply.sID
      5. WHERE Apply.sID IS NULL;
    3. Also "ON Student.sID=Apply.sID" = USING(sID) - when both columns have the same name:
      1. SELECT Student.sID 
      2. FROM Student
      3. LEFT JOIN Apply
      4. USING(sID)
      5. WHERE Apply.sID IS NULL;
  7. MySQL RIGHT JOIN works similar to  LEFT JOIN, the difference is that RIGHT JOIN uses right relation as the main, and LEFT JOIN uses left relation as the main one.
  8. FULL JOIN is INNER JOIN + RIGHT JOIN + LEFT JOIN
  9. Intersection operator can be simulated in MySQL using join and DISTINCT (show only unique values):
    1. Names that are both college name and student name:
      1. SELECT DISTINCT(sName) 
      2. FROM Student
      3. INNER JOIN College
      4. ON sName=cName;
  10. Inner and Outer joins:
    1. Inner join show only data which is in both left and right relations (using ON or USING)
    2. Outer joins use on relation as the main and completes this relation with the data from the other one and all empty data filled with NULLs (LEFT and RIGHT joins are: LEFT OUTER JOIN and RIGHT OUTER JOIN)


DHCP over Relay on Docker

DHCP (Dynamic Host Configuration Protocol) helps us to address dynamically our hosts on the network. In fact, when a Host is configured to get its IP address dynamically, it will broadcast a DHCP Request on the network searching for a DHCP server. DHCP server has to be on the same broadcast domain as the CLIENTS since routers do not forward broadcast packets.
For Docker container it means that we must connect our container to each subnet in the network of our company. But we want to use just one interface (in that post I'll use macvlan) on our container. But problem is:
As our DHCP Client wants to get an IP address, it will send a DHCP Discover message which is a broadcast message. As the Router/Gateway/Firewall do not forward broadcast packets, this message will never reach the DHCP Server (our Docker Container).
To solve this issue we'll use DHCP Relay Agent. This feature is activated on a network device having interfaces in all subnets of the network of the company:

  1. this device (router/gateway/firewall) forwards DHCP messages to the DHCP Server, and when the DHCP Server responds, this device forwards the replies to the Client. 
  2. DHCP Realy Agent adds giaddr (gateway interface address) field to the DHCP Packet. This field contains DHCP Relay Agent interface IP address which received DHCP Request and also this field helps to identify pool from which DHCP Server has to select IP addresses. 
  3. After identifying pool DHCP Server replies with DHCP Offer broadcast message and this message forwarded by DHCP Relay Agent to the DHCP Client.
  4. DHCP Client replies with DHCP Request message 
  5. this message also forwarded to the DHCP Server by DHCP Relay Agent
  6. DHCP Server replies with DHCP Ack
  7. this message forwarded to the DHCP Client by DHCP Relay Agent 
  8. finally DHCP Clietn is assigned an IP address

If you want to use Cisco ISR as Relay Agent:

  1. Setup interface which will be used to interconnect DHCP Relay Agent and DHCP Server:
    1. conf term
    2. int fa0/1 # DHCP Server facing interface
    3. ip address 172.16.3.4 255.255.255.0
  2. Setup interface which will use DHCP Relay Agent and enable IP-helper (DHCP Server IP address) on that interface - all DHCP messages will be forwarded to that IP address:
    1. int fa 0/0
    2. ip address 10.10.6.1 255.255.225.0
    3. ip helper-address 172.16.3.249
    4. do wr
  3. Check configuration:
    1. show ip int fa0/0
  4. Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:
    1. ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface
Because of using macvlan for Docker Container, you need to enable IP forwarding on Docker Host:
echo 1 /proc/sys/net/ipv4/ip_forward . Previous is not persistent  setup, to make it persistent:


  1. sudo vi /etc/sysctl.conf and add net.ipv4.ip_forward = 1
  2. sudo sysctl -p



If you want to use CentOS 7 as Relay Agent:
  1. Setup interface which needs to use DHCP Relay Agent:
    1. vi ifcfg-eth0
      1. IPADDR=10.10.6.1 
      2. PREFIX=24
    2. vi ifcfg-eth1  # DHCP Server facing interface
      1. IPADDR=172.16.3.4
      2. PREFIX=24
    3. yum install dhcp # dhcp-relay is part of dhcp package
    4. cp /usr/lib/systemd/system/dhcrelay.service /etc/systemd/system
    5. vi /etc/systemd/system
      1. under [Service]
      2. append IP address of the DHCP server to the ExecStart after --no-pid:
        1. ExecStart=/usr/sbin/dhcrelay -d --no-pid 172.16.3.249
        2. Also you can choose interfaces to activate DHCP Relay on them (by default all interfaces are used). You must use separate "-i" option for each additional interface:
          1. ExecStart=/usr/sbin/dhcrelay -d --no-pid 172.16.3.249 -i eth1 -i eth2.20
    6. systemctl --system daemon-reload
    7. systemctl start dhcrelay
    8. systemctl enable dhcrelay
    9. systemctl status dhcrelay
    1. Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:
      1. ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface
    If you want to use CentOS 6 as Relay Agent:
    1. Setup interface which needs to use DHCP Relay Agent:
      1. vi ifcfg-eth0
        1. IPADDR=10.10.6.1 
        2. NETMASK=24
      2. vi ifcfg-eth1  # DHCP Server facing interface
        1. IPADDR=172.16.3.4
        2. NETMASK=24
      3. yum install dhcp # dhcp-relay is part of dhcp package
      4. vi /etc/sysconfig/dhcrelay
        1. INTERFACES= "eth1 eth2.20" #which interfaces must use DHCP Relay Agent
        2. DHCPSERVERS="172.16.3.249" # DHCP server IP address
      5. service dhcrelay start
      6. chkconfig dhcrelay on
      7. service dhcrelay status
      1. Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:
        1. ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface

      Interface with DHCP relay must use static IP address (no DHCP is allowed).

      dhcp.conf
      # this server is primary and authorative server on that network
      authoritative;
      # dhcpd listens *only* on interfaces for which it finds subnet declaration in dhcpd.conf
      # empty declaration for local IP subnet to start listening on eth0 interface
      subnet 172.16.3.0 netmask 255.255.255.0 { }

      subnet 10.10.6.0 netmask 255.255.255.0 {
              range 10.10.6.2 10.10.6.3;
              option routers 10.10.6.1;
              #option domain-name-servers 8.8.8.8, 8.8.4.4;
          }

      to kill process on container:
      top > k > PID > Enter

      dhcp -cf dhcp.conf

        Tuesday, July 23, 2019

        Docker Networking

        Normally, Docker creates a new network namespace for each container we run. As we attach the container to a network, we define an endpoint that connects the container network namespace with the actual network. This way, we have one container per network namespace. Docker provides an additional way to define the network namespace in which a container runs. When creating a new container, we can specify that it should be attached to or maybe we should say included in the network namespace of an existing container. With this technique, we can run multiple containers in a single network namespace.

        When you install docker it creates 3 networks automatically:
        1. bridge
          1. docker-host NIC goes to promiscuous mode (allows all L2 packlets without checking destination MAC in other words MAC filtering is disabled)
          2. actually docker bridge is a switch inside docker host, this switch interconnects docker-host and docker-container
          3. network used by default when you run a container
          4. containers in this network can communicate with each other
          5. containers assigned IP from 172.17.0.0/16 subnet
          6. to go to outside world use must use port-mapping to the docker-host IP
          7. Overview:
            1. new network namespace created for container
            2. docker0 bridge is automatically created and attached to the docker-host NIC (docker-host namespace)
            3. veth (Virtual Ethernet) interface:
              1. automatically created
              2. attached to the docker0 bridge
              3. attached the container NIC
              4. veth interface is like media/cable connecting docker0-bridge/switch port to the container NIC
        2. none
          1. to use container without network: --network=none
          2. container no attached to any networks and also cannot communicate with any other container
        3. host
          1. to use host network: --network=host
          2. in this case container uses the same IP as docker-host uses
          3. ports are shared between docker-host and all containers connected to the "host" network
          4. container has direct access to the docker-host's NIC
        To create custom network:
        docker network create \
        custom_isolated_network \
             --driver bridge \
             --subnet 192.168.190.0/24 \
        List all docker networks:
        docker network ls
        To view bridges only:
        brctl show


        Other types of networks supported by docker:
        1. macvlan (requires at least kernel 3.9 on docker-host) - docker-host NIC uses unicast filtering, so L2 with not known DST MAC would be discarded (except is passthru, which uses promiscuous mode)
          1. this type allows you to assign several IP addresses to the same NIC.
          2. MAC-VLAN allows to configure subinterfaces (slave devices) of a parent (master) device
          3. each subinterface will have it own randomly generated MAC and consequently IP address
          4. subinterfaces cannot interact directly with parent interface
          5. to communicate with parent interface - assign macvlan subinterface to the docker-host
          6. macvlan subinterfaces are for example mac0@eth0 (this notation clearly identifies subinterface's parent)
          7. The macvlan is a trivial bridge that doesn’t need to do learning as it knows every mac address it can receive, so it doesn’t need to implement learning or stp. Which makes it simple stupid and fast.
          8. Each sub-interface can be in one of 4 modes that affect possible traffic flows (these are macvlan modes and not all of them are presented in macvlan docker driver - currently docker support only macvlan-bridge mode):
            1. Private - traffic goes only from subinterfaces to the out, subinterfaces on the same parent cannot communicate with each-other. This is not bridge.
            2. VEPA (Virtual Ethernet Port Aggregator) - this mode need VEPA compatible switch. Subinterfaces of one parent can communicate with each other with the help of VEPA hardware switch which returns all frames where both source and destination  are local to the macvlan interface
            3. bridge - all subinterfaces on a parent interface are interconnected with a simple bridge. Frames from one subinterface to the other delivered directly (through bridge) and not sent out. All MAC addresses are known so macvlan-bridge doesn't need STP and MAC learning
            4. passthru - allows a single VM to be connected directly to the physical interface. The advantage of this mode is that VM is then able to change MAC address and other interface parameters.
          9. docker network create --driver macvlan --subnet=10.0.0.0/24 --gateway=10.0.0.1  --opt parent=eth0 macvlanNetworkName
            1. gateway - external (not related to the docker-host) gateway
            2. parent - docker-host physical interface
            3. docker-host eth0 can be for example 10.0.0.2
          10. also you can use macvlan with VLAN interfaces. In this case subinterfaces are using different parent interfaces (ex. eth0.10 and eth0.20) and can communicate with each other only over gateway:
            1. create VLAN interface eth0.10 and eth0.20
            2. docker network create --driver macvlan --subnet=10.0.10.0/24 --gateway=10.0.10.1  --opt parent=eth0.10 macvlan10
            3. docker network create --driver macvlan --subnet=10.0.20.0/24 --gateway=10.0.20.1  --opt parent=eth0.20  macvlan20
            4. docker run --name='container0' --hostname='container0' --net=macvlan10 --ip=10.0.10.2 --detach=true centos
          11. To add additional IP to a container:
            1. docker network connect --ip=10.0.20.3 macvlan20 container1
          12. How to connect from macvlan subinterface to the host:
            1. This will prevent Docker from assigning 192.168.1.223 address to a container, --ip-range command says docker IPAM to allocate IP addresses from given sub-range: 
              1. docker network create -d macvlan -o parent=eno1 --subnet 192.168.1.0/24 --gateway 192.168.1.1 --ip-range 192.168.1.192/27 --aux-address 'host=192.168.1.223' mynet 
            2. Next, we create a new macvlan interface on the host. You can call it whatever you want: 
              1. ip link add mynet-aux link eno1 type macvlan mode bridge
            3. Now we need to configure the interface with the address we reserved and bring it up: 
              1. ip addr add 192.168.1.223/32 dev mynet-aux 
              2. ip link set mynet-aux up
            4. The last thing we need to do is to tell our host to use that interface when communicating with the containers. This is relatively easy because we have restricted our containers to a particular CIDR subset of the local network; we just add a route to that range like this: 
              1. ip route add 192.168.1.192/27 dev mynet-aux 
            5. With that route in place, your host will automatically use this mynet-aux interface when communicating with containers on the mynet network.
            6. above NIC based configs are not persistent and will be lost after reboot, so add all related config to the appropriate configuration files (NIC and route)
        2. ipvlan is similar to the macvlan but uses the same MAC for all endpoints (docker containers). It's useful in situations when switch where docker-host is connected restricts maximum number of MAC addresses per physical port. ipvlan requires at least kernel 4.1 on docker host 
          An IPAM (IP Address Management) driver lets you delegate IP lease management to an external component. This way you can coordinate IP use with other virtual or bare metal servers in your datacenter.
          Docker controls the IP address assignment for network and endpoint interfaces via the IPAM driver(s). Libnetwork has a default, built-in IPAM driver and allows third party IPAM drivers to be dynamically plugged. On network creation, the user can specify which IPAM driver libnetwork needs to use for the network’s IP address management. For the time being, there is no IPAM driver that would communicate with external DHCP server, so you need to rely on Docker’s default IPAM driver for container IP address and settings configuration. Containers use host’s DNS settings by default, so there is no need to configure DNS servers.
          IPAM driver ensures the container got an IPv4 and an IPv6 address from the subnets configured for the macvlan network.

          İf you use Hyper-V:
          Macvlan uses a unique MAC address per ethernet interface, by default, Hyper-V only allows traffics with MAC address sticks to the virutal switch port, we need to "Enable MAC address spoofing" to prevent virtual switch dropping VLAN's traffic.

          Docker Images & Dockerfile

          1. creating new image - can be done when you can't find needed container image on the docker-hub.
          2. Dockerfile:
            1. text file written in a specific format a docker can understand
            2. Every line starts with instruction (FROM, RUN, COPY etc.) followed by argument
            3. Each instruction instructs docker to do a specific action
              1. First line starts with a base OS or another image: FROM centos
              2. Then you install needed dependencies, for example:
                1. RUN yum update -y && yum install python python-pip
                2. RUN pip install flask flask-mysql
              3. Copy source files from docker-host to the docker-image:
                1. COPY . /opt/source-code
              4. Command to run when image is run as a container:
                1. ENTRYPOINT FLASK_APP=/opt/source-code/app.py flask run
            4. When building docker image, every line of Dockerfile creates layer of the docker image:
              1. For the above example layers are:
                1. layer 1: Base CentOS layer
                2. layer 2: changes in yum packages
                3. layer 3: changes in pip packages
                4. layer 4: source code
                5. layer 5: update entry-point with "flask" command
              2. docker build:
                1. docker build Dockerfile -t nameOfTheImage
                2. docker build -t nameOfTheImage dockerFileDirectoryName
                3. docker build -t nameOfTheImage .
                4. docker build Dockerfile -t nameOfTheImage .
              3. to view build process history: docker history imageName
              4. layered build process helps to debug and also helps to start build process, in case of failure, from the needed layer (this is done automatically using docker cash). The same is true when you want to add additional steps in dockerfile, rebuild will be done using cash, so only affected layers will be rebuilt
            5. CMD vs ENTRYPOINT:
              1. CMD defines command and it's parameters (if any) which will run when container starts:
                1. CMD ["mysqld"] or CMD mysqld
                2. CMD ["sleep", "5"] or CMD sleep 5
              2. ENTRYPOINT is like CMD but also appends any input to the "docker run" to the end of the command as parameter:
                1. ENTRYPOINT ["sleep"] or ENTRYPOINT sleep
                2. docker run centos-sleeper 10 # 10 will be appended as parameter to the "sleep" command
                3. when you append some input to the "docker run" and use CMD, then this is not appended to the command following CMD - the entire appended input is used as command after CMD 
              3. To use some value as default value for the ENTRYPOINT:
                1. ENTRYPOINT sleep
                2. CMD 10
                3. two lines above will be "sleep 10" by default, so CMD is appended to the ENTRYPOINT as parameter 

          DHCP on Docker

          DHCP (Dynamic Host Configuration Protocol) helps us to address dynamically our hosts on the network. In fact, when a Host is configured to get its IP address dynamically, it will broadcast a DHCP REQUEST on the network searching for a DHCP server. DHCP server has to be on the same broadcast domain as the CLIENTS since routers do not forward broadcast packets.
          1. create macvlan network:
            1. docker network create -d macvlan -o parent=enp3s0 --subnet 172.16.3.0/24 --gateway 172.16.3.4  --aux-address 'host=172.16.3.250' mynet
          2. add macvlan-aux to the docker-host (to ping directly from docker-host) - including ip link, route etc:
            1. ip link add mynet-aux link enp3s0 type macvlan mode bridge
            2. ip addr add 172.16.3.250/32 dev mynet-aux 
          3. run container with macvlan driver (assign static IP) and run /bin/bash:
            1. docker run --name='ctr0' --hostname='ctr0' --net=mynet --ip=172.16.3.249 -it centos /bin/bash
          4. Ping container IP from the Docker-host:
            1. ping 172.16.3.249
          5. on container (all of these can be done with docker-file):
            1. yum install net-tools -y
            2. yum install dhcp -y
            3. 1.1.6.1 is DHCP relay IP address
            4. dhcpd listens *only* on interfaces for which it finds subnet declaration in dhcpd.conf
          vi dhcp.conf:
          # this server is primary and thus - authorative server on that network
          authoritative;
          subnet 172.16.3.0 netmask 255.255.255.0 {
                     range 172.16.3.1 172.16.3.3;
                     option routers 172.16.3.4;
                     option domain-name-servers 172.16.3.6;
          }     

            Run dhcp service with specified file: dhcpd -cf dhcp.conf

            to kill process on container:
            top > k > PID > Enter

            Tuesday, July 16, 2019

            YAML

            YAML Ain't Markup Language:
            1. YAML uses indentation to distinguish layers (same indentation - same layer) - use spaces, because tabs are not allowed.
            2. YAML start with three dashes: ---
            3. YAML disctionary in key-value pair in one of two forms:
              1. colon separated key (like Python dictionary):
                1. key: value
              2. indentation separated key:
                1. key:
                2.       value
              3. Also one key can contain nested dictionary:
                1. Method 1:
                  1. first_level_key:
                  2.    second_level_key_under_the_first_level: second_level_value
                2. Method 2:
                  1. first_level_key: {second_level_key_under_the_first_level: second_level_value}
            4. YAML uses dashes as indentation to represent list of items (use lists when you want key to have more than one values which are not keys themselves):
              1. Methos 1:
                1. this_is_a_list:
                2.  - element_1
                3.  - element_2
              2. Method 2:
                1. this_is_a_list : [element_1, elemenet_2]

            Relational Algebra

            RA is a formal language that forms conventions used in implemented languages like SQL.
            RA operates on relations and produces relations as result.
            Query (expression) on set of relations produces relation as result. 
            Key is an attribute or a set of attributes whose value is guaranteed to be unique.
            Tuple  - row of a relation.
            Attribute  - column of a relation.
            Relation - table consisting of attribute (column) and tuples (rows).
            Schema  - relation header (attribute names)
            We'll use simple college admission database with three relations (keys are in bold):
            1. 1st relation: College(schema: cName, state, enrollment)
            2. 2nd relation: Student(schema: sID, sName, GPA, sizeHS) # sizeHS = size of High School a student attended
            3. 3rd relation: Apply(schema: sID, cName, major, decision)
            Simplest query in a RA is simply the relation name, for example Student is valid expression (query) in RA and it's returning copy of Student relation.
            Use operators to filter, slice, combine relations:
            1. Select - picks certain rows out of a relation - σ (sigma):
              1. Students with GPA > 3.7 (GPA - Grade Point Average)
                1. σ GPA > 3.7 Student
              2. Students with GPA > 3.7 and sizeHS < 1000 (caret ^ is logical and operator)
                1. σ GPA > 3.7  ^ sizeHS < 1000 Student
              3. Application for Stanford for CS major
                1. σ cName="Stanford" ^ major="CS" Apply
              4. Select operator general case:
                1. σ condition Relation
            2. Project operator - picks certain columns:
              1. Apply relation with sID and decision columns only
                1. П sID,decision Apply
              2. Project operator general case:
                1. П col1,col2,col3... Relation
            3. To Select and Project at the same time:
              1. ID and name of students with GPA>3.7:
                1. П sID,sName (σ GPA>3.7 Student)
            4. Duplicates:
              1. SQL is based on multisets/bags, so duplicates are also shown
              2. RA is based on sets, so duplicates are eliminated automatically
            5. Cross-product operator (a.k.a. cross-join) - combine two relations (a.k.a. Cartesian product) horizontally:
              1. Student x Apply as result we get big relation which going to have eight (8) attributes
              2. as a convention when cross-product is done and we get the same attributes for both cross-producted relations we preface their name with the name of the relation they came from: Student.sID / Apply.sID
              3. cross-product gives as relation where for every row of Student relation you get all rows of Apply relation
              4. Names and GPA's of students with HS>1000 who applied to CS and were rejected:
                1.  П sName, GPA (σ Student.sID=Apply.sID ^ HS>1000 ^ major="CS" ^ decision="Reject" (Student x Apply))
            6. Difference operator A\B or A-B returns elements of A that not in B:
              1. if A = {a, b, c} and B = {b, c, d} then A - B = {a}
              2. IDs of students who didn't apply anywhere:
                1.  sID Student) - (П sID Apply)
              3. IDs and names of students who didn't apply anywhere (we can't just add sName to the project from Student relation because Apply relation has not this attribute and we might not use difference operator on sets with different quantity of attributes):
                1. П sID,sName ( ( (П sID Student) - (П sID Apply) ) ⋈ Student )
            7. Union (denoted by ∪) operator is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. Union combines vertically:
              1. A ∪ B means all members of A and also all members of B that not in A (RA is set based and removes duplicates):
                1. if A = {a, b, c} and B = {b, c, d} then:
                  1. all members of A are {a, b, c} 
                  2. all members of B that not in A is B\A = {d}
                  3. combine both {a, b, c} and {d} => {a, b, c, d}
                  4. so: A ∪ B = {a, b, c, d}
              2. List of college and student names:
                1.  cName College) ∪ (П sName Student)
            8. Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:
              1. Names and GPA's of students with HS>1000 who applied to CS and were rejected:
                1. П sName, GPA (σ HS>1000 ^ major="CS" ^ decision="Reject"(Student ⋈ Apply))
              2. Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:
                1. П sName, GPA (σ HS>1000 ^ major="CS" ^ decision="Reject" ^ enrollment>20000(Student ⋈ (Apply ⋈ College)))
              3. Relation between ⋈ and x:
                1. E1 ⋈ E2 => П schema(E1) U schema(E2) (σ E1.a1= E2.a1 ^ E1.a2= E2.a2 ^ ... (E1 x E2))
            9. Theta Join - operator takes two expressions/relations and combines them with bow tie looking operator (⋈) but with a subscript theta (θ) which means select condition (any select condition you want):
              1. E1 ⋈θE2 = σθ (E1 ⋈ E2 )
              2. Theta join is the basic operation implemented in RDBMS so the term "join" often means theta-join
            10. Intersection operator -  A ∩ B of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements:
              1. if A = {1, 2, 3} and B = {2, 3, 4} then A ∩ B = {2, 3}
              2. Names that are both college name and student name:
                1.  sName Student) ∩ (П cName College)
              3. Expressing intersection via difference
                1. A ∩ B => A - ( A - B ) 
              4. Expressing intersection via natural join:
                1. A ∩ B => A ⋈ B
            11. Rename operator uses ρ (rho), it reassigns schema in the result of expression (relation). Above (in 7.2.1 and 10.2.1 we used operators on relations with different attribute names - different schemas, in practice RA doesn't allow that, so we need to use rename operator):
              1. General form:
                1. ρ R(A1,A2,...An) Relation E # call result of the relation E - R with attributes A1 to An
              2. Unify schemas for set operators:
                1. List of college and student names (7.2.1):
                  1. ρ C1(name) (П cName College) ∪ ρ C2(name)  sName Student)
                2. Names that are both college name and student name (10.2.1):
                  1. ρ C1(name) (П sName Student) ∩ ρ C2(name) (П cName College)
              3. for disambiguation in self-joins:
                1. pairs of colleges in same state:
                  1. With cross-join
                    1. σ s1=s2 ^ n1!=n2 ( ρ C1(n1,s1,e1) College x ρ C2(n2,s2,e2) College )
                  2. With natural-join:
                    1. σ n1!=n2 ( ρ C1(n1,s,e1) College x ρ C2(n2,s,e2) College )
            12. Select, Project, Cross-join, union, difference, rename are RA basic operators
            13. Natural join, theta join, intersection are not RA basic operators they can be expressed with use of basic operators, so there are actually are abbreviations

            These materials were used to write this synopsis-post:

            Wednesday, July 10, 2019

            Hash Functions, Binary Tree, O(n)

            Hash Function

            A hash function is any function that can be used to map data of arbitrary size to data of fixed size (N).  Example of simple hash function is h(x) = x mod N (mod is modulus or remainder of division, for example 5 / 5 = 1 with no reminder => mod=0, 6 / 5 = 1 and 1 in remainder => mod = 1)The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table (consists of hash function h and array, also called table, of size N), a common data structure used in computer software for rapid data lookup. Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.This term is also known as a hashing algorithm or message digest function. No sorting and no searching required. When you compute the hash function you know where to store the data and you know where to find the data. Hash functions are just one-way they cannot be reversed. The main idea is to store key-element pairs (k, e) as index h(k):
            Example:
            1. phone book with 5 numbers in it (N = 5)
            2. h (name) = (lenght of name) mod 5
            3. This function is ok if all names in phonebook have different lengths, if some lengths are the same, then collision is occurred (collision - when pairs of input to hash function are mapped to the same hash value): 
              1. h(John) = 3 mod 5 = 3
              2. h(Jack) = 3 mod 5
              3. So h(John) = h(Jack)
              4. so because of many collision this is example of the bad hash-function
              5. actually collision can be in every hash function but hash function must be designed in the way minimizing collision possibility. To do so hash functions produce long enough hash-values and this values are hold smaller enough to be computed quickly.

            Binary Tree

            In computer science, a binary tree is a tree data structure in which each node (узел) has at most two children, which are referred to as the left child and the right child. Topmost node called root and this is L-0 (level zero) and height of 0. Each child node in binary tree defines a sub-tree, the root of which it is.

            Big O notation

            In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.Actual formula is O(f(n)) meaning: with an increase in the parameter n (amount of input to the algorithm) the running time of the algorithm will increase no faster than some constant multiplied by f(n). How to find big-O of some operation (as Example 3n^2+ n^5 + 4n + 5 + 2^n + log8(n) ):
            1. omit constants and constant multipliers (3n^2+ n^5 + 4n + 5 + 2^n + log8(n) => n^2+ n^5 + n + 2^n + log8(n))
            2. n^a grows faster than n^b for a > b. In other words if you have n^3 - omit n^2 (n^2+ n^5 + n + 2^n + log8(n) => n^5 + 2^n + log8(n) )
            3. any polynomial grows faster than any logarithm, so n or even sqrt(n), grows faster than log3(n)  ( n^5 + 2^n + log8(n) => n^5 + 2^n )
            4. any exponential grows faster than any polynomial, so 3^n grows faster than n^5 ( n^5 + 2^n  => 2^n )
            5. So O(3n^2+ n^5 + 4n + 5 + 2^n + log8(n)) = 2^n
            6. All of the above doesn't mean that nobody cares constant - in practice speeding-up algorithm twice can be very hard but efficient, but it's much more reasonable to find approximate values first

            Monday, July 8, 2019

            Draft: Relax-and-Recover

            Backup to USB and restore from USB

            sudo yum install git syslinux syslinux-extlinux kernel-devel
            git clone https://github.com/rear/rear.git
            cd rear/
            insert USB stick to the backed-up computer
            lsblk # to find name of the USB flash card
            umount /dev/sdb1 # umount if USB flash is automatically mounted
            sudo usr/sbin/rear format /dev/sdb
            type 'Yes' to format USB flash
            rear will format that flash as REAR-000
            edit rear configuration:
            vi etc/rear/local.conf
            ### write the rescue initramfs to USB and update the USB bootloader
            OUTPUT=USB
            ### create a backup using the internal NETFS method, using 'tar'
            BACKUP=NETFS
            ### write both rescue image and backup to the device labeled REAR-000
            BACKUP_URL=usb:///dev/disk/by-label/REAR-000
            Create rescue image  (it's without OS backup and used to restore OS in case of failure) with verbose output:
            sudo usr/sbin/rear -v mkrescue
            Now reboot your system and try to boot from the USB device. If it's ok, then rescue image is ok and you can do OS data backup alonh with creating rescue media:
            sudo usr/sbin/rear -v mkbackup