IT Stuff: July 2019

Monday, July 29, 2019

CAP Theorem

CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

Consistency: data on every non-failing node in the distributed system is the same. So that updates across distributed system must be done before allowing further reads.
Availability: Availability can be used in two different meanings:

Availability of real service - can be measured as ratio expressed as a percentage between working and non-working time of the service
Availability in context of CAP theorem - for a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. So that data must be replicated between nodes of the system and also server is not allowed to ignore the client's requests.

Partition tolerance: the system continues to operate even if any one part of the system is lost or fails. Partition tolerance doesn’t require every node still be available to handle requests. It just means that partitions may occur. If you deploy on a typical IP network, partitions will occur; partition tolerance in these environments is not optional. So only a total network failure can cause a system to respond incorrectly.

So in practice every distributed system using network, must use P, and thus we have two possible types of systems: AP or CP. For systems not using network, we have AC, AP, CP models.

Conventional databases assume no partitioning - clusters were assumed to be small and local (CA).

NoSQL systems may sacrifice consistency.

AP or AC:

On systems that allow reads before updating all the nodes, we will get high Availability
On systems that lock all the nodes before allowing reads, we will get Consistency

Description of the CAP theorem:

Setup:

we have distributed system consisting of 2 servers - S1 and S2
S1 and S2 are interconnected
C connects to both S1 and S2
client - C - can query any of these servers (S1 or S2)
S1 and S2 keep track on a variable v with initial value = 0 (v=0)
write is done from C to S1 or S2 (write request and write responce) and read is done from C (read request and read responce) to S1 or S2

Consistency:

consistent system:

C write-request S1 => v=1
S1 write => v=1
S1 write-response C => v=1
S1 update S2
S2 update => v=1

inconsistent system:

C write-request S1 => v=1
S1 write => v=1
S1 write-response C => v=1
S2 is not updated and v on S2 is still v=1

Patition:

When partition occurs - S1 and S2 are no more interconnected

Thursday, July 25, 2019

DB Basics, Cross, Natural, Inner, Outer, Theta Join

In this blog-post I'll try to go from formal notions in Relational Algebra to the practical SQL using the same queries as in https://it-tuff.blogspot.com/2019/07/relational-algebra-db-basics-select.html.

Prerequisites for practical learning:

install mysq or mariadb server
RA Relation is table in SQL and tables are in database:

CREATE DATABASE Test;
USE Test;
SHOW DATABASES;

RA key is PRIMARY KEY in SQL, RA Atribute is column in SQL and RA Tuple is row in SQL. To fill table we first must create it's schema:

Data types:

VARCHAR - used for storing alphabetic or mixed alpha-numeric data
INTEGER - storing whole numbers from ~ -2billions to ~+2billions
DECIMAL - storing whole and non-whole numbers, you must specify length of number and also length of the fractional part - DECIMAL(10,4) - number length is 10 digits with 4 digits after decimal-point
after showing data type you must show probable maximal length of that data

CREATE TABLE College (cName VARCHAR(255), PRIMARY KEY (cName) , state VARCHAR(10), enrollment INTEGER);
SHOW TABLES;
CREATE TABLE Student (sID INTEGER, PRIMARY KEY(sID), sName VARCHAR(255), GPA DECIMAL(4,2), sizeHS INTEGER); # HS = High School
SHOW TABLES;
CREATE TABLE Apply (sID INTEGER, PRIMARY KEY(sID), cName VARCHAR(255), major VARCHAR(255), decision VARCHAR(20));
SHOW TABLES;

Now fill tables with test data:

INSERT INTO College (cName, state, enrollment) VALUES ("Amridge", "AL", 749), ("Berkeley", "CA", 42159), ("Stanford", "CA", 43797), ("Wyoming", "WY", 2024), ("Harcum", "PA", 1425);
INSERT INTO Student (sID, sName, GPA, sizeHS) VALUES (1001, "Nita Millwood", 3.2, 900), (1002, "Vincenzo Lyons", 3.8, 750), (1003, "Zachery Lefebvre", 2.9, 1500), (1004, "Wilbert Chan", 3.6, 1620), (1005, "Mirna Hamann", 3.9, 1000), (1006, "Delta Shutt", 2.5, 1300), (1007, "Ryan Lacefield", 3.1, 1460);
INSERT INTO Apply (sID, cName, major, decision) VALUES (1001, "Amridge", "BA", "accept"), (1002, "Berkeley", "CS", "accept"), (1003, "Houston", "CE" ,"reject"), (1004, "Berkeley", "CS", "reject"), (1005, "Stanford", "CS", "accept");

Practicing SQL:

In SQL RA Select and Project are combined into one operator SELECT:

right after select we write Projection part (* means all columns/attributes)
after Projection part we write FROM and then write table/relation name
after table name we write WHERE with needed column/attribute parameters - this is condition of the Selection
RA ^ (logical and) is AND in SQL
students with GPA>3.7 :

Select * FROM Student WHERE GPA > 3.7;

Application for Stanford for CS major

SELECT * FROM Apply WHERE cName="Stanford" AND major="CS"

ID and name of students with GPA>3.7:

SELECT sID,sName FROM Student WHERE GPA > 3.7

In SQL RA Cross-Product is CROSS JOIN in MySQL CROSS JOIN and INNER JOIN are the same, in Oracle you can't specify ON clause for CROSS JOIN (only WHERE is allowed) and Oracle INNER JOIN allows ON clause. Also theta join is join using only WHERE condition and not using ON or USING:

Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected:

To deeply understand this we'll compose this query step by step:
First we'll find all students:

SELECT * FROM Student ;

Now we need to find applications of all students (cross-product):

SELECT * FROM Student CROSS JOIN Apply ;

Previous query must be filtered by the condition Student.sID=Apply.sID:

SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID ;

Add sizeHS > 1000 condition:

SELECT* FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000;

Add two other conditions - major="CS" and decision="reject":

SELECT * FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;

Now make projection to select only sName and GPA:

SELECT sName, GPA FROM Student CROSS JOIN Apply WHERE Student.sID=Apply.sID AND sizeHS>1000 AND major="CS" AND decision="Reject" ;

RA Union in SQL is UNION - this operator is used to make composition of the results of two (or more) select statements:

List of college and student names:

SELECT cName FROM College
UNION
SELECT sName FROM Student;

RA Rename operator is AS in SQL:

List of college and student names under the name Names:

SELECT cName AS Names FROM College
UNION
SELECT sName FROM Student;

for disambiguation in self-joins (when relation/table is joined with itself):

pairs of colleges in same state (we name 1st call of College table C1, and the second - C2):
Only renaming tables:

SELECT *
FROM College AS C1
CROSS JOIN College AS C2
WHERE
C1.state=C2.state AND
C1.cName != C2.cName;

Renaming tables and columns:

SELECT C1.cName AS C1, C2.cName AS C2, C1.State
FROM College AS C1
CROSS JOIN College AS C2
WHERE C1.state=C2.state AND
C1.cName != C2.cName;

Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above cross-join example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:

Names and GPA's of students with sizeHS>1000 who applied to CS and were rejected:

SELECT sName, GPA
FROM Student
NATURAL JOIN Apply
WHERE sizeHS>100 AND
major="CS" AND
decision="reject";

The same with column and table renaming:

SELECT St.sName, St.GPA
FROM Student AS St
NATURAL JOIN Apply AS Ap
WHERE St.sizeHS>1000 AND
Ap.major="CS" AND
Ap.decision="reject";

Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:

Using table rename and two select statements:

SELECT S.sName, S.GPA
FROM Student AS S
NATURAL JOIN
(SELECT *
FROM Apply AS A
NATURAL JOIN College AS C
WHERE C.enrollment>20000 AND
A.major="CS" AND
A.decision="reject") AS A
WHERE S.sizeHS>1000;

Using several natural joins in one Select:

SELECT sName, GPA
FROM Student
NATURAL JOIN Apply
NATURAL JOIN College
WHERE sizeHS>1000 AND
major="CS" AND
decision="reject";

RA Difference operator can be simulated with LEFT JOIN in MySQL (left join adds found rows from the right side to the left side, if right side is empty then NULL values are used), here you must show which columns are used for selection:

IDs of students who didn't apply anywhere:
We can use ON Student.sID = Apply.sID:

SELECT Student.sID
FROM Student
LEFT JOIN Apply
ON Student.sID=Apply.sID
WHERE Apply.sID IS NULL;

Also "ON Student.sID=Apply.sID" = USING(sID) - when both columns have the same name:

SELECT Student.sID
FROM Student
LEFT JOIN Apply
USING(sID)
WHERE Apply.sID IS NULL;

MySQL RIGHT JOIN works similar to LEFT JOIN, the difference is that RIGHT JOIN uses right relation as the main, and LEFT JOIN uses left relation as the main one.
FULL JOIN is INNER JOIN + RIGHT JOIN + LEFT JOIN
Intersection operator can be simulated in MySQL using join and DISTINCT (show only unique values):

Names that are both college name and student name:

SELECT DISTINCT(sName)
FROM Student
INNER JOIN College
ON sName=cName;

Inner and Outer joins:

Inner join show only data which is in both left and right relations (using ON or USING)
Outer joins use on relation as the main and completes this relation with the data from the other one and all empty data filled with NULLs (LEFT and RIGHT joins are: LEFT OUTER JOIN and RIGHT OUTER JOIN)

DHCP over Relay on Docker

DHCP (Dynamic Host Configuration Protocol) helps us to address dynamically our hosts on the network. In fact, when a Host is configured to get its IP address dynamically, it will broadcast a DHCP Request on the network searching for a DHCP server. DHCP server has to be on the same broadcast domain as the CLIENTS since routers do not forward broadcast packets.

For Docker container it means that we must connect our container to each subnet in the network of our company. But we want to use just one interface (in that post I'll use macvlan) on our container. But problem is:

As our DHCP Client wants to get an IP address, it will send a DHCP Discover message which is a broadcast message. As the Router/Gateway/Firewall do not forward broadcast packets, this message will never reach the DHCP Server (our Docker Container).

To solve this issue we'll use DHCP Relay Agent. This feature is activated on a network device having interfaces in all subnets of the network of the company:

this device (router/gateway/firewall) forwards DHCP messages to the DHCP Server, and when the DHCP Server responds, this device forwards the replies to the Client.
DHCP Realy Agent adds giaddr (gateway interface address) field to the DHCP Packet. This field contains DHCP Relay Agent interface IP address which received DHCP Request and also this field helps to identify pool from which DHCP Server has to select IP addresses.
After identifying pool DHCP Server replies with DHCP Offer broadcast message and this message forwarded by DHCP Relay Agent to the DHCP Client.
DHCP Client replies with DHCP Request message
this message also forwarded to the DHCP Server by DHCP Relay Agent
DHCP Server replies with DHCP Ack
this message forwarded to the DHCP Client by DHCP Relay Agent
finally DHCP Clietn is assigned an IP address

If you want to use Cisco ISR as Relay Agent:

Setup interface which will be used to interconnect DHCP Relay Agent and DHCP Server:

conf term
int fa0/1 # DHCP Server facing interface
ip address 172.16.3.4 255.255.255.0

Setup interface which will use DHCP Relay Agent and enable IP-helper (DHCP Server IP address) on that interface - all DHCP messages will be forwarded to that IP address:

int fa 0/0
ip address 10.10.6.1 255.255.225.0
ip helper-address 172.16.3.249
do wr

Check configuration:

show ip int fa0/0

Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:

ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface

Because of using macvlan for Docker Container, you need to enable IP forwarding on Docker Host:

echo 1 /proc/sys/net/ipv4/ip_forward . Previous is not persistent setup, to make it persistent:

sudo vi /etc/sysctl.conf and add net.ipv4.ip_forward = 1
sudo sysctl -p

If you want to use CentOS 7 as Relay Agent:

Setup interface which needs to use DHCP Relay Agent:

vi ifcfg-eth0

IPADDR=10.10.6.1
PREFIX=24

vi ifcfg-eth1 # DHCP Server facing interface

IPADDR=172.16.3.4
PREFIX=24

yum install dhcp # dhcp-relay is part of dhcp package
cp /usr/lib/systemd/system/dhcrelay.service /etc/systemd/system
vi /etc/systemd/system

under [Service]
append IP address of the DHCP server to the ExecStart after --no-pid:

ExecStart=/usr/sbin/dhcrelay -d --no-pid 172.16.3.249
Also you can choose interfaces to activate DHCP Relay on them (by default all interfaces are used). You must use separate "-i" option for each additional interface:

ExecStart=/usr/sbin/dhcrelay -d --no-pid 172.16.3.249 -i eth1 -i eth2.20

systemctl --system daemon-reload
systemctl start dhcrelay
systemctl enable dhcrelay
systemctl status dhcrelay

Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:

ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface

If you want to use CentOS 6 as Relay Agent:

Setup interface which needs to use DHCP Relay Agent:

vi ifcfg-eth0

IPADDR=10.10.6.1
NETMASK=24

vi ifcfg-eth1 # DHCP Server facing interface

IPADDR=172.16.3.4
NETMASK=24

yum install dhcp # dhcp-relay is part of dhcp package
vi /etc/sysconfig/dhcrelay

INTERFACES= "eth1 eth2.20" #which interfaces must use DHCP Relay Agent
DHCPSERVERS="172.16.3.249" # DHCP server IP address

service dhcrelay start
chkconfig dhcrelay on
service dhcrelay status

Also we need to configure static route on the DHCP Server if DHCP Relay Agent is not default gateway for the DHCP Server:

ip route add 10.10.6.0/24 via 172.16.3.4 # this is not persistent setup to make it persistent create route file for needed interface

Interface with DHCP relay must use static IP address (no DHCP is allowed).

dhcp.conf

# this server is primary and authorative server on that network

authoritative;

# dhcpd listens *only* on interfaces for which it finds subnet declaration in dhcpd.conf

# empty declaration for local IP subnet to start listening on eth0 interface

subnet 172.16.3.0 netmask 255.255.255.0 { }

subnet 10.10.6.0 netmask 255.255.255.0 {

range 10.10.6.2 10.10.6.3;

option routers 10.10.6.1;

#option domain-name-servers 8.8.8.8, 8.8.4.4;

}

to kill process on container:

top > k > PID > Enter

dhcp -cf dhcp.conf

Tuesday, July 23, 2019

Docker Networking

Normally, Docker creates a new network namespace for each container we run. As we attach the container to a network, we define an endpoint that connects the container network namespace with the actual network. This way, we have one container per network namespace. Docker provides an additional way to define the network namespace in which a container runs. When creating a new container, we can specify that it should be attached to or maybe we should say included in the network namespace of an existing container. With this technique, we can run multiple containers in a single network namespace.

When you install docker it creates 3 networks automatically:

bridge

docker-host NIC goes to promiscuous mode (allows all L2 packlets without checking destination MAC in other words MAC filtering is disabled)
actually docker bridge is a switch inside docker host, this switch interconnects docker-host and docker-container
network used by default when you run a container
containers in this network can communicate with each other
containers assigned IP from 172.17.0.0/16 subnet
to go to outside world use must use port-mapping to the docker-host IP
Overview:

new network namespace created for container
docker0 bridge is automatically created and attached to the docker-host NIC (docker-host namespace)
veth (Virtual Ethernet) interface:

automatically created
attached to the docker0 bridge
attached the container NIC
veth interface is like media/cable connecting docker0-bridge/switch port to the container NIC

none

to use container without network: --network=none
container no attached to any networks and also cannot communicate with any other container

host

to use host network: --network=host
in this case container uses the same IP as docker-host uses
ports are shared between docker-host and all containers connected to the "host" network
container has direct access to the docker-host's NIC

To create custom network:
docker network create \
custom_isolated_network \
--driver bridge \
--subnet 192.168.190.0/24 \
List all docker networks:
docker network ls
To view bridges only:
brctl show

Other types of networks supported by docker:

macvlan (requires at least kernel 3.9 on docker-host) - docker-host NIC uses unicast filtering, so L2 with not known DST MAC would be discarded (except is passthru, which uses promiscuous mode)

this type allows you to assign several IP addresses to the same NIC.
MAC-VLAN allows to configure subinterfaces (slave devices) of a parent (master) device
each subinterface will have it own randomly generated MAC and consequently IP address
subinterfaces cannot interact directly with parent interface
to communicate with parent interface - assign macvlan subinterface to the docker-host
macvlan subinterfaces are for example mac0@eth0 (this notation clearly identifies subinterface's parent)
The macvlan is a trivial bridge that doesn’t need to do learning as it knows every mac address it can receive, so it doesn’t need to implement learning or stp. Which makes it simple stupid and fast.
Each sub-interface can be in one of 4 modes that affect possible traffic flows (these are macvlan modes and not all of them are presented in macvlan docker driver - currently docker support only macvlan-bridge mode):

Private - traffic goes only from subinterfaces to the out, subinterfaces on the same parent cannot communicate with each-other. This is not bridge.
VEPA (Virtual Ethernet Port Aggregator) - this mode need VEPA compatible switch. Subinterfaces of one parent can communicate with each other with the help of VEPA hardware switch which returns all frames where both source and destination are local to the macvlan interface
bridge - all subinterfaces on a parent interface are interconnected with a simple bridge. Frames from one subinterface to the other delivered directly (through bridge) and not sent out. All MAC addresses are known so macvlan-bridge doesn't need STP and MAC learning
passthru - allows a single VM to be connected directly to the physical interface. The advantage of this mode is that VM is then able to change MAC address and other interface parameters.

docker network create --driver macvlan --subnet=10.0.0.0/24 --gateway=10.0.0.1 --opt parent=eth0 macvlanNetworkName

gateway - external (not related to the docker-host) gateway
parent - docker-host physical interface
docker-host eth0 can be for example 10.0.0.2

also you can use macvlan with VLAN interfaces. In this case subinterfaces are using different parent interfaces (ex. eth0.10 and eth0.20) and can communicate with each other only over gateway:

create VLAN interface eth0.10 and eth0.20
docker network create --driver macvlan --subnet=10.0.10.0/24 --gateway=10.0.10.1 --opt parent=eth0.10 macvlan10
docker network create --driver macvlan --subnet=10.0.20.0/24 --gateway=10.0.20.1 --opt parent=eth0.20 macvlan20
docker run --name='container0' --hostname='container0' --net=macvlan10 --ip=10.0.10.2 --detach=true centos

To add additional IP to a container:

docker network connect --ip=10.0.20.3 macvlan20 container1

How to connect from macvlan subinterface to the host:

This will prevent Docker from assigning 192.168.1.223 address to a container, --ip-range command says docker IPAM to allocate IP addresses from given sub-range:

docker network create -d macvlan -o parent=eno1 --subnet 192.168.1.0/24 --gateway 192.168.1.1 --ip-range 192.168.1.192/27 --aux-address 'host=192.168.1.223' mynet

Next, we create a new macvlan interface on the host. You can call it whatever you want:

ip link add mynet-aux link eno1 type macvlan mode bridge

Now we need to configure the interface with the address we reserved and bring it up:

ip addr add 192.168.1.223/32 dev mynet-aux
ip link set mynet-aux up

The last thing we need to do is to tell our host to use that interface when communicating with the containers. This is relatively easy because we have restricted our containers to a particular CIDR subset of the local network; we just add a route to that range like this:

ip route add 192.168.1.192/27 dev mynet-aux

With that route in place, your host will automatically use this mynet-aux interface when communicating with containers on the mynet network.
above NIC based configs are not persistent and will be lost after reboot, so add all related config to the appropriate configuration files (NIC and route)

ipvlan is similar to the macvlan but uses the same MAC for all endpoints (docker containers). It's useful in situations when switch where docker-host is connected restricts maximum number of MAC addresses per physical port. ipvlan requires at least kernel 4.1 on docker host

An IPAM (IP Address Management) driver lets you delegate IP lease management to an external component. This way you can coordinate IP use with other virtual or bare metal servers in your datacenter.
Docker controls the IP address assignment for network and endpoint interfaces via the IPAM driver(s). Libnetwork has a default, built-in IPAM driver and allows third party IPAM drivers to be dynamically plugged. On network creation, the user can specify which IPAM driver libnetwork needs to use for the network’s IP address management. For the time being, there is no IPAM driver that would communicate with external DHCP server, so you need to rely on Docker’s default IPAM driver for container IP address and settings configuration. Containers use host’s DNS settings by default, so there is no need to configure DNS servers.
IPAM driver ensures the container got an IPv4 and an IPv6 address from the subnets configured for the macvlan network.

İf you use Hyper-V:
Macvlan uses a unique MAC address per ethernet interface, by default, Hyper-V only allows traffics with MAC address sticks to the virutal switch port, we need to "Enable MAC address spoofing" to prevent virtual switch dropping VLAN's traffic.

Docker Images & Dockerfile

creating new image - can be done when you can't find needed container image on the docker-hub.
Dockerfile:

text file written in a specific format a docker can understand
Every line starts with instruction (FROM, RUN, COPY etc.) followed by argument
Each instruction instructs docker to do a specific action

First line starts with a base OS or another image: FROM centos
Then you install needed dependencies, for example:

RUN yum update -y && yum install python python-pip
RUN pip install flask flask-mysql

Copy source files from docker-host to the docker-image:

COPY . /opt/source-code

Command to run when image is run as a container:

ENTRYPOINT FLASK_APP=/opt/source-code/app.py flask run

When building docker image, every line of Dockerfile creates layer of the docker image:

For the above example layers are:

layer 1: Base CentOS layer
layer 2: changes in yum packages
layer 3: changes in pip packages
layer 4: source code
layer 5: update entry-point with "flask" command

docker build:

docker build Dockerfile -t nameOfTheImage
docker build -t nameOfTheImage dockerFileDirectoryName
docker build -t nameOfTheImage .
docker build Dockerfile -t nameOfTheImage .

to view build process history: docker history imageName
layered build process helps to debug and also helps to start build process, in case of failure, from the needed layer (this is done automatically using docker cash). The same is true when you want to add additional steps in dockerfile, rebuild will be done using cash, so only affected layers will be rebuilt

CMD vs ENTRYPOINT:

CMD defines command and it's parameters (if any) which will run when container starts:

CMD ["mysqld"] or CMD mysqld
CMD ["sleep", "5"] or CMD sleep 5

ENTRYPOINT is like CMD but also appends any input to the "docker run" to the end of the command as parameter:

ENTRYPOINT ["sleep"] or ENTRYPOINT sleep
docker run centos-sleeper 10 # 10 will be appended as parameter to the "sleep" command
when you append some input to the "docker run" and use CMD, then this is not appended to the command following CMD - the entire appended input is used as command after CMD

To use some value as default value for the ENTRYPOINT:

ENTRYPOINT sleep
CMD 10
two lines above will be "sleep 10" by default, so CMD is appended to the ENTRYPOINT as parameter

DHCP on Docker

DHCP (Dynamic Host Configuration Protocol) helps us to address dynamically our hosts on the network. In fact, when a Host is configured to get its IP address dynamically, it will broadcast a DHCP REQUEST on the network searching for a DHCP server. DHCP server has to be on the same broadcast domain as the CLIENTS since routers do not forward broadcast packets.

create macvlan network:

docker network create -d macvlan -o parent=enp3s0 --subnet 172.16.3.0/24 --gateway 172.16.3.4 --aux-address 'host=172.16.3.250' mynet

add macvlan-aux to the docker-host (to ping directly from docker-host) - including ip link, route etc:

ip link add mynet-aux link enp3s0 type macvlan mode bridge
ip addr add 172.16.3.250/32 dev mynet-aux

run container with macvlan driver (assign static IP) and run /bin/bash:

docker run --name='ctr0' --hostname='ctr0' --net=mynet --ip=172.16.3.249 -it centos /bin/bash

Ping container IP from the Docker-host:

ping 172.16.3.249

on container (all of these can be done with docker-file):

yum install net-tools -y
yum install dhcp -y
1.1.6.1 is DHCP relay IP address
dhcpd listens *only* on interfaces for which it finds subnet declaration in dhcpd.conf

vi dhcp.conf:

# this server is primary and thus - authorative server on that network

authoritative;
subnet 172.16.3.0 netmask 255.255.255.0 {
range 172.16.3.1 172.16.3.3;
option routers 172.16.3.4;
option domain-name-servers 172.16.3.6;
}

Run dhcp service with specified file: dhcpd -cf dhcp.conf

to kill process on container:

top > k > PID > Enter

Tuesday, July 16, 2019

YAML

YAML Ain't Markup Language:

YAML uses indentation to distinguish layers (same indentation - same layer) - use spaces, because tabs are not allowed.
YAML start with three dashes: ---
YAML disctionary in key-value pair in one of two forms:

colon separated key (like Python dictionary):

key: value

indentation separated key:

key:
value

Also one key can contain nested dictionary:

Method 1:

first_level_key:
second_level_key_under_the_first_level: second_level_value

Method 2:

first_level_key: {second_level_key_under_the_first_level: second_level_value}

YAML uses dashes as indentation to represent list of items (use lists when you want key to have more than one values which are not keys themselves):

Methos 1:

this_is_a_list:
- element_1
- element_2

Method 2:

this_is_a_list : [element_1, elemenet_2]

Relational Algebra

RA is a formal language that forms conventions used in implemented languages like SQL.
RA operates on relations and produces relations as result.

Query (expression) on set of relations produces relation as result.

Key is an attribute or a set of attributes whose value is guaranteed to be unique.
Tuple  - row of a relation.
Attribute  - column of a relation.
Relation - table consisting of attribute (column) and tuples (rows).
Schema  - relation header (attribute names)

We'll use simple college admission database with three relations (keys are in bold):

1st relation: College(schema: cName, state, enrollment)
2nd relation: Student(schema: sID, sName, GPA, sizeHS) # sizeHS = size of High School a student attended
3rd relation: Apply(schema: sID, cName, major, decision)

Simplest query in a RA is simply the relation name, for example Student is valid expression (query) in RA and it's returning copy of Student relation.

Use operators to filter, slice, combine relations:

Select - picks certain rows out of a relation - σ (sigma):

Students with GPA > 3.7 (GPA - Grade Point Average)

σ _{GPA > 3.7} Student

Students with GPA > 3.7 and sizeHS < 1000 (caret ^ is logical and operator)

σ _{GPA > 3.7 ^ sizeHS < 1000} Student

Application for Stanford for CS major

σ_{cName="Stanford" ^ major="CS"} Apply

Select operator general case:

σ_condition Relation

Project operator - picks certain columns:

Apply relation with sID and decision columns only

П_sID,decision Apply

Project operator general case:

П_{col1,col2,col3...} Relation

To Select and Project at the same time:

ID and name of students with GPA>3.7:

П_sID,sName (σ_GPA>3.7 Student)

Duplicates:

SQL is based on multisets/bags, so duplicates are also shown
RA is based on sets, so duplicates are eliminated automatically

Cross-product operator (a.k.a. cross-join) - combine two relations (a.k.a. Cartesian product) horizontally:

Student x Apply as result we get big relation which going to have eight (8) attributes
as a convention when cross-product is done and we get the same attributes for both cross-producted relations we preface their name with the name of the relation they came from: Student.sID / Apply.sID
cross-product gives as relation where for every row of Student relation you get all rows of Apply relation
Names and GPA's of students with HS>1000 who applied to CS and were rejected:

П_{sName, GPA} (σ_{Student.sID=Apply.sID ^ HS>1000 ^ major="CS" ^ decision="Reject"} (Student x Apply))

Difference operator A\B or A-B returns elements of A that not in B:

if A = {a, b, c} and B = {b, c, d} then A - B = {a}
IDs of students who didn't apply anywhere:

(П_sID Student) - (П_sID Apply)

IDs and names of students who didn't apply anywhere (we can't just add sName to the project from Student relation because Apply relation has not this attribute and we might not use difference operator on sets with different quantity of attributes):

П_sID,sName( ( (П_sID Student) - (П_sID Apply) ) ⋈ Student )

Union (denoted by ∪) operator is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. Union combines vertically:

A ∪ B means all members of A and also all members of B that not in A (RA is set based and removes duplicates):

if A = {a, b, c} and B = {b, c, d} then:

all members of A are {a, b, c}
all members of B that not in A is B\A = {d}
combine both {a, b, c} and {d} => {a, b, c, d}
so: A ∪ B = {a, b, c, d}

List of college and student names:

(П_cName College) ∪ (П_sName Student)

Natural join operator performs cross-product operator and then enforces equality on all of the attributes with the same name (as in above example: Student.sID=Apply.sID) also natural join eliminates one copy of duplicate attributes:

Names and GPA's of students with HS>1000 who applied to CS and were rejected:

П_{sName, GPA} (σ_{HS>1000 ^ major="CS" ^ decision="Reject"}(Student ⋈ Apply))

Names and GPA's of students with HS>1000 who applied to CS and were rejected to colleges with the enrollment greater than 20000:

П_{sName, GPA} (σ_{HS>1000 ^ major="CS" ^ decision="Reject" ^ enrollment>20000}(Student ⋈ (Apply ⋈ College)))

Relation between ⋈ and x:

E1 ⋈ E2 => П_{schema(E1) U schema(E2)} (σ_{E1.a1= E2.a1 ^ E1.a2= E2.a2 ^ ...}(E1 x E2))

Theta Join - operator takes two expressions/relations and combines them with bow tie looking operator (⋈) but with a subscript theta (_θ) which means select condition (any select condition you want):

E1 ⋈_θE2 = σ_θ (E1 ⋈ E2 )
Theta join is the basic operation implemented in RDBMS so the term "join" often means theta-join

Intersection operator - A ∩ B of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements:

if A = {1, 2, 3} and B = {2, 3, 4} then A ∩ B = {2, 3}
Names that are both college name and student name:

(П_sName Student) ∩ (П_cName College)

Expressing intersection via difference

A ∩ B => A - ( A - B )

Expressing intersection via natural join:

A ∩ B => A ⋈ B

Rename operator uses ρ (rho), it reassigns schema in the result of expression (relation). Above (in 7.2.1 and 10.2.1 we used operators on relations with different attribute names - different schemas, in practice RA doesn't allow that, so we need to use rename operator):

General form:

ρ_{R(A1,A2,...An)} Relation E # call result of the relation E - R with attributes A1 to An

Unify schemas for set operators:

List of college and student names (7.2.1):

ρ_C1(name) (П_cName College) ∪ ρ_C2(name)(П_sName Student)

Names that are both college name and student name (10.2.1):

ρ_C1(name) (П_sName Student) ∩ ρ_C2(name) (П_cName College)

for disambiguation in self-joins:

pairs of colleges in same state:

With cross-join

σ_{s1=s2 ^ n1!=n2}( ρ_C1(n1,s1,e1) College x ρ_C2(n2,s2,e2) College )

With natural-join:

σ_n1!=n2( ρ_C1(n1,s,e1) College x ρ_C2(n2,s,e2) College )

Select, Project, Cross-join, union, difference, rename are RA basic operators
Natural join, theta join, intersection are not RA basic operators they can be expressed with use of basic operators, so there are actually are abbreviations

These materials were used to write this synopsis-post:

Wednesday, July 10, 2019

Hash Functions, Binary Tree, O(n)

Hash Function

A hash function is any function that can be used to map data of arbitrary size to data of fixed size (N). Example of simple hash function is h(x) = x mod N (mod is modulus or remainder of division, for example 5 / 5 = 1 with no reminder => mod=0, 6 / 5 = 1 and 1 in remainder => mod = 1)The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table (consists of hash function h and array, also called table, of size N), a common data structure used in computer software for rapid data lookup. Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.This term is also known as a hashing algorithm or message digest function. No sorting and no searching required. When you compute the hash function you know where to store the data and you know where to find the data. Hash functions are just one-way they cannot be reversed. The main idea is to store key-element pairs (k, e) as index h(k):
Example:

phone book with 5 numbers in it (N = 5)
h (name) = (lenght of name) mod 5
This function is ok if all names in phonebook have different lengths, if some lengths are the same, then collision is occurred (collision - when pairs of input to hash function are mapped to the same hash value):

h(John) = 3 mod 5 = 3
h(Jack) = 3 mod 5
So h(John) = h(Jack)
so because of many collision this is example of the bad hash-function
actually collision can be in every hash function but hash function must be designed in the way minimizing collision possibility. To do so hash functions produce long enough hash-values and this values are hold smaller enough to be computed quickly.

Binary Tree

In computer science, a binary tree is a tree data structure in which each node (узел) has at most two children, which are referred to as the left child and the right child. Topmost node called root and this is L-0 (level zero) and height of 0. Each child node in binary tree defines a sub-tree, the root of which it is.

Big O notation

In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.Actual formula is O(f(n)) meaning: with an increase in the parameter n (amount of input to the algorithm) the running time of the algorithm will increase no faster than some constant multiplied by f(n). How to find big-O of some operation (as Example 3n^2+ n^5 + 4n + 5 + 2^n + log8(n) ):

omit constants and constant multipliers (3n^2+ n^5 + 4n + 5 + 2^n + log8(n) => n^2+ n^5 + n + 2^n + log8(n))
n^a grows faster than n^b for a > b. In other words if you have n^3 - omit n^2 (n^2+ n^5 + n + 2^n + log8(n) => n^5 + 2^n + log8(n) )
any polynomial grows faster than any logarithm, so n or even sqrt(n), grows faster than log3(n) ( n^5 + 2^n + log8(n) => n^5 + 2^n )
any exponential grows faster than any polynomial, so 3^n grows faster than n^5 ( n^5 + 2^n => 2^n )
So O(3n^2+ n^5 + 4n + 5 + 2^n + log8(n)) = 2^n
All of the above doesn't mean that nobody cares constant - in practice speeding-up algorithm twice can be very hard but efficient, but it's much more reasonable to find approximate values first

Monday, July 8, 2019

Draft: Relax-and-Recover

Backup to USB and restore from USB

sudo yum install git syslinux syslinux-extlinux kernel-devel

git clone https://github.com/rear/rear.git

cd rear/

insert USB stick to the backed-up computer

lsblk # to find name of the USB flash card

umount /dev/sdb1 # umount if USB flash is automatically mounted

sudo usr/sbin/rear format /dev/sdb

type 'Yes' to format USB flash

rear will format that flash as REAR-000

edit rear configuration:

vi etc/rear/local.conf

### write the rescue initramfs to USB and update the USB bootloader

OUTPUT=USB

### create a backup using the internal NETFS method, using 'tar'

BACKUP=NETFS

### write both rescue image and backup to the device labeled REAR-000

BACKUP_URL=usb:///dev/disk/by-label/REAR-000

Create rescue image (it's without OS backup and used to restore OS in case of failure) with verbose output:

sudo usr/sbin/rear -v mkrescue
Now reboot your system and try to boot from the USB device. If it's ok, then rescue image is ok and you can do OS data backup alonh with creating rescue media:
sudo usr/sbin/rear -v mkbackup