Monday, August 19, 2019

Writing special symbols (Mathematics, Statistics) in HTML


x-bar = x̄ = x̄ or x̄ (hex)
x-hat = x̂ = x̂ or x̂ (hex)
x-arrow = x⃗ = x⃗
degree = ° = °
left ceiling = ⌈ = ⌈
right ceiling = ⌉ = ⌉
left floor = ⌊ = ⌊
right floor = ⌋ = ⌋
real-numbers set = ℝ = ℝ
greek uppercase a = A = Α
function = ƒ =ƒ
Sum = ∑ = ∑
Hadamard product = ⊙ = ⊙
dot = ⋅ = ⋅
not equal = ≠ = ≠

Linear Algebra 4. Matrix multiplication.

When you apply one LT and then the other LT (example: 90° clockwise rotation and then shear (shift) the overall effect is another LT which is composition of two LT. This LT will capture overall effect of applying 2 LTs into a single LT.
Applying several LT to one vector is like using several functions and using output of one function as input to the other:
ƒ(g(x)) where:

  1. g is the first LT with input "x"
  2. ƒ is the second LT with input from the previous LT
So the same as with functions - we apply LT from right to the left. 
The composition of two LTs is multiplication / product / dot product of two LT - product of two matrices:
 = AB  where:

  1. A have the same number of columns as B has rows or mathematically
    1.  Al x m and Bm x n 
    2. easy way - to check dot product possibility
      1.  example: A2 x 3 and B3 x 4
      2. Write dimensions of matrices one after the other with "=" sign between them:
        1. 2 x 3 = 3 x 4  as you see 3 = 3, so we can dot-product these matrices
      3. example: A2 x 2 and B3 x 3
        1. 2 x 2  = 3 x 3 as you see 2 = 3 is not true, so we can't multiply that matrices
  2. Ci,j = m Ai,k Bk,j
    k=1
    this means: starting with k=1 and till k=m multiply each Ai,k by Bk,j and also i = {1,2,...,l} and j={1,2,..,n}
For example if we have two matrices and want to find their dot-product:
0 2 1 -2
1 0 1 0
A2 x 2 and B2 x 2 find k:  2 x 2 = 2 x  2 , 2 = 2, k = 2, k shows how many times we sum product of factors A and B:

Change general formula for this particular case:
Ci,j = 2 Ai,k Bk,j
k=1
Then:
C1,1 = 2 A1,k Bk,1 = A1,1 B1,1 + A1,2 B2,1 = 0*1 + 2*1 = 2
k=1

C1,2 = A1,1 ⋅ B1,2 + A1,2 ⋅ B2,2= 0*(-2) + 2*0 = 0
C2,1 = A2,1 ⋅ B1,1 + A2,2 ⋅ B2,1= 1*1 + 0*1 = 1
C2,2 = A2,1 ⋅ B1,2 + A2,2 ⋅ B2,2= 1*(-2) + 0*0 = -2

The simplest way to calculate matrix dot product is to approach it as matrix vector product:
First we find î of the right matrix after applying left matrix (LT):
0 2 1
1 0 1
Secondly we find ĵ hat of the right matrix after applying left matrix (LT):
0 2 -2
1 0 1

Matrix product properties:
A(B + C) = AB + AC distributive property
A(BC) = (AB)C associative property
But ABBA (because matrix is LT, and LT is like function, so apply right to left)

Also matrix element wise product (or Hadamard product) exists. It is supported only for matrices of the same shape:
C = A ⊙ B where Ci,j = Ai,j Bi,j

With 3-D Tensor basis are î , ĵ and k̂ and it's linear combination is:

v = xî + yĵ + zk̂
ĵ =

These materials were used while preparing this blog-post:
  1. https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
  2. https://www.deeplearningbook.org/
  3. NBGtLA by https://minireference.com/

Linear Algebra 3. Linear transformations and matrices, matrix operations.

Linear transformation (LT) is like a function that transforms (changes) vector: ƒ(x) => L(v⃗)
So linear transformation takes some input vector and produces some output vector.
A transformation is linear if:

  1. all lines (of the coordinate system grid) are not become curved after transformation (horizontal, vertical and diagonal lines). In other words grid lines remain parallel and evenly spaced
  2. the origin remains fixed in place
Example of LT - 90° clockwise rotation about the origin. How we can describe LT numerically? We have input vector with coordinates [xin , yin] and output vector with coordinates [xout , yout] . We know that each vector is just linear combination of the basis/unit vectors, so we can rewrite coordinates like:

  1. [xin , yin] = xin î + yin 
  2. linear combination remains the same even after applying LT, so we just use transformed versions of the î and ĵ => LT(î) and LT(ĵ)
  3. [xout , yout] = xin LT(î) + yin LT(ĵ)
Example of 90° clockwise rotation LT:

  1. Take squared sheet of paper and draw two unit vectors; for convenience - each with length of 2 squares. 
  2. If we make 90° clockwise rotation LT then:
    1. we move î  90° clockwise - now î is down y axis and LT(î) coordinates (in terms of old greed - before transformation) are [0, -1].
    2. we move ĵ  90° clockwise - now ĵ lies on x axis and LT(ĵ) coordinates (in terms of old greed - before transformation) are [1, 0].
  3. if we have some vector v with coordinates [3,2]:
    1. LT(v) = 3LT(î) + 2LT(ĵ) = 3[0, -1] + 2[1,0] = [0, -3] + [2,0] = [2, -3] in terms of greed before transformation
We can describe 2D (Cartesian plane) with 4 digits - 2 for î coordinates and 2 for ĵ coordinates. We can package this coordinates in two-by-two grid of numbers - array of numbers, or in terms of LA - matrix. Matrix will have 2 columns and 2 rows:

  1. columns - 1st is î coordinates and 2nd is  ĵ coordinates
  2. rows - 1st is x axis coordinates of î and ĵ , and 2nd - y axis coordinates of î and ĵ
0 1 3 = 3 0 + 2 1 = 0⋅3 + 1⋅2 = 2
-1 0 2 -1 0 -1⋅3 + 0⋅2 -3

Above we rewrote our linear combination as matrix-vector multiplication.

By convention we denote matrix in bold upper-case, like A . And we denote elements of a matrix upper-case non-bold, like A.

Amxn is matrix with height of m (rows) and width of n (columns)
A1,1 is element at 1 row and 1 column intersection, using above example of LT matrix, A1,1= 0
To denote real valued matrix Amxn : A∈ℝmxn
Colon symbol ":" represents "all" - all rows or all columns:
  1. All numbers/elements of matrix on intersection with i column: A:,i 
    1. A:,1 equals to the set {0, -1} - 1st column of A
  2. All numbers/elements of matrix on intersection with i row: Ai,: 
    1. A2,: equals to the set {-1,0} - 2nd row of A
If we use more than 2 axes (2 axes is 2D) then we'll call such a matrix - tensor.

We can add matrices of the same shape by adding their corresponding elements/numbers:
C = A + B   where   Ci,j = Ai,j + Bi,j

01+-38=0+(-3)1+8=-39
-10-1-5-1+(-1)0+(-5)-2-5

To add scalar to a matrix or to multiply matrix by a scalar, we must perform addition or multiplication of each element of a matrix:
D = aB + c where Di,j = aBi,j + c



 These materials were used while preparing this blog-post:
  1. https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
  2. https://www.deeplearningbook.org/
  3. NBGtLA by https://minireference.com/

Tuesday, August 6, 2019

Linear Algebra 2. Unit vectors, linear combinations, basis.

Each coordinate of a vector is a scalar stretching and squishing a unit vector. Unit vectors are vectors starting (as each vector) at the origin, orthogonal (perpendicular) to each other and having length of one unit on the corresponding axis. Unit can be anything you want - 1 centimeter, 1 meter, 1 millimeter etc. So:
  1. unit vector on x axis is î (i-hat) with coordinates [1,0] meaning 1 of x, o of y
  2. unit vector on y axis is ĵ (j-hat) with coordinates [0,1] meaning 0 of x, 1 of y
  3. so first we write x coordinate, then y, then (if any) z etc.
So each and any vector is sum of scaled unit vectors. We use vector-scalar multiplication and then vector addition. Thus we make linear combination (as result we get a vector which is an arrow) of î and ĵ (here 3 and 2 are scalars):
[3,2] = 3î + 2 ĵ = 3[1,0] + 2[0,1] = [3,0] + [0,2] = [3,2]

î and ĵ also called basic vectors of the x-y coordinate system.

We can choose different basis vectors (non unit) and get completely new coordinate system. So when describing vectors numerically, it (description) depends on a choice of basis vectors.



These materials were used while preparing this blog-post:
  1. https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
  2. https://www.deeplearningbook.org/
  3. NBGtLA by https://minireference.com/

Monday, August 5, 2019

Docker firewalling

Docker containers are not host services. They rely on a virtual network in your host, and the host acts as a gateway for this network. So traffic is routed traffic and FORWARD chain/table is used.
In fact Docker daemon creates several iptables-chains to setup containers connectivity and we can use DOCKER chain to control access to the Docker Containers. Traffic from FORWARD chain is forwarded to the DOCKER chain. You should not modify the rules Docker adds to your iptables policies. For manually added rules you must use DOCKER-USER chain. Rules from the DOCKER-USER chain are used before DOCKER chain rules.

To restrict access to container which uses docker bridge network (inserts rule to the first position in the rules list):
add rule: iptables -I DOCKER-USER ruleHere -j [ACCEPT|DROP]
remove rule: iptables -D DOCKER-USER ruleNumberHere

For example:
list all rules in DOCKER-USER chain:
iptables -L DOCKER-USER
or more verbose with numeric ports:
iptables -L DOCKER-USER -vn
deny access to all containers from IP address 10.10.10.11:
iptables -I DOCKER-USER -s 10.10.10.11 -j DROP
deny access to the containers TCP port 5000 (this port is container port, not host port of the port-mapping):
iptables -I DOCKER-USER -p tcp -m tcp --dport 5000 -j DROP

macvlan driver

Below (till the end of the blog-post) can be used for any container not just using macvlan driver .

With network namespaces, you can have different and separate instances of network interfaces and routing tables that operate independent of each other.
The only namespace we have on each linux machine is a "default" or "global" namespace (physical interfaces exist here).

From the docker-host:


Make directory for network namespaces linking (done on the container host only once):
mkdir -p /var/run/netns

Find PID of the container:
CPID=$(docker inspect --format='{{ .State.Pid }}' containerName)

Create linking
LINK="/var/run/netns/$CPID"
ln -s "/proc/$CPID/ns/net" "$LINK"

All container related proc entries are under:
/proc/$CPID/ns/net

Drop packets on found container PID:
ip netns exec $CPID iptables -I INPUT -j DROP
ip netns exec $CPID iptables -I OUTPUT -j DROP

Allow only incoming and outgoing ICMP packets:
ip netns exec $CPID iptables -I INPUT  -j ACCEPT
ip netns exec $CPID iptables -I OUTPUT -j ACCEPT

Viewing all container iptables rules:
ip netns exec $CPID iptables -L

rm -f $LINK

From the container itself:


To use iptables inside container itself, you must run container with NET_ADMIN privilege
docker run --cap-add=NET_ADMIN --name='ctr0' --hostname='ctr0' -it centos /bin/bash

From the container bash:
yum install net-tools
yum install iptables

Now you can restrict all access but ICMP:
iptables -I INPUT -j DROP
iptables -I OUTPUT -j DROP
iptables -I INPUT -p icmp  -j ACCEPT
iptables -I OUTPUT -p icmp  -j ACCEPT
iptables -L

Linear Algebra 1. What is vector and scalar.

There are 3 views on vectors:

  1. physics view - arrows pointing in space, having length, direction and also you can move it all around - it is still the same vector
  2. computer science - ordered lists of numbers (order matters) and dimension describes length of that list
  3. mathematics - generalize both views: a vector can be anything where there is sensible notion of adding 2 vectors and multiplying a vector by a number: v⃗+w⃗ and 2v⃗
Geometrically vector is an arrow inside a coordinate system and that coordinates shows move from the origin ([0,0] coordinates of the Cartesian coordinate system) to the tip of the vector.

Vector addition - is like encoding the endpoint of the whole way as group of vectors starting at each turn and each of them encoding direction and length of that part of road:

  1. whole way: go 1 to the right and 2 up, then 3 to the right and 1 down:
    1. here we have 2 parts of the whole way:
      1. 1 to the right and 2 up - we'll encode that v⃗  [1,2]
      2. 3 to the right and 1 down - we'll encode that w⃗ [3,-1]
  2. so we have 2 vectors - v⃗  [1,2] and w⃗ [3,-1], 
  3. then v⃗+w⃗ = [1+3 , 2 + (-1)] = [4,1]
  4. for better understanding:
    1. take a piece of squared paper
    2. draw the whole way using notebook squares to measure steps
    3. draw v and w vectors on the Cartesian plane
Multiplication by a number - this means stretching and squishing of vector or changing its direction:
if v⃗  is [1,2], then 2v⃗ = 2[1,2] = [2*1 , 2*2] = [2, 4] . This also called scaling, and numbers used to scale (stretch, squish, change of the direction) are called scalars. Scalar is just a single number.

We can identify each individual number in a vector by it's index: v⃗  [1,3,5,7,9,2]  v3= 5

By convention we can show vector in bold lowercase or in non-bold lowercase with the arrow above (v or v⃗) and vector elements are non-bold lowercase with subscript.

If we want to index a set of elementsof a vector, then we define set containing the indices and write this set as subscript:

  1. x is [2,3,4,6,1,8,4] we need 1st, 4th, 5th elements (x1,x4,x5)
  2. define set S={1,4,5}
  3. xS
x-1 means all elements but x1
x-S means all elements but x1,x4,x5

These materials were used while preparing this blog-post:

Friday, August 2, 2019

Entropy

Entropy is a measure of uncertainty. High entropy means the data has high variance and thus contains a lot of information and/or noise. For instance, a constant function where f(x) = 4 for all x has no entropy and is easily predictable, has little information, has no noise and can be briefly represented . Similarly, f(x) = ~4 has some entropy while f(x) = random_number is very high entropy due to noise.

Information entropy is a concept from information theory. It tells how much information there is in an event. In general, the more certain or deterministic the event is, the less information it will contain. More clearly stated, information is an increase in uncertainty or entropy. The concept of information entropy was created by mathematician Claude Shannon.

Generally speaking, information entropy is the average amount of information conveyed (sent,transported) by an event, when considering all possible outcomes (results).

Example:
we have 3 bags:

  • 1st with 4 red balls
  • 2nd with 3 red and 1 green balls
  • 3rd with 2 red and 2 green balls
Entropy and information are opposites. The more variants of arrangement of the balls we have the more amount of entropy we'll get. So if we'd speak about color probability if one ball is taken from the bag:

  • 1st bag have 100% probability of red color, so this bag has the least entropy
  • 2nd bag has 75% probability of red and 25% probability of green color, has medium entropy
  • 3rd bag has 50% probability of red and 50% probability of green color, has the greatest entropy


Thursday, August 1, 2019

Tabular Data

Tabular data are opposed to relational data, like SQL database. In tabular data, everything is arranged in columns and rows. Every row have the same number of column (lacking information or missing value substituted by "N/A" (also zero values, as SQL NULL value, are not allowed in tabular data structure). The first line of tabular data is most of the time a header, describing the content of each column. The most used format of tabular data in data science is CSV (Comma-Separated Values). Every column is surrounded by a character (a tabulation, a coma ..), delimiting this column from its two neighbors.
The best is to think of tabular data as being "organized by row" where each row corresponds to a unique identifier such as the time a measurement was made (opposite in SQL where keys are used as unique identifier). For example you can store phone-book as tabular data and each row shows persons Name-Surname and Phone Number. To find relations between rows in tabular data you'll need first load all data in memory and only after that can find relations between rows (example: find all persons with numbers starting with +994 which is code of Azerbaijan). If this phone-book will be in relation structure, then one phone-book table:
  1. tabular data:
    • name;surname;address;zip;phone-number
    • name1;surname1;addressX;zipA;phone1,phone2
    • name2;surname2;addressY;zipB;phone1
    • name3;surname3;addressZ;zipA;phone1,phone2
  2. due to First Normal Form (1NF) - no repeating groups ("phone" is group - two columns like "phone1" and "phone2", or one column "phone" with "phone1,phone2" data are not allowed by 1NF). 1NF adds redundant/repeated values to data:
    • name;surname;address;zipCode;phoneNumber
    • name1;surname1;addressX;zipA;phone1
    • name1;surname1;addressX;zipA;phone2
    • name2;surname2;addressY;zipB;phone1
    • name3;surname3;addressZ;zipA;phone1
    • name3;surname3;addressZ;zipA;phone2
  3. due to Second Normal Form (2NF) - 1NF + all the non-key columns are dependent on the table’s primary key, the table serves a single purpose (each column must depend on the primary key and serve to describe what the primary key identifies, if not - move that column into another table). If we add primary-key rowID, then this key will uniquely describe each row having unique number for that person, but person itself is not describes purpose of the primary-key, so we'll move all person related data to the other table. Main idea of the 2NF is to reduce amount of redundant/repeated data. 
      1. We use table to store all person related stuff (name, surname, address, zip-code):
        • personID;name;surname;address;zipZode
        • 100;name1;surname1;addressX;zipA
        • 200;name2;surname2;addressY;zipB
        • 300;name3;surname3;addressZ;zipA
      2. Now our phone-numbers table will be (we must add rowID to uniquely identify each row) and it is in 2NF:
        • rowID;personID;phoneNumber
        • 1;100;phone1
        • 2;100;phone2
        • 3;200;phone1
        • 4;300;phone1
        • 5;300;phone2
  4. due to Third Normal Form (3NF) - 2NF + contains only columns that are non-transitively dependent on the primary key. Non-transitively dependent means non-through dependent. Dependence - age depends on birth-date.  Transitive dependency - we have 3 columns PK, BMI (Body Mass Index) , oWtf (Over Weight True-Flase), here PK helps to find BMI and oWtf, but oWtf also depends on BMI as BMI>25 is overweight, so oWft relies on PK through BMI.  So all columns in table are dependent only on primary-key (2NF) and not on other columns:
    1. Our phone-number table is 3NF:
      • rowID;personID;phoneNumber
      • 1;100;phone1
      • 2;100;phone2
      • 3;200;phone1
      • 4;300;phone1
      • 5;300;phone2
    2. But our person table is only 2NF, each column related to primary-key (PK), bot not 3NF, because we can use PK to find address of  a person and also can use PK to find zip code of the person, but at the same time address depends on zip code, this is transitive dependency:
      1. Move address an zip code to the separate table:
        • addrID;address;zipCode
        • 111;addressX;zipA
        • 222;addressY;zipB
        • 333;addressZ;zipA
      2. Now our person table will be:
        • personID;name;surname;addrID
        • 100;name1;surname1;111
        • 200;name2;surname2;222
        • 300;name3;surname3;333

So we can say that relational data structures:

  1.  are the same like tabular but with applied normalization, so that one table of tabular data becomes several relational tables - relations
  2. allows zero values while tabular data doesn't
  3. to query tabular data you need to load all data into RAM