Wednesday, January 25, 2017

OVS (Open vSwitch)

Overview
OVS concepts:
  1. A switch contains ports
  2. A port may have one or more interfaces (Bonding allows more than one interface per port)
  3. Packets are forward by flow (tunnel ID / IPv4-IPv6 src or dst IP / input port / Ethernet frame type / VID / TCP-UDP src or dst port / MAC src or dst / ToS-DSCP / ARP-ND src or dst address)  
Benefits from adding OVS to a KVM:
  1. host can access guests from host via IP address
  2. you can change network configuration and have no need to reboot host (but you must reboot host if 'service network restart' command is issued)
  3. no need to create vlans on host's interface and setting them up (everything is done automatically after 'ovs-vsctl set port port-name trunks=VID1,VID2,...,VIDn' command is issued)

Tuesday, January 24, 2017

Python Ansible Part1

In one of my previous posts I wrote that I couldn't use Fabric with Cisco IOS as I want, so I decided to try Ansible (the network I admin is heterogeneous but mostly is Linux, Cisco IOS).
Ansible is also used to automatise management process but it's more complex and thus more feature rich than Fabric.

Installation and understanding:
[admin@localhost ~]$ sudo pip install ansible
#In order to use `ansible_ssh_pass`:
[admin@localhost ~]$ sudo yum install sshpass

[admin@localhost ~]$ansible --version
ansible 2.2.1.0
config file =
configured module search path = Default w/o overrides

[admin@localhost ~]$ sudo mkdir  /etc/ansible
[admin@localhost ~]$ sudo vi /etc/ansible/hosts
[voip-srv] #host-group name
#Ansible will use `asterisk` as name for root@172.16.0.170:22
asterisk ansible_host=172.16.0.170 ansible_ssh_pass='123456'

[voip-srv:vars]
ansible_user=root
ansible_connection=ssh
ansible_port=22

[cisco-867]
c867-cc ansible_host=172.16.45.3 ansible_ssh_pass='123456'

[cisco-867:vars]
ansible_user=admin
ansible_connection=ssh
ansible_port=2200

When connecting to a server first time this message can appear:
"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."
To avoid this message, if you don't want to connect to this server manually via ssh and answer `yes` to `Are you sure you want to continue connecting`:
[admin@localhost ~]$ export ANSIBLE_HOST_KEY_CHECKING=False

Ansible Pattern
In Ansible pattern is a string describing which hosts to manage. It can be string (asterisk) or wildcard (all which equals to *) or regex (~(voip|web)-srv) or  (aster*) etc. We specify a module name after `-m`. `ping` module is not ICMP ping - it logins via SSH and verifies if appropriate python  version is installed:
[admin@localhost ~]$ ansible all   -m ping
asterisk | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
 [WARNING]: sftp transfer mechanism failed on [172.16.45.3]. Use ANSIBLE_DEBUG=1 to see detailed information

 [WARNING]: scp transfer mechanism failed on [172.16.45.3]. Use ANSIBLE_DEBUG=1 to see detailed information

c867-cc | FAILED! => {
    "failed": true,
    "msg": "failed to transfer file to \"` echo ~/.ansible/tmp/ansible-tmp-1485262768.12-203384291421785 `\" ) && sleep 0'\"/ping.py:\n\nAdministratively disabled.\n"
}

To view all available Ansible modules:
[admin@localhost ~]$ ansible-doc -l
To learn how to use module. Add module name after `ansible-doc`:
[admin@localhost ~]$ ansible-doc ios_config
> IOS_CONFIG

  Cisco IOS configurations use a simple block indent file syntax for segmenting configuration into sections.  This
  module provides an implementation for working with IOS configuration sections in a deterministic way.

  * note: This module has a corresponding action plugin.

Ansible Ad-Hoc Commands
We also can start using Ansible  without further configuration using `raw` module. In Ansible this is called Ad-Hoc commands:
[admin@localhost ~]$ ansible asterisk -m raw -a "asterisk -rx 'core show uptime'"
asterisk | SUCCESS | rc=0 >>
System uptime: 1 week, 1 day, 7 hours, 15 minutes, 21 seconds
Last reload: 1 week, 1 day, 7 hours, 15 minutes, 21 seconds
Shared connection to 172.16.0.170 closed.

To check state of the service on a host:
[admin@localhost ~]$ ansible asterisk -m service -a "name=asterisk state=started"
asterisk | SUCCESS => {
    "changed": false,
    "name": "asterisk",
    "state": "started"
}

Ansible Configuration File
Ansible configuration is processed in the below order (Ansible first searches for this files in the given order and then uses the first file found - no files merged):
  1. ANSIBLE_CONFIG (an environment variable)
  2. ansible.cfg (in the current directory)
  3.  .ansible.cfg (in the home directory)
  4. /etc/ansible/ansible.cfg

When installing via pip, files like ansible.cfg and ansible/hosts are not created automatically (but autogenerated when using yum or apt). We can create ansible.cfg manually or download it:
[admin@localhost ~]$ sudo touch /etc/ansible/ansible.cfg
OR
As on - https://ansible-tips-and-tricks.readthedocs.io/en/latest/ansible/install/
sudo wget -O /etc/ansible/ansible.cfg https://raw.githubusercontent.com/cyverse/ansible-tips-and-tricks/master/ansible.cfg


[admin@localhost ~]$ ansible --version
ansible 2.2.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

Ansible supports this systems:
Linux, BSD, Windows (via WinRM and PowerShell), various networking devices (but most Ansible modules are either Desktop or Server OS oriented and won't work on networking devices - modules like `raw`  will remain working).

Note as of 2017-04-27:
for network equipment `ansible_connection` variable must be set to `local` not `ssh`.

Simple steps to troubleshoot network on Windows machine

Few days ago I faced one interesting problem - client was able to enter all sites besides one. Below I list all steps used to identify the source of the problem:
  1. verify settings on the NIC (Network Interface Card):
    ipconfig /all | findstr /ic:"adapter" /ic:"ip address" /ic:"mask" /ic:"gateway"
  2. verify IP routes, especially route to the needed site:
    route print
  3. verify DNS response and check if resolved IP addresses in the both outputs are the same (it's needed because site's IP address can be in C:\WINDOWS\system32\drivers\etc\hosts and windows uses DNS server when executing `nslookup` and `hosts-file` when ping is executed):
    1. nslookup google.com
    2. ping google.com
  4. Verify connection from cmd (If connection is successful output will be like:
    Trying 62.212.252.119...
    Connected to google.com.
    Escape character is '^]'.)
    1. telnet google.com 80
    2. telnet google.com 443
  5. ping needed site. If error occurred, lower packet size until ping returns normal answer:
    ping google.com -l 1500
In my case problem was in MTU size in the router at the customer side.

Thursday, January 19, 2017

Python fabric Part2

Errors handling:
command-running operations like run, local etc. can return objects containing their execution details: .failed or .return_code
settings - gives ability to change env variables only for needed chunk of the code
env warn_only setting gives us ability to turn aborts and errors into flexible error handling
abort - used to abort execution
confirm function from contrib.console used for simple `[Y/n]` prompts
prompt used to interactively get information from fab user
validate in run is used as user-input validator (regular expression) and continues prompting if user-input is not valid
execute used to execute already defined fab commands in other fab command (simply calling a task function does not take into account decorators such as roles)

from fabric.api import settings, abort, run, prompt, execute,hosts, task
from fabric.contrib.console import confirm

@task
@hosts('localhost') # if we won't specify this "fab deploy" will return: "No hosts found. Please specify (single) host string for connection" because  execute uses attributes set-up in executed command itself
def dmidecode_test():
 with settings(warn_only=True):
  shell_command = r"sudo dmidecode -t 1 | grep -i 'serial number'"
  result = run(shell_command)
  if result.failed and not confirm("dmidecode is not installed. Do you need to install"):
   abort("Execution aborted. dmidecode is not installed")
  elif result.failed:
   answer =  prompt("Say `y/n` to yum?", validate="[yn]")
   run("sudo yum install dmidecode -%s" % answer)
  else:
   pass

@task
@hosts('localhost')
def dmidecode():
 run("sudo dmidecode -t 1 | grep -i 'serial number'")
 run("sudo dmidecode -t 16 | grep -i 'maximum capacity'")

@task
@hosts('10.1.1.1')
def deploy():
 execute(dmidecode_test)
 execute(dmidecode)

[admin@localhost ~]$ fab --list
Available commands:
    deploy
    dmidecode
    dmidecode_test
[admin@localhost ~]$

[admin@localhost ~]$ fab deploy
[10.1.1.1] Executing task 'deploy'
[localhost] Executing task 'dmidecode_test'
[localhost] run: sudo dmidecode -t 1 | grep -i 'serial number'
[localhost] Login password for 'admin':
[localhost] out: [sudo] password for admin:
[localhost] out: sudo: dmidecode: command not found
[localhost] out:
Warning: run() received nonzero return code 1 while executing 'sudo dmidecode -t 1 | grep -i 'serial number''!
dmidecode is not installed. Do you need to install [Y/n] n
Fatal error: Execution aborted. dmidecode is not installed
Aborting.
Disconnecting from localhost... done.
[admin@localhost ~]$


[admin@localhost ~]$ fab deploy
[10.1.1.1] Executing task 'deploy'
[localhost] Executing task 'dmidecode_test'
[localhost] run: sudo dmidecode -t 1 | grep -i 'serial number'
[localhost] Login password for 'admin':
[localhost] out: [sudo] password for admin:
[localhost] out: sudo: dmidecode: command not found
[localhost] out:
Warning: run() received nonzero return code 1 while executing 'sudo dmidecode -t 1 | grep -i 'serial number''!
dmidecode is not installed. Do you need to install [Y/n] y
Say `y/n` to yum? y
[localhost] run: sudo yum install dmidecode -y
...
[localhost] Executing task 'dmidecode'
[localhost] run: sudo dmidecode -t 1 | grep -i 'serial number'
[localhost] out: [sudo] password for admin:
[localhost] out:     Serial Number: NXV54ER0246021DE76
[localhost] out:
[localhost] run: sudo dmidecode -t 16 | grep -i 'maximum capacity'
[localhost] out: [sudo] password for admin:
[localhost] out:     Maximum Capacity: 16 GB
[localhost] out:
Done.
Disconnecting from localhost... done.
[admin@localhost ~]$










Linux useful one liners

File operations

Search directory for large files (larger than 2GB) and show 10 largest
find Desktop/ -size +2G | xargs ls -l | sort +4nr -5 | head -n 10
+   larger than
-    smaller than
c   bytes
k   Kilobytes
M  Megabytes
G   Gigabytes

Empty log file:
truncate -s 0 messages

truncate command makes the specified file of a needed size, if size (-s option) is zero, then the file will be empty. 

Find how many times the specified Asterisk SIP extension was unreachable:
grep -Ei "9011.+unreachable" /var/log/asterisk/messages | awk '{split($1,month,"[");  printf " %s %s %s  %s\n", month[2], $2, $7, $10}' | sort -k 2 | uniq -c
      1   Jan 10 '9011'  UNREACHABLE!
     10  Jan 11 '9011'  UNREACHABLE!
      2   Jan 12 '9011'  UNREACHABLE!
      1   Jan 16 '9011'  UNREACHABLE!
     18  Jan 18 '9011'  UNREACHABLE!
      1  Dec 28 '9011'  UNREACHABLE!
      4  Dec 29 '9011'  UNREACHABLE!
      2  Dec 30 '9011'  UNREACHABLE!
      6  Jan 6 '9011'  UNREACHABLE!
      1  Jan 9 '9011'  UNREACHABLE!

Create asterisk dump file
From asterisk server shell:
tcpdump -w - -p -n -s 0 udp > /tmp/cli-capture.pcap
-w = output file
-p = don't start in promiscuous mode
-n = no name resolution
-s 0 = full frame, not only first bytes
udp = only capture udp packets

Then upload pcap file and analyze with Wireshark.

Network Related

View NIC parameters
ethtool eth4
Settings for eth4:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 100Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: pg
    Wake-on: d
    Current message level: 0x0000003f (63)
                   drv probe link timer ifdown ifup
    Link detected: yes

Find needed NIC port
sometimes we can't find needed Ethernet NIC port of the rack server, we can use ethtool to make NIC blinking for specified amount of seconds:
ethtool -p eth4 15

Getting Server Info

Vendor, Model, Serial:
dmidecode -t 1 | grep "Manuf\|Product\|Serial"

Processor Sockets & Installed Processor:
dmidecode -t 4 | grep "Processor\|Socket\|Version"

RAM slots & Installed RAM:
dmidecode -t 17 | grep "Device\|Size\|DIMM_"

HDD  info:
smartctl -a /dev/your_device_here | grep "Vendor\|Product\|Capacity"

Wednesday, January 18, 2017

Python fabric Part1

Fabric is a Python library and command-line tool helping to simplify management tasks via automation of the tasks.
From personal experience:
Fabric is very good for Linux tasks automation but I tried a lot to automate Cisco IOS tasks and failed (it seems like Fabric reconnects to the target system every time when you use `run` command , so after issuing `run('enable', shell=False)`  when you send `run('configure terminal', shell=False)` you are not in `enable mode` and end with error. But if you want to auto-connect to the Cisco device and then enter commands manually - you can use `from fabric import open_shell` and then use `open_shell()` in your script you'll end up connected to the IOS command line where you can add commands manually).

[admin@localhost ~]$ pip install --upgrade pip
[admin@localhost ~]$ pip install fabric

In desired directory:
[admin@localhost ~]$ touch fabfile.py

task decorator makes command available to fab
run executes smth on remote host (local like run but executes only on local
host). If you want to execute command on non-Linux machine - use `shell=False`:
@task
@hosts('cisco@10.3.3.1:2200')
def test1():
run('sh ver', shell=False)
Otherwise you will experience an error like this:
Line has invalid autocommand "/bin/bash -l -c "sh ver""
hosts decorator makes command available for specified hosts
roles decorator makes command available for specified roledefs
runs_once decorator executes command only one time even if hosts list or role for this command is specified
[admin@localhost ~]$ vi fabfile.py
from fabric.api import env, task, run, roles, runs_once, local, hosts

env.hosts = ['127.0.0.1', 'localhost']

env.roledefs = {
 'switch'  : ['10.1.1.1', '10.1.1.2'],
 'asterisk' : ['root@10.2.2.1']
}

@task
@runs_once
@roles('switch')
def list_roles():
 for role in env.roledefs:
  print role

@task
@roles('switch')
def hello(name='World'):
 print("Hello %s!" % name)

def ldr():
 run('ls -l')

@task
def test_exec():
 print('executed')

@task
@hosts('root@10.2.2.1:2200')
def test1():
 run('ls -l')

@task
def test2():
 local('ls -l')

[admin@localhost ~]$ fab --list
Available commands:
    hello
    list_roles
    test_exec
    test1
    test2

[admin@localhost ~]$ fab hello
[10.1.1.1] Executing task 'hello'
Hello World!
[10.1.1.2] Executing task 'hello'
Hello World!

[admin@localhost ~]$ fab hello:name='Fellow'
[127.0.0.1] Executing task 'hello'
Hello Fellow!
[localhost] Executing task 'hello'
Hello Fellow!

Per-task, command-line host lists (fab mytask:host=host1 or fab mytask:role=role1) override absolutely everything else:
[admin@localhost ~]$ fab hello:role=asterisk
[10.2.2.1] Executing task 'hello'
Hello World!

If command have no hosts or roles specified with decorator and env.host is set, this command will be executed for all hosts in env.host:
[admin@localhost ~]$ fab test_exec
[127.0.0.1] Executing task 'test_exec'
executed
[localhost] Executing task 'test_exec'
executed

CentOS 6 Squid 'Your cache is running out of filedescriptors'

  1. [root@localhost ~]# grep 'Your cache is running out of filedescriptors' /var/log/squid/cache.log
    2017/01/18 11:41:42| client_side.cc(3070) okToAccept: WARNING! Your cache is running out of filedescriptors
  2. [root@localhost ~]# squidclient mgr:info | grep 'file descri'
        Maximum number of file descriptors:   1024
        Available number of file descriptors:  178
        Reserved number of file descriptors:   100
  3. [root@localhost ~]# ulimit -a | grep 'open files'
    open files                      (-n) 1024
  4. Update system security limits (Each line describes a limit for a user in the form: <domain>  <type>  <item>  <value>) in /etc/security/limits.conf:
     * - nofile 4096
  5. [root@localhost ~]#  ulimit -a | grep 'open files'
    open files                      (-n) 4096
  6. Append `max_filedesc 4096` line to the vi /etc/squid/squid.conf
  7. service squid restart
  8. [root@localhost ~]# squidclient mgr:info | grep 'file descri'
        Maximum number of file descriptors:   4096
        Available number of file descriptors: 3842
        Reserved number of file descriptors:   100

Monday, January 16, 2017

OVS Installation on CentOS 6 or 7

  1. yum update -y
  2. yum install gcc make python-devel openssl-devel kernel-devel graphviz kernel-debug-devel autoconf automake rpm-build redhat-rpm-config libtool checkpolicy selinux-policy-devel python-six
  3. We can find OpenvSwitch in RDO repo (RDO is a community of people using and deploying OpenStack on CentOS, Fedora and RHEL):
    1. wget https://rdoproject.org/repos/rdo-release.rpm 
    2. rpm -i rdo-release.rpm
  4. yum install openvswitch
  5. Enable OVS daemon on startup and start it:
    1. On CentOS 6
      1. chkconfig openvswitch on
      2. service openvswitch start
    2. On CentOS 7:
      1. systemctl -l start openvswitch.service
      2. systemctl -l enable openvswitch.service
  6.  In order to avoid possible conflicts in networking:
    1. yum remove NetworkManager
  7. To verify your OVS installation:
    1. ovs-vsctl -V
      ovs-vsctl (Open vSwitch) 2.5.0
      Compiled Mar 18 2016 15:00:11
      DB Schema 7.12.1
    2.  ovs-vsctl show
      1c490f2d-68c0-4dd0-a0de-cdcaa944711a
      ovs_version: "2.5.0"