Performance monitoring

This TIL will be a bit longer because it aims to list some of the most useful and most important performance monitoring tools available on Linux. Some are usually installed by default and some must be installed manually.

The tools are split into several categories.

The base of this page has been constructed by selecting information from: https://www.tecmint.com/command-line-tools-to-monitor-linux-performance/. It has then been enhanced based on my personal preference and uses.

General

This section shows tools displaying general information about the system state and usage. When troubleshooting a Linux machine, these are the tools to be inspected first.

top

The first tool to execute, to inspect te running process on a system interactively.

top - 06:49:14 up 1 day, 23:04,  1 user,  load average: 0.52, 0.86, 1.12
Tasks: 220 total,   1 running, 219 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.9 us,  0.7 sy,  0.0 ni, 95.9 id,  0.0 wa,  0.3 hi,  0.1 si,  0.0 st
MiB Mem :   7879.9 total,    814.6 free,   2198.2 used,   4867.1 buff/cache
MiB Swap:   2294.0 total,   2271.5 free,     22.5 used.   5041.8 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1034 john      20   0 4007988 373052 120692 S   8.6   4.6   6:07.41 gnome-shell
10943 john      20   0 9240860 653416 143408 S   3.3   8.1   3:39.64 firefox
 1352 john      20   0  503356  43776  32716 S   1.0   0.5   0:06.27 gnome-terminal-
    1 root      20   0  220192   8968   6852 S   0.3   0.1   0:03.90 systemd
  563 root      20   0   57172   6336   5500 S   0.3   0.1   0:00.64 systemd-logind

vmstat

This less known tool is also very useful to get an overview about a lot of components from the system, but must be installed manually. The tool provides several options, but the most simple use case is:

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0  23004 790600 341872 4640592    0    2   723   581  514 1102 22  5 73  0  0

The default command displays summary values since the system boot. To understand all values, have a look at the man page of the tool. Below are some more examples of vmstat commands:

vmstat 3 10: execute vmstat every 3 seconds for 10 times
vmstat -a: display active an inactive memory
vmstat -s: display a variety of event counters and statistics
vmstat -d: display disk statistics
vmstat -S M: use megabyte unit instead of kilobyte

nmon

Global performance and system utilization interactive monitoring tool. It can be used interactively or store data in CSV files. It can collect a lot of information and present nicely: CPU, Memory, Disk Usage, Network, Processes, NFS, Kernel, etc. The panels must be toggled inside the application by pressing the corresponding letter. You can toggle help with h.

If you use always the same panels, you can load them automatically by setting the NMON variable.

NMON=cmdnk nmon

Processes

htop

It is an advanced and interactive top command. Actually it is my advised replacement for htop.

progress

Monitoring tool to show the progress of basic coreutils linux commands such as: cp, mv, dd, tar, gzip, …

The official documentation is located here: https://github.com/Xfennec/progress

A few examples from the official documentation:

Monitor all current and upcoming instances of coreutils commands in a simple window:

watch progress -q

See how your download is progressing:

watch progress -wc firefox

Look at your Web server activity:

progress -c httpd

File system and I/O

Some of the most useful tools for debugging file system and disk IO issues.

iotop

It is a live and real time monitoring of disk I/O operations per process. It shows the number of bytes read from / written to the disk, the IO capacity usaged of the media, etc. Very useful to find why your disk is slow or heavily used.

$ iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
    4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
    6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H-kblockd]
    8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
    9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
   10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_preempt]

lsof

This command lists all open files and the owning process. As everything is a file in Linux, this can display disk files, processes, pipes, mqueues, devices, sockets, …

The default command is very verbose. Look at the man page to get some insights on what the values mean. Some useful switches:

lsof -u geoffrey: list the files opened by the user geoffrey
lsof -i TCP:22: find running processes with connection on port 22
lsof -i 4: list IPv4 network files
lsof -i 6: list IPv6 network files
lsof -i TCP:1-100: list open connections on port range 1 to 100
lsof -p 1234: list open files belonging to process id 1234

$ lsof
COMMAND     PID   TID TASKCMD   USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
systemd     983                 john  cwd       DIR                8,2     4096          2 /
systemd     983                 john  rtd       DIR                8,2     4096          2 /
systemd     983                 john  txt       REG                8,2  1411208      14142 /usr/lib/systemd/systemd
systemd     983                 john  mem       REG                8,2   561040      12192 /usr/lib/libsystemd.so.0.23.0
systemd     983                 john  mem       REG                8,2   333728      41178 /usr/lib/libdbus-1.so.3.19.8
systemd     983                 john  mem       REG                8,2   133000      60752 /usr/lib/libnl-3.so.200.26.0

iostat

The iostat tool reports global CPU and input / output statistics for each partition on the system.

$ iostat
Linux 4.18.6-arch1-1-ARCH (core-m)      09/14/2018      _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          21.87    0.45    5.60    0.15    0.00   71.93

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.88        27.86        43.78    8422261   13232504
dm-0              0.72        23.94        40.29    7236249   12177052

df

This is the global disk usage analysis tool. Can be combined with -h for human readable data. The result is immediate - the tool doesn’t scan the file system.

$ df -h
Filesystem             Size  Used Avail Use% Mounted on
dev                    3.9G     0  3.9G   0% /dev
run                    3.9G  1.3M  3.9G   1% /run
/dev/sda2               14G   11G  1.8G  86% /
tmpfs                  3.9G   32M  3.9G   1% /dev/shm
tmpfs                  3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                  3.9G  1.7M  3.9G   1% /tmp
/dev/sda3              2.0G  386M  1.5G  22% /opt
tmpfs                  788M   24K  788M   1% /run/user/120
/dev/mapper/_dev_sda4  100G   72G   23G  76% /data
tmpfs                  788M  2.6M  786M   1% /run/user/1000

du

This tool can give the size of a single files or directories. It will scan the disk to compute the size, so it might be slow. My best use case is du -hs folder to get the size of folder. Note that it will start to scan mounted device.

$ du -hs source
396K    source

Network stats

The best tool to show which application is sending which amount of data is nethogs

nethogs

The result is self-describing:

NetHogs version 0.8.1

  PID USER     PROGRAM                          DEV        SENT      RECEIVED
10364 john     /usr/share/spotify/spotify       tun0       3.707      40.496 KB/sec
 9162 john     /usr/lib/gvfs/gvfsd-http         tun0       0.454       2.342 KB/sec
 8608 john     firefox                          tun0       0.000       0.000 KB/sec
 9398 john     thunderbird                      tun0       0.000       0.000 KB/sec
 8220 john     dropbox                          tun0       0.000       0.000 KB/sec
10238 john     /usr/share/spotify/spotify       tun0       0.000       0.000 KB/sec
 8224 john     /usr/bin/owncloud                tun0       0.012       0.000 KB/sec
    ? root     unknown TCP                                 0.000       0.000 KB/sec

TOTAL                                                      4.173      42.838 KB/sec

iftop

Live and real time network bandwidth vizualization tool. Can be useful to check network speed, see through which interface the traffic is going and to which endpoint.

An example use case to get the network traffic on interface eth0:

iftop -i eth0 -B

nload

nload is yet another traffic stats reporting connection speed. It shows current, peak and average speed with a funky animation.

A typical usage example for getting the network traffic on interface eth0 and speed in MBytes/s:

nload -u M eth0

Network traffic analysis

tcpdump

Tcpdump could have its own dedicated TIL because it has so many options. But here are some useful simple commands:

tcpdump -i eth0: capture packets from the interface eth0
tcpdump -XX -i eth0: display packets and its data in HEX and ASCII format
tcpdump -w capture.pcap -i eth0: save packets to a capture.pcap file
tcpdump -r capture.pcap: read from a previously saved capture file
tcpdump -i eth0 src 172.26.10.10: capture packets for a specific destination IP
tcpdump -i eth0 dst 172.26.10.10: capture packets coming from specific IP

netstat

Netstat can monitor open ports and incoming and outgoing traffic. Below some useful commands:

netstat -plunt: list listening applications
netstat -a: list all TCP and UDP ports
netstat -s: statistics by protocol, ca be combined with u (UDP) and t (TCP). Can show bad segments, retransmissions, failures, …
netstat -at: only TCP connections
netstat -au: ony UDP connections
netstat -l: all listening connections, can be combined with u (UDP) and t (TCP)
netstat -lx: all UNIX listening ports
netstat -tp: display PID and program name
netstat -r: display kernel routing table
netstat -ie: show kernel interface table
netstat -i: show network interface packet statistics
netstat -c 10: print continuously, every 10 seconds
netstat --statistics --raw: display a lot of network statistics like number of packets received, errors, …

iptraf

More advanced iftop tool, which collects in additional and more advanced informaiton.

arpwatch

Monitors ARP address resolution happening on the network. Useful to debug ARP resolution on the network and ARP spoofing. It can even send e-mail alerts when addresses change.

Load average

Linux makes use of three “magic” numbers that get used in several of these tools to describe the load average of the system. They could be quite confusing for beginners, but actually there are easy to understand. This is one example:

load average: 0.20, 1.05, 5.09

The numbers define the load average for the last 1 minute (1.05), the last 5 minutes (0.70) and the last 15 minutes (5.09).

The number defines the number of processes which are in running, waiting or uninterruptible sleep states (e.g. waiting for I/O). The number have to be compared to the number of CPU on the system.

So if the example above is a system with 2 CPUs, it means that the system is fully loaded when the value is equal 2: * over the last minute, the system was mostly idle * over the last 5 minutes, it was half used * over the last 15 minutes, it was overloaded: there were many processes waiting for the CPU