Performance monitoring
This TIL will be a bit longer because it aims to list some of the most useful and most important performance monitoring tools available on Linux. Some are usually installed by default and some must be installed manually.
The tools are split into several categories.
The base of this page has been constructed by selecting information from: https://www.tecmint.com/command-line-tools-to-monitor-linux-performance/. It has then been enhanced based on my personal preference and uses.
General
This section shows tools displaying general information about the system state and usage. When troubleshooting a Linux machine, these are the tools to be inspected first.
top
The first tool to execute, to inspect te running process on a system interactively.
top - 06:49:14 up 1 day, 23:04, 1 user, load average: 0.52, 0.86, 1.12
Tasks: 220 total, 1 running, 219 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.9 us, 0.7 sy, 0.0 ni, 95.9 id, 0.0 wa, 0.3 hi, 0.1 si, 0.0 st
MiB Mem : 7879.9 total, 814.6 free, 2198.2 used, 4867.1 buff/cache
MiB Swap: 2294.0 total, 2271.5 free, 22.5 used. 5041.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1034 john 20 0 4007988 373052 120692 S 8.6 4.6 6:07.41 gnome-shell
10943 john 20 0 9240860 653416 143408 S 3.3 8.1 3:39.64 firefox
1352 john 20 0 503356 43776 32716 S 1.0 0.5 0:06.27 gnome-terminal-
1 root 20 0 220192 8968 6852 S 0.3 0.1 0:03.90 systemd
563 root 20 0 57172 6336 5500 S 0.3 0.1 0:00.64 systemd-logind
vmstat
This less known tool is also very useful to get an overview about a lot of components from the system, but must be installed manually. The tool provides several options, but the most simple use case is:
$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 23004 790600 341872 4640592 0 2 723 581 514 1102 22 5 73 0 0
The default command displays summary values since the system boot. To understand all values, have a look at the man page of the tool. Below are some more examples of vmstat
commands:
vmstat 3 10
: execute vmstat every 3 seconds for 10 timesvmstat -a
: display active an inactive memoryvmstat -s
: display a variety of event counters and statisticsvmstat -d
: display disk statisticsvmstat -S M
: use megabyte unit instead of kilobyte
nmon
Global performance and system utilization interactive monitoring tool. It can be used interactively or store data in CSV files. It can collect a lot of information and present nicely: CPU, Memory, Disk Usage, Network, Processes, NFS, Kernel, etc. The panels must be toggled inside the application by pressing the corresponding letter. You can toggle help with h
.
If you use always the same panels, you can load them automatically by setting the NMON
variable.
NMON=cmdnk nmon
Processes
htop
It is an advanced and interactive top command. Actually it is my advised replacement for htop.
progress
Monitoring tool to show the progress of basic coreutils linux commands such as: cp, mv, dd, tar, gzip, …
The official documentation is located here: https://github.com/Xfennec/progress
A few examples from the official documentation:
Monitor all current and upcoming instances of coreutils commands in a simple window:
watch progress -q
See how your download is progressing:
watch progress -wc firefox
Look at your Web server activity:
progress -c httpd
File system and I/O
Some of the most useful tools for debugging file system and disk IO issues.
iotop
It is a live and real time monitoring of disk I/O operations per process. It shows the number of bytes read from / written to the disk, the IO capacity usaged of the media, etc. Very useful to find why your disk is slow or heavily used.
$ iotop
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp]
4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp]
6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H-kblockd]
8 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_preempt]
lsof
This command lists all open files and the owning process. As everything is a file in Linux, this can display disk files, processes, pipes, mqueues, devices, sockets, …
The default command is very verbose. Look at the man page to get some insights on what the values mean. Some useful switches:
lsof -u geoffrey
: list the files opened by the user geoffreylsof -i TCP:22
: find running processes with connection on port 22lsof -i 4
: list IPv4 network fileslsof -i 6
: list IPv6 network fileslsof -i TCP:1-100
: list open connections on port range 1 to 100lsof -p 1234
: list open files belonging to process id 1234
$ lsof
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
systemd 983 john cwd DIR 8,2 4096 2 /
systemd 983 john rtd DIR 8,2 4096 2 /
systemd 983 john txt REG 8,2 1411208 14142 /usr/lib/systemd/systemd
systemd 983 john mem REG 8,2 561040 12192 /usr/lib/libsystemd.so.0.23.0
systemd 983 john mem REG 8,2 333728 41178 /usr/lib/libdbus-1.so.3.19.8
systemd 983 john mem REG 8,2 133000 60752 /usr/lib/libnl-3.so.200.26.0
iostat
The iostat
tool reports global CPU and input / output statistics for each partition on the system.
$ iostat
Linux 4.18.6-arch1-1-ARCH (core-m) 09/14/2018 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
21.87 0.45 5.60 0.15 0.00 71.93
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.88 27.86 43.78 8422261 13232504
dm-0 0.72 23.94 40.29 7236249 12177052
df
This is the global disk usage analysis tool. Can be combined with -h
for human readable data. The result is immediate - the tool doesn’t scan the file system.
$ df -h
Filesystem Size Used Avail Use% Mounted on
dev 3.9G 0 3.9G 0% /dev
run 3.9G 1.3M 3.9G 1% /run
/dev/sda2 14G 11G 1.8G 86% /
tmpfs 3.9G 32M 3.9G 1% /dev/shm
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 3.9G 1.7M 3.9G 1% /tmp
/dev/sda3 2.0G 386M 1.5G 22% /opt
tmpfs 788M 24K 788M 1% /run/user/120
/dev/mapper/_dev_sda4 100G 72G 23G 76% /data
tmpfs 788M 2.6M 786M 1% /run/user/1000
du
This tool can give the size of a single files or directories. It will scan the disk to compute the size, so it might be slow. My best use case is du -hs folder
to get the size of folder. Note that it will start to scan mounted device.
$ du -hs source
396K source
Network stats
The best tool to show which application is sending which amount of data is nethogs
nethogs
The result is self-describing:
NetHogs version 0.8.1
PID USER PROGRAM DEV SENT RECEIVED
10364 john /usr/share/spotify/spotify tun0 3.707 40.496 KB/sec
9162 john /usr/lib/gvfs/gvfsd-http tun0 0.454 2.342 KB/sec
8608 john firefox tun0 0.000 0.000 KB/sec
9398 john thunderbird tun0 0.000 0.000 KB/sec
8220 john dropbox tun0 0.000 0.000 KB/sec
10238 john /usr/share/spotify/spotify tun0 0.000 0.000 KB/sec
8224 john /usr/bin/owncloud tun0 0.012 0.000 KB/sec
? root unknown TCP 0.000 0.000 KB/sec
TOTAL 4.173 42.838 KB/sec
iftop
Live and real time network bandwidth vizualization tool. Can be useful to check network speed, see through which interface the traffic is going and to which endpoint.
An example use case to get the network traffic on interface eth0:
iftop -i eth0 -B
nload
nload is yet another traffic stats reporting connection speed. It shows current, peak and average speed with a funky animation.
A typical usage example for getting the network traffic on interface eth0 and speed in MBytes/s:
nload -u M eth0
Network traffic analysis
tcpdump
Tcpdump could have its own dedicated TIL because it has so many options. But here are some useful simple commands:
tcpdump -i eth0
: capture packets from the interface eth0tcpdump -XX -i eth0
: display packets and its data in HEX and ASCII formattcpdump -w capture.pcap -i eth0
: save packets to a capture.pcap filetcpdump -r capture.pcap
: read from a previously saved capture filetcpdump -i eth0 src 172.26.10.10
: capture packets for a specific destination IPtcpdump -i eth0 dst 172.26.10.10
: capture packets coming from specific IP
netstat
Netstat can monitor open ports and incoming and outgoing traffic. Below some useful commands:
netstat -plunt
: list listening applicationsnetstat -a
: list all TCP and UDP portsnetstat -s
: statistics by protocol, ca be combined withu
(UDP) andt
(TCP). Can show bad segments, retransmissions, failures, …netstat -at
: only TCP connectionsnetstat -au
: ony UDP connectionsnetstat -l
: all listening connections, can be combined withu
(UDP) andt
(TCP)netstat -lx
: all UNIX listening portsnetstat -tp
: display PID and program namenetstat -r
: display kernel routing tablenetstat -ie
: show kernel interface tablenetstat -i
: show network interface packet statisticsnetstat -c 10
: print continuously, every 10 secondsnetstat --statistics --raw
: display a lot of network statistics like number of packets received, errors, …
iptraf
More advanced iftop tool, which collects in additional and more advanced informaiton.
arpwatch
Monitors ARP address resolution happening on the network. Useful to debug ARP resolution on the network and ARP spoofing. It can even send e-mail alerts when addresses change.
Load average
Linux makes use of three “magic” numbers that get used in several of these tools to describe the load average of the system. They could be quite confusing for beginners, but actually there are easy to understand. This is one example:
load average: 0.20, 1.05, 5.09
The numbers define the load average for the last 1 minute (1.05), the last 5 minutes (0.70) and the last 15 minutes (5.09).
The number defines the number of processes which are in running, waiting or uninterruptible sleep states (e.g. waiting for I/O). The number have to be compared to the number of CPU on the system.
So if the example above is a system with 2 CPUs, it means that the system is fully loaded when the value is equal 2: * over the last minute, the system was mostly idle * over the last 5 minutes, it was half used * over the last 15 minutes, it was overloaded: there were many processes waiting for the CPU