How can I use the atop tool and the atopsar tool to get historical usage statistics for processes on my EC2 Linux instance?

6 minute read
2

I want to monitor historical resource (CPU, Memory and Disk) usage on my Amazon Elastic Compute Cloud (Amazon EC2) instance.

Short description

The atop tool is a performance monitoring tool that records historical resource usage for later analysis. This tool also provides real-time reporting. You can retrieve usage for CPU utilization, memory consumption, and disk I/O for each process and thread. The atop tool stays active as a background service while it records the statistics. This allows for long-term server analysis, the data is stored for 28 days, by default.

Note: Atop logs data only after it's installed. Historical performance data can't be retrieved before the date of atop's installation.

Resolution

Install atop

For installation instructions, see How do I configure the ATOP Monitoring and SAR monitoring tools for my EC2 instance running Amazon Linux, RHEL, CentOS, or Ubuntu?

Create atop historical report logs

The atop tool creates log files in /var/log/atop. These files are named in the following format: atop_ccyymmdd. For example, atop_20210902 is the recording for September 2, 2021.

To access the log file, run the following command:

atop -r /var/log/atop/atop\_ccyymmdd

Replace atop_ccyymmdd with the date that you want to review.

See the following example of the command and log file:

atop -r /var/log/atop/atop_20210902 
ATOP - ip-172-20-139-91                2021/09/02  17:03:44                ----------------                 3h33m7s elapsed
PRC |  sys    6.51s  |  user   7.85s  |  #proc    103  |  #tslpi    81 |  #tslpu     0  |  #zombie    0  |  #exit      0  |
CPU |  sys     0%  |  user      3%  |  irq       0%  |  idle    197% |  wait      0%  |  ipc notavail  |  curscal   ?%  |
cpu |  sys     0%  |  user      1%  |  irq       0%  |  idle     98% |  cpu000 w  0%  |  ipc notavail  |  curscal   ?%  |
cpu |  sys     0%  |  user      1%  |  irq       0%  |  idle     98% |  cpu001 w  0%  |  ipc notavail  |  curscal   ?%  |

In the following output example, the first recorded snapshot was at 2021/09/02 17:03:44. To move forward to the next snapshot, press the t key (lowercase) on the keyboard. To return to the previous snapshot, press the T key (uppercase). To analyze a specific time slot, press the b key and then enter the date and time. The atop tool skips to the time specified in the Enter new time variable:

NET |  lo      ----  |  pcki       2  |  pcko       2  |  sp    0 Mbps |  si    0 Kbps  |  so    0 Kbps  |  erro       0  |
Enter new time (format [YYYYMMDD]hhmm):
  PID              TID              RDDSK              WRDSK             WCANCL              DSK             CMD        1/4

To view different statistics, press the designated shortcut key. The following are example shortcut keys:

  • g: Generic info (default).
  • m: Memory details.
  • d: Disk details.
  • n: Network details. This key works only when the netatop kernel module installed.
  • c: Full command line per process.

To sort the process list, use the following shortcut keys:

  • C: CPU activity.
  • M: Memory consumption.
  • D: Disk activity.
  • N: Network activity. This key works only if the netatop kernel is installed.
  • A: The most active system resource (auto mode).

Press the h key to view the help documentation.

Create atop report logs for a certain time period

To access the log file and extract only a certain time period of performance data, run the command:

atop -r /var/log/atop/atop\_ccyymmdd -b starttime -e endtime -M

Replace atop_ccyymmdd with the date that you want to review. Replace starttime with the start time and endtime with the end time of the performance period.

For example, the following command returns performance data captured for memory on April 22, 2024 between 08:00 and 08:10:

$ atop -r /var/log/atop/atop_20240422 -b 0800 -e 0810 -M

Flags that are used in the example:

  • b: start time
  • e: end time
  • r: specify file
  • M: memory

Generate system activity reports with the atop command

Use the atopsar command to generate system activity reports.

If you use the flag -c, then a report is generated about the current CPU utilization of the system. The following example shows two results of this report, each which is one second apart:

$ atopsar -c 1 2
ip-172-20-139-91  4.14.238-182.422.amzn2.x86_64  #1 SMP Tue Jul 20 20:35:54 UTC 2021  x86_64  2021/09/02

-------------------------- analysis date: 2021/09/02 --------------------------

18:50:16  cpu  %usr %nice %sys %irq %softirq  %steal %guest  %wait %idle  _cpu_
18:50:17  all     0     0    0    0        0       0      0      0   200
            0     0     0    0    0        0       0      0      0   100
            1     0     0    0    0        0       0      0      0   100
18:50:18  all     0     0    0    0        0       0      0      0   200
            0     0     0    0    0        0       0      0      0   100
            1     0     0    0    0        0       0      0      0   100

The atopsar command can analyze data within a specified timeframe. For example, to generate all reports (-A) that starts at 13h00 (-b) and ends at 13h35 (-e) for the current day, run the following command:

atopsar -A -b 13:00 -e 13:35

To retrieve multiple outputs, combine the flags for atopsar into a single command. The following example command queries cpu utilization, process(or) load, and processes & threads:

$ atopsar -cpP

Example output:

ip-172-31-89-231 6.1.84-99.169.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Apr 8 19:19:48 UTC 2024 x86_64 2024/04/22

-------------------------- analysis date: 2024/04/22 --------------------------

07:59:27 cpu %usr %nice %sys %irq %softirq %steal %guest %wait %idle cpu
08:00:27 all 0 0 0 0 0 0 0 4 95
08:01:27 all 0 0 0 0 0 0 0 0 100
08:02:27 all 0 0 0 0 0 0 0 0 100
08:03:27 all 0 0 0 0 0 0 0 0 100

-------------------------- analysis date: 2024/04/22 --------------------------

07:59:27 pswch/s devintr/s clones/s loadavg1 loadavg5 loadavg15 load
08:00:27 203 70 1.07 0.13 0.29 0.14
08:01:27 53 31 0.07 0.05 0.23 0.13
08:02:27 59 31 0.87 0.02 0.19 0.12
08:03:27 68 35 0.22 0.00 0.15 0.10

-------------------------- analysis date: 2024/04/22 --------------------------

07:59:27 clones/s pexit/s curproc curzomb thrrun thrslpi thrslpu procthr
08:00:27 1.07 1.07 114 0 1 83 58
08:01:27 0.07 0.07 114 0 1 83 58
08:02:27 0.87 0.88 109 0 1 83 53
08:03:27 0.22 0.28 105 0 1 76 52

For a detailed list of flags and output values that atop retrieves and displays, see atopsar on the Linux website.

Related information

Why is my EC2 Linux instance becoming unresponsive due to over-utilization of resources?

A guide to atop command in Linux on the DigitalOcean website

AWS OFFICIAL
AWS OFFICIALUpdated a month ago