Friday, September 3, 2021

vCenter - File system `/storage/log` is low on storage space

After a recent VCSA reboot, I was seeing the infamous `no healthy upstream` error from vCenter.

The first place to check for issues like this is VMware's Virtual Appliance Management Interface (VAMI), located by default via HTTPS on port 5480. An administrator can use the appliance root password for this particular interface.

When reviewing this issue with the VAMI, I saw the following error:

Now, VCSA by design automatically rotates most logs available on the appliance using the open-source tool logrotate, but nothing in this directory appears to be managed:

root@vcenter [ / ]# grep \/storage\/log /etc/logrotate.d/*

I'd say this particular log partition is going to need some manual cleanup every now and then. To open up the CLI, SSH into vCenter and execute the following command:
Command> shell
Shell access is granted to root

First, let's get an idea of how full the disks are:
Note: The -m switch converts units into Megabytes
root@vcenter [ ~ ]# df -m
Filesystem 1M-blocks Used Available Use% Mounted on
devtmpfs 5982 0 5982 0% /dev
tmpfs 5993 1 5992 1% /dev/shm
tmpfs 5993 2 5992 1% /run
tmpfs 5993 0 5993 0% /sys/fs/cgroup
/dev/sda3 46988 7199 37374 17% /
tmpfs 5993 5 5988 1% /tmp
/dev/mapper/dblog_vg-dblog 15047 185 14080 2% /storage/dblog
/dev/mapper/vtsdb_vg-vtsdb 10008 68 9412 1% /storage/vtsdb
/dev/mapper/vtsdblog_vg-vtsdblog 4968 36 4661 1% /storage/vtsdblog
/dev/sda2 120 30 82 27% /boot
/dev/mapper/log_vg-log 10008 9475 6 100% /storage/log
/dev/mapper/core_vg-core 25063 45 23723 1% /storage/core
/dev/mapper/db_vg-db 10008 507 8974 6% /storage/db
/dev/mapper/updatemgr_vg-updatemgr 100273 1953 93185 3% /storage/updatemgr
/dev/mapper/netdump_vg-netdump 985 3 915 1% /storage/netdump
/dev/mapper/lifecycle_vg-lifecycle 100273 3364 91775 4% /storage/lifecycle
/dev/mapper/autodeploy_vg-autodeploy 10008 37 9444 1% /storage/autodeploy
/dev/mapper/imagebuilder_vg-imagebuilder 10008 37 9444 1% /storage/imagebuilder
/dev/mapper/seat_vg-seat 10008 1185 8295 13% /storage/seat
/dev/mapper/archive_vg-archive 50133 16373 31185 35% /storage/archive

The log partition is definitely full. To take an inventory of disk usage, we'll use the du utility, with the s (summarize) and m (megabytes) switches enabled, and then pass the output to sort with the n (numerical) and r (reverse) switches enabled to focus on the most important first.
root@vcenter [ / ]# du -sm /storage/log/vmware/* | sort -n -r
2578 /storage/log/vmware/eam
2286 /storage/log/vmware/lookupsvc
785 /storage/log/vmware/sso
781 /storage/log/vmware/vsphere-ui
530 /storage/log/vmware/vmware-updatemgr

Examining these folders further, quite a few of these are old and never rotated. VMware provides the following guidance on what's safe or isn't. Generally, Linux has issues with files being deleted out from under it, so obviously rotated logs can be safely removed. If this is a production system, I'd recommend calling VMware GSS instead of taking it upon yourself. The above command (du -sm * | sort -nr) can be used in any working directory to see what is filling up the logs the most. Here are a few examples of what I deleted to make room:
rm -rf /storage/log/vmware/eam/web/localhost-2020-*
rm -rf /storage/log/vmware/eam/web/localhost_access.2020*
rm -rf /storage/log/vmware/eam/web/catalina-2020*

From here, I like to verify that space is cleared:
root@vcenter [ / ]# df -m | grep \/storage\/log
/dev/mapper/log_vg-log 10008 5793 3688 62%

Catalina and Tomcat names for the same thing. This software package proxies inbound HTTP requests to specific applications, allowing many developers to build code without having to construct a soup-to-nuts HTTP server. Other similar (but more recent) projects include Python's Flask.

With HTTP Proxies and servers, it is useful to keep comprehensive records indicating "who did what", both for security reasons ("whodunit") and for debugging reasons. As a result, Tomcat is a serious log-hog wherever it exists, and it almost never reviews old logs. This is why I evaluated the change as relatively safe.

If this was not an appliance, I would have added a logrotate spec to automatically delete old files from this directory, but it is not recommended to alter VCSA in this way.

No comments:

Post a Comment

Get an A on with VMware Avi / NSX ALB (and keep it that way with SemVer!)

Cryptographic security is an important aspect of hosting any business-critical service. When hosting a public service secured by TLS, it is ...