Sunday, December 20, 2020

vCenter Upgrade Error: `Exception Occurred in precheck phase`

Error presented by VAMI Interface





Troubleshooting

VCSA 7.0 has moved the upgrade process logging to a new location - the log itself is now at /storage/log/vmware/applmgmt/update_microservice.log (actual) or /var/log/vmware/applmgmt/update_microservice.log (symlink)

update_microservice

This appears to be a rough order of operations with this new update process:

  • Pre-Checks: First, the upgrade tries to identify the system being upgraded:
update_microservice::          precheckEventHandler: 148 -     INFO - Precheck event happens
update_b2b::                      precheck: 709 -    DEBUG - Running update prechecks
update_b2b::               b2bRequirements: 479 -    DEBUG - Running B2B Requirements hook and processing the results
update_b2b::                _runScriptHook: 330 -    DEBUG - Running B2B script with hook CollectRequirementsHook
update_b2b::                _runScriptHook: 339 -    DEBUG - update script output to file /var/log/vmware/applmgmt/upgrade_hook_CollectRequirementsHook
extensions::                _findExtension:  83 -    DEBUG - Found script hook <module 'update_script' from '/storage/core/software-update/updates/7.0.1.00200/scripts/update_script.py'>:CollectRequirementsHook'
update_utils::                     isGateway:  83 -    DEBUG - Not running on a VMC Gateway appliance.
update_utils::                  isB2BUpgrade:  72 -    DEBUG - Bundle will execute upgrade: False
update_script::           collectRequirements: 492 -    DEBUG - Checking verisons
update_script::           collectRequirements: 496 -    DEBUG - Source VCSA version = 7.0.1.00100
update_script::           collectRequirements: 500 -     INFO - Target VCSA version = 7.0.1.00200
update_utils::               getRPMBlacklist: 185 -    DEBUG - vCSA deployment Type: embedded
update_b2b::               b2bRequirements: 493 -    DEBUG - Getting packages excluding the ones in blacklist

From there, it picks up the scope for the upgrade, and verifies against common upgrade issues:

update_b2b::               b2bRequirements: 528 -    DEBUG - Calculated packages list 
update_b2b::                     checkDisk: 423 -    DEBUG - Checking for disk utilization
update_b2b::                     checkDisk: 467 -    DEBUG - CheckDisk completed, returning with selected disk partition /storage/updatemgr
update_b2b::                      precheck: 740 -    DEBUG - Estimating time to install..
update_b2b::                 estimate_time: 679 -    DEBUG - Estimating time required for rpm-update, services start-stop and reboot time if its required
update_b2b::                 estimate_time: 682 -    DEBUG - Calculating RPM installation time
update_b2b::              rpm_install_time: 587 -    DEBUG - Reading all rpms present in rpm-manifest.json
update_b2b::              rpm_install_time: 588 -    DEBUG - Estimating installation time for installed rpms and new rpms
update_b2b::       get_installed_rpms_list: 564 -    DEBUG - Getting the list of installed RPMs along with the time of install
update_b2b::       get_installed_rpms_list: 578 -    DEBUG - Completed getting the list of rpms, returning with the list: <class 'list'>
update_b2b::              rpm_install_time: 610 -    DEBUG - Installation time estimated successfully, returning with time for installation 23
update_b2b::                 estimate_time: 684 -    DEBUG - Calculating time to start and stop services
update_b2b::        estimate_time_services: 620 -    DEBUG - Estimating time for services-start and services-stop
update_b2b::        estimate_time_services: 640 -    DEBUG - Completed estimating time for starting and stopping services, returning with the required time: 2
task_manager::                        update:  80 -    DEBUG - UpdateTask: status=SUCCEEDED, progress=100, message={'id': 'com.vmware.appliance.update.prechecks_task_ok', 'default_message': 'Prechecks completed', 'args': []}

In this case, everything looks good. I'm not really sure why it needs the SSO Administrator password, and there isn't much on-line about this. We're seeing three errors after we hit go time:

update_b2b::                   resumeStage:3431 -    DEBUG - 'download' phase is 100% completed. checkAllRpmsArePresent
rpmfunctions::        checkAllRpmsArePresent: 308 -    ERROR - Empty Stage location passed. This cannot be empty.
update_b2b::                   resumeStage:3497 -    ERROR - Exception in resume stage. Exception : {Package discrepency error, Cannot resume!}
task_manager::                        update:  80 -    DEBUG - UpdateTask: status=FAILED, progress=0, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Package discrepency error, Cannot resume!']}
dbfunctions::                       execute:  81 -    DEBUG - Executing {SELECT CASE WHEN count(*) == 0 THEN 0 ELSE 1 END as status FROM progress WHERE _stagekey = 'patch-state' AND _message = 'Stage successful'}
functions::              get_resume_state: 340 -    DEBUG - Resume needed in Stage phase
update_b2b::           install_with_resume:2477 -    DEBUG - Installing version 7.0.1.00200
update_functions::                  readJsonFile: 224 -    ERROR - Can't read JSON file /storage/core/software-update/stage/stageDir.json [Errno 2] No such file or directory: '/storage/core/software-update/stage/stageDir.json'
task_manager::                        update:  80 -    DEBUG - UpdateTask: status=FAILED, progress=0, message={'id': 'com.vmware.appliance.not_staged', 'default_message': 'The update is not staged', 'args': []}
update_b2b::              installPrechecks:2146 -    DEBUG - Exception occurred while checking for discrepancies Update not staged
task_manager::                        update:  80 -    DEBUG - UpdateTask: status=RESUMABLE, progress=0, message={'id': 'com.vmware.appliance.plain_message', 'default_message': '%s', 'args': ['Exception occurred in install precheck phase']}

This is pretty odd, because it's indicating a "resumable error" despite the fact that it cannot resume until a file lock is removed. Here are the errors I see:

  • Empty Stage Location: Unsure what this means, given the context. Odds are the upgrade script cannot find out where to stage RPMs (Red Hat Package Manager).
  • Package discrepancy error: It could be relating to the above, or it could be a failed checksum. No other logging is generated by the agent to indicate what's wrong.
  • Can't read JSON file /storage/core/software-update/stage/stageDir.json: This one's more actionable! It looks like there's no directory by this name.

Easter Egg: statsmoitor probably should be statsmonitor

Remediation

Allow the update to resume

VAMI saves the installation state as a file in /etc/applmgmt/appliance/software_update_state.conf:

{
    "state": "INSTALL_FAILED",
    "version": "7.0.1.00200",
    "latest_query_time": "2020-12-21T00:19:32Z",
    "operation_id": "/storage/core/software-update/install_operation"
}

VAMI will be stuck in a loop until you remove this file as root:

rm -rf /etc/applmgmt/appliance/software_update_state.conf

This will not necessarily resolve the issue that caused the failure, however, more work still needs to be done.

Install via ISO

We're going to try a fallback method, attaching the upgrade ISO. The following snippet is from the vSphere UI, modifying vCenter's VM Hardware:



From there, simply click "Check CD-ROM" and it will immediately appear.

This time, we know what directories to search, so I'm going to watch the logs:

tail -f  /var/log/vmware/applmgmt/update_microservice.log | grep -i err

Attempt via Command-line with ISO

VMware documents the following method to update via the command line https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vcenter.upgrade.doc/GUID-8466F019-C57C-4344-9E15-8CFF74A6E4C2.html

Stage Packages

We're going to try and clear the (empty) workspace and try fresh, auto-accepting EULAs:

Command> software-packages unstage
Command> software-packages stage --iso --acceptEulas
 [2020-12-20T17:49:54.355] : ISO mounted successfully
 [2020-12-20T17:49:54.355] : UpdateInfo: Using product version 7.0.1.00100 and build 17004997
 [2020-12-20T17:49:55.355] : Target VCSA version = 7.0.1.00200
 [2020-12-20 17:49:55,169] : Running requirements script.....
 [2020-12-20T17:50:12.355] : Evaluating packages to stage...
 [2020-12-20T17:50:12.355] : Verifying staging area
 [2020-12-20T17:50:12.355] : ISO unmounted successfully
 [2020-12-20T17:50:12.355] : Staging process completed successfully
 [2020-12-20T17:50:12.355] : Answers for following questions have to be provided to install phase:
        Question:
                ID: vmdir.password
                Text: Single Sign-On administrator password
                Description: For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
                Allowed values:
                Default value:

 [2020-12-20T17:50:12.355] : Execute software-packages validate to validate your input

Let's take a look at the update:

Command> software-packages list --staged
[2020-12-20T17:52:00.355] :
    category: Bugfix
    kb: https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-vcenter-server-70u1c-release-notes.html
    leaf_services: ['vmware-pod', 'vsphere-ui', 'wcp']
    vendor: VMware, Inc.
    name: VC-7.0U1c
    tags: []
    version_supported: []
    size in MB: 5107
    releasedate: December 17, 2020
    executeurl: https://my.vmware.com/group/vmware/get-download?downloadGroup=VC70U1C
    version: 7.0.1.00200
    updateversion: True
    allowedSourceVersions: [7.0.0.0,]
    buildnumber: 17327517
    rebootrequired: False
    productname: VMware vCenter Server
    type: Update
    summary: {'id': 'patch.summary', 'translatable': 'In-place upgrade for vCenter appliances.', 'localized': 'In-place upgrade for vCenter appliances.'}
    severity: Critical
    TPP_ISO: False
    thirdPartyInstallation: False
    timeToInstall: 0
    requiredDiskSpace: {'/storage/core': 6.286324043273925, '/storage/seat': 228.3861328125}
    eulaAcceptTime: 2020-12-20 17:50:12 AKST

Let's run it!

Command> software-packages install --staged
 [2020-12-20T17:53:52.355] : For the first instance of the identity domain, this is the password given to the Administrator account.  Otherwise, this is the password of the Administrator account of the replication partner.
Enter Single Sign-On administrator password:

 [2020-12-20T17:54:02.355] : Validating software update payload
 [2020-12-20T17:54:02.355] : UpdateInfo: Using product version 7.0.1.00100 and build 17004997
 [2020-12-20 17:54:02,095] : Running validate script.....
 [2020-12-20T17:54:09.355] : Validation successful
 [2020-12-20 17:54:09,125] : Copying software packages  [2020-12-20T17:54:09.355] : ISO mounted successfully
166/166
 [2020-12-20T17:57:31.355] : ISO unmounted successfully
 [2020-12-20 17:57:31,238] : Running system-prepare script.....
 [2020-12-20 17:57:40,289] : Running test transaction ....
 [2020-12-20 17:57:54,344] : Running prepatch script.....
 [2020-12-20 18:01:22,731] : Upgrading software packages ....
 [2020-12-20T18:07:39.355] : Setting appliance version to 7.0.1.00200 build 17327517
 [2020-12-20 18:07:39,538] : Running patch script.
....
 [2020-12-20 18:28:42,743] : Starting all services ....
 [2020-12-20T18:28:46.355] : Services started.
 [2020-12-20T18:28:46.355] : Installation process completed successfully
 [2020-12-20T18:28:46.355] : The following warnings have been found:
['\tWarning: \n\t\tsummary: Failed to start all services, will retry operation.\n']
Command> shutdown reboot -r "patch reboot"

Looks like the manual install worked for me - 7.0 U1c

TL;DR

rm -rf /etc/applmgmt/application/software_update_state
grep -i error /var/log/vmware/applmgmt/update_microservice.log
exit
software-packages unstage
software-packages stage --iso --acceptEulas
software-packages list --staged
software-packages install --staged
shutdown reboot -r "patch reboot"

2 comments:

  1. For 7.0.1.00300 patch this is the only method that worked for me! Thanks much!

    ReplyDelete
  2. Thanks for the steps! I had the exact same problems going from 7.0 U1d build 17491101 to 7.0 U2 build 17694817. I used your steps and that fixed my issue.

    ReplyDelete

PAN-OS IPv6 Error: bgp peer local address 0:0:0:0:0:0:0:0 does not belong to interface

  When encountering this error, please ensure that "Enable IPv6" is set under interfaces: Hope this helps! Happy IPv6ing!