Monday, July 5, 2021

NSX Advanced Load Balancer - NSX-T Service Engine Creation Failures: `CC_SE_CREATION_FAILURE` and `Transport Node Not Found to create service engine`

TL;DR

If you see either of these errors, check grep 'ERROR' /opt/avi/log/cc_agent_go_{{ cloud }} for the potential cause. In my case, the / character was not correctly processed by Avi's Golang client (facing vCenter).

The Problem

When trying to configure NSX ALB + NSX-T on my home lab, I am presented nothing but the following error:

CC_SE_CREATION_FAILURE

The Process

Avi Vantage appears to be treating this as a retriable error, attempting to deploy a service engine five times, which can be re-executed with a controller restart:

Oddly enough, vCenter doesn't report any OVA deploy attempts. The next thing to check here would be the content library:
So far, so good. vCenter knows where to deploy the image from.

Now here's a problem - Avi doesn't provide any documentation on how to troubleshoot this yet - so I did a bit of digging and found that you can bump yourself to root by performing a:

sudo su

Useful note: Avi Vantage is running 
bullseye/sid
 with only 821 packages listed under dpkg -l | wc -l. They did do a pretty good job with pre-release cleanup, but there are still a few oddball packages in there. I'd give it a 9/10, I'd like to see X11 not be installed but am pleased to see only Python 3!

Avi's logs are located in:

/var/lib/avi/log
/opt/avi/log

Here's what I found in alert_notifications_debug.log:

summary: "Syslog for System Events occured"
event_pages: "EVENT_PAGE_VS"
event_pages: "EVENT_PAGE_CNTLR"
event_pages: "EVENT_PAGE_ALL"
obj_name: "avi_-Avi-se-rctbp"
tenant_uuid: "admin"
related uuids ['avi_-Avi-se-rctbp']
[2021-04-09 20:06:30,923] INFO [alert_engine.processAlertInstance:225] [uuid: ""
alert_config_uuid: "alertconfig-938cf267-e20d-4d8e-a50a-21f0f5a5b633"
timestamp: 1617998694.0 obj_uuid: "avi_-Avi-se-rctbp" threshold: 0 events { report_timestamp: 1617998694 obj_type: SEVM event_id: CC_SE_CREATION_FAILURE module: CLOUD_CONNECTOR internal: EVENT_EXTERNAL context: EVENT_CONTEXT_SYSTEM obj_uuid: "avi_-Avi-se-rctbp" obj_name: "avi_-Avi-se-rctbp" event_details { cc_se_vm_details { cc_id: "cloud-022c7b90-f987-4b15-91bb-1f1405715580" se_vm_uuid: "avi_-Avi-se-rctbp" error_string: "Transport node not found to create serviceengine avi_-Avi-se-rctbp" } } event_description: "Service Engine creation failure" event_pages: "EVENT_PAGE_VS" event_pages: "EVENT_PAGE_CNTLR" event_pages: "EVENT_PAGE_ALL" tenant_name: "" tenant: "admin" } reason: "threshold_exceeded" state: ALERT_STATE_ON related_uuids: "avi_-Avi-se-rctbp" level: ALERT_LOW name: "Syslog-System-Events-avi_-Avi-se-rctbp-1617998694.0-1617998694-45824571" summary: "Syslog for System Events occured" event_pages: "EVENT_PAGE_VS" event_pages: "EVENT_PAGE_CNTLR" event_pages: "EVENT_PAGE_ALL" obj_name: "avi_-Avi-se-rctbp" tenant_uuid: "admin"
From the looks of things - Avi is talking with NSX-T before vCenter to determine appropriate placement, which makes sense.

Update and Root Cause

With the Avi 20.1.6 release, VMware has made a lot of improvements to logging! We're now seeing this error in the GUI (Ensure that "Internal Events" is checked:






Let's take a look at the new logging. Avi's controller system leverages a series of Go modules called "cloud connectors" dedicated to that specific interface. Each one has its own log file in
/opt/avi/log/cc_
2021-07-04T20:20:42.801Z        ERROR   vcenterlib/vcenter_utils.go:606 [10.66.0.202][avi-mgt-vni-10.7.80.0/24] object references is empty
2021-07-04T20:20:42.819Z        ERROR   vcenterlib/vcenter_utils.go:578 [10.66.0.202][avi-mgt-vni-10.7.80.0/24] object references is empty
2021-07-04T20:20:42.822Z        ERROR   vcenterlib/vcenter_se_lifecycle.go:432  [10.66.0.202][QH] [10.66.0.202] Network 'avi-mgt-vni-10.7.80.0/24' matching not found in Vcenter
2021-07-04T20:20:42.822Z        ERROR   vcenterlib/vcenter_se_lifecycle.go:891  [10.66.0.202] [10.66.0.202] Network 'avi-mgt-vni-10.7.80.0/24' matching not found in Vcenter
Now, this vn-segment does exist in vCenter, so I tried the "non-escaped shell character" knowledge from years of Linux/Unix administration and reformatted it to avi-mgt-vni-10.7.80.0_24. 
Since we don't get a Redeploy (please VMware!) button, I restarted the controller and all SE deployments succeeded after that.

NSX Advanced Load Balancer - NSX-T Service Engine Creation Failures: `CC_SE_CREATION_FAILURE` and `Transport Node Not Found to create service engine`

TL;DR If you see either of these errors, check  grep 'ERROR' /opt/avi/log/cc_agent_go_{{ cloud }}  for the potential cause. In my ca...