Time to check the log…
Any time you open a ticket with VMware (or any vendor) the first thing they generally want you to do is pull the logs and send them over. They then use their great powers (of grep) to try to find the warning signs, or results of a known issue (or new one!). This whole process can take quite some time, and frustratingly some issues roll out of logs quickly, are buried in 10^14 of noise, or can only be found with an environment that is down and has not been rebooted. I recently had a conference call with a vendor where they instructed a customer that we would have to wait for one (or more!) complete crashes to their storage array before they would be able to get the logs to possibly find a solution.
This is where LogInsight comes to the rescue. With real time indexing, graphs that do not require you learn ruby to make, and machine learning to auto group similar messages you can find out why your data center has crashed in 15 minutes instead of 15 days.
Recently while deploying a POC I had a customer who complained of intermittent performance issues on a VDI cluster they couldn’t quite pin down. Internal teams were arguing (Storage blamed network, network blamed AD, Windows/AD blamed the VMware admin). A quick search for “error*,crit*,warn*” across all infastruture on the farm (Firewall/Switch/Fabric/DiskArray/Blades/<infinate number of locations View hides logs> returned thousands of unrelated errors for internal certificates not being signed and other non-interesting events. LogInsight’s auto grouping allowed for quick filtering of the noise to uncover the smoking gun. A Fibre Channel connection inside of a blade chassis was flapping (from a poorly seated HBA). IT was not bad enough to trigger port warnings on the switches, or an all paths down error, but it was enough to impact user experience randomly. This issue was a ghost that had been plaguing them for two weeks at this point. LogInsight found it in under 15 minutes of searching. It was great to have clear evidence so we could end internal arguing as well as hold the vendor accountable so they couldn’t deflect blame to VMware or another product.
I’d encourage everyone to download a free trial and post back in the comments what obscure errors or ghosts in the machine you end up finding.