vRealize Orchestrator – Resolving the ${message} blue screen issue

Here’s an issue that frustrated me for a while until I was able to finally resolve it. If you’re here reading this too, I feel your pain… Hopefully this helps you out as well!

I’m deploying vRealize Orchestrator (vRO) 7.3 in our lab for testing as I continue to build our cloud environment. To help detail the issue we have been having, I’ll provide a quick overview of our environment.

For our cloud, we have three separate environments:

  • Core
    • Management nodes (NSX mgr, AD, DNS, SQL, PSCs for vCenter, and vCenter)
  • Automation
    • vRealize suite (vRO, vRA, IaaS, SQL, PSC for Auto environment)
  • Networking
    • NSX load balancer, ESGs, DLRs

During the initial vRO configuration, you configure it as standalone and then choose your authentication method. We are using vSphere authentication which will authenticate via the PSC (Platform Services Controller) in the Auto environment. We have a single SSO domain with relationships setup between the Core PSC and the Auto PSC.

Now that I’ve set the premise, let’s talk about the issue at hand. During the vRO standalone config, if you are using a load balancer you have to change the hostname to the your LB VIP for vRO. Then on the next screen you configure your authentication source. We’re using vSphere authentication and set it to our Automation PSC. Once complete, you’re taken right into control center using the root account. If you logout at any point, you may encounter the following issue when trying to browse back to control center (https://vro1.domain.local:8283/vco-controlcenter)

vro-issue-sso.jpg

Here’s what I realized after seeing this issue and attempting various failed fixes… we had missed a step during our NSX load balancer configuration. Since the hostname was set to the vRO VIP and the authentication source now set to our PSC, SSO was looking to authenticate via our VIP rather than the local node. This lead us back to NSX where we had to configure another virtual server for port 8283 and a pool for our two vRO nodes as well.

Here’s what we ended up configuring on the NSX end:

NSX Virtual Server on the Load Balancer

vro-nsxlb-virtualserver.jpg

NSX Pool on the Load Balancer

vro-nsxlb-pool.jpg

Once that was in place, I was able to get to the vRO control center using the VIP address. I also was able to join the second node to the cluster and verify all was good on that end after applying our needed SSL certificate!

vRO-cluster-configured.jpg

Celerra NAS pool maxed out – manually deleting a filesystem

I recently ran into an issue that I will share with you since I was unable to find a solution online and resolved the issue myself. 

Issue: NAS pool maxed out and replications halted

When trying to issue a nas_fs -delete for a certain filesystem on a destination system, I received the following error: “file system has backups in use.” The reason you’re getting this error is either because the file system has a checkpoint schedule created or has replication checkpoints in use. In my case, it was the replication checkpoints preventing it from being deleted. Issue the following command to see the checkpoints associated with the filesystem:

fs_ckpt id=XX -list -all (where XX is the file system ID). Once you’ve identified the checkpoints that need to be deleted, issue the following command to delete them:

nas_fs -delete id=XX -o umount=yes -ALLOW_REP_INT_CKPT_OP (where XX is the checkpoint ID). Now, you should be able to go back and delete the file system with the “nas_fs -delete” command. If you go back to the source system and try to delete the replication, you will be returned an error that the destination side of the replication could not be found.

[nasadmin@NS480 ~]$ nas_task -i 648886
Task Id = 648886
Celerra Network Server = NS480
Task State = Failed
Movers =
Description = Delete Replication VNX5700_FS2 [ id=295_APM00110000_520_APM00130000].
Originator = nasadmin@cli.localhost
Start Time = Wed Jun 11 13:26:17 EDT 2014
End Time = Wed Jun 11 13:26:19 EDT 2014
Schedule = n/a
Response Statuses = Error 13160415862: The destination side of the replication session could not be found.

When deleting the replication session, use the “-mode source” flag and the replication session should now be deleted.