Introduction

Network ATC is an intent-based approach to network configuration that was introduced with Azure Stack HCI OS (21H2) and is the preferred network configuration method to use when deploying your Azure Stack HCI cluster.

Network ATC assists with the following:

Whether you are wanting to deploy a Non-Converged, Fully Converged, or Switchless network topology for your Azure Stack HCI cluster, Network ATC can be leveraged.

The following sections will help guide you through an Azure Stack HCI network deployment leveraging Network ATC while covering some of the parameters and configuration overrides you may need depending on your environment.

Prerequisites

Prior to working with Network ATC, the following pre-requisites must be met:

  • All physical nodes to be used in the cluster must be Azure Stack HCI certified
  • Azure Stack HCI OS 21H2 or 22H2 must be deployed on the nodes you wish to create a cluster with
  • The latest network adapter drivers and firmware supplied by your OEM vendor are installed
  • The Network ATC Windows feature (and other dependent features) must be installed on your nodes
    • Install-WindowsFeature -ComputerName "MyServer" -Name "BitLocker", "Data-Center-Bridging", "Failover-Clustering", "FS-FileServer", "FS-Data-Deduplication", "Hyper-V", "Hyper-V-PowerShell", "RSAT-AD-Powershell", "RSAT-Clustering-PowerShell", "NetworkATC", "NetworkHUD", "Storage-Replica" -IncludeAllSubFeature -IncludeManagementTools
  • Each network adapter must use the same name across all nodes in the cluster
  • Nodes must be cabled according to your desired network topology (I.E. Non-Converged)
Non-Converged network topology leveraging two NIC ports

Verify your intended network adapters are in the Up state:
Get-NetAdapter | Sort-Object Name

Verifying that NIC1, NIC2, SLOT 3 Port 1, and SLOT 3 Port 2 are Up across all nodes we wish to create a cluster for

Configuration (Storage)

The below sections will discuss configuring a storage intent for a Non-Converged network topology:

By default, storage IP addressing will be automatic unless an override is passed during the intent creation to tell it otherwise. The default IP addressing scheme will be as follows:

AdapterIP Address SubnetVLAN
SLOT 3 Port 110.71.1.X711
SLOT 3 Port 210.71.2.X712
Automatic storage IP addressing

To override the automatic storage IP addressing, run the following prior to creating the storage intent:

$storageOverrides = New-NetIntentStorageOverrides
$storageOverrides.EnableAutomaticIPGeneration = $false

You will need to manually configure the storage network adapter IP addresses if you override the automatic IP generation setting. Here is the output of the $storageOverrides variable for reference:

InstanceId                           ObjectVersion EnableAutomaticIPGeneration
----------                           ------------- ---------------------------
01aed5b5-4c8a-4f6c-a839-bdecce03c208 1.1.0.0                             False

In my environment, the adapters being leveraged for storage (SLOT 3 Port 1, SLOT 3 Port 2) are actually tagged for VLAN 204 (SLOT 3 Port 1) and VLAN 205 (SLOT 3 Port 2) on their respective Top-of-Rack (ToR) switches. To create the intent for my scenario, I will need to specify an additional parameter (-StorageVlans) to accommodate for the tagged VLANs. In addition to that, I will be leveraging the -StorageOverrides parameter as I do not want to leverage automatic storage IP addressing.

The storage adapters on each node have been assigned IP addresses as follows (the last octet of the storage interface IP address will be node number X):

AdapterIP Address SubnetVLAN
(Node X) SLOT 3 Port 1172.16.1.X204
(Node X) SLOT 3 Port 2172.16.2.X205
Storage adapter specifics for my environment

Add-NetIntent -Name "Storage" -Storage -AdapterName "SLOT 3 Port 1","SLOT 3 Port 2" -StorageVlans 204,205 -StorageOverrides $storageOverrides

Additional overrides available:

Global cluster overrides
Applied via the -GlobalClusterOverrides parameter for the Set-NetIntent cmdlet
$globalClusterOverrides = New-NetIntentGlobalClusterOverrides

Set-NetIntent -GlobalClusterOverrides $globalClusterOverrides

EnableNetworkNaming                               :
EnableLiveMigrationNetworkSelection               :
EnableVirtualMachineMigrationPerformanceSelection :
VirtualMachineMigrationPerformanceOption          :
MaximumVirtualMachineMigrations                   :
MaximumSMBMigrationBandwidthInGbps                :
InstanceId                                        : 3bae9ce0-ce51-4d77-a3f9-2eb34603d07b
ObjectVersion                                     : 1.1.0.0

QoS policy overrides
Applied via the -QoSPolicyOverrides parameter for the Add-NetIntent cmdlet
$qosOverrides = New-NetIntentQosPolicyOverrides

PriorityValue8021Action_SMB     :
PriorityValue8021Action_Cluster :
BandwidthPercentage_SMB         :
BandwidthPercentage_Cluster     :
NetDirectPortMatchCondition     :
InstanceId                      : 4e11c7df-29ec-4d38-86f8-d1f465b5fc85
ObjectVersion                   : 1.1.0.0

Switch (VMswitch) configuration overrides
Applied via the -SwitchProperty parameter for the Add-NetIntent cmdlet
$switchConfigurationOverrides = New-NetIntentSwitchConfigurationOverrides

EnableSoftwareRsc                   :
DefaultQueueVrssMaxQueuePairs       :
DefaultQueueVrssMinQueuePairs       :
DefaultQueueVrssQueueSchedulingMode :
EnableIov                           :
EnableEmbeddedTeaming               :
LoadBalancingAlgorithm              :
InstanceId                          : 07b8f136-75ee-486b-9dd5-756174c9dd2e
ObjectVersion                       : 1.1.0.0

Site overrides
Applied via the -SiteOverrides parameter for the Add-NetIntent cmdlet
$siteOverrides = New-NetIntentSiteOverrides

Name          :
StorageVLAN   :
StretchVLAN   :
InstanceId    : 3d68a991-948d-48ba-81fa-90c6c62d24e9
ObjectVersion : 1.1.0.0

Adapter property overrides
Applied via the -AdapterPropertyOverrides parameter for the Add-NetIntent cmdlet
$adapterPropertyOverrides = New-NetIntentAdapterPropertyOverrides

EncapOverhead                      :
EncapsulatedPacketTaskOffload      :
EncapsulatedPacketTaskOffloadNvgre :
EncapsulatedPacketTaskOffloadVxlan :
FlowControl                        :
InterruptModeration                :
IPChecksumOffloadIPv4              :
JumboPacket                        :
LsoV2IPv4                          :
LsoV2IPv6                          :
NetworkDirect                      :
NetworkDirectTechnology            :
NumaNodeId                         :
PacketDirect                       :
PriorityVLANTag                    :
PtpHardwareTimestamp               :
QOS                                :
QosOffload                         :
ReceiveBuffers                     :
RscIPv4                            :
RscIPv6                            :
RssOnHostVPorts                    :
Sriov                              :
TCPUDPChecksumOffloadIPv4          :
TCPUDPChecksumOffloadIPv6          :
UDPChecksumOffloadIPv4             :
UDPChecksumOffloadIPv6             :
TransmitBuffers                    :
UsoIPv4                            :
UsoIPv6                            :
VMQ                                :
VxlanUDPPortNumber                 :
VlanID                             :
NetAdapterName                     :
InstanceId                         : 957bcb28-3051-4282-984e-a646242dae20
ObjectVersion                      : 1.1.0.0

Adapter RSS overrides
Applied via the -AdapterRssPropertyOverrides parameter for the Add-NetIntent cmdlet
$adapterRssOverrides = New-NetIntentAdapterRssOverrides

RssEnabled          :
BaseProcessorGroup  :
BaseProcessorNumber :
MaxProcessorGroup   :
MaxProcessorNumber  :
MaxProcessors       :
Profile             :
InstanceId          : 9520eac9-d75a-4a83-aecf-748661f3d980
ObjectVersion       : 1.1.0.0

Configuration (Management/Compute)

The below sections will discuss configuring a management and compute intent for a Non-Converged network topology:

In my environment, the management adapters are configured for an access VLAN (or native VLAN) for the switch interface they are connected to. As a result, I do not need to leverage the -ManagementVlan parameter for the Add-Intent cmdlet. In addition to that, the NIC1 adapter on each host has a management IP address assigned via a DHCP reservation. This is to provide us with initial management connectivity during the deployment of Azure Stack HCI. Alternatively, I could have assigned a static IP address to each NIC1 adapter instead of leveraging DHCP. Once the VMswitch is created via Network ATC, a management VMnetworkadapter is created with the IP address inherited from NIC1.

Add-NetIntent -Name "Management_Compute" -Management -Compute -AdapterName "NIC1","NIC2"

Get-VMSwitch

Name                                SwitchType NetAdapterInterfaceDescription
----                                ---------- ------------------------------
ConvergedSwitch(management_compute) External   Teamed-Interface

Get-VMNetworkAdapter -ManagementOS

Name                            IsManagementOs VMName SwitchName
----                            -------------- ------ ----------
vManagement(management_compute) True                  ConvergedSwitch(management_compute)

Validation

To monitor the progress of the intent provisioning, we can use the Get-NetIntentStatus cmdlet. This will return every intent across every node for the cluster specified. The -Name parameter can be used to return the status of the specified intent.

Get-NetIntentStatus -Name "management_compute"

IntentName            : management_compute
Host                  : a0640r33c02n01
IsComputeIntentSet    : True
IsManagementIntentSet : True
IsStorageIntentSet    : False
IsStretchIntentSet    : False
LastUpdated           : 12/15/2022 06:36:46
LastSuccess           : 12/15/2022 06:36:46
RetryCount            : 0
LastConfigApplied     : 1
Error                 :
Progress              : 1 of 1
ConfigurationStatus   : Success
ProvisioningStatus    : Completed

Get-NetIntentStatus -Name "storage"

IntentName            : storage
Host                  : a0640r33c02n01
IsComputeIntentSet    : False
IsManagementIntentSet : False
IsStorageIntentSet    : True
IsStretchIntentSet    : False
LastUpdated           : 12/15/2022 06:36:52
LastSuccess           : 12/15/2022 06:36:52
RetryCount            : 0
LastConfigApplied     : 1
Error                 :
Progress              : 1 of 1
ConfigurationStatus   : Success
ProvisioningStatus    : Completed

The Get-NetIntent cmdlet can be used to return a list of the configured network intents:

AdapterAdvancedParametersOverride : FabricManager.NetAdapterAdvancedConfiguration
RssConfigOverride                 : FabricManager.RssConfiguration
QosPolicyOverride                 : FabricManager.QoSPolicy
SwitchConfigOverride              : FabricManager.SwitchConfigurationOverride
IPOverride                        : FabricManager.NetAdapterStorageOverride
SiteOverrides                     :
NetAdapterNameCsv                 : NIC1#NIC2
StorageVLANs                      :
ManagementVLAN                    :
NetAdapterNamesAsList             : {NIC1, NIC2}
NetAdapterCommonProperties        : FabricManager.NetAdapterSymmetry
ResourceContentVersion            : 1
IntentName                        : management_compute
Scope                             : Cluster
IntentType                        : 10
IsComputeIntentSet                : True
IsStorageIntentSet                : False
IsOnlyStorage                     : False
IsManagementIntentSet             : True
IsStretchIntentSet                : False
IsOnlyStretch                     : False
IsNetworkIntentType               : True
InstanceId                        : 0a6d3824-9e93-4fbf-87c2-782260f6114f
ObjectVersion                     : 1.1.0.0

AdapterAdvancedParametersOverride : FabricManager.NetAdapterAdvancedConfiguration
RssConfigOverride                 : FabricManager.RssConfiguration
QosPolicyOverride                 : FabricManager.QoSPolicy
SwitchConfigOverride              : FabricManager.SwitchConfigurationOverride
IPOverride                        : FabricManager.NetAdapterStorageOverride
SiteOverrides                     :
NetAdapterNameCsv                 : SLOT 3 Port 1#SLOT 3 Port 2
StorageVLANs                      : {204, 205}
ManagementVLAN                    :
NetAdapterNamesAsList             : {SLOT 3 Port 1, SLOT 3 Port 2}
NetAdapterCommonProperties        : FabricManager.NetAdapterSymmetry
ResourceContentVersion            : 1
IntentName                        : storage
Scope                             : Cluster
IntentType                        : Storage
IsComputeIntentSet                : False
IsStorageIntentSet                : True
IsOnlyStorage                     : True
IsManagementIntentSet             : False
IsStretchIntentSet                : False
IsOnlyStretch                     : False
IsNetworkIntentType               : True
InstanceId                        : fc3702a0-5381-40c5-8a87-b551f582848e
ObjectVersion                     : 1.1.0.0

Lessons Learned

Setting SMB bandwidth limit
In order to use the Get-SMBBandwidthLimit and Set-SMBBandwidthLimit cmdlets, the SMB Bandwidth Limit feature must be installed on each of the nodes.

Install the feature on each node by running the following from any node in the cluster:
$nodes = (Get-ClusterNode).Name
foreach ($node in $nodes) {Invoke-Command -Computer $node -ScriptBlock {Install-WindowsFeature FS-SMBBW -Confirm}}


SMB Bandwidth Limit for VM live migration will be set automatically. The current value can be verified by running the following:
$nodes = (Get-ClusterNode).Name
foreach ($node in $nodes) {Invoke-Command -Computer $node -ScriptBlock {Get-SMBBandwidthLimit -Category LiveMigration}}

Category      Bytes Per Second PSComputerName
--------      ---------------- --------------
LiveMigration 2187500000       A0640R33C02N01
LiveMigration 2187500000       A0640R33C02N02
LiveMigration 2187500000       A0640R33C02N03
LiveMigration 2187500000       A0640R33C02N04

For more information on how traffic bandwidth is allocated, please see the below:
Traffic Bandwidth Allocation – Azure Stack HCI | Microsoft Learn

In order to set a custom value for VM live migration SMB bandwidth limit, we must take what we learned above in the earlier section in regard to overrides and set an override for the Net Intent.

$globalClusterOverrides = New-NetIntentGlobalClusterOverrides
$globalClusterOverrides.MaximumSMBMigrationBandwidthInGbps = 6
$globalClusterOverrides.MaximumVirtualMachineMigrations = 2
Set-NetIntent -GlobalClusterOverrides $globalClusterOverrides

Previously, to accomplish this you would use the below commands on a per node basis:

Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 750MB
Set-VMHost -MaximumVirtualMachineMigrations 2


In my testing, if you change these values manually (I.E. without setting it via the Global Cluster Overrides for Network ATC) the values may revert to Microsoft defined best practice defaults as part of drift remediation.

SMB Bandwidth Limit – Issues Encountered
According to Microsoft’s bandwidth allocation table, the following should apply to my environment:

The following Network ATC event log messages are seen which is indicating the SMB bandwidth available is insufficient:

The selected live migration transport is not recommended when the available bandwidth for live migration is (6.25 Gbps). The recommended transport is (Compression).

Live Migration failures and RDMA inconsistencies
Through some of my early testing of NetworkATC with Azure Stack HCI OS (22H2), I am seeing some odd behavior of RDMA traffic and live migration failures.

Let’s take a look at the current lab configuration where this issue was seen. This environment has the below intents created via NetworkATC:

  • Management
    • 2 x 10Gb Intel X710 NICs (OCP)
  • Compute_Storage
    • 2 x 25Gb QLogic NICs that support RDMA (PCIe)

The following cluster networks which are all in the Up state:

The following networks are selected for Live Migration:

When attempting to perform a live migration, I am faced with an issue:

Live migration of 'Virtual Machine vm-base-A0640R33C02N01-001' failed.

Failed to get the network address for the destination node 'A0640R33C02N02': A cluster network is not available for this operation.  (0x000013AB).

Now, if I unselect one of those storage networks to use for live migration, it begins working again:

Finally, if I add that storage network back for use with live migration, live migration remains working. Just an observation to be aware of. The behavior is similar to what I experienced in this post here:
Live Migration Failing (NetworkATC) | Blue Flashy Lights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: