Comparing WVD Auto-scaling solutions

Windows Virtual Desktop

Introduction

In this post, I’ll discuss a few solutions for auto-scaling Windows Virtual Desktop session hosts and how they compare. I won’t be looking at personal desktops.

We’ll explore five solutions:

  • Windows Virtual Desktop Scaling Tool (Microsoft)
  • Ciraltos Scale Host Pool (Ciraltos)
  • Jason Parker’s Scale Optimizer
  • Nerdio Manager for WVD (Nerdio)
  • Project MySmartScale (Sepago)

Before we dive in, let’s keep in mind the load balancing options:

Depth-first fills up the first machine before allowing sessions on the next machine. Because it requires fewer machines at first, this option can be cost effective and doesn’t waste resources (CPU time). However, it could reduce the user experience since we try to cram users on to fewer hosts. As users log off, this option creates new sessions on the host with the most users, optimizing for user density.

Breadth-first distributes sessions evenly across all available hosts. The user experience is usually better because it’s more likely that there are fewer users on each host and each user has access to the hosts resources. But because more machines are turned on, even if utilization is low, it can be more costly and harder to scale down.

WVD Scaling Tool (Microsoft)

The WVD Scaling Tool is a fully supported auto-scale solution maintained by Microsoft. During peak times, it evaluates if additional hosts are required by comparing SessionThresholdPerCPU to the current number of sessions and the current capacity. During off-peak times, it evaluates if hosts can be shut down based on a MinimumNumberOfRDSH.

  • Technology: The solution uses an Azure Automation account with webhook, PowerShell runbook and Azure Logic App. The Automation account has a RunAs Managed Identity with limited ability to start/stop machines.
  • Modifies drain mode: Yes, during off-peak (optional).
  • Can create VMs: No
  • Evaluation Frequency: Recommended to be 15 minutes.

You can use an Azure Tag defined in MaintenanceTagName to distinguish hosts managed by the tool versus those you manage manually. The tool will also use MaxSessionLimit to calculate if 90% of maximum capacity has been reached and if so, start an additional host. This happens during peak or off-peak times.

The main limitations with the tool are that it only supports pooled multi-session VMs, scaling out is only done during peak times and scaling in is only done during off-peak times.

Learn more at Microsoft Docs and look at the code on GitHub in this repository.

Ciraltos

The @Ciraltos solution from Travis Roberts calculates scaling by comparing running session host count to a target session host count. This target is calculated by counting the active sessions and adding a buffer/spare capacity (threshold) and then dividing by the maximum sessions configured. Because of this threshold, the solution can start the process of turning on a host (or hosts) to minimize the situation where there’s no capacity for the next set of users.

  • Technology: Azure Function app with Managed Identity with limited ability to start/stop machines.
  • Modifies drain mode: No
  • Can create VMs: No
  • Evaluation Frequency: Can go as low as 5 minutes. For anything lower, it’s possible to get false starts since it can take a few minutes for a VM to boot up.

The solution has an option that can change the load balancing option from depth-first to breadth-first during peak time. This is useful during peak times because it will start all available session hosts and start to distribute sessions across them. Once peak time ends, the solution reverts the change and shutdown session hosts as users log off. If hosts need to be started but none are available, the solution will provide an error message. It would be useful to monitor for this.

Note: The solution requires a maximum session limit to function. The maximum session limit is how it knows when to start distributing sessions to another host (when the max is reached).

Compared to the Microsoft tool, this solution uses depth-first to consolidate sessions and evaluates scaling based on session (a.k.a user) count instead of sessions per CPU core. This solution does not scale in breadth-first, unlike the Microsoft solution which should help with user experience but limits potential cost savings.

The main limitations for this solution is that only supports pooled multi-session VMs, it does not manage drain mode, can not force log off users during off-peak times.

To learn more about this solution, check out the first video and the update video. Get the code on Github in this repository.

Jason Parker’s Scale Optimizer

A fellow Microsoft employee, Jason Parker created an auto-scale solution that is tied to a specific deployment model, particular useful in very large deployments. It works by comparing the running session host count to a target session host count. These values are calculated based on the number of user sessions and you can define the minimum and maximum thresholds. The solution determines the optimal state (for instance, 6 additional hosts are needed) and then automates towards that state instead of a one-by-one approach.

  • Technology: Azure Automation account and PowerShell runbooks.
  • Peak time supported: Yes (excludes weekends)
  • Modifies drain mode: Yes
  • Can create VMs: No
  • Evaluation Frequency: Recommended to be 15 minutes since it’s using an Azure Automation account

The Scale Optimizer uses two PowerShell runbooks: one with the logic to check if a Host Pool is compliant with it capacity settings and another which performs the actual scaling. The solution can take hosts out of drain mode to make them available and/or start additional hosts if needed. For scaling in, it can set hosts to drain mode (if there are active sessions) and then stop them once there are no more sessions.

The Scale Optimizer solution logs everything to a Log Analytics workspace and sets a maintenance tag automatically if an operation did not complete successfully.

The main limitations for this solution is that only supports pooled multi-session VMs, can not force log off users during off-peak times.

To learn more about this solution, check out the code and README on GitHub at this repository.

Nerdio Manager for WVD

Nerdio Manager for WVD is a fully supported solution offered through Azure Marketplace (not free) that uses several different auto-scale algorithms, including the ability to scale single session desktops. While Nerdio makes other products, I will be using the word “Nerdio” to refer to the Nerdio Manager for WVD in this blog post.

For multi-session hosts, Nerdio has three triggers it can use for scaling: CPU usage, average active sessions, or available sessions. The CPU usage looks at all hosts in the host pool to determine if scaling is necessary based on a minimum and maximum threshold set by the administrator. If the threshold is met, Nerdio scales out (or scales in) hosts in increments specified by the user. Further, you can set a time period (such as 5pm to 8am) to permit scaling in, effectively blocking scaling in during the workday. The average active sessions trigger works similarly and has the same features but looks at the average active sessions specified by the administrator. This is especially important for scenarios using breadth-first where user experience is more important than reducing cost. Finally, the last trigger is the available sessions trigger. With this, the same features apply but it looks at the max sessions per host. Once this max is reached in the host pool, Nerdio will start scaling out by an amount specified by the administrator. This is similar to the Ciraltos solution above.

  • Technology: Azure Marketplace solution in your subscription
  • Peak time supported: Yes
  • Modifies drain mode: Yes
  • Can create VMs: Yes
  • Evaluation Frequency: Can be set as low as 5 minutes for triggers that use metrics.

Like the other solutions presented in this post, Nerdio runs in your Azure environment using native Azure services such as App Service, Azure Automation, Key Vault, SQL Database and others. This means that your WVD environment is not dependent on Nerdio, it can be completely removed with no impact to your environment. However, Nerdio brings some value added features besides a nice web-based GUI to manage auto-scale settings. For instance, it introduces the concept of “Dynamic” host pools. Dynamic Host Pools were created to address some auto-scaling challenges.

Dynamic Host Pools are host pools that can be scaled in or out not just by turning off and turning on existing hosts but it can actually create and delete hosts. This enables you to have burst capacity, beyond the initial machines created. Once scaling determines there are no more machines to turn on, it can start creating new ones automatically using your naming convention. Once those hosts are no longer needed, they can be deleted.

In addition, Nerdio can pre-stage hosts on the specified days and time (for instance, all work days at 8am) to ensure there is available capacity. The administrator will set the number of active hosts needed and Nerdio uses scale out features to achieve that number. And finally, Nerdio can auto-heal hosts if there is a problem by restarting or recreating them.

While there is a license cost for Nerdio Manager for WVD, it would be worthwhile to explore the cost savings from the advanced auto-scale and other features (like Ephemeral Disks) to check if the license costs can be offset.

The main limitations of this solution is that there is no auto-scale options for static host pools (therefore, you must use Dynamic Host Pools) and it cannot be customized beyond what the vendor permits.

To learn more about this solution, visit the vendor’s website at https://www.getnerdio.com where you can access a free trial.

Project MySmartScale

Project MySmartScale was developed by Microsoft MVP Marcel Meurer and has a free community version and a fully supported version from Sepago. One unique feature of this solution is its ability to predict the number of session hosts needed before users logon, preventing resource constraints during logon storms. It learns about user behaviors and logs off sessions at the right time to smartly deallocate and start session hosts.

  • Technology: Azure App Service, Log Analytics and Azure SQL Database using Managed Identity which has limited privileges to start/stop machines (VM Contributor) and Read Groups/Users in Azure AD.
  • Modifies drain mode: No
  • Can create VMs: No
  • Evaluation Frequency: Can go as low as 5 minutes. For anything lower, it’s possible to get false starts since it can take a few minutes for a VM to boot up.

The solution uses a custom agent that is optional (for WVD Spring release) but provides additional features to logoff idle users and get data on connected/disconnected sessions. Unlike configuring logoff via Group Policy, with Project MySmartScale, you can define idle times in 5 minute intervals – useful to prevent logoff if you’re only idle for a short time (driving home from work or taking your lunch break). Further, you can set conditions based on days of the week and even time of day (define working hours).

Another great feature of this solution is the web-based user interface (UI). The UI (admin portal) is hosted in your own subscription and you can perform your configuration through it. The admin portal dashboard shows high-level information such as session count, active session hosts, and average sessions per host. It’s easy to update the admin portal and you can manage additional host pools by adding them to the UI. The solution logs everything to a Log Analytics workspace.

The main limitations of this solution is that you need to deploy a custom agent for some features and does not manage drain mode.

To learn more about this solution, check out the code and README at this repository on GitHub (Community version limited to 5 session hosts).

Closing

There are several solutions to choose from and I’m sure I didn’t capture all of them so if you know of others, please let me know. Also, if I have made mistakes on assessing any of the solutions, I’d be happy to correct it. Most information is from the provided documentation and limited testing.

Tip: Remember to apply a policy to force log off idle or disconnected sessions (Policies > Computer Configuration > Administrative Templates > Windows Components > Remote Desktop Services > Remote Desktop Session Host > Session Time Limits).

Here’s a table that attempts to summarize the options presented here:

FeaturesWVD Scaling ToolCiralto Scale Host PoolScale OptimizerNerdioMySmartScale
TechnologyAzure Automation/Logic AppAzure Functions AppAzure AutomationAzure Automation/App ServiceApp Service/Azure SQL
Support modelFull, MicrosoftCommunityCommunityFull, NerdioFull, Sepago
CostFreeFreeFree$4 pupmContact
Drain modeYes (off-peak)NoYesYesNo
Scaling methodSessions per CPUSessionsSessionsAverage Sessions, Active Sessions or CPU usageSessions
Frequency15 minutes5 minutes15 minutes5 minutes5 minutes
Scaling inOff-peak onlyOff-peak onlyPeak & off-peakPeak & off-peakPeak & off-peak
Create hostsNoNoNoYesNo
Heal / repair hostsNoNoNoYesNo
LoggingLogs available in Azure Automation runbook.Logs available via Function app if enabledAll logs sent to Log AnalyticsFull audit logs via web site.All logs sent to Log Analytics
Summary comparison of auto-scaling solutions for Windows Virtual Desktop

0 comments… add one

Leave a Reply