SD-WAN is a potential game-changer for wide area networking—on the same level as server virtualization, which transformed data centers over the last 10 years. SD-WAN combines the use of multiple active branch links, intelligent direction of traffic across those links, and centralized, policy-driven management of the WAN as a whole. The ability to leverage multiple lower-cost services (including Internet and 4G wireless) as well as traditional services like MPLS holds the promise of transforming IT’s relationship to the WAN and the WAN’s relationship to the business.
Transformational potential is not enough. IT has to build a compelling business case for making the transition. The base of the case must be cost. Nemertes has developed and validated an SD-WAN cost model that enables enterprise users to build that business case. The short version? SD-WAN deployments can cut millions from large WAN service bills. But connectivity is not the only avenue by which SD-WAN can drive savings; by providing cheaper and more transparent and automatic failover when WAN links fail, SD-WAN can reduce branch WAN outages and troubleshooting costs by 90%.
For IT and networking professionals the message is clear: now is the time to take a close look at your WAN architecture, with the aim of identifying locations that could benefit from higher bandwidth, lower rates, increased reliability, or all three. Model the cost of sticking with the current architecture and compare that against at least two SD-WAN solutions. If the SD-WAN numbers show significant potential savings over time, build a business case based on them, as well as other operational savings and any business value assigned by the business lines to faster branch turn-up.
IT staff should:
- Assess the amount of failover-only bandwidth they are paying for now
- Assess their demand curve for WAN and Internet bandwidth: determine how the connectivity profile for typical locations is likely to evolve in the next few years based on existing IT strategies for UC, collaboration, etc.
- Model the cost of using the current architecture for three to five more years.
- Evaluate and model costs for at least two in-net or overlay SD-WAN solutions
- If the SD-WAN numbers show significant potential savings over time, they should build a business case on them—but don’t leave out any other operational improvements they expect to realize.
- Look for quantification of the business value of agility in starting new branches or delivering new services more quickly; business units may have built a significant portion of the business case.
In the classic engineer’s formulation, “You can have it cheaper, faster, or better…pick two.” From time to time new technology comes along and, by changing the basic assumptions underlying existing solutions, manages to be cheaper and faster and better all at once.
SD-WAN promises to hit the trifecta. By changing the underlying assumptions about how IT connects a branch to the WAN (and, indeed, what constitutes a branch) it offers the chance of improving agility (i.e. being faster) and performance and reliability (i.e. being better) while also reducing costs.
Building a business case for deploying SD-WAN invokes all three benefits but rests mostly on the strength of savings, whether in the form of expected cost increases avoided, or as actual cost decreases.
Let’s start first with definitions. Software-Defined WAN, or SD-WAN, incorporates several key concepts:
- Abstracting edge connectivity: Making all the connections into a location useful as a single pool of capacity available to all services.
- WAN virtualization: Overlaying one or more logical WANs on the pool of connectivity, with behavior and topology for each overlay WAN defined to suit the needs of specific types of network services, locations, or users. (Please see Figure 1.)
- Policy-driven, centralized management: Key to an SD-WAN is the ability to define behaviors for an overlay WAN and have them implemented across the entire infrastructure without requiring device-by-device configuration.
- Flexible traffic management for performance and security: SD-WANs can optimize traffic in many ways; foremost, they can selectively route traffic across links based on criteria such as link performance.
There are two key ways to provide these services in a WAN. Nemertes calls these overlay and in-net SD-WAN.
In an overlay SD-WAN, the new SD-WAN appliances are deployed on an existing routed network, either behind the routers or replacing them as the branch connection to the WAN. SD-WAN appliances can also collapse the typical branch stack by replacing other branch WAN appliances such as optimizers and firewalls.
More than a dozen companies sell SD-WAN appliances, both physical and virtual (which allow extension of the SD-WAN into public cloud spaces such as Amazon EC2, Microsoft Azure Compute, or Google Compute Engine). Some are intended to replace routers, some to ride behind them, others can fill either role, and enterprise IT staff need to carefully evaluate each against their specific needs. For example, those with an aging router plant but mostly MPLS and Carrier Ethernet or broadband links may find router replacement very attractive. Those with a lot of older T1 or T3 connections that can’t or won’t be replaced with Ethernet may want to keep their existing routers in place, to terminate the older connectivity, while using the SD-WAN solution to supplement it with wired or 3G/4G broadband.
In the overlay scenario, SD-WAN appliances comprise a layer of enterprise infrastructure distinct from the WAN connectivity they manage, allowing IT to easily add and remove network service providers and link types. This gives the enterprise maximum flexibility on connectivity services, but incurs the burden of managing the solution itself. This is typically less trouble to manage than the old-school router plant, and can even help make router management easier where routers stay in the picture, but is still a significant operational responsibility for IT.
In contrast, in-net SD-WAN ties the SD-WAN functionality to the connectivity services. These functions may all be provided in the service provider’s edge and core infrastructure, with the branch using a traditional router to connect to the provider’s nearest point of presence. Or, some or all functions may be provided on-premises via appliances under service provider management; this pushes work out of the service provider’s infrastructure and also allows optimization of last-mile connectivity via compression.
In-net SD-WAN can be tied to Network Functions Virtualization (NFV), with the various functions provided by separate, cooperating Virtual Network Functions (VNFs) dynamically downloaded to the on-premises device (where there is one) or chained into the traffic path in the carrier infrastructure. This opens the possibility of the on-premises device being white-box/generic rather than bespoke for the service, decreasing vendor lock-in somewhat.
The trade-off for handing off the management burden for the SD-WAN is the loss of autonomy with respect to connectivity. In the in-net scenario, you can’t necessarily mix and match links from different vendors freely. The new level of WAN functionality is tied to the in-net SD-WAN provider, after all. If you have trouble getting connectivity to all your sites from a single provider, that becomes an issue. Likewise if you want to have provider diversity for your branch connectivity, as well as path and link-type diversity: that is, you want to have each branch have a link from at least two different providers, e.g. one for MPLS and a different one for Internet. The in-net SD-WAN provider has to allow for (and potentially partner with) the other providers you want to use in order for you to fold in links from those other vendors. This sharply limits enterprise choice in the matter.
First and foremost in the business case most SD-WAN users will build is cost savings, and the main source of hard-dollar cost savings in SD-WAN is the substitution of lower-cost connectivity in place of more expensive kinds.
The organization might be looking for immediate savings. In that case, the goal will be to decrease absolute spending on connectivity. This can be accomplished by replacing MPLS or other relatively expensive connectivity (at least as reckoned on a cost-per-Mbps basis) in favor of a less expensive option: replacing some MPLS links with business Internet services, or even consumer-grade broadband.
Or, the organization might be looking for savings over a longer timeframe—looking to “bend the cost curve” for their WAN as they project current growth trends into the future. In this case, they may change little or nothing in their current use of MPLS, for example, but shift all growth to other media.
Fully 78% of organizations deploying SD-WAN have no plan to completely drop MPLS from their WAN. However, most intend to reduce and restrict their use of it, if not immediately then over the next few years.
Speed has value in business. For the growing number of businesses adopting a “get closer to the customer” approach to their physical storefronts, that speed can be measured in part by how many days it takes to turn up a new branch. SD-WAN can radically alter that number. Most solutions allow free mixture of different kinds of connectivity. Consequently, a new location can be brought up with whatever form of connectivity is most readily available, be it cable or DSL or even 4G/LTE, and can be come on line in under a week, even within a day of receiving its endpoint equipment. Contrast that with the more typical 30 to 90 or more days to connect up a new branch using traditional approaches.
Another form of agility that the SD-WAN approach lends itself to is rapid deployment of new WAN-based services. Centralized, policy-based management of the WAN as a whole allows rapid reconfiguration to support the addition of new services as well as changes in the prioritization of the application portfolio over all.
The business lines responsible for new branch operations can likely put a dollar value on every additional week or even day of operations for a new location. IT should be reaching out to them for that information in constructing the business case. Likewise, they will have put a value on the benefits of delivering the new services they are pursuing, and IT should reach out to get that information for any initiatives planned for the near term.
That rapid deployment and integration of new services is in turn the cornerstone of another level of value to consider in a business case: support for strategic innovations and especially Digital Transformation (DT) efforts. Many DT initiatives revolve around new uses of real-time communications to interact with customers and prospects. Others, around insertion into the environment of new technologies that generate streams of data that flow back to the data center or out to the cloud—sensors, digital signage, location tracking devices. In either case, the WAN becomes the channel by which DT data flows to and from branches, and SD-WAN provides the ability to swiftly add new flows to the mix without hurting performance for what is already there, as well as to easily meet new bandwidth demands using more affordable connectivity.
SD-WAN solutions can also contribute to the security of an organization. Although they make it possible to more easily send traffic directly to the Internet from the branch, avoiding backhauls through the data center, most build firewall functionality around that, and all allow for careful selection of which traffic is allowed to flow direct. For example, policy can allow traffic to and from Office 365 or Salesforce to go direct, while other web-bound traffic is not.
And, on another front, creating a holistically managed WAN using provider endpoints allows the organization to easily and reliably keep the endpoints current on all security-related updates and patches. Most organizations are reluctant to apply patches and updates to all their WAN routers too frequently, since they have to invest significant staff hours in pushing out patches branch by branch, and doing so usually involves an interruption in services. Too many organizations apply patches and updates only when they have no other option, rather than whenever one is available that will tighten up security. A system intended to allow no-down-time, comprehensive updating changes this dynamic entirely, and improves the overall security posture of the organization.
SD-WAN can be a key enabler of simplified global operations. SD-WAN can make it easier for the organization to spin up new branches anywhere they need to, globally, by delivering a consistent set of services while taking advantage of whatever local connectivity options are available. And, for new and existing branches both, securely delivering greater consistency and better performance to both in-house and cloud applications can boost productivity globally.
In-net SD-WAN can enjoy a particular advantage in this scenario by using an optimized backbone to deliver “middle-mile” optimizations independent of locale. Assuming a broad enough distribution of provider points of presence, this can eliminate most of the unpredictability of multicontinental Internet performance, a huge boon when the data centers (whether the enterprise’s or the enterprise’s cloud providers) are a world away from the branch.
The Nemertes model incorporates three key cost components of the WAN and of SD-WAN solutions: connectivity, capital, and operations. It is built to support multiple decision points in regards to each.
In assessing costs for any WAN architecture, circuit and service costs represent the lion’s share. And, as noted, the largest piece of cost savings from SD-WAN comes from changes in circuit and service costs. Whether overlay or in-net, a fundamental concept behind SD-WAN is to use any available network routes that deliver an application’s required quality of service; where big cheap Internet links are available, a lot of traffic will shift onto them off more expensive MPLS links, which can shrink or go away. This provides IT with a range of options for adding bandwidth, and lets network professionals take advantage of the full range of options to deliver their particular mix of services, site types, and use cases.
Depending on the organization and its applications, that may mean:
- Routing unified communications and other real-time traffic over MPLS while shifting other application traffic, file transfers, and other latency-insensitive applications to business or consumer Internet services (which cost up to 10 times less than comparable MPLS services)
- Routing all applications across MPLS where available, and using 4G wireless as backup or for overflow traffic
- Shifting all applications from MPLS to business or consumer Internet services to maximize cost savings, with two or more providers per branch both for resilience and to allow the solution to take advantage of whichever one of them provides the best performance for services the enterprise uses.
So at the core of our cost model is the “circuit costs” component, which includes all services that an enterprise has in the “before SD-WAN” state and those it will have after deploying SD-WAN, including:
- MPLS circuits: Traditional MPLS services with SLA and possibly multiple levels of QoS
- Business Internet: Internet services provided with an SLA and symmetrical service, i.e. the same bandwidth up to the Internet and down from it
- Consumer Internet: Consumer-grade Internet services (although also typically provided for smaller branch offices) which don’t have an SLA and may, if based on cable or DSL, be asymmetrical, with lower bandwidth for traffic going up to the Internet than for traffic coming down from it
- 4G or LTE wireless: Broadband wireless services usually used as initial connectivity in a new branch, or as backup or overflow capacity for an established branch with other connectivity available.
Given how large, comparatively, the spend on connectivity is, with a long enough replacement cycle (five to seven years, although costs are usually amortized over three to five years) the cost of capital equipment can seem insignificant. Even as the branch stack has grown from just a router to include also optimization and firewalls, this can still look true. That is, it can seem insignificant if you have easy access to capital funds. However, many organizations find capital funds increasingly pinched. That, coupled with an accelerating pace of technology change makes a big upfront investment in a long replacement cycle untenable, for now. So, the impetus is to reduce capital spend by consolidating the stack into a single box; or to shift costs from capital to operating expenses.
SD-WAN appliances, especially the newest generation ones used by carriers and service providers in their in-net solutions, are intended to be able to replace routers and firewalls and some functions of WAN optimizers, whether via integral functions of a unified appliance, or, in the NFV scenario, via router, firewall, or optimization VNFs run alongside the core SD-WAN VNF.
In other words, an apples-to-apples before-and-after comparison of capital equipment might include:
Or many other combinations. The model accommodates selecting how many sites have a separate firewall before the transition, and how many after; likewise WAN optimizers. We bundle both software licensing costs and amortized hardware into a single line item.
Although they feel keenly the fact that they have too much to do and too little time in which to do it, network professionals usually don’t know exactly how much time they (and their teams) spend in troubleshooting and resolving WAN problems. That’s because teams typically wear multiple hats, and outages and issues occur relatively infrequently in most WANs. Over the course of a year, a network engineer might estimate she spends 75% of her time on upgrades and new installations; 10% of her time doing architecture and planning; and the remainder on troubleshooting. But unless the company she works for is exceptionally obsessive about time-tracking, there’s no way she knows this. And when sites do experience significant connectivity issues, solving the problem is paramount and time-tracking what goes into it is not; resolution pushes aside normal work and often involves after-hours and weekend work that is rarely tracked and accounted for accurately.
What we found in research for the cost model, as well as in the 2016 Cloud and Data Center Benchmark research, is that regardless of how much time network engineers invest in troubleshooting and problem resolution, that number decreased by roughly 90% with deployment of SD-WAN. That may seem counter-intuitive, given that with SD-WAN network architects are in theory putting less-reliable Internet links in the role of primary connectivity beside (or in place of) more reliable MPLS links. However, in practice, most use cases involve moving from single MPLS connections to pools consisting of MPLS-plus-Internet or multiple-Internet connections—and a consequence of moving to multiple connections with transparent failover is to reduce or eliminate the impact of any single link having problems. The SD-WAN technology happily reroutes traffic over the good link(s), and simply resumes using the link that went down as soon as it is back up.
When there’s a service outage with a single MPLS circuit, network engineers need to drop everything and deal with the outage until the site is back up. But when a circuit goes down and other circuits take its place, it’s not really an outage; it’s merely service degradation, and not an emergency. And given that such outages are usually temporary and self-correcting, often no action by IT is required.
For a cost model to apply to any given environment, users need to be able to customize it to reflect their current environment and planned changes. This ability is key to conducting “what-if” analyses: determining which options make the most sense for a given deployment scenario.
To enable customization, Nemertes focused on a few key variables. (Please see Figure 2.) First and foremost: the WAN size (number of sites) and the percentage of the WAN converted to SD-WAN, because SD-WAN doesn’t have to be all or nothing. Users can input both, and see how the results change.
Carrier Service Options
The next most important variable in the cost equation is, as noted above, the cost of connectivity services. This comprises multiple, separate variables: Which provider is delivering services, and which services—MPLS, business Internet, consumer Internet, and LTE—are in use, and at how many sites.
The model allows users to select “before” and “after” options for service types, and to define connectivity profiles for a few common branch scenarios (see below). The cost for those services will draw from one of three sources:
- Specific carrier costs. Network professionals who work with a specific carrier, or who are considering selecting that carrier, can select that provider’s costs for the options
- Specific enterprise costs. Network professionals who know their own costs for services can plug those in, and have the model compare configurations based on the actual costs paid for services
- Generic costs. Network professionals who don’t know their own costs and aren’t focusing on a specific carrier can leverage an average of benchmark and survey data collected by Nemertes. These are paid costs, not list prices, so they provide a realistic sense of actual market costs.
We also enable users to indicate before and after scenarios for capital equipment. These include:
- Router replacement. As indicated above, some solutions allow (and even encourage) router replacement. At least one may require it (i.e. for in-router SD-WAN requiring a new enough router to support it). Removing a branch router reduces capital, management, and maintenance costs
- Branch firewalls, pre- and post-transition. A significant appeal of SD-WAN is the ability to send cloud-bound traffic directly to the cloud rather than routing it back through a data center; deploying more Direct Internet Access (DIA) in branches means deploying more firewalls to secure those connection points. Some SD-WAN solutions provide strong firewall functionality, others don’t, and in some cases IT will want to deploy a standalone no matter what, as a matter of policy
- WAN optimizers, pre- and post-transition. Between increases in usable bandwidth (with consequent decrease in contention for capacity) and the ability of SD-WAN appliances to supply crucial WAN optimization functions such as prioritization and route optimization, enterprises often have no ongoing need for a separate optimization appliance in an SD-WAN site.
Although the type of SD-WAN appliance doesn’t affect the cost of a deployment dramatically, we let users select the SD-WAN appliances they are considering as part of the modeling. This is a particularly useful capability when it comes to comparing overlay SD-WAN (for which users must purchase their own SD-WAN appliances) with in-net SD-WAN (in which providers deliver, and manage, the appliance as part of the service).
Lastly, the Nemertes tool allows the user to describe the organization’s most common site types in terms of their current connectivity profile and the profile they would like to shift to via SD-WAN. (Please see Figure 3.) Site types can range from a large headquarters or data center to typical midsize branch offices to small branches or even kiosks or other unstaffed network sites (e.g. an ATM or a Red Box or similar network-connected vending machine).
The model’s goal is to determine not only whether SD-WAN can deliver cost benefits, but particularly what sort of SD-WAN is optimal: overlay or in-net.
As outputs, the model compares current costs with SD-WAN costs, modeling both an overlay and an in-net transition. (Please see Figure 4.)
This provides network professionals with the opportunity to gain two pieces of insight. First, how much (if any) will converting to SD-WAN save? And second, which type of SD-WAN—overlay or in-net—saves most?
Which solution generates greater savings depends on the transition scenarios envisioned. Currently, users will be most likely to see in-net SD-WAN generating greater savings in scenarios where MPLS connectivity is left intact and no consumer broadband is added to the mix. When consumer services come into play and MPLS use is scaled back, overlay usually takes the lead.
It is important, though, to keep in mind that the attraction of outsourcing a big part of SD-WAN management via an in-net solution may outweigh small differences in savings. Some organizations would think the prospect of saving 20% over current spending levels and offloading management more attractive than saving 30% and keeping it; offloading the work frees staff up to add value in other ways.
Most WAN-connected branches of significant importance have a primary link (typically MPLS) and a backup link (usually an IP-VPN running across an Internet link). Under normal circumstances, they use only the primary link. If, and only if, that primary link fails will they use the backup link, and they will use that only until service on the primary is restored. Usually, the failover between primary and secondary is slow enough to break all network sessions currently running to or from the branch, booting people out of conferences and hanging up voice or video calls, terminating sessions on core applications. In all too many cases, it will be manual and require WAN staff time to execute. The whole drama is replayed when the primary comes back up and services are moved back to it, unless the WAN staff wait until “after hours” to make the swap back—typically still penalizing staff with poorer WAN performance (and penalizing themselves with after-hours work).
The presence of unused backup links is one of the chief avenues by which SD-WAN solutions can provide value quickly. Using Nemertes’ SD-WAN TCO Tool to model various scenarios, it is easy to see that even someone making the most conservative choices about connectivity—e.g. keeping existing MPLS links in place and at current speeds, and using only business Internet can, by making active/active use of existing IP-VPN links to double available bandwidth, offset big spending increases associated with big bandwidth increases. For example, consider a 100-site WAN spending $1.88M a year on MPLS and backup Internet. Doubling the speed to the branches results in a 35% cost increase, to $2.54M, using the conventional primary-plus-failover architecture. (Please see Figure 5.) Switching to hot/hot use of both original links via SD-WAN instead, doubling effective bandwidth without actually increasing link speeds, avoids that huge added cost.
Decreasing MPLS port speeds (but retaining MLPS as a core technology) and shifting some smaller locations off it entirely, can easily decrease connectivity costs by nearly 30%, to $1.33M. (Please see Figure 6.) More radical (and consequently riskier) shifts off MPLS can drive significantly deeper savings.
In addition to providing lower cost for more connectivity for branches with dual links already, fully leveraging Internet links via SD-WAN gives many other branches something they never could afford before: resilience. Many small and midsize branches have only a single MPLS link and no backup, or a single Internet VPN link. For such branches, the cost of a second link useful only when the first failed was seen as unjustifiable when compared to the cost of downtime. But by fully exploiting a second Internet link as soon as it is available, SD-WAN makes investing in the second link part of a growth and performance strategy at the same time that it provides business continuity. SD-WAN lowers the barriers to investing in redundancy and improves enterprise uptime even further as a result.
And of course, when a branch has multiple active links and intelligence in how they are used, difficulties on any one link have less impact. Branches experience less down time, about a 90% reduction in Nemertes’ 2016 Cloud and Data Center Benchmark data. This can represent enormous improvements in productivity for branches with poor connectivity currently. Such improvements, which most business acknowledge exist even though they have a hard time quantifying them, should be mentioned as ancillary benefits in any SD-WAN business case, even though they are generally not enough to drive approval of a deployment in and of themselves.
Similarly, an SD-WAN business case should mention IT time savings, as well. When link problems don’t have discernible impact on users, the urgency of troubleshooting the issues decreases. Given that most such problems are transitory, IT currently engages in a lot of troubleshooting on WAN issues that eventually just resolve themselves. By making most link issues non-events for the users and the business, as well as by providing intelligence on the exact nature and timing of the problems, SD-WAN can drive as much as 90% reduction in WAN troubleshooting time, according to 2016 Cloud and Data Center Benchmark data.
It’s important to track another “soft-cost” improvement of SD-WAN: business agility. For WANs, this aspect of “faster” boils down to one thing: branch lead time, the length of time it takes to light up a new network site. For MPLS networks, IT executives bemoan lengthening lead times, which for many of them have crept up from 30 to 60 days eight years ago to 90 to 120 now. By contrast they can often provision wired Internet service in a week or two; LTE, in a day or two. With business agility on many minds, this is no small improvement. You can’t build the business case on it, usually, but every business case should mention it. And, if there is an explicit corporate strategy built around a nimbler branch strategy, the business may have done the work of quantifying the value of each day shaved off the lead time for lighting up a new branch, and IT should lean heavily on that in building the SD-WAN business case.
SD-WAN combines active use of multiple branch links, intelligent direction of traffic across those links to provide better performance, security, and reliability, and centralized, policy-driven management of the WAN as a whole. It holds the promise of transforming IT’s relationship to the WAN by simplifying management of complex behaviors, promoting resilience and continuity of service, empowering more nimble branch strategies, and radically decreasing the cost of meeting rising bandwidth and performance needs. As always, IT has to build a compelling business case for making a transition like this, especially where an up-front investment will be required.
The base of the case must be cost, and, based on Nemertes’ SD-WAN cost model, savings should be easy to come by. The biggest cost component in the enterprise WAN is the connectivity, and SD-WAN can drive major savings on connectivity in a couple ways: preventing the major cost increases associated with major bandwidth increases, by making all links to a site usable simultaneously; and allowing actual spending reductions by means of substituting less-expensive Internet bandwidth for some or all of an enterprise’s more-expensive MPLS.
Note, though, that connectivity is not the only avenue by which SD-WAN can drive savings. By making redundant live links cheaper to deploy and making failover among links transparent to end users, SD-WAN can reduce both WAN outages and WAN troubleshooting costs by 90%.
IT staff should:
- Assess the amount of backup bandwidth you are paying for now—the links only available as failover connectivity in the event an MPLS link fails.
- Assess your demand curve for WAN and Internet bandwidth: determine how the connectivity profile for typical locations is likely to evolve in the next few years based on existing IT strategies and roadmaps for UC, collaboration, and other application or service rollouts.
- Model the cost of sticking with the current architecture, going out at least three years.
- Evaluate at least two SD-WAN solutions, overlay or service based, and model the cost of switching to them.
- If the SD-WAN numbers show significant potential savings over time, build a business case on them—but don’t leave out any other operational improvements you expect to realize.
- Look for quantification of the business value of agility in starting new branches; business units may have built a significant portion of the business case for you.