As a blackjack player, even if you’re playing “basic strategy” to maximize your odds for each hand – the house edge, while seemingly small, is significant when enough players play for an extended duration. A player may view their success – or lack thereof — from moment to moment based on a (un)lucky shoe or dealer. While that player might have less than a half percentage disadvantage, that house edge matters.
When we talk about scale, zeros matter. And when we think about live streaming at scale, we need to reframe our metrics. But first, we need to understand the consumption characteristics of live and which factors make it unpredictable in nature and unpredictable in practice.
As an industry, we’ve had many years of experience with video on demand (VOD), delivering billions of hours of premium content to over a billion viewers across a multitude of devices. At the NAB Streaming Summit, one of the panel topics focused on best practices for managing and operating live events: what can companies leverage from delivering VOD, and conversely, what do we need to rethink?
If we were to think about video consumption as cars, VOD behavior is a Toyota Camry. It has decent speed, acceleration, turning, and braking – it’s predictable. For one customer that provides a premium SVOD service for millions of subscribers across dozens of devices, we created a viewer consumption model. Using historical consumption and factoring in recent trend patterns and expectations related to new content drops, marketing, and other factors that could affect usage, we could predict video traffic with a reasonable degree of confidence. This approach was helpful to model costs and infrastructure provisioning when analyzing from day-to-day, month-to-month, and year-over-year.
We could theoretically take this approach and apply it to live. However, we’re no longer dealing with a Toyota Camry.
Live is a Bugatti Chiron. But why? Live is fundamentally a transient experience, an event/moment centric experience. Though linear simulcast of pre-recorded content – e.g., primetime entertainment dramas and comedies – will have more in common with traditional VOD, truly live content – e.g., sports, news – can turn on a dime.
With live streaming, concurrent viewers can grow or shrink by thousands – or tens of thousands – in a matter of minutes. For example:
-
Viewers start watching an Olympic event, but within a minute, the athlete falls and is no longer in contention for a medal. Consequently, viewers are no longer interested in watching the remainder of the event and – with a click, swipe, or a whispered “Alexa…” – turn their digital eyeballs elsewhere.
-
On the other hand, if you’re fan of the Pats or Crimson Tide, you may have given up on them during the previous Super Bowl or CFP title game at halftime. But what happens when you get texts from friends that there’s a hint of a comeback?
-
Or, if you’re a baseball fan and it’s 10-0 in the sixth, you’re probably not interested in the game if you’re not a Giants or ‘Stros fan or if there aren’t any players from your fantasy lineup. But when it’s the ninth inning and news breaks that Matt Cain has a perfect game on the line, the score is meaningless; this is about history.
Given the demand variance of live streaming consumption, there are considerable limitations when relying on legacy data centers and independently provisioned hardware. From both cost and operational perspectives, companies need to utilize a scalable service that can react to the ebb and flow of live streaming. Cloud infrastructure – whether public or private – becomes a requirement.
For public cloud infrastructure, its fundamental value proposition over the years is the cost effective means for scaling resource against demand. But with live, if you correlate infrastructure allocation directly with demand, by the time you react to a spike in audience and the consequent demand for resources, the ability to spin up more instances may take minutes, which is too late. And with the same velocity as increased demand, audiences can shrink just as quickly, leaving provisioned resources underutilized.
There is as much science as there is art as to when to scale – or prescale – resources and when to terminate resources. From an architecture and cost perspective, allocation should take into account resource types (e.g., in the context of AWS, on-demand vs. reserved vs spot), regional resource capabilities (e.g., public cloud infrastructure regions, capacity of a single or multiple CDNs, etc.), and most importantly, the content itself. There are many ways to model resource allocation and address bias-variance tradeoff, but we should not ignore the value of having eyes on glass watching the event and informing the resource allocation model as to the potential shape of the demand traffic. No one could predict Spieth’s 2016 Masters meltdown, but the combination of informed data – via machine learning or similar analysis – and eyes on glass – with a sprinkle of human intuition – is a reasonable compromise.
None of us have a crystal ball, but what we need to do is look at the holistic end-to-end workflow for live and understand the balance between user consumption behavior and the effort and cost to delivery a solution that scales accordingly.
In our next post, we will discuss how Quality of Service (QoS) plays into the need for proper instrumentation and telemetry to understand the overall health of the workflow and how to plan for when things don’t go according to plan.