Over the years, we’ve worked with a lot of partners who first went down the ‘build’ path to develop a real-time streaming solution for their Unreal Engine or Unity-based experiences, before ultimately working with us to deliver their experiences. Unreal and Unity both provide solid documentation that makes it comparatively easy to get your game streaming locally. Epic, for one, even provides documentation for how to set up an instance in AWS so you can stream from the cloud. Achieving these quick wins can give the impression that taking that single AWS instance and then doing it at scale is relatively straightforward. However, neither Epic or Unity provide documentation for streaming at scale, and that’s not surprising considering what’s involved.
In this post I will walk through just a few of the technical considerations we’ve had to work through over time when building our platform for streaming experiences real-time at scale. At the end of the day, even if you decide that the ‘build’ path is the right one for you, I hope this can provide you with a bit of a roadmap to get to your destination.
In the interest of focus, I’ll be examining a VM-based Unreal Pixel Streaming solution built in AWS. While this represents only one provider type in the overall PureWeb platform, it’s the easiest place to start your own ‘build’ journey.
The small considerations
Even before we get to global scale distribution, there are several aspects to consider in ensuring the reliable functioning of streaming with even a few instances. Let’s start with some of these.
Stop and go
Oddly, this is something that often gets taken for granted, but in any scalable streaming solution you’re going to need a mechanism for game lifecycle management. Specifically, something that literally starts your game when a stream is requested, stops it when the stream is completed, and cleans up the game after the stream has ended.
This is important from both a usability and a security perspective. Most real-time experiences are built with an introduction or preamble, which can include welcome messaging or other instructions. Without a way to start a fresh copy of your experience for every user, you’re going to be reusing streams for different users. Beyond being jarring for your users, this also presents security issues if users are expected to log in or provide other inputs into the game.
On the flip side, I can say that one of the most frustrating experiences any builder of a streaming solution will face is knowing when a user is really done with a stream, and it’s safe to shut down and turn off the game. There are several variables that feed into this decision, and none of them are 100% reliable. End the session too early? You’ve just given your user a bad experience. End the session too late? You’ve just burnt (potentially a lot of) cloud compute time. At PureWeb, our lifecycle solution was arrived at through years of experimentation and optimization, and we’re always looking for ways to improve it further.
I still remember the satisfaction I felt when our team built the very first incarnation of our VM scaling system years ago. You’d think this would be fairly straightforward, but the trick here is that auto-scaling groups only get you part of the way.
Nearly all modern cloud services are built around a model of stateless workloads. Basically, workloads like web and database queries where it doesn’t matter which infrastructure host handles which request. Virtually all auto-scaling mechanisms are built around this type of workload. You configure your auto-scaling system to watch host utilization for something it knows about like memory or CPU, and scale up when it hits a certain threshold. This, however, doesn’t work for real-time workloads.
Unreal Engine and Unity games can use wildly different levels of CPU and memory during their runtime. Just because a user is in a level or scene that requires a bit more compute, doesn’t mean we should be turning on another server. This is where you’ll have to build out a stateful scaling system that knows how to talk in the units we care about – streams. Then wire that up to your auto-scaler, so that servers will be turned on based on how many streams are currently running from your infrastructure. It’s also important to build this system so when scaling down, you don’t inadvertently boot your active users off by scaling down servers that are in-use.
The large considerations
While we have just scratched the surface of small-scale considerations, let’s get into some of the key things to think about when taking your small-scale solution and making it global.
So you’ve got a stable regional system. It scales up and down, and you can provide users a simple URL to access streams from that region. Is it just a matter of ctrl-c / ctrl-v to replicate your infrastructure to other regions? Unfortunately not.
As of this publication date, AWS operates in 31 global regions (the PureWeb platform covers most of them). The main challenge when you replicate your infrastructure across regions is how you route users to your experience. If you know exactly where your users are, you can possibly side-step the routing question by simply providing different URLs to those groups of users (ex. European users use link A, North American users use link B, etc.). However, in our experience this is rarely the case. So, you will need one link for all users, which means you need to know where your users are in the world, what their network latency is to all of your different regional deployments, and then route their sessions accordingly.
Now, hopefully you’ve already built a queuing system into your regional infrastructure, otherwise you either have to over-provision your regions or just hope that there is free compute in the selected region where you just routed your user (otherwise they’re going to get an error instead of a stream).
When we set out to solve this problem, we wanted to ensure that users on the PureWeb platform would always get the best possible stream with the smallest time (usually zero) spent in the queue. To address that we’ve built a global routing system that not only takes just-in-time latency measurements for every end user to our various streaming providers, but we also feed model availability, cluster utilization, and queue-length data back to the platform so we can make the best possible decision about routing users.
Think big, like really big
If you ever watch conference keynotes for the big cloud providers, they like to claim their cloud offerings have more compute than you’ll ever need. If you’ve ever built large-scale infrastructure on different cloud providers, you’ll know how this is more of an aspirational claim, rather than a reality.
We’ve been building streaming solutions on AWS since 2013 when they introduced the first G2 instances. Since then, AWS has tripled the number of regions, and massively increased the quantity and quality of GPUs available in those regions. In spite of that, over the years, our partners and customers have regularly tested the pan-regional limits of AWS’s GPU compute capacity within our platform.
Given that many of our partners, and perhaps even you, are looking to push their streaming experiences out to thousands, or even tens of thousands of concurrent users, we knew we had to expand beyond AWS. That’s why in 2021 we undertook a major architectural overhaul that would make it far easier for us to leverage compute from other cloud providers, because supporting very high scale experiences is often not possible when using just a single cloud. The first cloud provider we have expanded to is CoreWeave.
Today, we’ve built up fantastic partnerships (and friendships) with the teams at AWS and CoreWeave, and our platform has evolved to seamlessly dispatch load between these clouds while being completely transparent to our users. As we look to the future we will certainly expand on additional clouds, as we’re constantly being pushed to achieve higher levels of scale.
Two more for the road
Before you set out on your journey, here are a couple final things that are important to keep in mind.
One of the more persistent challenges partners face in trying to build out their real-time content, in parallel with building a platform for distributing that content, is how they ensure that their latest content changes are deploying continuously out to all the various servers set up around the globe.
In my experience, there is a maturity path when it comes to how you get your game content onto your infrastructure. In the early stages, teams often start by baking their content into their VM images (eg. AMIs). This can lead to a pretty quick win, as you’ll be able to rapidly copy your image to different regions and start scaling away. This is great in the early days, but as you start to increase the pace of change in your content, it can very rapidly feel like you’re wading through quicksand, as the build / bake / copy process can take hours.
The next step down the path usually involves some sort of dynamic download of game content when an instance starts up. This represents a big jump up in ease of use, because as long as you are deploying your latest game file to a known, stable location, all it takes to roll out that change to your infrastructure is to just cycle all your servers. Not great for availability, but it’s definitely an improvement in deployability.
The next big jump comes when you can dynamically deploy and update your game files on your infrastructure in real time, without impacting your active users. This can be further simplified if your storage target is a network attached disk like AWS EFS. A lot of how this is done depends on what tools you’re using for your CI/CD system and the capabilities of your lifecycle management service (see “Stop and go” above).
With the PureWeb platform, we handle all content deployments automatically across all our underlying infrastructure providers – this can be made even simpler with our CLI which can be easily wired into whatever system you have for building your Unreal Engine or Unity games, making it effortless to go from changing your game, to seeing that change live on your PureWeb deployment.
Secure all the things
I want to avoid seeding FUD when it comes to information security. I think it’s sufficient to say it’s both very important and very hard to get right. Between knowing your attack surfaces and threat actors, identifying and addressing system vulnerabilities, and continually monitoring your infrastructure, cloud configurations, and data, it can get overwhelming.
If you’re trying to address your security posture in AWS, you can get some decent mileage out of the built-in tools like the AWS Trusted Advisor, Inspector, and Guard Duty. GCP and Azure have similar tools for their clouds. However, the real trick is to have a culture of shifting left on security and build automation that ensures that when issues are identified, they’re addressed rapidly and with minimal interruption to the rest of your development efforts.
Given our roots in remote imaging for healthcare applications, I feel very proud of our culture of security. In the early days of PureWeb Reality, we saw a lot of success based on the merits of our early investments in platform security. This year, we’re getting our SOC 2 audit, which will provide an industry-standard assurance that we take the security of customer data very seriously.
If you’ve stuck with this post to this point, congratulations, achievement unlocked. While I’ve only touched on a handful of the considerations and learning opportunities we’ve experienced over the years of building our platform, I hope you’ve found it useful and have a better sense of what the ‘build’ path might look like.
But, if you’re thinking that the ‘build’ might not be right for you, then please reach out. We’re always excited to meet new partners who are trying to create amazing immersive experiences they want to distribute.
Get in touch here.