#K8s in 2020: the Storage Admin is dead, Long live the Kubernetes SRE
The past couple of decades will likely go down in history as transformative years for Infrastructure Engineering as well as their operators. These operators span from the humble beginnings of the baby boomer System Administrators with long beards to the highly curious, energetic and enthusiastic millennial Kubernetes SREs of today.
Kubernetes SREs, otherwise known as platform engineers, has gained broad acceptance in almost all enterprises. These engineers are tasked with answering a couple of important questions as Kelsey Hightower, the author of the SRE Handbook, puts it:
Over the past decade, Kubernetes has changed the way platform engineers implement the solutions to achieve the above - from building in-house tools to using a standardized set of open source tools like Kubernetes and its add-ons for networking, security, storage and so on. In this post, we'll explore how the role of platform administrators is being replaced by Kubernetes SREs and the factors that have influenced this change.
Looking back, some of the notable developments that led to this transformation could be summarized as follows:
For the world of data management, which saw a 16x or 32x increase in available storage capacity on the individual nodes, software-defined storage along with hyper-converged infrastructure-led many to forecast the death of storage administrators. While most advancements and futuristic ideas tend to come with doomsday predictions and often self-fulfilling prophecies, the imminent death didn’t really come until Kubernetes expanded into use for data management itself. Of course, there is still a lot of ground to cover - but the role of Storage Administrators has clearly changed.
At first, Kubernetes didn’t embrace data with open arms. In fact, the early entrants (Startups) that tried to add data to their containers and Kubernetes were too early and unfortunately had to shut down.
Like a stateful workload itself, which is persistent, startups also must be persistent to fend off the forces that have tried to keep stateful workloads out of Kubernetes. It took efforts from several storage companies to fight the marketing buzz around stateless workloads and functions as services and establish the fact that there really is no such thing as stateless. An app’s state must be stored somewhere. It is either stored in the cloud or proprietary APIs, causing lock-in, data gravity, and usage of proprietary APIs to move data into and out of their environments.
However, with Kubernetes itself being such a great product with the vibrant community, it was not easy to keep statefulness at bay. The last two years have been very promising for Storage Startups such as MayaData, Upbound, and Heptio, all of which are working to ease the data management challenges in Kubernetes with Open Source projects. Over the last couple of years, Kubernetes has expanded itself into running Data Pipelines, CI/CD, Observability Tools (Grafana, Prometheus, Elastic,..) and so forth.
This shift can be largely attributed to the efforts of many community members from different organizations. Significant enhancements to Kubernetes Data Layer in 2019 include:
Today’s Dev Teams are required to operate with greater agility in delivering applications. The greatest bottlenecks hindering agility typically revolve around people and processes surrounding the infrastructure. Using an expensive infrastructure meant asking for permissions, planning in advance for the budgeting season, planning for types of workloads to avoid noisy neighbor issues on shared infrastructure, etc. Now, even a developer laptop can be turned into a production-grade Kubernetes environment with storage included, thanks to virtualization, Vagrant, Docker, and KinD.
Teams that have embraced DevOps typically find that using Hyper-Converged Infrastructure also helps hyper-converged operations, giving developers exactly what they need, when they need it. Developers can now focus on what they do best, working on the business logic, while the IT operations/DevOps can focus on what they do best.
Dev Teams can now provision a decent Kubernetes cluster with persistent storage using tools like OpenEBS without waiting for a Storage Administrator to provision, plan and permit storage infrastructure.
It is common these days to have a single SRE managing up to 50 developers and dozens of services by depending on one of the Cloud Providers. In fact, we are a big GKE shop ourselves!
Just like how the roles of Developers, QA and Ops have blended in to give rise to high velocity, functional/domain-based small teams, the roles of storage administrators have merged with the Platform Engineers who are building fully automated, YAML/intent-driven data infrastructure.
Data teams have become largely disaggregated or distributed, resulting in silos. This reflects the same microservice patterns for people. Enterprises are internalizing the concepts of Data as a Product and Data Mesh/Data Ops! The typical team structure now looks as follows, wonderfully articulated by Zhamak Dehghani’s article on Data Mesh. This article is a must-read for anyone looking to achieve Cloud-Native Data Management.
The ideal platform engineers develop a platform where developers can self-provision a source repository for their application. The self-service portal will go ahead and set up the required Docker and Kubernetes manifests and even enable CI/CD, all with just a few inputs.
It is evident that small enterprises really don’t need Storage Administrators anymore, and they can opt to either use a managed Kubernetes cluster from one of the several cloud providers or even run their own.
However, as a business, applications, and development team scales, the isolation can introduce chaos and make it harder for developers to move across teams. Platform Engineers, aka Kubernetes SREs, must then come to the rescue.
Platform Engineers help restore calm from chaos by setting up self-service portals for Dev teams to provision their own clusters or use a namespace within a cluster. Some platform teams have also extended the capabilities to the level of setting up code repositories with docker and Kubernetes manifests and integrate into CI/CD pipelines that also include Chaos Engineering.
The role of the Kubernetes SREs has evolved, and they must now embrace responsibilities that were typically handled by the very storage administrators everyone declares extinct! These responsibilities include:
The Kubernetes SREs will also have to work with the overall community to enable their enterprise to adopt best-in-class solutions that will ensure they remain competitive as the IT sector moves towards ARM-based Infrastructures, 5G Implementations, and proliferation of the Edge!
A final, but still important, aspect that Kubernetes SREs will be tasked with is to ensure their platforms are environmentally friendly. This could come from using more optimized deployments to actually migrating to infrastructures or cloud providers that have actually implemented next-generation sustainable data centers.
2020 looks bright and promising to Kubernetes SREs. Personally, I do everything I can to keep up with developments via the following resources in the data management space and Kubernetes:
Open Source Project communities
Conferences
Blogs/Links
Podcasts/Webinars
Another great source of engaging with the community is through your Local CNCF meetups or related Data Management meetups like Data-Driven, Data Council.
Do you know of another resource that is not listed above? Please share it with me via comments or on twitter.