June 27, 2017

Death of the SysAdmin – 3Pillar CTO Jonathan Rivers’ Talk at ‘Rise of the DevOps’

3Pillar CTO Jonathan Rivers spoke at the recent 'Rise of the DevOps' conference in Cluj-Napoca, Romania about the impending "Death of the SysAdmin."

In the presentation, Jonathan talks about why DevOps is near and dear to his heart and then ventures into why the SysAdmin profession is dying out. He lays out his career journey, from the world of tech support for Windows to becoming a CTO. Jonathan’s presentation outlines what DevOps is and what essential skills are required for pursuing DevOps as a career choice. At the end of the presentation, a brief Q&A session was held.

Jonathan Rivers is the Chief Technology Officer at 3Pillar Global. In this role, he leads 3Pillar’s Product and Engineering organizations. This includes more than 600 software engineers, product consultants, product managers, quality assurance, and user experience professionals. He has 20+ years in System Administration, Data Center Operations, Cloud Infrastructure, Orchestration, and Automation.

You can watch the full video of Jonathan’s presentation via the Vimeo embed below.

About Rise of the DevOps

'Rise of the DevOps' was an event aimed at the growth of the DevOps community in Romania. This event was hosted by 3Pillar at the Impact Hub in Cluj-Napoca on June 10, and another 'Rise of the DevOps' event was held in Timisoara on June 17. Videos of each presentation are being published here on the 3Pillar site, so stay tuned for more! 

Spread the Word

If you like Jonathan's take, please help us spread the word about his talk!

[bctt tweet=".@3PillarGlobal CTO Jonathan Rivers on the death of the #sysadmin - and why that's a problem." username="3pillarglobal"]

Read the Transcript

If you'd rather read the full transcript than watch the video, you can find a transcript of Jonathan's talk below.

Jonathan: Thank you all for coming today. I’m going to try and do this two-handed with a microphone and a clicker. So bear with me. I’m Jonathan Rivers, Chief Technology Officer of 3 Pillar Global, and I’d like to thank all of you guys for coming out today. We’ve got really great speakers coming up after me. I’m really excited. I have the distinct pleasure of getting to work for all of them. They are really at the top of their game building very innovative products for us and really pushing the boundaries about DevOps which is something sort of near and dear to my heart.

When we talk about DevOps and the rise of DevOps, one of things that I’ve become very, very concerned about as a CTO and somebody who builds web scale products, is the fact that sys admin profession is dying out. There’s still some of us around, but unfortunately our beards are getting gray and/or we’re moving into management and don’t actually get to work for a living anymore. So it’s a problem. We need new ranks of people coming forward to take up that mantel. So talking about sys admins and DevOps and why all of this matters, I wanted to talk briefly about my career history because I think it might frame part of where the problem was.

So my career is largely divided into two parts in my mind. I have the early phase of my career where I was honing my craft, and the second part of my career where I got to actually work a little less and a little less and then just talk at events like this a little bit more. So I started my career a long time ago doing tech support. So phone technical support for Compaq Computers. That probably tells you how long ago that was, actually answering phone calls, telling people how to use Windows 95. If you ever want to learn the hardest job on the planet it’s trying to walk somebody through doing an operation where you can’t actually see their screen. It requires you to memories everything and do that, but I started in the Windows world. I started really in an IT mindset in the Windows world really not sort of on the networking side. I moved from help desk to IT support, fixing people’s printers, fixing people’s computer and then into IT consulting, so doing Windows networking, LANS, WANS and things like that.

In about 1999 I met a 16 year old kid who was running the networks for a high school, and this kid turned me onto Linux. It completely changed my world. I immediately fell in love with Linux as an operating system and dropped out of the Windows world completely. It was spectacular. I had named a male server bloke ware a Windows male server bloke ware for the computer name and didn’t realize that was going to be in all of the male headers, and my boss was a little less than amused. So I decided to become a Linux admin instead, and I think that’s when things really opened up. I went to an ISP a hosting provider where I learned Linux, Apache networking, how to really build and support web servers, and that gets us to the second half of my career when I had a very good background in everything that it takes to run a large scale network or a large scale set of systems.

I was building a SAS platform called AdJuggler. It’s an online ad serving application. It was a JAVA servlet. We built it with our own custom JAVA servlet engine because at the time Tomcat was just not fast enough for us. So we custom wrote a JAVA servlet application that had only what was barely needed to serve ads. We spent a lot of time tuning the TCP/IP piece stack just to make it faster. So you start building this huge depth of knowledge about Apache, about Tomcat, about TCP/IP, about Linux in order to really improve the performance. I went from there to go on to build the cloud infrastructure for BPS. Sylvan and Audi who both are going to speak a little later, helped me with all of that, and then to the telegraph where I rebuilt their IT department and here to 3 Pillar Global where I don’t actually get to do anything expect help other people do that. So that’s a career trajectory, right. So you start in IT. You actually move to Linux. You start building web scale products. You have to have some fundamental skills in order to do all of that and then ultimately you age out.

So we talk about why the sys admin’s dying out. There’s no entry path anymore. When I got started in the mid to late ‘90s, we had all of the small ISPs. The internet was new and so you had dial up. You had early cable connections. There were a large number of companies getting into the game. They needed new staff, and they were willing to train. From there everything has advanced. The cloud has become more predominant, and you’re finding companies like Amazon, RackSpace, all of the large cloud providers run all of the data centers now. There aren’t small ISPs. There aren’t small data centers. Most of the Linux administrators, most of the people who actually know how to run things at scale have gotten hired up for these companies or they’re comfortable. They’ve already got jobs with large product companies where they’re working.

Then talking about there being no entry path, the difference between the late ‘90s and now, in the late ‘90s IT was the hot profession, right. You wanted to go into IT. You wanted to either be in corporate IT or become a system administrator. Now programming is the hotness, right. Development is the new hotness. Everybody wants to grow up. They want to be a developer. They want to build all of these products, and it’s just not as glamorous. It’s not really taught; it’s almost viewed as a support profession and not part of the prime time.

Then finally the only entry point now is really the Microsoft game. Microsoft and their IT world have done a fantastic job of building a machine to pump new talent into the system; sort of web scale products don’t have a machine built that really pumps new blood into the profession. It’s ones and twos; its people who are interested in it find their way into doing it. There’s no real corporate mechanism for people to get there.

So why is this a problem? Well most web scale products are built on Linux open sourced software, and without those good sets of fundamentals to actually go and build them, then you aren’t going to be able to operate at scale. It’s really easy to prototype an application. It’s really easy to get things up on the internet. It’s really hard to run them at scale. I always talk about one computer is easy; ten computers not that hard. A thousand is unbelievably hard. You have to have top tier automation skills to be able to manage that many as well as understand how they all work to know what’s going wrong and when.

So when we talk about DevOps, I like to think of DevOps as a continuum. Sort of on the left hand side you have the scariest thing on the planet to anybody who has to operate anything in production, and that’s a developer with root access, right. It’s a very, very bad thing. In the middle you really have those sys admins who have been attached to a development team. They’re either just supporting them, taking tickets, fixing things in production for them but haven’t gone all the way. Then on the far side of the continuum sort of the dream state is really Google and what they’ve done with their site reliability engineers. These are engineers who have mastered not only operating systems, TCP/IP. They are also fluent in multiple programming languages, and they spend their time not only doing automation but actually taking development tickets and developing their applications at the same time. I think that’s the state that we all want to get to, but you have to have all of the requisite skills to get there.

Now I get asked a lot by our clients. They’re like we need DevOps and I’m like, ooh. What do you actually want because when most people say, I need DevOps, it’s because they hate their IT department, and they can’t actually get anything into production because we all know IT administrators want up-time, right. They want to make sure it’s work. Have you tested it? Of course I did. It worked on my machine. It’s fine. They stand in the way of getting things into production, and so I always have to clarify for folks what they really are thinking about when they want DevOps, right. Is it build chain automation? Is it production support or is it infrastructure automation and orchestration? It’s usually a combination of all three, but you want to make sure that you talk them through it and that they’re clear about what they want as they’re going through all of that.
Now I talked about those fundamental skills that are going to be required to really do this, and there are sort of four sets that are absolutely essential to ascend into the DevOps arena with any real measure of success. The first is operating systems, right. I’d say you can master Windows and get into DevOps. I’m not sure I’d think that’s really going to happen, but Linux or any of the other Unix variance are what you’re going to find in most web scale applications. The web stack, this one didn’t occur to me until mid or later in my career. How many admins there are out there who don’t really understand Apache, Tomcat, Varnish, NGINX and how to handle flow control at the web layer level. TCP/IP again absolutely mandatory understanding how networks work is the only way you’re going to build scalable products, and it’s the only way you’re going to be able to support a scalable product when it blows up at 3:00 o’clock in the morning and you’re unbelievably tired.

Then finally programming and scripting; back when I got started it was enough to be able to just script. Now you actually have to have programming chops. The game has been upped. There are more skills required to get all of this done. With that in mind, Linux again near and dear to my heart; I’m a Linux junkie and an open source zealot. Understanding file permissions absolutely key, right; if you can’t get the security right you stuffs going to get busted into time and time again. Instead of working in production you’re going to be getting hacked and cleaning up our file systems and trying to do all of that all the time.

The basic tools Vi(M); Vi is going to be on every server everywhere. It’s really easy. We all have sort of IDs of our choice or file editors of our choice, but learn Vi. It’s going to be everywhere where you need it, and if you know all of the commands you’ll be able to work quickly and get what you need.

File descriptors; I can’t stress this enough. It’s a big thing when you start to realize that Linux is just a series of text files being written to and out of. You’re networks sockets, your files all of it is text being written into, and if you understand how file descriptors work, how many of them you have, you’re going to be able to get better at the operation system. Then finally logs and where to go, right; you’re going to live half of your life in VI log unless you’re smart enough to export your logs to a centralized server. Then you won’t have to, but ETSI where you’re configuration files are. Just having the basics of the operating system file structure so that when you run into something unfamiliar you know where to go and how to get what you need.

The web layer, again, I can’t stress how important this is, and the first two I really want to invite everyone to master Mod_Rewrite and Mod_Proxy are two of the greatest tools you will ever own. They will allow you to control all traffic. Anybody can type whatever they want into a browser, but you can control where they go. When you realize that your end users want to go to a certain page and you can direct them to the server you want with that, the world opens up. Now there’s a lot of RegEx that you’re going to have to learn with that and that’s always fun because then you can trick your buddies by writing RegEx. They’ll never ever be able to read, but it is important.

The second page caching, right; page caching is how you get to scale. If you’re not caching your pages, your applications will never hit internet scale. You’re not going to be able to handle every single request, and the better you get at page caching the more scalable your applications are going to be and the better you’re going to get at coming up after an outage. If you’ve ever had a thundering herd problem, bringing servers back up under load is really difficult, and the more caching and hot caching that you have, it’s going to make it much, much better.

Then finally HTTP codes, right; there’s a difference between the 301 and 302. It’s going to affect your SEO rankings. You need to learn that. Four hundreds, five hundreds they’re both errors, but they tell you really different things about the state of the application. You want to get in there and really understand what your apps are telling you so that way when you’re going to solve a problem you’re solving the right one.

Networking, TCP/IP again learning all of the basics of it; I really can’t stress enough having a good fluency in how to configure your networks, configure your servers. Subnet masks, right, there are about 65 thousand reasons why you should know what a /16 is. You really need to understand that as you go to set up things in Amazon. As you’re setting up VPC you’re configuring your networks. You’re figuring out how your applications are going to be segmented and how your dataflow goes. Routing, gateways and then finally DNS, understanding how DNS works is core; most of the applications we’re going to build are going to use IP addresses but a lot of things are still going to operate over DNS. You want to make sure that you know what happens when you type something into a browser window.

Finally the programming skills, Bash and Perl are going to be essential. I have long said if I can’t do it in Bash I’m largely uninterested in doing it. This is why folks like Sylvan and Audi don’t actually let me work in production anymore. Now Python and Ruby for handling a lot of the orchestration and the automation, chef, puppet, a lot of the things we’re using in DevOps now. Then finally really understanding JSON and XML for your configuration files; being able to handle white space and get everything set up that you need with a minimum of frustration.

So from there I think really when you have those skills the next couple of things to sort of focus on is theory of operation, and I can’t stress this enough. If you understand how they work you can solve any problems. You can get to the details. So there’s a famous interview question that Google does which is when you type www.google.com into a browser window what happens, and there are a lot of ways to answer that question. Your skill level really can be shown by how you answer that question, right. You could nerd out completely and start talking about the OSI seven layer model and how it’s passed to a piece of hardware. It hits a DNS server; it gets a result. It’s routed, but the more depth that you can answer in the more you’re understanding of how all of these things work is going to be apparent.

The second a Django app; how does a web page get returned from a Django app? You’ve got a request to a web server that’s going to a database. How’s that database query made? How is it returned? How is it returned to the client browser? What are the steps involved and what are the individual pieces that are touched as that request is made? Finally even more confusingly we all love micro-servers, right. Micro-servers are great, but how does a web front end construct a call from five micro-services or ten micro-services to actually display a page? Can you go through the path? How are the individual requests handled? How are they cached? How are they stored? How are they responded to? The more you start thinking about these things and making sure that you have a fundamental understanding of how all of those bits work, the more successful you’re going to be.

Again knowing how your browser interacts with an operating system absolutely essential. How your browser interacts with remote servers again essential, and then finally how the application servers are going to all of their support. Is it just a database call? Is it a roll up call of multiple micro-services? Understanding the interactions between each of these services, where the data flows. What’s the cost of the dataflow? One of the things that you start getting at web scale which is really scary is you become an accountant. You start to have to think about the cost of an individual call. Is it staying within your network? Is it being routed over your network? Are you going to have to pay carrier for it, right? If it’s on Amazon, if you go outside your network, you’re paying a lot for those calls, and so making sure you know how your networks are set up that you’re routing all of your traffic within them so that you’re paying the lowest rates for them are going to be essential. It’s going to be faster; it’s going to be cheaper. You’re going to want to do it that way.

So all of that goes to say as DevOps becomes the new hotness, and it is, I’ve got developers coming to me, QA coming to me asking me how do I get into DevOps? How do I begin to do this? It’s a new glamorous form of system administration that I think is starting to take the market by storm. Programming just isn’t enough. Like I said a developer with root access is really, really scary. You have to have a good intent understanding of how those elements in the ecosystem interact, and part of that means having those basic skills to get all of that working.

Finally theory of operation; I harped on it a little bit. It’s absolutely key. If you know the order that events happen in and the components that are involved, you can solve any problem. Everything else is just going to be a detail, right. If you can tell what broke, where in the chain, we all have the internet; we can look things up. We can figure it out, but if it’s three in the morning, and you get a call and you don’t understand those things, your job is going to be a lot harder. So when you’ve got that good broad understanding, it makes getting things working that much easier.

So with that in mind, I will quit yapping at you because again, we’ve got other speakers who are going to talk about very, very real things, and I will turn it over to them unless we’re doing anything else. I don’t know.


Jonathan Rivers: Questions anyone?


Male Speaker 2: Can you elaborate more on how DevOps – people are trying to get information from me [inaudible] first slide. So how should be the organization [inaudible] that people answering calls at 3 a.m. should be able to solve problems?

Jonathan Rivers: That’s great. So I talk a lot about segmentation of duties and separating how you do things. There are a couple of different configurations, and it’s going to depend on how an organization wants to work. I’m a huge proponent of having a very strong web operations teams that sort of an L2 and L3 level team that handles the on-call duties, and then your DevOps are considered L4 or tier 4 in responding because what you want to do is you want to push the trouble shooting and the on-call work down to the least expensive or the most junior resource that can actually get it done because that first attempt is going to be triage. So you want them to be able to take a call, triage it quickly and then know if you need to wake somebody up because part of the problem about that call in at three a.m. is that three a.m. calls wrecks the next sprint if your DevOps guys are actually taking tickets.

So making sure that you have multiple layers of support; if you don’t then it’s making sure that you use something like pager duty or some sort of other on-call rotation to spread that load around to make sure that your same guys aren’t getting woken up. Then the other thing is don’t every sit on those problems. When production events happen, make sure that the fixes go into the very next sprint. I think all too often companies are willing to let their sys admins or their DevOps people just continually get woken up night after night after night, and it’s on us to make sure that we find all of the time and that we fix it almost immediately. Does that answer the question or are you looking for something more specific?

Male Speaker 2: Also now that I think how do we make sure that throughout the organization is structured such that people also have the knowledge to fix problems?

Jonathan Rivers: That one’s tough and it’s one of the reasons that I wanted to give this particular talk because people aren’t getting those essential skills. They’re not hiring for those essential skills. I think because of the problem that there aren’t a lot of new admins coming in, or the few Linux admins that I see coming in are coming out of IT infrastructures where they’re supporting IT Linux applications. It’s really required for the support manager to build a program to insure that they’re getting those skills. It’s just got to be part of the hiring process, and I think for me the one thing that I will tell you that I think is essential is when we’re hiring our DevOps people and we’re hiring sys admins, hire builders not protectors, right. Old school IT mentality is about protecting and stopping anything from happening. The only way you’re going to innovate or grow is sometimes by putting yourself at risk and getting things into production quickly, but you want all of your IT organization, all of your operations, and most importantly all of your DevOps wanting to build things, right. They’ve got to be excited about building things. They’ve got to be exciting about the thought of how do I scale a thousand application servers without actually doing anything, right. Can I just click a button and then a thousand servers appear. It’s finding people with the right belief set and then ensuring that they have the proper skillset, but belief first skill second.