Keep calm & be a system engingeer
Let me start this with the following quote: “A programmer would be nothing without a place to run his or her code. “
This quote actually works for either on premise or cloud based software. Everything needs a reliant, performant and solid infrastructure to run his or her applications.
Now here is where we, system engineers, come into play to provide this infrastructure to our developers.
Even though we operate mostly in the background as our infrastructure is based somewhere in a state-of-the-art datacentre hidden deep in the mountains somewhere guarded by a dragon (figure of speech off course).
We are providing this basis to our developers, so they can focus themselves entirely on writing application which make it possible for our business to run.
About the function
I’m a system engineer working for a financial corporation in Belgium. Here we have a couple of 100 system engineers working together but all operating in different fields of IT. For example we have dedicated system engineers for storage, networking, operating systems, hardware, virtualization, big data, … and many more.
I’m part of the compute team which manages all the physical servers which are used to provide compute capacity for server operating systems and the virtualization layer that is installed on top of this hardware. Virtualization means that we configure and run multiple operating systems on a single server. This allows us to use our compute capacity as efficiently as possible without letting resources go to waste.
We are running an environment of around 350 physical servers hosting more than 5000 virtual servers. As you can imagine this environment doesn’t run itself.
A day as system engineer
So how do we fill our days?
Here is a bit of my day schedule: While all our developers are still sleeping, we get in early and check our capacity reports.
Are there some cluster that need additional resources, we add some from our spare capacity to the production clusters if required, we make sure the correct storage datastores are connected and we add the resources into the cluster to allow our virtual machines to use these resources.
As our clusters are dynamically built, the cluster will balance its virtual servers nicely over the new resources. coffee time (yes system engineers also run on coffee).
Then the chats start coming in from application owners like “we had an issue at 4:30 last night could you investigate what happened.” Then the hunt begins to find where or what caused an issue and this can be literally anything. Networking issues, storage issues, another VM clogging up all the resources, a bad piece of hardware, a faulty driver, … you name it. But we know our stuff and we solve it before the end of the day assuring the application owners it will never happen again (probably).
Maintaining our hardware and virtualisation layer also means frequent updates to be compliant with the latest security patches which is a requirement with a big red exclamation mark next to it. We are constantly fixing and patching security issues in our hardware or virtualisation layers so that we are protected at all times, you don’t want a backdoor into your bank account would you?
I once heard a phrase from an old teacher: a good system engineer is a lazy system engineer. I didn’t get it at the time because how can you be good at your job if you are lazy. But then we were introduced to “automation”. Imagine you are running you are running a small environment with 5 servers. You can click every server, change some settings and move on to the next one and you are done in 5 minutes. Do you want to do that as well on 500 servers? That is why a big part of our day goes into creating automated scripts that do the work for us. some examples:
A new server will be installed fully automated and placed in our virtual environment without any intervention.
All configuration after installation is done automagically (yes we use this word ).
Architectural changes are scripted so they can be rolled out at once without manually clicking anything
Patching of our hosts is fully automated (firmware and drivers) through scripting every step
And then there are the projects, the interesting stuff.
At this moment we are implementing a VMware VSAN cluster in our environment where we will be placing very specific workloads on. This is a new technology which is being used for the first time in our environment so it requires some time before we are confident enough to place it in production. We have to create a fully functional design which is basically agreeing on how we are going to build the environment.
It then needs to be setup properly while automating as much as possible along the way for future builds and last but definitely not least, we need to document everything we do, which means writing pages long of boring word documents explaining every step of the process and stating why we did it like we did. Boring right, but I can assure you, in 6 months you will not remember why you flipped the switch on that particular setting.
So that is a bit on how I spend my days.
Each day we make sure that everything runs smoothly and that our business is operating at it’s max capacity by providing the resources they need!