Data Boost For Engineers & Engineering Managers
After Chetan gave a talk at TOA Berlin about this topics, he sums up his ideas about improving developing habits and using readily-available data to extract findings from code.
Last month, the Relink team got a chance to visit ever-so-beautiful Berlin where Bjarne and I talked about data and developers at a TOA satellite event. For those of you who couldn’t join us, here is a sneak peak of the things that we are experimenting with at Relink office.
Our team here in Copenhagen tries to use data and science to use reduce the time and money that we have to spend to build products. While I’m sure that a lot of companies are doing that too, we often end up neglecting the contribution some very common insights can make in improving developer productivity.
Make no mistake, this is not one of those boost productivity, self help articles. Before investing our time in any action, we like to ask how useful the outcome would be to us. While working with our toolchain and products, we have broken down these insights into two major categories — improving your software, and improving your development habits.
While we discuss these two, we will go into more details about the indicators that can help us find out where and why we waste time, and how we can find bottlenecks in our software.
Improving Your Development Habits
The best place to draw data about your development habits is your version control systems. Let’s take an example of Github, which is probably the most famous cloud based version control host.
By clicking on the insights tab in the last panel of Github, we can unlock a ton of useful graphs and indicators.
Here, for example, is a simple graph that shows the development effort put into a code-base over a period of time.
While the above may not be the most useful chart for engineers or engineering managers, the following could be immensely helpful.
This illustration called punch card is probably the easiest way to look into the frequency and time at which people make contributions to a particular code base.
As you can see in the above screenshot, somebody happens to be making commits at 1AM on a Tuesday morning.
I think we can all agree that 1 AM on a Tuesday may not be the most productive time for Software Development for a team that would otherwise work during the day time. Unless you are working in a distributed team, this visual is extremely useful if you want to find out the most productive time intervals for your development team.
Taking a look at the same graph over a few code bases can give you an even better insight into the practices and habits of your team, and help you tailor a better plan for communication and collaboration.
How Often Are You Reinventing The Wheel?
While it is good to know the common habits of your development team, it is even more helpful to find out the code bases where you are reprogramming everything many times. The above chart (also extracted from Github), shows the additions (green spikes), and deletions (red spikes) in your repository. Red and green becoming competitive, like you see on the left side of the chart, is an indicator that you may be changing your entire code base again. As we move forward to the right, you can see the code base becomes slightly more stable as we start adding more features and not removing too many things.
Visuals are often more helpful than we give them credit for, and Github (and other version control tools like it) give us an easy way out of figuring out ways of extracting them from our daily workflow.
Improving Your Software
While it is fun to look at beautiful graphs and fancy metrics, it may be very hard to extract meaningful data out of many running applications in the cloud to something that you can actually use. There are often two major problems for when startups decide to dig deeper into performance of their running code — extracting metrics out of the code, and deciding what exactly it is that you want to extract. Let’s look at one problem at a time.
Extracting Metrics Out of Code
While this is something with which we are still experimenting, we feel that we have an immense advantage of the software stack that we are using here at Relink. Most of our code is written in Scala, and software runs on Kubernetes as micro-services. This lets us take advantage of Scala’s meta programming framework to build tools that let developers abstract out any additional functionality out of their code-base, and let the operations collect useful information from those applications while they run portably.
We use something similar to what you see in the picture above where a developer only needs to mark a piece of code with something like an annotation, where our benchmarking library takes over and evaluates the performance of that code when it runs without the developers having to worry about it.
While we do that, it is a very common mistake to benchmark your code for the wrong things. Most of the code written today is asynchronous, and it is important to remember that when we benchmark a piece of code, we benchmark the entire period from input to output.
What Metrics To Extract
Like I mentioned initially, while we are working towards pretty graphs and fancy metrics, it is important to remind ourselves of the usefulness of the end result. We believe that there are three important characteristics to useful visual data of your software’s performance:
Comparison in performance between different versions of your code
Traceability of the bottlenecks (slow components of an end-to-end action call)
Change in the impact of the above over a period of time
If we can compare the performance between different changes that we make to our code, and stack up different API calls between applications to complete a user action, we get a pretty good idea of the parts of our application that require attention.
Things get more interesting when we evaluate the above two over time for long running applications. Long term monitoring does not only give us an idea of our user traffic, but also help us with cost estimates, anomaly detection, and alerts for potential cyber attacks.
Traceability Of Your Code
Last and, arguably, the most important component of application monitoring is the logs. Application logs give us the most comprehensive view of things that our application is doing.
"While it is good to know the common habits of your development team, it is even more helpful to find out the code bases where you are reprogramming everything many times."
Given a microservices architecture, we have many applications calling many other applications. Therefore, it is important that we can track how different components of our stack interact with each other to fulfil a user request.
Every time a user makes a request to our servers, a first responder API handles the call, which is then propagated to many other applications. This first responder API is responsible for generating a unique identifier for that request that is then cascaded over every internal call and logged with the anonymised and obfuscated data for that call.
This gives us a bird’s eye view of what happens with our application. We are currently working on making this unique identifier accessible to our customers so that when they come back to us with a bug or a problem, we can pinpoint the exact request to find out things that went wrong.
Where to go from here?
While the road from your company’s first line of code to what we discussed above is long and tedious, the best way to get there is to keep asking yourself and your team for the things that could help the most.
Here we described the things that have been most helpful for us, and they may not be the same for everyone. There is no better way to solve a data problem than by asking questions and reiterating on feedback. That is something that I have learned the hard way, and something that I would advice any Software Engineer who wishes to use data for real productivity boosts.