Bug Police Officer

By Bruno Morais

Bruno is a Senior Software Engineer at AUTO1 Group.

Sense of Security
1. Why does feeling safe matter?
2. Strategic actions
3. A test as a Cop
Treat the injured first
1. Prioritizing what matters
2. The appropriate Mental Model
3. Ask before trying to find out
Trust your tools
1. When less is more
2. Efficient distribution of efforts
3. Confidence that avoids shame
Communication via Radio
1. Q-code
2. The Language and the Domain
Bug is in jail

What I present today is how to use police techniques and their mindset to detect and solve bugs in software, so that you can be the best software developer that society can provide.

For some years, I worked as a police officer in Brazil and I can say that it is a challenging career. The first step was to pass a national test, where knowledge of various laws was required. Criminal laws, traffic laws, and even the laws of physics.

The second step was to pass a battery of medical exams. Almost perfect health was necessary to be able to practice the profession, in addition to having most of the parts of the human body.

The third step was the psychological examination. It was necessary to have a clear mind and not have impulses different from those expected from a person who must protect other people.

As a fourth part, there was also the fitness test, a test in which career candidates had to prove they had sufficient strength and speed to carry out daily police tasks. Running, pull-ups, sit-ups and long jump are examples of what was required.

The last stage was the Police Academy, where for three months those from now on known as students would learn and prove the learning of several new skills: first aid, use of firearms (pistols, carbines, submachine guns, etc.), non-lethal weapons, martial arts, public policy, and many other disciplines.

All this so that society receives the best police officer that its people can provide. So, what about using some of the following principles and techniques to become the best software developer that society can provide?

Sense of Security

Why does feeling safe matter?

Without needing to be a security expert, you already know that it is impossible to have a completely safe city. Even if we reached the point of having perfect police officers, which in itself is impossible, we would not have a police officer for every citizen. In general, the recommendation is 300 police officers for every 100 thousand inhabitants.

Aware of this limitation, that it is not possible to guarantee complete security for all citizens, police agencies work with the concept of a sense of security. Through strategic actions to combat crime, citizens' feeling that they are safe is increased. This mental condition gives them peace of mind when working, studying, and carrying out their daily tasks.

Strategic actions

This sense of security also affects criminals. The common visibility of police officers carrying out their duties in strategic places, as well as effective actions to combat crime, make criminals stay away from that place.

Notice that ending 100% of crime is impossible, what we do well is limit its actions. From here we can extract our first lesson when dealing with software: perfect code, totally bug-free, is impossible.

Despite this conclusion, the fact that it is impossible does not mean that we should simply ignore the problem. On the contrary, we can, like the police, create strategic actions to combat Bugs. To the point that Bugs themselves will run away from our code.

A test as a Cop

Test Driven Development - TDD is a strategic action that provides a feeling of security in your code. Before the crime occurs, you position police officers in a specific region. Before the Bug occurs, you position the test in a specific place in your code.

Robert C. Martin, in his book "Clean Code: A HandBook of Agile Software Craftsmanship", teaches us that writing unit tests before writing production code is just the tip of the iceberg in defining TDD. There must be a continuous cycle of improvement, in which there is always a test covering the code you are developing and this security that the code is protected allows you to play with it, and work with it. After all, whatever problem may happen, there is a police-test there to protect you.

Treat the injured first

Prioritizing what matters

One of the functions of Highway Police Officers is to provide first aid in the event of traffic accidents. There are often serious injuries that must be treated as urgently as possible, as even 1 minute can make the difference between life and death.

The first step when arriving at a traffic accident site is to mark the location appropriately to prevent other accidents. We must use lighting devices and other appliances to make it clear to other vehicles that there is a danger there.

The police officer must then check the condition of all the injured to prioritize them according to the severity of each situation.

Once the treatment of each injured person has been completed and, if necessary, they have been taken to the hospital, it is time to survey the accident site. At this stage, the police officer begins to collect all the information relating to the accident to have sufficient information so that a decision can be made on the cause of the accident and, when applicable, any culprits.

The appropriate Mental Model

This is the mental model that you and I must adopt when we are faced with a Bug. Although several procedures have been adopted to prevent it from occurring, it is a fatality that can happen. It's like when the highway was well designed, the vehicles were up to date with maintenance, but even so, due to carelessness or misunderstanding on the part of the driver, an accident happens.

As in the case of traffic accidents, the first step to take is to secure the environment so that the Bug does not cause other problems. If possible, we can revert the application version to one in which the Bug did not appear. We can use a feature flag to temporarily disable a feature. Regardless of the method used to stop the bleeding, the important thing is that we must focus on treating the injured first rather than looking for culprits.

We have to agree with David Thomas and Andrew Hunt, in their work "The Pragmatic Programmer: your journey to mastery" that facing Bugs is a sensitive subject for most developers. Instead of treating the issue as a problem to be solved, some approach it with denial, pointing fingers, lame excuses, or simply apathy. It's exactly the opposite of what is expected!

Instead of looking for the culprit in a traffic accident, we must first assist the victims and stop the bleeding. In software, instead of finding out who inserted the Bug, we must solve it.

Ask before trying to find out

When treating an injured person in an emergency, one approach is to use the Glasgow Scale to determine the patient's consciousness. On this scale, one of the components is the verbal response.

The injured person is asked simple questions, and by analyzing the responses, we can assess the level of consciousness and consequently the level of urgency of their care.

You and I must carefully read every error message that is displayed during the Bug, it may contain vital information so that we can resolve the problem.

You may be thinking, but what if there is no error message? If we are just faced with an incorrect result? Well, this also happens in traffic accidents, it is not uncommon for there to be an unconscious victim who cannot answer questions.

When we discussed the sense of security, we talked about the importance of tests in our application. Go and build a test that can reproduce that Bug, so you can extract the information you need to solve it.

Think of these tests like the motor tests that highway patrol officers perform on unconscious victims to check their level of consciousness: do the eyes open in response to pain? Is there a pupillary response to light stimulation? Mentally translate this into software language: Does the Bug repeat itself when I enter this and that parameter? And if I insert this new argument, does it happen too?

To decide on a Bug, data is needed. If you don't receive this data, you must extract it.

Trust your tools

When less is more

Police work is very complex and your day-to-day job involves the fight for life and protection of the most important assets for society. There would be no point in having a modern and efficient selection and training process for the development of police officers if they were not given the appropriate tools for their work.

And when I say appropriate, I don't mean they are minimally capable of performing the task, as the result delivered is directly related to the quality and safety of the tools. For an excellent result, not only the police officers but their tools must also be excellent.

In the police agency I worked for, one of the work tools used was an Austrian pistol known worldwide for its safety and efficiency. The Glock 17 pistol was developed to meet the strict criteria of the Austrian army who wanted to replace the model used during the Second World War.

In addition to requirements on durability, efficiency, and reliability, safety against accidental firing was one of the pivots of the new model.

But how did a knife factory manage to deliver the pistol model that would prove to be the safest? Well, it can be said that, in addition to intense testing, the secret to reliability was simplicity and maintainability. The person responsible for the project designed the equipment with as few parts as possible, minimizing complexity. Today the Glock pistol is made with an average of 35 parts, which is a much smaller quantity than other pistols on the market.

With a simpler design, so-called first-level maintenance is much easier. Equipment users can easily perform maintenance procedures, making the equipment's efficiency last for decades.

Of course, you must already remember the principle with the acronym KISS, which stands for Keep it Simple (omitting the last word because I don't want to offend the reader). The secret of that renowned pistol model was not in the used materials, but in being simple. The secret of maintaining your software in good shape is to keep it simple, having a specific component for each task. When a problem arises, you will know exactly which component to look for, as each component has a responsibility, this is the principle of Single Responsibility.

Because of all this, when you are at a shooting range practicing your aim, the police officer knows that if he is not hitting the target, most likely, the error is in his procedure and not in the equipment.

Allan Kelly, in one of his contributions to the Book "97 Things Every Programmer Should Know", corroborates this idea that we should trust our tools when dealing with Bugs. Considering our tools are widely used, mature, and in use by multiple companies, we have little reason to doubt their quality.

Of course, a prototype, version 0.1, is much more fragile than a library that has been on the market for several years. We are also not saying that it is impossible to have bugs in compilers.

But considering how rare compiler bugs are, what we are saying is that in the search for a Bug solution, we should trust our tools and look at our piece of code first.

Efficient distribution of efforts

Imagine it as an efficient distribution of efforts. The library or compiler code has already been reviewed and tested for many years by countless people. As for that little piece of code of yours, only God knows who besides you has looked at it on your team.

On one of those days, I managed to solve an unpleasant bug that occurred in one of our services. It was a non-deterministic error in our test suite, that is, the Bug only pushed out its head when no one was looking.

This type of non-deterministic error generally involves three types of origin: multithread, time sensitivity, and dissynchrony between construction and cleaning of test data.

Analyzing the body of the failed test, after several hours, of course, I began to strongly distrust the way we created timestamps with a native Java library. After all, the test seemed perfect to me, if there was an error it must have been somewhere else.

Confidence that avoids shame

But, aware of this lesson that we should trust our tools, especially since the component in question had been in Java for 9 years, I ignored this distrust. This saved me a lot of time and embarrassment, as I'm going to share that the error was elsewhere.

Our test suite ran all tests in a single thread, so it wouldn't be a concurrency issue. But it also ran each test in a deterministic but unpredictable order. Therefore, the source of the intermittent problem must be in some specific order in which the tests were performed.

After several hours of analyzing the logs, I was able to identify two tests that failed when run in sequence. It turned out that, although our tests performed a test data cleaning routine after each execution, a simple typo had silently overwritten the cleaning method in one of the tests, which generated data inconsistency if such tests were executed in a specific order.

Communication via Radio

Q-code

I remember that one of the first systems I developed when I worked in the police was one to replace the sending of statistics through radio communication. Many years ago, each police station was supposed to send daily work statistics via radio. Data such as number of people and vehicles inspected, car accidents, etc.

This took a lot of time as the audio quality of radio communications was not always ideal. The interlocutors spent a lot of time transmitting and confirming each piece of information. All this using Q-Code

The Q-Code is a standardized collection of three-letter codes used in radio communication. The use of Q-code even in voice transmissions makes communication safer and more efficient. Each set of three letters, always starting with the letter Q, has a specific meaning that is known to the interlocutors.

I could see that two of those codes were widely used among police officers in radio communication: QAP and QSL.

QAP was used when you wanted to transmit a message. Let's say someone wanted to talk to me, so he said on the radio: "Bruno, QAP?". This could be translated as "Bruno, are you ready to hear a message?"

My answer could be QAP or for example QRM. If I responded with QAP, I was signaling that yes, I was ready to receive the message. If you responded with QRM, I would be informing you that at that moment the communication is being interfered with.

During communication, the second most used Q-Code appears, QSL. At the end of each message, the sender ends it with the QSL. This means that he is asking the receiver to confirm whether they received the message. It would be something like: "There is a suspicious vehicle heading towards your team, QSL?" If I had understood the entire message, I should respond with a simple "QSL". In the absence of a response, the sender repeats the message until receiving confirmation from the receiver.

Each message is concluded with a QSL, and its reception is confirmed with a QSL. Does this remind you of any transmission control protocol out here, used on the internet?

The main thing here is that a large amount of knowledge is condensed into three letters, in a language that provides interlocutors with efficiency and security. Eric Evans, the author of "Domain-Driven Design: Tackling Complexity in the Heart of Software", would agree with me that there is a principle here that should also be adopted in software development.

The Language and the Domain

A language specific to the domain, which considers possible interference from the environment, a ubiquitous language that is known by all interlocutors, can be the difference between life and death, between success and damnation of a project.

When a project does not have its own language for that domain, this creates the need for numerous translations between interlocutors. Developers have to translate for domain experts. These, in turn, have to translate their demands to developers. Even among developers, there may be a need for translations.

All these translations make concepts confusing and the consequences of a lack of mutual understanding can be disastrous.

The importance of this in trying to arrest a Bug goes much further than compilation errors or exceptions. It goes through the swampy characteristic that the software may be logically working almost perfectly, but it is not delivering what was expected by the stakeholder.

Let's look at the following pieces of code:

if (order.status != completed) {
     // (...)
}

if (order.isFullyRefunded()) {
     // (...)
}

Here we have a small example of two ways of expressing the same situation. In the second piece of code, a common language between developers and domain experts was used. Here, possible logical errors can be easily noticed, without the need to keep in mind what the possible states of the Order are and what the expected state is for that case.

See the following code:

Payment newPayment = payment.add(10);

What do you expect from this code? Does it add ten dollars to the payment? Ten transactions, inbound or outbound? Will the method create a new payment object or will it just change the value of the payment variable and create a new reference called newPayment?

A lesson also taken from the book "Clean Code: A HandBook of Agile Software Craftsmanship" shows that clear communication must also take place between software developers. Functions must communicate exactly what they are doing.

If you have to look at the function's implementation to know what it's actually doing, that's a big sign that you need to rename it.

Bug is in jail

I'm very grateful that you've made it this far, you've certainly learned a lot about police activity and how some of its principles can be applied to combat and remove Bugs from your software.

By applying this knowledge you will be able to work more confidently in your code, giving you peace in your daily life and also allowing you to experiment and discover new things.

Your aim will be improved and you will get straight to the point, avoiding damage instead of first looking for the culprits. You become bolder and go further because you know you can trust your tools. This leaves you more time to refine your code and seek excellence.

Finally, not only will your code be safe and efficient, but its result will also be in line with what your customers expect because through effective communication you can accurately deliver what was demanded from you.

The result of all this is that Bug is in jail, lives are no longer in danger, and you can go home at the end of your shift to enjoy the rest of the day with your family which is looking forward to welcoming their hero back.