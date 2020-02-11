Schematic representation of how Muzahid’s deep learning algorithm works. The algorithm is ready to detect anomalies after it has been trained for the first time with counter data from a healthy version of a program. Photo credit: Texas A&M Engineering

We’ve all shared the frustration – software updates that accidentally run our applications faster do the opposite. These errors, which are known as performance regressions in the field of computer science, are time-consuming to fix, since finding software errors usually requires considerable human intervention.

To overcome this obstacle, researchers at Texas A&M University, in collaboration with computer scientists from Intel Labs, have developed a fully automated method to identify the source of the error caused by software updates. Their algorithm, which is based on a special form of machine learning called deep learning, is not only turnkey, but also fast and performance errors occur in a matter of hours in days.

“Upgrading software can sometimes cause problems when errors creep in and slow things down. This problem is even greater for companies that use large-scale software systems that are constantly evolving,” said Dr. Abdullah Muzahid, Assistant Professor at the Department of Computer Science and Engineering. “We have developed a practical tool for diagnosing performance regressions that is compatible with a wide range of software and programming languages ​​and greatly expands its usefulness.”

The researchers described their findings in the 32nd edition of Advances in Neural Information Processing Systems from the conference proceedings of the December conference on neural information processing systems.

To accurately pinpoint the source of errors in software, debuggers often check the status of the counters in the central processing unit. These counters are lines of code that monitor, for example, how the program runs on the computer hardware in memory. When the software is running, the counters record, among other things, how often certain locations are accessed, how long it stays there, and when it ends. If the behavior of the software is faulty, counters are used again for the diagnosis.

“Counters provide an overview of the program’s execution status,” said Muzahid. “So, if a program doesn’t run as intended, these counters usually show the telltale sign of abnormal behavior.”

However, newer desktops and servers have hundreds of counters, making it practically impossible to manually track all statuses and then look for faulty patterns that indicate a performance failure. This is where Muzahid’s machine learning comes in.

Deep learning enabled researchers to simultaneously monitor data from a large number of meters by reducing the size of the data. This is similar to compressing a high-resolution image to a fraction of its original size by changing the format. In the lower dimension data, your algorithm could then search for patterns that deviate from the norm.

When their algorithm was finished, the researchers tested whether it could find and diagnose a performance error in commercially available data management software that companies use to keep track of their numbers and numbers. First, they trained their algorithm to recognize normal meter data by running an older, error-free version of the data management software. Next, they ran their algorithm on an updated version of the software with the performance regression. They found that their algorithm localized and diagnosed the error within a few hours. Muzahid said this type of analysis could take a long time if done manually.

In addition to diagnosing software performance regressions, Muzahid found that the deep learning algorithm can also be used in other areas of research, such as developing the technology required for autonomous driving.

“The basic idea is the same again, namely to recognize an anomalous pattern,” said Muzahid. “Self-driving cars need to be able to detect whether there is a car or a person in front of them and then act accordingly. So again, it’s a form of abnormality detection, and the good news is that our algorithm already does is designed for that. “

Other authors of the study are Dr. Mejbah Alam, Dr. Justin Gottschlich, Dr. Nesime Tatbul, Dr. Javier Turek and Dr. Timothy Mattson from Intel Labs.

