It’s too early to put your faith in Microsoft’s GitHub Copilot to fix your programme code automatically. Microsoft has admitted that the programme, which costs $10 a month as an add-on to GitHub, is not ideal and may use insecure coding patterns, have errors, or make references to out-of-date APIs or idioms.
The idealistic vision of automation, however, predicts that someday, artificial intelligence will be able to foresee a programming error that might potentially damage functionality or bring down systems, and will not only alert a developer before the code is put into production but will also instruct them on how to change the code to fix the issue. Even better, AI might be able to automatically correct bugs for programmers by reaching into the application code and doing so, saving them a great deal of time and effort.
Today’s observability and DevOps tools are the beginnings of such a future. Dynatrace, a maker of DevOps tools, has been developing “causal AI” and “predictive AI” for some time now to figure out why programmes crash and foretell how they will fail.
The next step is to combine generative AI with those observability tools to provide coders with recommendations on how their code may encounter issues and how to fix them.
Bernd Greifeneder, chief technology officer and co-founder of Dynatrace, stated in an interview that “the typical request from a CIO is, please fix my system before it actually fails.” Application lifecycle management tools are sold by Dynatrace, a commercial software provider in the DevOps and Observability sector.
Think about a common computer error: running out of disc space in Amazon’s AWS.
Ironic to the core, said Greifeneder. Even in these ultra-high tech times, it remains a problem when cloud discs at AWS run out of space and need to be resized via API calls. We want to optimise what we use because we don’t want to resize the discs upfront because doing so is expensive. However, the usage patterns can change depending on the number of customers we have in our clusters and other factors.
With a combination of causal and predictive AI, the company first pinpoints the “root cause” of a disc failure in order to address the issue. Large language models and other generative AI are not the foundation of these two tools. Instead, they rely on more established, more seasoned types of AI that can be trusted to deliver accurate, consistent results.
When it comes to causal AI, the software uses a number of algorithms, such as quantile regression, density estimation, and a model referred to as the random surfing model. The causal programmes are used to navigate a graph representing the components of an organization’s IT system and their relationships, as opposed to neural nets, which are trained on a static set of data to detect correlations between data points.
Because variables change too rapidly, traditional statistical models or learning models based on neural networks do not apply to dynamic IT systems in the broad sense, according to Greifeneder. Many of the tens of thousands to one hundred thousand pods that our customers have may be connected and change as traffic is routed, things scale, there are various versions, etc.
The causal AI programmes create a “in-memory, real-time model” of a customer’s entire IT system using a “multi-dimensional model that has the causal, directed dependency, sort of like a multidimensional graph” of all the entities, including the type of cloud service, Kubernetes version, and application being used. When a system problem raises an alarm, that Smartscape model is consulted in order to determine the root cause by iteratively traversing the model.
However, the business’s variations won’t be predicted by that causal model. Greifeneder said that although “it knows the root cause” of issues, “what it does not know is what is your business pattern,” referring to things like “Monday morning at 8:00 AM, you have a big spike in usage for whatever reason.”
According to Greifeneder, there must be some kind of history-based education to account for such aberrations.
Another set of sophisticated tools, such as an autoregressive integrated moving average, which is an algorithm that’s especially adept at piecing together patterns occurring in data over time, are used by a predictive AI component to achieve that historical learning.
It is crucial to note that predictive AI does not just examine server-based back-end systems. It also receives signals from network endpoints, such as whether a user is experiencing lag or service interruptions.
Greifeneder said that focusing only on server-side systems is insufficient. Understanding the dependencies involves many different factors, such as real user monitoring and API service monitoring.
Even though a CIO is most concerned with systems, user issues can arise even when servers are functioning normally. As a result, both the back-end and user experience need to be measured and compared.
He said: “Occasionally, we run into the IT-only person who only cares about their servers — ‘Oh, my server is up’ — but in reality, users are frustrated.On the contrary, just because one of those CPUs malfunctions doesn’t necessarily mean the end user is affected.
Using the disc space example as an example, the causal and predictive AI can predict a future disc issue. We can extrapolate from the cluster’s usage over the previous days and weeks to determine whether or not we might run out of disc space in a week, according to Greifeneder.
That is the motivation behind taking preventative measures, such as, Let’s trigger a workflow action from Dynatrace’s automation engine to call an API into AWS to resize the disc and subsequently automatically prevent an outage that we had in the past as a result of this.
The process is looped through generative AI at this point. This year, Davis CoPilot, a module that sits on top of the causal and predictive systems, was added to the Dynatrace umbrella programme Davis AI.
Create for me an automation that proactively stops this [disc outage], a user can instruct the CoPilot. To find out which discs are being mentioned in that prompt, the CoPilot can send a question to the causal and predictive AI. In response, the Smartscape and predictive data are used by the Davis programme to create a prompt that contains all the contextual information needed to comprehend the IT system as it is right now.
This prompt is then sent to the CoPilot, which, after receiving the information, will return the template of the workflow to automate” the disc re-sizing, according to Greifeneder. It will allow you, the user, to review and say, “OK, this is roughly right. Thank you, you helped me get 90% there,” which can save the systems engineer time compared to creating a workflow from scratch.
The Davis AI program’s next task is to relay all of these observations to the programmer as they coding the application. The holy grail of application development is to avoid writing code that introduces errors up front, rather than having to fix issues afterward.
A guardian is one strategy that Dynatrace uses. Before an application is put into production, a DevOps person can speak naturally to the CoPilot and request that it create a guardian to keep an eye on a specific performance goal. This is referred to by the company as “defining a quality objective in the code.” Next, it is determined whether the code will achieve the defined objectives using the causal and predictive elements.
Of course, if the Davis AI identifies potentially dangerous code, the problem is how to correct it. Although it is still a new field, it is possible to have the Davis CoPilot advise the programmer on potential code fixes.
According to your technology stack, we are considering offering recommendations with this Davis CoPilot on how we found this vulnerability in production, and Davis CoPilot gives you these recommendations that you should check out to fix in your code, Greifeneder told.
According to Greifeneder, the use of generative AI for these kinds of code fix recommendations is still in its infancy. The generative algorithms still experience the “hallucination” phenomenon, which causes the programme to confidently assert false information, even though the causal-predictive AI is designed to be trustworthy.
He claimed that causal AI-generated information is trustworthy because it accurately represents the system state. Therefore, we are certain of what is present. However, since the potential advice on how to change the code originates from the open GPT-4 models, it is not trustworthy.
Because of this, code suggestions for remediation may begin with a sound foundation, but they encounter the same problem as GitHub Co-pilot: a lack of a solid understanding of what code is appropriate. To better ground the recommendations of generative AI, large language models must be integrated with the tools offered by Dynatrace and others.
The results of formal studies on GPT-4 and its ilk in identifying and resolving code vulnerabilities are very inconsistent. The technical paper published by OpenAI alongside the launch of GPT-4 in March issued a warning against relying on the software. According to the report, GPT-4 […] had trouble creating exploits for the discovered vulnerabilities.
A study of GPT-4’s predecessor, GPT-3, by University of Pennsylvania researcher Chris Koch, published in February, was positive. In a set of GitHub repository files that had been carefully chosen for their known vulnerabilities, it was demonstrated that GPT-3 was able to discover 213 vulnerabilities. That number was substantially higher than the 99 errors discovered by the well-known code evaluation tool Snyk, which is a type of “Static Application Security Testing,” or SAST, frequently used to check for software vulnerabilities.
Koch pointed out that despite this, GPT-3 and Snyk both failed to detect numerous vulnerabilities because they frequently produced false negatives.
A later study, conducted by cybersecurity company PeopleTec, expanded on Koch’s work by evaluating an updated GPT-4 that was made available in August. It was discovered that the same files had four times as many vulnerabilities discovered by GPT-4.
However, GPT-4 was tested in both studies on files with a combined total of just over 2,000 lines of code. That pales in comparison to complete production applications, which can have tens of thousands to millions of lines of code spread across dozens or even hundreds of linked files. It’s unclear whether solutions to the GitHub files’ toy problems will scale to problems of this complexity.
To try and amplify language models for that bigger task, the race is on. In addition to Dynatrace, DeepCode AI is a tool sold by privately held Snyk Ltd. of the UK, which also sells a commercial version of the open-source software. By combining it with other tools, this technology, according to Snyk, can avoid the flaws of generative AI. The company claimed that DeepCode AI’s hybrid approach secures applications by utilizing a variety of models and security-specific training sets.
It is evident that generative AI still has a long way to go before it can solve even the most basic types of programming debugging and fixing, let alone the complexity of a real-world production IT environment. The great shift towards AI has not yet occurred.
Using generative AI as a new interface to assist coders in more aggressively examining their own code both before and after they ship that code is what is on the horizon, because of Davis Copilot and initiatives like it.