Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

cm0002@lemmy.world · 5 months ago

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

gnufuu@infosec.pub · 5 months ago

The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.

That’s basically a coin flip. Even if you made it less successful a user would have higher chances of guessing correctly whether an annotation is true of false.