Hey, I’ve got some exciting news for you! DeepMind AI has recently made some groundbreaking mathematical breakthroughs. Even more impressive is that it has a built-in fact-checker, ensuring the accuracy of its findings. Incredible, right?
Artificial intelligence firm DeepMind asserts it has found a method to train chatbots to answer mathematical problems with little human intervention.
By constructing a fact-checker to weed out irrelevant outputs and leave only trustworthy answers to computational or mathematical issues, Google DeepMind asserts that it has created the first-ever scientific discovery with an AI chatbot.
Models trained on precise and task-specific data have been the backbone of DeepMind’s past successes, such as weather prediction and protein shape artificial intelligence. An alternative is to train large language models (LLMs) like GPT-4 and Google’s Gemini on diverse datasets to build their capacities. But that strategy also leaves them open to “hallucination,” the word scientists use to describe the phenomenon of obtaining erroneous results.
Even with its recent introduction, Gemini has shown signs of delusion, incorrectly predicting the winners of the Oscars this year and other seemingly easy facts. Even the debut ads for Google’s AI-powered search engine were inaccurate.
A typical solution to this problem is to implement a layer above the AI that checks the outputs for correctness before sending them to the user. The sheer variety of questions that humans may pose to chatbots makes the development of an all-encompassing safety net an insurmountably daunting undertaking.
Alhussein Fawzi of Google DeepMind and his colleagues have constructed a generic LLM dubbed FunSearch based on Google’s PaLM2 model with a fact-checking layer, which they name an “evaluator”. The model is confined to creating computer code that solves issues in mathematics and computer science, which DeepMind argues is a far more manageable endeavor because these new ideas and solutions are intrinsically and immediately verifiable.
The underlying AI can still hallucinate and deliver false or misleading findings, but the evaluator filters out erroneous outputs and leaves only trustworthy, possibly valuable notions.
“We think that perhaps 90 percent of what the LLM outputs is not going to be useful,” adds Fawzi. “Given a candidate solution, it’s straightforward for me to tell you whether this is a correct solution and to evaluate the solution, but coming up with a solution is hard. And so mathematics and computer science match extremely well.”
DeepMind believes the model can produce fresh scientific information and ideas — something LLMs haven’t done previously.
To start with, FunSearch is given a problem and a very simple solution in source code as an input, then it builds a database of new solutions that are verified by the evaluator for correctness. The best of the trustworthy answers are provided back to the LLM as inputs with a prompt requesting it to improve on the ideas. DeepMind says the system develops millions of alternative solutions, which gradually converge on an efficient outcome — sometimes surpassing the best-known option.
For mathematical difficulties, the approach builds computer programs that can discover solutions rather than trying to solve the problem directly.
Fawzi and his colleagues challenged FunSearch to find solutions to the cap set issue, which entails identifying patterns of points where no three points create a straight line. The task gets increasingly more computationally costly as the number of points rises. The AI developed a solution consisting of 512 points in eight dimensions, bigger than any previously known.
When challenged with the bin-packing issue, where the purpose is to effectively load products of varying sizes into containers, FunSearch developed solutions that outperform frequently used algorithms — a discovery that has direct advantages for transport and logistics organizations. DeepMind claims FunSearch might lead to advances in many more mathematics and computational challenges.
Mark Lee of the University of Birmingham, UK, argues the next advancements in AI won’t come from scaling up LLMs to ever-larger capacities, but from adding layers that assure accuracy, as DeepMind has done with FunSearch.
“The strength of a language model is its ability to imagine things, but the problem is hallucinations,” adds Lee. “And this research is breaking that problem: it’s reining it in, or fact-checking. It’s a good idea.”
Lee argues AIs shouldn’t be blamed for creating enormous volumes of wrong or worthless outputs since this is not dissimilar to the way that human mathematicians and scientists operate: brainstorming ideas, testing them, and following up on the best ones while eliminating the worst.