interview Don’t trust; verify. According to AI researcher Vishal Sikka, LLMs alone are limited by computational boundaries and will start to hallucinate when they push those boundaries. One solution? Companion bots that check their work.
“To expect that a model that has been trained on a certain amount of data will be able to do an arbitrarily large number of calculations which are reliable is a wrong assumption. This is the point of the paper,” said Sikka, CEO of Vianai Systems during a call this week to discuss that research.
Sikka is a towering figure in AI. He has a PhD in the subject from Stanford, where his student advisor was John McCarthy, the man who in 1955 coined the term “artificial intelligence.” Lessons Sikka learned from McCarthy inspired him to team up with his son and write a study, “Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models,” which was published in July. The former CTO of SAP and ex-CEO of Infosys, Sikka set out to study the efficacy of LLMs and AI agents last year.
“We have an example my son came up with of two prompts that have identical tokens and when you run them, the exact same number of operations get performed independent of what the tokens are,” he said. “Therein is the entire point, that whether the prompt is expressing the user’s desire to perform a particular calculation or the prompt is expressing a user’s desire to write a piece of text on something, it does exactly the same number of calculations.”
Attempting to push an LLM beyond that limit gives rise to the hallucinations that bedevil the model’s output.
“When we say, ‘Go book a ticket for me and then charge my credit card or deduct the amount from my bank and then send a post to my financial app,’ which is what all these agent vendors are kind of saying, you are asking the agents to perform an action which holds a meaning to you, which holds a particular semantic to you, and if it is a pure LLM underneath there, no matter how that LLM works, it has a bounded ability to carry out these kinds of tasks,” he said. “So with agentic use of pure LLMs, you have to perform extreme caution when you do these kinds of things.”
But, Sikka – who founded Vianai in 2019 – said that, when LLMs are supported by systems that can verify the work and use the foundation model only for the computational power, the output becomes more accurate. Sikka said that, in the case of Vianai’s Hila, it can perform mission-critical tasks such as reducing financial reporting from 20 days of human labor to five minutes.
“For certain domains, when you surround the LLM with guardrails, with reliable approaches that are proven, then you are able to provide reliability in the overall system,” he said. “It’s not only us. A lot of systems out there work like that where they pair the LLM with another system which is able to ensure that the LLM has correctness. So we do that in our product Hila. We combine the LLM with a knowledge model for a particular domain and then, after that, Hila does not make mistakes.”
Sikka compared it to the structure Google uses to identify proteins that could be used to make medicines. Google’s AlphaFold has a custom LLM called Evoformer that creates candidates for proteins and that is fed into another “non imaginative” system that can check the configuration for flaws.
“And so anything that comes out of that has a much higher likelihood of being an actual protein, and then it repeats this cycle three times, and the outcome of that is pretty much guaranteed to be a protein for a particular situation,” Sikka said. “They have produced, I think 250,000 proteins that way, which, producing one protein used to take teams of scientists years to do that.”
He continued, “As to ‘why?’ as a scientist you always have to try and