Can open source defeat Google and OpenAI in the AI arms race?
Can open source defeat Google and OpenAI in the AI arms race?
A leaked Google document revealed the technology giant’s concern that it could be losing the generative AI arms race to open source researchers
Imagine a world where to develop and produce an iPhone, or any other advanced smartphone, you don't need an expensive team of engineers and designers, a complex global supply chain with factories in several countries to manufacture screens, chips and sensors, or to employ tens of thousands of workers to assemble the devices and ship them around the world.
Imagine a world where almost everyone can produce their own smartphone, one that is at least as advanced as those of Samsung and Google, and customized to their needs. The development of this device won’t be carried out by huge companies worth trillions of dollars but will be conducted by a small startup or, more likely, an open source researcher. Sounds far-fetched, right? But when it comes to the most advanced and discussed technology today - generative artificial intelligence - you don't have to imagine at all. This is not the reality of tomorrow, but of today.
This was confirmed in recent days with the publication of what is claimed to be a leaked internal Google document, entitled, "We have no moat, and neither does OpenAI." A moat in the context of technological developments refers to a limit or threshold that makes it difficult for competitors to catch up with a leading company in the field. This can be a financial, logistical, technological or other limitation. Smartphone manufacturers, for example, have a logistical and financial moat that makes it difficult for new players in the field to generate significant competition. But when it comes to generative AI, and in particular large language models like ChatGPT, the document claims that this moat does not exist at all.
The document was posted on a public Discord server, and it was confirmed by Chief Analyst Dylan Patel of research firm SemiAnalysis to be written by a Google employee. However, even if it wasn’t, the document raises some important insights, primarily that players like Google and OpenAI are not positioned to win the AI race. "While we’ve been squabbling, a third faction has been quietly eating our lunch," it reads, "I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today." Among other things, the writer includes LLMs on a phone and scalable personal AI.
According to the document, the advancements made by the open source community began in March when Meta's LLaMA model was leaked. This was the first time that the open source community gained direct access to a multi-capable basic model. "A tremendous outpouring of innovation followed," it reads, "anyone can tinker. Many of the new ideas are from ordinary people. The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop."
According to the author, although Google and OpenAI models have a slight quality advantage, the gap is closing quickly. "Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.”
“We have no secret sauce,” it continues. “People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality.”
"For now, for daily commercial applications, solutions developed by the open source community are good enough," Matty Mariansky, an artist and entrepreneur in the field of AI, lecturer on Machine Learning for Designers at Bezalel Academy, and founder of the Rise of the Machines community on Facebook, told Calcalist. "They also don’t require you to give all of your private data to the owners of the model, who may choose to censor or limit the specific case that your startup wants to work on.
"The author of the document basically says that the notion that building a model would be such an expensive business that everyone would have to go to OpenAI or Google, was a mistake. The open source community, which has countless engineers and university researchers at its disposal, is schooling the big companies.” According to him, the open source community’s approach is smarter. “Instead of training a model that knows everything about everything, it's better to let it specialize. If you need a bot to answer customer service, why do you need it to know how to quote the works of James Joyce? You need to collect the data that already exists and make your little GPT a great expert in a tiny field."
This situation, Mariansky adds, creates quite a few difficulties for those who seek to regulate the field: "In terms of regulation, this is very bad news. It is very difficult to monitor models that run in-house.” Its all well and good when the model is owned by a law-abiding company, but bad actors can easily enter the foray and can possess enormous power without responsibility. “If the EU thought it would set some rules for four or five big companies and solve the problem, now that idea has been shattered. Everyone will have their own AI, trained on god knows what, with the ability to do all kinds of things, and on a private server that is difficult to monitor. This also means that none of the big companies will stop. If before it was 'if we stop, the Chinese will continue and overtake us,’ now it's 'everyone will overtake us.’”
However, researchers in the field believe that even if the gaps are not relevant for existing applications, they will always remain, and that open source models will always lag behind the larger and more expensive models. Andrej Karpathy, one of the founders of OpenAI and former senior director of AI at Tesla, said that the recent boom in open source was made possible mainly thanks to the leak of Meta's model. "Pretraining LLM base models remains very expensive. Think: supercomputer + months.," he tweeted, "But fine tuning LLMs is turning out to be very cheap and effective. Think: few GPUs + day, even for very large models.”
As far as the practical applications of generative AI are concerned, the large OpenAI and Google models do not have a substantial advantage that justifies choosing them over free open source models, especially when it comes to narrow and well-defined tasks, such as customer service, summarizing business documents or generating medical reports. But this is a short-term view. When you look at the future, ambitious applications such as making long-term investments in the stock market, managing complex international logistical systems, even dealing with and solving political problems and crises, could result in the creation of a moat by the bigger and better-funded players that the open source community has no chance of overcoming.