AI pioneer Cerebras opens up generative AI where OpenAI goes dark
AI pioneer Cerebras opens up generative AI where OpenAI goes dark
AI pioneer Cerebras opens up generative AI where OpenAI goes dark
The world of artificial intelligence, especially the corner of it that is wildly popular known as “generative AI” — creating writing and images automatically — is at risk of closing its horizons because of the chilling effect of companies deciding not to publish the details of their research.
But the turn to secrecy may have prompted some participants in the AI world to step in and fill the void of disclosure.
On Tuesday, AI pioneer Cerebras Systems, makers of a dedicated AI computer, and the world’s largest computer chip, published as open-source several versions generative AI programs to use without restriction.
The programs are “trained” by Cerebras, meaning, brought to optimal performance using the company’s powerful supercomputer, thereby reducing some of the work that outside researchers have to do.
“Companies are making different decision than they made a year or two ago, and we disagree with those decisions,” said Cerebras co-founder and CEO Andrew Feldman in an interview with ZDNET, alluding to the decision by OpenAI, the creator of ChatGPT, not to publish technical details when it disclosed its latest generative AI program this month, GPT-4, a move that was widely criticized in the AI research world.
Also: With GPT-4, OpenAI opts for secrecy versus disclosure
“We believe an open, vibrant community — not just of researchers, and not just of three or four or five or eight LLM guys, but a vibrant community in which startups, mid-size companies, and enterprises are training large language models — is good for us, and it’s good for others,” said Feldman.
The term large language model refers to AI programs based on machine learning principals in which a neural network captures the statistical distribution of words in sample data. That process allows a large language model to predict the next word in sequence. That ability underlies popular generative AI programs such as ChatGPT.
The same kind of machine learning approach pertains to generative AI in other fields, such as OpenAI’s Dall*E, which generates images based on a suggested phrase.
Also: The best AI art generators: DALL-E2 and other fun alternatives to try
Cerebras posted seven large language models that are in the same style as OpenAI’s GPT program, which began the generative AI craze back in 2018. The code is available on the Web site of AI startup Hugging Face and on GitHub.
The programs vary in size, from 111 million parameters, or neural weights, to thirteen billion. More parameters make an AI program more powerful, generally speaking, so that the Cerebras code affords a range of performance.
The company posted not just the programs’ source, in Python and TensorFlow format, under the open-source Apache 2.0 license, but also the details of the training regimen by which the programs were brought to a developed state of functionality.
That disclosure allows researchers to examine and reproduce the Cerebras work.
The Cerebras release, said Feldman, is the first time a GPT-style program has been made public “using state-of-the-art training efficiency techniques.”
Other published AI training work has either concealed technical data, such as OpenAI’s GPT-4, or, the programs have not been optimized in their development, meaning, the data fed to the program has not been adjusted to the size of the program, as explained in a Cerebras technical blog post.
Such large language models are notoriously compute-intensive. The Cerebras work released Tuesday was developed on a cluster of sixteen of its CS-2 computers, computers the size of dormitory refrigerators that are tuned specially for AI-style programs. The cluster, previously disclosed by the company, is known as its Andromeda supercomputer, which can dramatically cut the work to train LLMs on thousands of Nvidia’s GPU chips.
Also: ChatGPT’s success could prompt a damaging swing to secrecy in AI, says AI pioneer Bengio
As part of Tuesday’s release, Cerebras offered what it said was the first open-source scaling law, a benchmark rule for how accuracy of such programs increases with the size of the programs based on open-source data. The data set used is the open-source The Pile, an 825-gigabyte collection of texts, mostly professional and academic texts, introduced in 2020 by non-profit lab Eleuther.
Prior scaling laws from OpenAI and Google’s DeepMind used training data that was not open-source.
Cerebras has in past made the case for the efficiency advantages of its systems. The the ability to efficiently train the demanding natural language programs goes to the heart of the issues of open publishing, said Feldman.
“If you can achieve efficiencies, you can afford to put things in the open source community,” said Feldman. “The efficiency enables us to do this quickly and easily and to do our share for the community.”
A primary reason that OpenAI, and others, are starting to close their work off to the rest of the world is because they must guard the source of profit in the face of AI’s rising cost to train, he said.
Also: GPT-4: A new capacity for offering illicit advice and displaying ‘risky emergent behaviors’
“It’s so expensive, they have decided it’s a strategic asset, and they have decided to withhold it from the community because it’s strategic to them,” he said. “And I think that’s a very reasonable strategy.
“It’s a reasonable strategy if a company wishes to invest a great deal of time and effort and money and not share the results with the rest of the world,” added Feldman.
However, “We think that makes for a less interesting ecosystem, and, in the long run, it limits the rising tide” of research, he said.
Companies can “stockpile” resources, such as data sets, or model expertise, by hoarding them, observed Feldman.
Also: AI challenger Cerebras assembles modular supercomputer ‘Andromeda’ to speed up large language models
“The question is, how do these resources get used strategically in the landscape,” he said. “It’s our belief we can help by putting forward models that are open, using data that everyone can see.”
Asked what the product may be of the open-source release, Feldman remarked, “Hundreds of distinct institutions may do work with these GPT models that might otherwise not have been able to, and solve problems that might otherwise have been set aside.”
Add a Comment
You must be logged in to post a comment.