The large language model revolution is shifting state of the art AI research out of the reach of ordinary AI laboratories. Apart from the fact that this relegates the power of this technology to a few big players, forcing the research community to be reliant on their APIs, and removing the ability of external researchers to probe these models for safety concerns, it also represents an impractical road to scaling that leads to an innovative wall.
IEEE Spectrum
Dylan Patel of the consultancy SemiAnalysis says, “We won’t be able to make models bigger forever. There comes a point where even with hardware improvements, given the pace that we’re increasing the model size, we just can’t.” And so, the study and development of technology with smaller models now matters more than ever.
Last year, DeepMind showed (and researchers at Meta, Nvidia and Stanford confirmed) that training smaller models on far more data could significantly boost performance. Additionally, Patel brings up the promising “mixture of experts” technique which is training smaller, specialized sub-models for various tasks rather than using a large, more general model. He and Sara Hooker, research leader Cohere For AI, also talk about exploiting the sparsity of models to compress them, by finding ways to remove empty parameters from the model.
Still, Patel concedes that the large model paths have their necessary place in research and development. “The max size is going to continue to grow, and the quality at small sizes is going to continue to grow,” he says. “I think there’s two divergent paths, and you’re kind of following both.”