Advertisement
Canada markets open in 4 hours 47 minutes
  • S&P/TSX

    21,953.80
    +78.01 (+0.36%)
     
  • S&P 500

    5,509.01
    +33.92 (+0.62%)
     
  • DOW

    39,331.85
    +162.33 (+0.41%)
     
  • CAD/USD

    0.7316
    +0.0005 (+0.07%)
     
  • CRUDE OIL

    83.02
    +0.21 (+0.25%)
     
  • Bitcoin CAD

    83,014.42
    -2,550.80 (-2.98%)
     
  • CMC Crypto 200

    1,310.14
    -24.77 (-1.86%)
     
  • GOLD FUTURES

    2,355.30
    +21.90 (+0.94%)
     
  • RUSSELL 2000

    2,033.87
    +3.81 (+0.19%)
     
  • 10-Yr Bond

    4.4360
    -0.0430 (-0.96%)
     
  • NASDAQ futures

    20,258.50
    +3.25 (+0.02%)
     
  • VOLATILITY

    12.14
    +0.11 (+0.91%)
     
  • FTSE

    8,150.98
    +29.78 (+0.37%)
     
  • NIKKEI 225

    40,580.76
    +506.07 (+1.26%)
     
  • CAD/EUR

    0.6793
    -0.0007 (-0.10%)
     

How your company could be tomorrow’s surprise GenAI leader

Getty Images

Generative AI may have just recently hit the mainstream, but the economics of GenAI already point to a shift in the industry’s balance of power, away from the dominant tech giants. Companies like OpenAI, Alphabet's Google, and Meta continue to plough resources into generalist models of extraordinary power and size (so-called “foundation models”), and they will continue to be at the forefront of technological innovation in GenAI. Yet the driving economic force of the B2B GenAI industry is set to move “downstream,” towards smaller, more cost-efficient models tailored to specific business purposes. The impetus for this shift will be the growing demand for high-performing GenAI systems that are cheaper to use than the large language or multimodal models (LLMs and LMMs) of today, such as OpenAI’s GPT-4 or Google’s Gemini.

What many business leaders don’t fully appreciate is that this shift will open up tremendous opportunity even for companies that, today, are not tech players at all—provided they have the right data. That’s why industry leaders of all types should be asking themselves whether their data might put them in a position to become influential players in the GenAI industry, rather than mere consumers of the technology.

The ‘cost of inference’ problem

Demand for GenAI models has exploded over the last year, so much so that OpenAI’s ChatGPT, in addition to being “the fastest-growing consumer application in history,” is now estimated to have reached over $100 million in run rate revenue from its enterprise service. According to a recent BCG survey of more than 1,400 C-suite executives worldwide, 85% of business leaders plan to increase spending on AI, including GenAI, in 2024. As more companies find helpful applications and make the requisite investments, overall demand for GenAI services faces a serious constraint: the cost of using it.

ADVERTISEMENT

Today’s generalist foundation models, like GPT-4, Gemini, and Anthropic’s Claude, require substantial marginal costs to query each system (unlike, for example, search engines). Very roughly speaking, for each additional word of response from an LLM, the entire cumulative conversation needs to be passed through the entire model. The cost of this inference—which is how an LLM or LMM generates its output—is therefore proportional to the number of parameters in the model multiplied by the number of words of the conversation.

GPT-3.5, for instance, was built using more than 175 billion parameters, and that number is believed to have grown to 1.75 trillion for GPT-4. At such scale, the cost of using these generalist models—driven by energy use and amortized fixed cost of operating cloud facilities—can rapidly become astronomical, especially if the model is not used strategically. How costly are we talking? In working with our clients, we have seen that, depending on the user’s skills with prompt engineering, a chat can easily cumulate to tens of thousands of tokens (or word-parts), costing from a few cents to a dollar or more per query.

Even if the cost of computation continues to decline, as it has consistently done so far and is likely to continue doing, the volume of inferences demanded by users will grow as companies become better acquainted with, and more comfortable deploying and scaling, GenAI solutions. And that demand will truly explode as autonomous agents begin to deliver on their promise to automate entire end-to-end workflows. As a result, even with declining cost per compute, actual spend on model inference is likely to rise. In fact, compute spend is set to soon outrun personnel costs at large tech companies (and there is speculation this could already be the case at Google). That’s why the cost of inference is poised to become (if it isn’t already) the binding constraint on large-scale adoption of GenAI.

The possibilities of modularity

The good news is that GenAI is a modular technology. While the premium is on vertical integration in the earliest days of a new, transformative technology, as the architecture matures, innovation shifts to its new and evolving component parts, or modules. By virtue of what economists call the “mirroring hypothesis,” this evolution will ultimately lead to the modularization of the GenAI industry as a whole. The economic consequence of modularity is that it effectively redistributes the industry profit pools that are currently clustered around the foundational models and the tech companies that made them, creating multiple loci of innovation along the value chain.

We saw this dynamic play out in the evolution from the monolithic mainframe computer to the modular PC. Once IBM’s System/360 series computer system introduced a modular architecture for mainframes, along with Application Programming Interfaces (APIs) that enabled interoperability among modules, the industry fragmented, with different companies focusing on different modules of the technology. IBM was unable to stop others from exploiting those APIs to create what began as “IBM-compatible” products and later evolved into PCs assembled entirely from modules built by competitors. The winners were companies focused on modules with inherent scale economies, most obviously Microsoft and Intel. Over time, innovation spurred by modularization enabled massive industry growth and huge improvements in cost and performance.

A similar pattern is currently taking hold in the GenAI industry, and some of the major players are already anticipating what comes next. Open AI, for instance, has announced an app store of its own where new models will be marketed and purchased. By positioning itself as a platform for others to develop specialized apps based on its foundation models, OpenAI is acknowledging the prospect of fragmented value creation downstream from its state-of-the-art foundation LLMs. It's also attempting to secure a portion of the redistributed profit pools that are the result of an increasingly modular industry.

GenAI goes smaller but mightier

Growing demand for GenAI use will create mounting pressure to bring down the cost of inference. And because the industry is becoming increasingly modular, this pressure creates significant opportunity for companies able to develop small, high-performing specialist models. These smaller models can, collectively, decouple performance from the cost of inference, unleashing GenAI adoption at scale.

One method for attaining high-performing specialist models is to build them from scratch, designing them to be small compared to the gargantuan generalist LLMs and LMMs that have captured the public imagination in the last year. One way to do this is by reducing the number of parameters, often by means of distilling (a technique whereby the small model is trained via automated, focused interactions with a larger model). For example, Chinese startup 01.AI recently released a “small” LLM that outperformed peers with more than five times the number of parameters. Microsoft has also taken this approach with Phi, its self-described “suite of small language models,” several of which also outperform much larger peers (despite having as few as 1.3 billion parameters). Bloomberg leveraged the open-source LLM BLOOM to build BloombergGPT with less than one third the number of parameters.

A second approach is fine-tuning, which is sometimes combined with distilling. Fine-tuning is the process of retraining a foundation model (whether an LLM or LMM) using specialized data to adjust weights or add new layers to the model. The retraining of parts of the model with specialized data leads to improved performance on a specific set of tasks. Fine-tuning has the advantage of being often relatively inexpensive and fast. Vicuna, a model trained by fine-tuning Meta’s LLaMa, reportedly achieves 90% of the quality of ChatGPT and Google Bard, with “just” 13 billion parameters and with a total cost of retraining of $300. Microsoft fine-tuned LLaVa to create LLaVa-Med, a conversational assistant for biomedical image processing, in just a single day. Many companies, such as Intuit, have already turned to fine-tuning to deploy GenAI solutions.

Fine-tuning tends to reduce the cost of inference by making it possible to rely on smaller ‘seed’ models than the most powerful LLMs and LMMs, while matching or exceeding their performance for specific business purposes. However, fine-tuning does require access to specialized data that is typically proprietary—data that the tech giants for the most part don’t have access to.

As a result, economic activity will tend to shift downstream in the GenAI value chain, drawing in an expanding set of players—those with the best data in a specific domain—to either partner with tech firms or directly fine-tune models themselves. Businesses must therefore examine what data they should collect (and aren’t) in order to fine-tune models that they can not only use themselves, but also monetize as services for third parties. Especially with the rise of multimodal GenAI (encompassing text as well as images, video, and even sensor data from machines), the range of valuable data is far greater than many leaders realize.

The new power dynamics of the GenAI industry

Generalist foundation models will remain in the hands of a handful of very large and powerful tech players because of their extraordinary scale and cost. At the same time, demand for smaller, specialized applications will unleash the innovative potential of GenAI’s modular architecture. Consumer companies can leverage Internet-of-things (IoT) datasets to build specialized models for product design; businesses with complex supply chains can capitalize on their logistics data to develop solutions for third parties. The company that, say, makes your dishwasher or manufactures your car could be the next big thing in GenAI. In short, the best specialized GenAI models may not come from “tech” businesses at all today.

The Big Tech companies are, of course, aware of all this and aggressively pursuing downstream alliances to access the proprietary data they lack. The companies that own that data are, however, in a superior bargaining position—though they may not know it. The downstream companies have the option to play junior partner in such an alliance with a big tech firm or they can seize the business opportunity themselves, taking the lead in teaming up with smaller and open-source developers. This second, more intrepid approach requires the company to dramatically increase its expertise in smaller GenAI models, while investing seriously in internal data science and engineering capabilities.

We are, therefore, at an inflection point in the economics of generative AI. What has so far been a massively scale-driven upstream evolution, where the typical corporation is a mere bystander, is evolving into something much more downstream and decentralized, upending the current balance of power in the GenAI industry.

Read other Fortune columns by François Candelon.

François Candelon is a managing director and senior partner of Boston Consulting Group and the global director of the BCG Henderson Institute (BHI).

Philip Evans is a senior advisor at the BCG Henderson Institute. He was the founding managing director and partner of BCG’s media and internet sectors.

Leonid Zhukov is the director of the BCG Global A.I. Institute and is based in BCG’s New York office.

David Zuluaga Martínez is a partner at BCG and an Ambassador at the BCG Henderson Institute.

Some of the companies featured in this column are past or current clients of BCG.

This story was originally featured on Fortune.com