LLM Infrastructure & Context

Expertly playing the AI infra space

Return the Fund 🚀 

The frontier of tech-focused VC research

In today’s edition:

  • How to understand, forecast, and invest in LLM infrastructure like a technical expert

  • This week’s future unicorn pick: a London-based B2B startup selling shovels to companies utilizing LLMs (i.e. everyone)

INTUITING THE TECH

Understanding LLM Infrastructure

A basic LLM infrastructure sectorization/market map, slightly edited. Courtesy of Autoblocks.

It feels as though every day there’s a new company “making AI development easier” for some vertical market. It’s a convoluted, dense, and saturated market.

🤔 But what even is LLM infrastructure?

❝

LLM infra companies are the products and services that deploy, run, manage, monitor, evaluate, and maintain LLMs.

Some examples of LLM infra are:

  • HuggingFace, who hosts open-source models, allowing users to find, compare, and even run models remotely via an API (an API is a service companies build allowing developers to use their products via code)

  • Pinecone, a vector database management system—text embeddings, numerical representations of text mapped to semantic meaning, need to be stored somewhere so they can be queried. This happens in RAG (Retrieval-Augmented Generation) pipelines, wherein an LLM is given relevant text (think a podcast or company docs) to contextualize its answer

  • LangChain, the most popular open-source development package for combining distinct LLM components into single pipelines, commonly used to orchestrate pipelines, RAG, and agents in deployment

  • Lambda Labs, a GPU rental service allowing users to temporarily utilize Linux machines connected to Nvidia hardware, billed per hour (similar to CoreWeave and an endless number of small- to mid-sized competitors)

Infrastructure companies are the services that allow LLMs to operate en masse, from hosting to training to evaluation. Below are the most important LLM infra sub-verticals to watch going forward.

💾 Remote Hosting

AWS took the market in the last two decades because they were able to abstract away a costly headache: servers. Maintaining a local server rack to run your product requires money, physical space, and often a dedicated team to manage it 24/7. With AWS, you can simply deploy your code in a pipeline and close your laptop. The rest is taken care of by them—no server rack; nothing.

The same is true for LLMs. They, too, need to be hosted and served to end users all over the world. HuggingFace is a great example of this business model. You deploy your model to their cloud and make micropayments every time you call their API to run your model. Their costs are fixed, while yours are now variable. The beauty of this model is that you only pay for what you use. If you have 5 customers on your app this month, you only pay HuggingFace for those requests.

Entering this space requires comparatively more capital, as the entire value proposition is distributed computing on hardware. Lots of fixed costs.

The biggest pre-AI cloud providers are already playing in this market: Google Cloud (GCP) with Vertex AI and Amazon (AWS) with Bedrock and Sagemaker. While it’s important to keep an eye on their movements, niche and developer-friendly services like HuggingFace always appeal to customers over cloud tycoons (unless the customer already uses said tycoon for other services).

The key to remote hosting is finding companies with competitive pricing, high capacity, and low latency. This space is not subjective; it’s easy to quantify the competitive advantage of any company by those metrics.

🖥️ GPUs

Like it or not, GPUs are to machine learning what oxygen is to humans. GPUs excel at parallel processing, traditionally used for graphics rendering to simultaneously calculate each pixel’s color. Machine learning models are an unexpected but prolific use case for GPUs as training requires matrix and vector operations whose time complexity scales exponentially (doubling the training job scales the time not by 2, but by 2n ).

It may seem as though GPUs’ efficacy is short-lived due to hype over Groq and LPUs. Yes, these technologies are incredibly exciting and open a myriad of possibilities with language models given their high token throughput… But, they don’t change the fundamental underpinnings of ML models at large.

We’re increasingly seeing that because LLMs are nondeterministic (explained below), people don’t trust them to make important decisions. Consider this: would you actually let an LLM respond to every one of your most important emails from the last month? Probably not.

Not all AI applications involve language models. While there are countless ML architectures employable to solve critical compute problems, practically every one of them benefits from a GPU.

Takeaway: Alternative hardware like Groq LPUs work well in niche settings, but GPUs are far from dead.

There are many levels of abstraction in the cloud computing stack. The higher the number, the more abstract it is. Think of 0 as maintaining a GPU in your office basement.

  1. Rent GPUs hourly, and get remote access to a cloud Ubuntu instance with the rented GPU fired up. Developers set up the machine, move containers/deployments and start their training workloads. An example of hourly GPU rentals is Lambda Labs who recently raised $320 million. More comprehensive than Lambda is CoreWeave, who just raised $7.5 billion in debt from Blackstone. (Sidenote: this is a great sign for CoreWeave. According to PitchBook, CoreWeave has at least $1.5 billion in ARR, and their equity stake is not diluted by the debt.)

  2. Deploy code serverlessly to instances with GPUs attached. The cloud provider only bills you when a customer pings your app, activating the instance. An example of serverless GPU hosting is Beam, a YC-backed seed-stage startup.

  3. Use GPUs when needed at the code level. Say you have a program with three lines of code. The first adds 2 numbers, the second trains a neural network, and the third bills a customer. Paying for a GPU while running lines 1 and 3 is a waste of money. But that GPU will prove invaluable for line 2. Services like Covalent (created by early-stage Agnostiq) allow you to wrap functions in code for GPU use. The necessary context is automatically sent to their cloud for processing. In production, your code can be hosted anywhere regardless of hardware—when the GPU is needed, it’s automatically routed to Covalent from within the codebase. 🤯 

The GPUs-as-a-service space is already extremely saturated. The key is to find companies that make the development cycle painless. For example, as a developer, if I realize the next-generation Nvidia H200 boasts a 2.5x tokens-per-second improvement for my model compared to the H100, I want my cloud provider to facilitate that switch seamlessly. Potentially even across architectures; like from Nvidia’s GPUs to Groq’s LPUs.

🔍️ Monitoring and Evaluation

In machine-learning lingo, LLMs are nondeterministic. If you have a function add(5, 6) you will always get 11 (unless it was coded by a baboon). LLMs, however, do not follow a logic-based “if this, then that” sequence to produce outputs. This is the difference between determinism and nondeterminism.

As a result, unleashing an LLM-powered tool into the wild (say, a customer support chatbot) is a frightening proposition. What if it accidentally recommends a competitor’s product? What if it starts rambling about something unrelated to your company?

True story: a car dealership deployed an LLM-powered chatbot on its site to funnel customers into their sales pipeline. Instead, users convinced the chatbot to sell Chevys for $1 each, with “no takesies backsies”. Funny and benign this time, but who’s to say the consequences aren’t more drastic the next?

LLM monitoring companies plug into the model’s generation sequence (when the dealership’s server triggers a response from a language model). It understands how the model was fine-tuned and prompted, including how the model is supposed to behave. It then exposes a dashboard for the company to monitor their chatbot’s behavior from a top-down perspective, factoring in every one of its conversations.

They help to identify outliers, enforce default behavior, and handle unexpected requests. You can think of LLM monitors as a marriage between logging and Data Loss Prevention (DLP) for the AI era.

Takeaway: Remote hosting, GPUs, and monitoring are all critical to the success of an LLM-dependent application. LLMs are black boxes that need to be observed at a low level to rely on proper behavior.

PLAYING THE MARKET

Strategically Investing in LLM Infrastructure

Investments in AI & LLM infrastructure over time. Courtesy of PitchBook.

LLM infra is an important space to understand as it is to AI what shovels were to the gold rush. The business model is enticing, as the success of each gold miner is irrelevant—as long as they’re mining, money is made.

When OpenAI released GPT-4o, they killed an entire class of startups focused on combining models to enhance user immersion. This is a predictable pattern—GPT-4, GPT plugins, code interpreter, and the GPT store all had a similar effect.

While threatened startups must pivot thanks to the existential burden of OpenAI’s consumer-centric innovation (improving UX instead of improving the model), infra companies remain strong. They’re happy as long as their B2B customers are building something.

Conceptually, how does LLM infra remain so inelastic to the struggles plaguing traditional AI outfits?

Similar to an online store business, unless you want to build the entire system (from UI to billing) yourself and throw a server rack in your basement connected to a Verizon LAN, you’ll end up paying an infra company somewhere.

In the case of your online store, you might pay Shopify to host your website, catalog your products, and handle customer payments. In return, they receive a monthly fee and a small percentage of each transaction (both SaaS and unit economics).

The value proposition of these backbone companies is their horizontal applicability. Shopify powers over 4.8 million live stores (as of December, 2023). Some may sell shoes, some may sell rockets. As long as it’s legal, Shopify hardly cares.

Takeaway: Over time, infra companies in any market become less optional. Today, e-commerce without a service provider would be ridiculous.

At the moment, there aren’t exactly unified all-encompassing LLM infrastructure tools, akin to Shopify for e-commerce. The market is sparse, with startups handling a specific aspect of the LLM lifecycle.

Like shovels, though, demand for infrastructure is highly correlated with demand for AI, as the cost is often SaaS or unit-based as opposed to fixed, i.e. customers are charged per use or per month.

Unit pricing has the most elastic demand, as infra companies’ revenues are tied directly to their customers’ product usage. SaaS pricing (periodically recurring revenue) has moderately elastic demand, as monthly payments are (typically) tiered by usage. Infrastructure companies embodying the SaaS model experience delayed effects of changes in AI demand.

Characteristics of Winning Companies

Leaders in the LLM infrastructure space are developer-friendly, easily integrable, agile in feature rollouts, lean in operation, led by experts, and receptive to customer requests. Consider three of the companies previously presented as preeminent LLM infra players.

  • HuggingFace and its trademark smiley hands 🤗 are known for their community. The Sentence Transformers library is open-source, allowing developers to work with cutting-edge machine-learning models well before the LLM hype cycle.

  • Pinecone was the first productized vector database (required for RAG and context-aware question answering). Hence, in the early days of the GPT API, most YouTube tutorials for generative question-answering involved Pinecone, leading to exponential adoption, $138 million in funding, and a $750 million post-money valuation as of last year.

  • LangChain is the pinnacle of developer-friendliness. They’re the most popular open-source package for users to instantly connect LLMs into complex pipelines. With a whopping 2,747 unique contributors, they arguably catalyzed the LLM hype cycle as highly involved apps became buildable by mere hobbyists. Now, they’re monetizing with a deeply integrated paid monitoring system.

Takeaway: When evaluating a potential LLM infra prospect, look for developer-friendliness, ease of integration, agility in feature rollouts, operational leanness, expertise of leaders, and receptiveness to customer requests.

Consider these characteristics a ranking system for AI infra prospects as we (finally) dive into our unicorn pick. 👇️ 

PRESENTING AN UNDERCOVER UNICORN

Meet Context

Co-Founder & CEO Henry Scott-Green (right) and Co-Founder & CTO Alex Gamble (left).

As a Monitoring and Evaluation infra company, Context is a platform designed to ensure LLM deployments behave as expected.

In August of 2023, they raised $3.5 million from Theory Ventures (Tomasz Tunguz), Google Ventures, and a host of prominent angels including Harry Stebbings and Milos Rusic (Co-Founder & CEO of deepset, another infra company and creator of Haystack, a LangChain alternative).

How it Works

The company has created packages allowing developers to easily connect their LLM pipelines to the Context platform. These could be customer support chatbots referencing internal documentation, agents conducting real-time data analysis, etc.

They’ve even created a LangChain integration so pipelines built on the popular framework are linkable with two lines of code.

Once the connection is made, Context will automatically start tracking the model’s inputs and outputs. It’ll run high-level sentiment analysis and provide overview metrics outlining the general performance, trends, topics, speed, and costs of the system.

A platform graphic from the Context homepage.

Recently, Context has been improving the LLM fine-tuning process by enabling users to intuitively and quantitatively compare newer and older model versions. This is conceptually akin to git (the version-control standard), but for language models—a brilliant vision for the future. 🧠 

Target Market

Context’s demand, in typical infra-company fashion, is horizontal. Any company utilizing LLMs in their product, on their website, inside internal tools, etc. is a potential customer.

It’s impossible to quantify the time and money saved by abstracting LLM monitoring to Context, similar to how companies abstract compute to AWS. The key opportunity is that most companies are just starting to realize the importance of LLM monitoring.

Ensuring the stability, performance, cost-effectiveness, security, safety, reliability, and modernity of the foundational models you employ is a diligent and responsible necessity, just as logging and monitoring are critical for a cloud-deployed product.

Google Cloud’s logging page contains imperative information for a system engineer to monitor a product deployment.

Cloud logging is imperative for traditional deployment as it shows companies where their product is failing and how their customers are using their services. Context is to LLM integrators as logging is to cloud deployers. This logical derivation leads to a market opportunity for a mission-critical service that hasn’t yet been realized.

The market is finally shifting from one of racing towards the next best model and MVP, towards acquiring users for a stable product at scale. This is Context’s prime opportunity to slot perfectly into the needs of companies everywhere.

Concerns

Extracting accurate quantitative insight from an LLM with inherently unstructured outputs is no simple feat (see the above explanation of nondeterminism).

That said, it’s not impossible. Still, the most successful AI companies have moats beyond technical advancement on the industry’s natural trajectory.

Companies that live on the general innovation trajectory are ultimately swallowed by the market. OpenAI has exemplified this phenomenon countless times: GPT plugins for tool-use apps, GPT store for mission-oriented chatbots, GPT document-retrieval for RAG services, open-source for lightweight agent frameworks, etc.

The question, now, is whether Context’s technology exists on the general innovation trajectory or outside of it. And, if companies will build the functionality themselves or outsource to Context—the same way companies abstract the monstrous challenge of compute to AWS.

Unlike RAG, whose implementation is thoroughly documented, LLM evaluation is a deep science riddled with subjectivity. Yet the technology is horizontally applicable, and the market is burgeoning. To build AWS, one must be an expert of networking, security, load balancing, traffic management… the list doesn’t end. Similarly, comprehensive LLM monitoring is enigmatic.

To Context’s benefit, LLM monitoring exists outside of the general innovation trajectory as it’s wholly unnecessary for taking a product to market. It only becomes critical when scaling a product and ensuring reliability.

Takeaway: While monitoring is crucial for the scaled deployment of an LLM-involved service, it is not a necessary step in general innovation. Because of the technical and scientific barriers, and the plug-and-play nature of Context’s interface, we believe companies will outsource the monitoring and evaluation process.

Evaluating the Prospect

Let’s use our earlier-derived ranking system to evaluate Context as a prospect in the LLM infrastructure space.

Developer Friendly and Easily Integrable: Having created a standalone package and a LangChain integration early in their journey, Context clearly cares for developers. It’s hard to justify adding a new layer to a product when integrating it requires breaking changes. But when implementation is two lines of code with no external effect…? No brainer.

Agile in Feature Rollouts: Context releases a product update every month highlighting new features and changes. While their vision is focused specifically on monitoring and evaluation, they move fast and cover their bases. This is a positive sign in the LLM market, as end users’ needs are quickly evolving.

Lean in Operation: The company has made all of its technical achievements thus far with a team of under 10. ‘Nuff said.

Led by Experts: Co-Founder & CEO Henry Scott-Green graduated from UCL with a CS degree, worked as a PM at Google creating monetization tools for 7 years, and holds 4 patents in AI and anti-abuse. His co-founder and CTO Alex Gamble left Google as an L5 senior software engineer. Context’s founding software engineers are highly experienced and capable builders as well. Suddenly, it’s less surprising how much they’ve accomplished with so few people.

Receptive to Customer Requests: Henry and Alex are easily accessible to customers. They frequently engage in customer calls to hear how their platform is useful—and where it falls short. Importantly, when they discover a shortcoming or growth area, they quickly roll out patches and new features. This is an incredibly positive sign in a horizontal and fragmented market.

So why are they a future unicorn?

Infra company markets tend to be winner-take-all; the winners having the best price, product, community, ease of use, and support. These companies—Shopify for instance—emerge early and grip their customers just as they realize their dependence on the product.

Context beautifully fits this profile, along with the LLM-specific infra prospect criteria outlined above. They have only $3.5 million in funding to date, and are a small name in the AI space despite already having a name-brand investor and customer community.

MORE ABOUT RTF

1. Serving You Companies

It’s hard to stray from the hype cycles and topics of the hour in tech/VC. But at Return the Fund, we’re relentless in our pursuit of novelty and actionable insights. We dig where others don’t and leverage access to industry savants, private data rooms, and research platforms out of reach for most (think PitchBook and the Bloomberg Terminal).

We check our hypotheses and take nothing at face value (Sequoia investing in FTX doesn’t mean we can skimp on due diligence).

You’ll find innovative sleeper startups you can partner with, invest in, or leverage as a product. More often than not, we can warmly introduce you to companies’ leadership.

Further, we’ll cut through the noise and showcase tangible, niche market opportunities to help you find your edge in the startup world. Whether you’re an investor constructing a startup accelerator class or a founder searching for your next big venture, we’re confident we’ll inspire you with ideas.

2. Giving You a Technical Edge

Our team is not only deeply embedded in the startup-operator and VC-investor communities, but in AI and research as well. This market is moving fast. It’s hard to understand new tech, but more importantly, what it means and why it’s important.

Okay, so Mistral just dropped a 7×22 Mixture of Experts (MoE) model. What does that even mean, and how can you use that knowledge to make decisions? What is underhyped and what is overhyped? How does this change the market, if at all? What powers the tech, and how can that knowledge be used to make expert investing decisions?

We’ll explain technical concepts simply and intuitively so you can critically analyze trends without needing a computer science PhD or a Mensa membership.

Thanks for reading today’s RTF. Let us know what you thought of this edition by answering the question below, and feel free to reach out to us at [email protected]. 🤝 

Psst: Nothing in this edition was sponsored. All research and opinions are completely held by the Return the Fund team.

Enjoyed this email?

Login or Subscribe to participate in polls.

Reply

or to participate.