newsio aggregates and links to original sources. We do not own the original images or content. If you believe content infringes on intellectual property rights, contact us — it will be removed at first notice.
genai/news//SiliconANGLE News
Tensormesh Inc. has raised $20 million in funding.
Tensormesh has developed KV caching to eliminate redundant LLM computations on GPUs.
KEY POINTS
Nvidia, AMD, and CoreWeave invested in Tensormesh, contributing to a $20 million funding round.
Tensormesh Inference enables developers to achieve over 70% cache hit rates, greatly improving efficiency.
The software offers developers direct control over cache storage and cost tracking via a dashboard.
Tensormesh’s approach treats intermediate LLM data as a new asset class to extend AI capabilities.
Tensormesh Inc. has hit upon a way to make artificial intelligence inference more efficient by eliminating the need for redundant computations, and its technology is so convincing that several of AI infrastructure giants are backing it with $20 million in funding.
Today’s round saw the participation of Nvidia Corp., Advanced Micro Devices Inc. and CoreWeave Inc., as well as the venture capital firms Valley Capital Partners and Laude Ventures. It brings Tensormesh’s total amount raised so far to $24.5 million, and it coincides with the launch of its flagship software-as-a-service offering, Tensormesh Inference.
Tensormesh’s technology is designed to tackle one of the most glaring inefficiencies of graphics processing units, which have to reprocess the same data over and over again given their limited memory caches. It’s a design challenge that stems from the way large language models work. Typically, LLM deployments treat each new request or prompt they receive as a brand new task. So even if an AI chatbot is engaged in a long-winded conversation with someone, or analyzing a document it has seen before, the GPU will need to reprocess the entire context window from scratch.
The startup aims to fix this by using a technique it calls key-value or KV caching. What this does is store the intermediate data generated by LLMs while processing a prompt.
Because it helps them to remember these computations, Tensormesh makes it possible to skip the reprocessing each time a new prompt arrives, enabling it to respond more quickly. For developers building agentic models that need to crunch their way through multiple steps to perform a task or solve a problem, it can result in a 10-fold reduction in latency and GPU spending.
The Tensormesh Inference, based on the open-source LMCache project, includes a cost savings dashboard that allows developers to track cache hit rates and convert them into tangible dollar figures. Moreover, it gives developers direct control over how much storage they allocate to the cache, so they can fine-tune their infrastructure to maximize efficiency based on the size of their LLM deployment and usage rates. According to the startup, some customers have achieved cache hit rates of more than 70%, meaning that more than two-thirds of all prompts are retrieved from the cache instead of recomputed.
Deployment is flexible, with three options available. Developers can use a serverless application programming interface that’s fully compatible with OpenAI Group PBC’s standards, enabling it to be dropped into existing workflows. Alternatively, for customers running more intensive workloads, the company offers on-demand deployment on dedicated GPU resources, or reserved deployments for enterprises that need custom service-level agreements.
Founder and Chief Executive Junchen Jiang said he’s not surprised that Nvidia, AMD and CoreWeave were among the first to understand the implications of his company’s technology. “Tensormesh offers a new vision on the significance of the intermediate data that LLMs generate when processing a prompt,” he said. “Behind the term KV cache is a whole concept of AI interpretation of the question it is asked. It’s a whole new class of data.”
Therein lies the potential of Tensormesh’s technology. It’s transforming “intermediate AI data” into an entirely new asset class, and this could become extremely valuable as AI agents become more complex. The more capable an AI agent is, the greater the context window required. By extending those context windows, Tensormesh could well emerge as a key piece of the agentic AI stack.
The money from today’s round will be used to expand Tensormesh’s hardware integrations with AMD’s, Nvidia’s and CoreWeave’s infrastructure and accelerate product development. The company also remains committed to the underlying, open-source LMCache project, which will be the main beneficiary of many of its planned upcoming innovations.
Image: Tensormesh