# MLCommons 發布了 MLPerf 4.0 推理基準測試。

*genai, semiconductor · news · 2024-03-27 · VentureBeat*

## Key points

- MLPerf 4.0 現在針對 Llama 2 70B 進行問答基準測試，並新增 Stable Diffusion 影像生成測試。
- Nvidia 的 H100 GPU 使用 TensorRT-LLM，在六個月內達成近三倍的文字摘要推理效能提升。
- Nvidia 新款 H200 GPU 在 Llama 2 工作負載上，推理效能比 H100 快最多 45%。
- Intel 第五代 Xeon 處理器在 GPT-J 推理效能上比前代提升最多 1.9 倍。

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here. MLCommons is out today with its MLPerf 4.0 benchmarks for inference, once again showing the relentless pace of software and hardware improvements. As generative AI continues to develop and gain adoption, there is a clear need for a vendor-neutral set of performance benchmarks, which is what MLCommons provides with the MLPerf set of benchmarks. There are multiple MLPerf benchmarks with training and inference being among the most useful. The new MLPerf 4.0 Inference results are the first update on inference benchmarks since the MLPerf 3.1 results were released in September 2023. Needless to say, a lot has happened in the AI world over the last six months, and the big hardware vendors including Nvidia and Intel have been busy improving both hardware and software to further optimize inference. The MLPerf 4.0 inference results show marked improvements for both Nvidia and Intel’s technologies. The MLPerf inference benchmark has also changed. With the MLPerf 3.1 benchmark large language models (LLMs) were included with the GPT-J 6B (billion) parameter model to perform text summarization. With the new MLPerf 4.0 benchmark the popular Llama 2 70 billion parameter open model is being benchmarked for question and answer (Q&A). MLPerf 4 also for the first time includes a benchmark for gen AI image generation with Stable Diffusion. VB Event The AI Impact Tour – Atlanta Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today. Request an invite “MLPerf is really sort of the industry standard benchmark for helping to improve speed efficiency and accuracy for AI,” MLCommons Founder and Executive Director David Kanter said in a press briefing. Why AI benchmarks matter There are more than 8,500 performance results in the MLCommons’ latest benchmark, testing all manner of combinations and permutations of hardware, software and AI inference use cases. Kanter emphasized that there is a real purpose to the MLPerf benchmarking process. “To remind people of the principle behind benchmarks. really the goal is to set up good metrics for the performance of AI,” he said. “The whole point is that once we can measure these things, we can start improving them.” With MLCommons another goal is to help align the whole industry together. The benchmark results are all conducted on tests with similar datasets and configuration parameters across different hardware and software. The results are seen by all the submitters to a given test, such that if there are any questions from a different submitter, they can be addressed. Ultimately the standardized approach to measuring AI performance is about enabling enterprises to make informed decisions. “This is helping to inform buyers, helping them make decisions and understand how systems, whether they’re on premises systems, cloud systems or embedded systems, perform on relevant workloads,” Kanter said. “If you’re looking to buy a system to run large language model inference, you can use benchmarks to help guide you, for what those systems should look like.” Nvidia triples AI inference performance, with the same hardware Once again, Nvidia dominates the MLPerf benchmarks with a series of impressive results. While it’s to be expected that new hardware would yield better performance, Nvidia is also able to get better performance out of its existing hardware. Using Nvidia’s TensorRT-LLM open-source inference technology, Nvidia was able to nearly triple the inference performance for text summarization with the GPT-J LLM on its H100 Hopper GPU. In a briefing with press and analysts, Dave Salvator, director of accelerated computing products at Nvidia emphasized that the performance boost has occurred in only six months. “We’ve gone in and been able to triple the amount of performance that we’re seeing and we’re very, very pleased with this result,” Salvator said. “Our engineering team just continues to do great work to find ways to extract more performance from the Hopper architecture.” Nvidia just announced its newest generation Blackwell GPU last week at GTC, which is the successor to the Hopper architecture. In response to a question from VentureBeat, Salvator said he wasn’t sure exactly when Blackwell-based GPUs would be benchmarked for MLPerf, but he hoped it would be as soon as possible. Even before Blackwell is benchmarked, the MLPerf 4.0 results mark the debut of H200 GPU results which further improve on the H100’s inference capabilities The H200 results are up to 45% faster than the H100 when evaluated using Llama 2 for inference. Intel reminds industry that CPUs still matter for inference too Intel is also a very active participant in the MLPerf 4.0 benchmarks with both its Habana AI accelerator and Xeon CPU technologies. With Gaudi, Intel’s actual performance results trail the Nvidia H100 though the company claims it offers better price per performance. What is perhaps more interesting are the impressive gains coming from the 5th Gen Intel Xeon processor for inference. In a briefing with press and analysts, Ronak Shah, AI product director for Xeon at Intel commented that the 5th Gen Intel Xeon was 1.42 times faster for inference than the previous 4th Gen Intel Xeon across a range of MLPerf categories. Looking specifically at just the GPT-J LLM text summarization use case, the 5th Gen Xeon was up to 1.9 times faster. “We recognize that for many enterprise customers that are deploying their AI solutions, they’re going to be doing it in a mixed general purpose and AI environment,” Shah said. “So we designed CPUs that mesh together, strong general purpose capabilities with strong AI capabilities with our AMX engine.”

**Companies:** Nvidia, Intel, MLCommons
**Countries:** United States

[Read the full story on VentureBeat](https://venturebeat.com/ai/nvidia-triples-and-intel-doubles-generative-ai-inference-performance-on-new-mlperf-benchmark/)

---

Canonical: https://newsio.io/zh-TW/n/c186cacb-dd44-44d3-abdb-0f65baa863de/mlcommons-mlperf-4-0-2023-9
Summarized by Newsio from VentureBeat. https://newsio.io/how-it-works