# M3 processes up to 1 million tokens at once - 5x more than its predecessor.

*genai · news · 2026-06-01 · Sputnik News*

## Key points

- M3 can process up to 1 million tokens at once, 5x more than its predecessor.
- M3 achieved a 59% score on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
- M3's Sparse Attention architecture reduces compute needs by up to 95% and cuts costs by 90%.
- M3 autonomously raised NVIDIA Hopper chip utilization from 7.6% to 71.3% in benchmarks.

M3 processes up to 1 million tokens at once - 5x more than its predecessor, enabling it to handle massive codebases The model scored 59% on SWE-Bench Pro, outperforming OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro in real-world software engineering tests Its new Sparse Attention architecture cuts computing requirements to as little as 1/20th of previous levels, reducing costs by over 90% while enhancing speed In one benchmark, M3 autonomously optimized software for NVIDIA Hopper chips, boosting hardware utilization from 7.6% to 71.3%

**Companies:** NVIDIA

[Read the full story on Sputnik News](https://sputnikglobe.com/20260601/how-chinas-new-ai-model-beating-openai--google---1124235849.html)

---

Canonical: https://newsio.io/n/33a0b9ba-c5f4-4072-b56e-c5eed9fc2555/m3-processes-up-to-1-million-tokens-at-once-5x-more-than-its-predecessor-new-spa
Summarized by Newsio from Sputnik News. https://newsio.io/how-it-works