The competition in the artificial intelligence market shows no signs of slowing down. Google DeepMind has officially unveiled DiffusionGemma, a new open AI model designed for very rapid content generation. Moments after the premiere, NVIDIA announced full support for the solution on its RTX and DGX platforms. The manufacturer claims that with the right optimisations, users can expect significantly higher performance and local deployment of the model without the need for cloud services.
DiffusionGemma is set to accelerate text generation
The new model developed by Google is based on the Gemma 4 architecture and uses a different approach than classical autoregressive models. Instead of generating single tokens step by step, DiffusionGemma can process larger batches of them simultaneously. In practice, this significantly shortens the time needed to create responses. The model has over 25 billion parameters; however, only a portion of them is active during operation, which improves computational efficiency. Google also emphasises the open nature of the project. DiffusionGemma has been released under the Apache 2.0 license, allowing developers and companies to freely use the solution and develop their own projects based on this technology. The model supports both text and images, with a maximum context of 256 thousand tokens. This makes it suitable for various advanced applications related to data analysis, content creation, or building AI agents. According to the creators, the greatest advantage remains its speed. In certain scenarios, the model is said to be up to four times faster than traditional solutions based on sequential generation.
NVIDIA has prepared support on the day of the release
NVIDIA quickly capitalised on the launch of the new model by showcasing ready environments for running it on their own hardware. The support includes both GeForce RTX cards for home users and the professional RTX PRO platforms, as well as AI computers from the DGX family. The company claims that by utilising Tensor cores and CUDA technology, it is possible to achieve very high performance without the need for additional configuration. The results achieved by DGX systems are particularly impressive. According to the manufacturer's data, the DGX Spark can generate about 150 tokens per second, while more advanced configurations can achieve several hundred tokens per second during local operation of the model. NVIDIA also emphasises that users do not need to use cloud services or pay for each generated query. The entire system can operate directly on a computer equipped with the appropriate hardware. This is an important argument for those involved in artificial intelligence development, who are increasingly looking for local solutions that provide greater control over data. Already, DiffusionGemma can be run on, among others, the GeForce RTX 5090 card and DGX platforms equipped with the latest NVIDIA chips.
DiffusionGemma is a new open AI model from Google DeepMind, which focuses on very quick content generation and local operation. NVIDIA has provided full support for its RTX graphics cards and DGX systems from day one, offering additional optimisations to enhance performance. Everything indicates that the new model may become an interesting alternative to the popular solutions currently used by developers and artificial intelligence enthusiasts.
source; wccftech
Redakcja Choose TV












