Earlier this month, NVIDIA announced the release of a beta version of Omniverse, a platform on which developers and creators can create Metaverse applications. Thus, the company has aligned its future according to the vision of the metauniverse, and the new platform allows users to create «digital duplicates» to simulate the real world.
One such step on the way to realizing such a dream, which will help users visualize a high-resolution 3D model for any input of 2D image or text hint, is Magic3D .

Magic3D, recently released by NVIDIA researchers, is a 3D text synthesis model that creates high-quality 3D mesh models.
The model is a response to Google’s DreamFusion, in which the team used a pre-trained diffusion model of text to the image to circumvent the impossibility of having large-scale tagged 3D datasets to optimize the neural radiation fields (NeRF). Magic3D eliminates two DreamFusion constraints – extremely slow NeRF optimization and low resolution image space control in NeRF.
The model is based on a strategy from rough to accurate, which uses both low and high resolution diffusion to study three-dimensional representation of the target image. As a result, this method can create high-quality 3D grid models in 40 minutes, which is on average twice as fast as DreamFusion and at the same time provides an eight-fold higher resolution supervision.
NVIDIA uses a two-step optimization framework to achieve fast and high-quality 3D text prompting.
The first step in this process is to obtain a rough model using low resolution pre-diffusion and optimize the representations of the neural field (color, density and normal fields). In the second step, the textured 3D mesh is extracted in different ways from the density and color fields of the rough model.
The output is then configured using a high-resolution latent diffusion model that, after optimization, generates high-quality 3D meshes with detailed textures.

The model also allows editing. That is, given the crude model generated from the basic text hint, parts of the text can be modified by fine-tuning the NeRF models and the 3D grid to produce an edited 3D grid model with high resolution.
In addition, the Magic3D model also has room for other editing features, where for a given input image by fine-tuning the distribution model with DreamBooth and optimizing 3D models with given prompts is guaranteed, that the object in the visualized 3D image has maximum fidelity to the object of the input image.
Using eDiffi’s stylistic transfer capabilities, NVIDIA’s text-to-image conversion model, the input image can also be converted into an output 3D model style.
NVIDIA, known for its hardware prowess, has firmly established itself on the generative AI front, even amidst relentless competition from major technology companies such as Microsoft, Google and Meta, They are active in integrating their platforms with advanced AI technologies.