PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

New AI tool generates high-quality images faster than state-of-the-art approaches

Researchers fuse the best of two popular methods to create an image generator that uses less energy and can run locally on a laptop or smartphone.

New AI tool generates high-quality images faster than state-of-the-art approaches
2025-03-20
(Press-News.org) CAMBRIDGE, MA – The ability to generate high-quality images quickly is crucial for producing realistic simulated environments that can be used to train self-driving cars to avoid unpredictable hazards, making them safer on real streets.

But the generative AI techniques increasingly being used to produce such images have drawbacks. One popular type of model, called a diffusion model, can create stunningly realistic images but is too slow and computationally intensive for many applications. On the other hand, the autoregressive models that power LLMs like ChatGPT are much faster, but they produce poorer-quality images that are often riddled with errors.

Researchers from MIT and NVIDIA developed a new approach that brings together the best of both methods. Their hybrid image-generation tool uses an autoregressive model to quickly capture the big picture and then a small diffusion model to refine the details of the image.

Their tool, known as HART (short for Hybrid Autoregressive Transformer) can generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.

The generation process consumes fewer computational resources than typical diffusion models, enabling HART to run locally on a commercial laptop or smartphone. A user only needs to enter one natural language prompt into the HART interface to generate an image.

HART could have a wide range of applications, such as helping researchers train robots to complete complex real-world tasks and aiding designers in producing striking scenes for video games.

“If you are painting a landscape, and you just paint the entire canvas once, it might not look very good. But if you paint the big picture and then refine the image with smaller brush strokes, your painting could look a lot better. That is the basic idea with HART,” says Haotian Tang PhD ’25, co-lead author of a new paper on HART.

He is joined by co-lead author Yecheng Wu, an undergraduate student at Tsinghua University; senior author Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and a distinguished scientist of NVIDIA; as well as others at MIT, Tsinghua University, and NVIDIA. The research will be presented at the International Conference on Learning Representations.

 

The best of both worlds

Popular diffusion models, such as Stable Diffusion and DALL-E, are known to produce highly detailed images. These models generate images through an iterative process where they predict some amount of random noise on each pixel, subtract the noise, then repeat the process of predicting and “de-noising” multiple times until they generate a new image that is completely free of noise.

Because the diffusion model de-noises all pixels in an image at each step, and there may be 30 or more steps, the process is slow and computationally expensive. But because the model has multiple chances to correct details it got wrong, the images are high-quality.

Autoregressive models, commonly used for predicting text, can generate images by predicting patches of an image sequentially, a few pixels at a time. They can’t go back and correct their mistakes, but the sequential prediction process is much faster than diffusion.

These models use representations known as tokens to make predictions. An autoregressive model utilizes an autoencoder to compress raw image pixels into discrete tokens as well as reconstruct the image from predicted tokens. While this boosts the model’s speed, the information loss that occurs during compression causes errors when the model generates a new image.

With HART, the researchers developed a hybrid approach that uses an autoregressive model to predict compressed, discrete image tokens, then a small diffusion model to predict residual tokens. Residual tokens compensate for the model’s information loss by capturing details left out by discrete tokens.

“We can achieve a huge boost in terms of reconstruction quality. Our residual tokens learn high-frequency details, like edges of an object, or a person’s hair, eyes, or mouth. These are places where discrete tokens can make mistakes,” says Tang.

Because the diffusion model only predicts the remaining details after the autoregressive model has done its job, it can accomplish the task in eight steps, instead of the usual 30 or more a standard diffusion model requires to generate an entire image. This minimal overhead of the additional diffusion model allows HART to retain the speed advantage of the autoregressive model while significantly enhancing its ability to generate intricate image details.

“The diffusion model has an easier job to do, which leads to more efficiency,” he adds.

 

Outperforming larger models

During the development of HART, the researchers encountered challenges in effectively integrating the diffusion model to enhance the autoregressive model. They found that incorporating the diffusion model in the early stages of the autoregressive process resulted in an accumulation of errors. Instead, their final design of applying the diffusion model to predict only residual tokens as the final step significantly improved generation quality.

Their method, which uses a combination of an autoregressive transformer model with 700 million parameters and a lightweight diffusion model with 37 million parameters, can generate images of the same quality as those created by a diffusion model with 2 billion parameters, but it does so about nine times faster. It uses about 31 percent less computation than state-of-the-art models.

Moreover, because HART uses an autoregressive model to do the bulk of the work — the same type of model that powers LLMs — it is more compatible for integration with the new class of unified vision-language generative models. In the future, one could interact with a unified vision-language generative model, perhaps by asking it to show the intermediate steps required to assemble a piece of furniture.

“LLMs are a good interface for all sorts of models, like multimodal models and models that can reason. This is a way to push the intelligence to a new frontier. An efficient image-generation model would unlock a lot of possibilities,” he says.

In the future, the researchers want to go down this path and build vision-language models on top of the HART architecture. Since HART is scalable and generalizable to multiple modalities, they also want to apply it for video generation and audio prediction tasks.

###

This research was funded, in part, by the MIT-IBM Watson AI Lab, the MIT and Amazon Science Hub, the MIT AI Hardware Program, and the National Science Foundation. The GPU infrastructure for training this model was donated by NVIDIA.

END

[Attachments] See images for this press release:
New AI tool generates high-quality images faster than state-of-the-art approaches New AI tool generates high-quality images faster than state-of-the-art approaches 2

ELSE PRESS RELEASES FROM THIS DATE:

Xylazine detected in U.S.-Mexico border drug supply, study finds

2025-03-20
Researchers at University of California San Diego School of Medicine, in collaboration with the Prevencasa free clinic in Tijuana, Mexico, have confirmed the presence of xylazine in the illicit drug supply at the U.S.-Mexico border. While xylazine remains less common in the Western U.S., border cities serve as key trafficking hubs and may have higher rates of emerging substances. The findings, published on March 20, 2025 in the Journal of Addiction Medicine, highlight the urgent need for public health intervention. “Xylazine is a veterinary anesthetic that is not approved for human use and is increasingly detected alongside illicit fentanyl in parts of the United States ...

Producing nuclear fusion fuel is banned in the US for being too toxic, but these researchers found an alternative

Producing nuclear fusion fuel is banned in the US for being too toxic, but these researchers found an alternative
2025-03-20
Lithium-6 is essential for producing nuclear fusion fuel, but isolating it from the much more common isotope, lithium-7, usually requires liquid mercury, which is extremely toxic. Now, researchers have developed a mercury-free method to isolate lithium-6 that is as effective as the conventional method. The new method is presented March 20 in the Cell Press journal Chem. “This is a step towards addressing a major roadblock to nuclear energy,” says chemist and senior author Sarbajit Banerjee of ETH Zürich and Texas A&M University. “Lithium-6 is a critical material for the renaissance of nuclear energy, ...

Adaptive defenses against malicious jumping genes

Adaptive defenses against malicious jumping genes
2025-03-20
Adverse genetic mutations can cause harm and are due to various circumstances. “Jumping genes” are one cause of mutations, but cells try and combat them with a specialized RNA called piRNA. For the first time, researchers from the University of Tokyo and their collaborators have identified how the sites responsible for piRNA production evolve effective behaviors against jumping genes. This research could lead to downstream diagnostic or therapeutic applications. The word mutation can mean different things in different situations. ...

Cancer antigen 125 levels at time of ovarian cancer diagnosis by race and ethnicity

2025-03-20
About The Study: In this cohort study of patients with ovarian cancer, American Indian and Black patients were 23% less likely to have an elevated cancer antigen (CA)-125 level at diagnosis. Current CA-125 thresholds may miss racially and ethnically diverse patients with ovarian cancer. International guidelines use CA-125 thresholds to recommend which patients with pelvic masses should undergo evaluation by gynecologic oncologists for ovarian cancer. However, CA-125 thresholds were developed from white populations. Work is needed to develop inclusive CA-125 thresholds and ...

Prevalence and severity of astigmatism in children after COVID-19

2025-03-20
About The Study: In this study, lifestyle changes after the pandemic were associated with an increase in the prevalence and severity of child astigmatisms, likely associated with changes in the developing cornea. The potential impact of higher degrees of astigmatism may warrant dedicated efforts to elucidate the relationship between environmental and/or lifestyle factors, as well as the pathophysiology of astigmatism. Corresponding Authors: To contact the corresponding authors, email Jason C. Yam, MD (yamcheuksing@cuhk.edu.hk) and Li ...

Study: new guidelines expanded access to lung cancer screening, but gaps remain in reaching rural and uninsured populations

Study: new guidelines expanded access to lung cancer screening, but gaps remain in reaching rural and uninsured populations
2025-03-20
MIAMI, FLORIDA (STRICTLY EMBARGOED UNTIL MARCH 20, 2025, AT 11 A.M. EDT) – Since 2021, when lung cancer screening guidelines began to include younger people and those with a lower smoking history, the number of screenings climbed, but significant gaps remain, especially among people with limited access to healthcare, according to a new study led by researchers at Sylvester Comprehensive Cancer Center, part of the University of Miami Miller School of Medicine. "The updated guidelines substantially increased lung cancer screenings overall, even as ...

Analysis of new colorectal cancer immunotherapy shows more treatment options

2025-03-20
A team of researchers from Cleveland Clinic Genomic Medicine share insights from an early set of 19,000 patients to receive immune checkpoint inhibitor treatments for colorectal cancer in the U.S.   The report comes from the laboratory of Stephanie Schmit, PhD, MPH, and was published in JAMA Network Open. It serves as an opportunity to better understand how immune checkpoint inhibitor treatments, including PD-1 and PD-L1 inhibitors, work in a larger population that reflects real-world settings. Dr. Schmit collaborated with a team of ...

Scientists use cellular programming to mimic first days of embryonic development

Scientists use cellular programming to mimic first days of embryonic development
2025-03-20
The earliest days after fertilization, once a sperm cell meets an egg, are shrouded in scientific mystery.  The process of how a humble single cell becomes an organism fascinates scientists across disciplines. For some animals, the entire process of cellular multiplication, generation of specialized cells, and their organization into an ordered multicellular embryo takes place in the protective environment of the uterus, making direct observation and studies challenging. This makes it difficult for scientists to understand what can go wrong during that process, and how specific risk factors and the surrounding environment may prevent ...

Potential targeted therapy for pediatric brain cancer identified by Dana-Farber team

2025-03-20
Boston – An international team of clinical collaborators, led by physician scientists from Dana-Farber Cancer Institute, performed a first-ever clinical test of the targeted therapy avapritinib in pediatric and young patients with a form of high-grade glioma. They found that the drug, already FDA-approved for certain adult cancers, was generally safe and resulted in tumor reduction visible on brain scans, as well as clinical improvement, in 3 out of 7 patients. The study was published in Cancer Cell. Pediatric-type high-grade gliomas are currently incurable brain tumors with median survival times less than 18 months after initial diagnosis. Avapritinib ...

Self-assembled vesicles containing podophyllotoxin covalently modified with polyoxometalates for antitumor therapy

Self-assembled vesicles containing podophyllotoxin covalently modified with polyoxometalates for antitumor therapy
2025-03-20
POMs are a class of inorganic metal-oxygen cluster compounds with broad-spectrum antitumor potential. However, their strong hydrophilicity and poor lipophilicity result in insufficient cell membrane permeability, and high doses are required to achieve therapeutic effects, which severely limits their clinical application. To address this challenge, the research team proposes a covalent modification strategy: the construction of an amphipathic drug molecule PPT-POM-PPT by linking the hydrophobic anti-tumor drug Podophyllotoxin (PPT) with hydrophilic POMs. This molecule ...

LAST 30 PRESS RELEASES:

The puberty talk: Parents split on right age to talk about body changes with kids

Tusi (a mixture of ketamine and other drugs) is on the rise among NYC nightclub attendees

Father’s mental health can impact children for years

Scientists can tell healthy and cancerous cells apart by how they move

Male athletes need higher BMI to define overweight or obesity

How thoughts influence what the eyes see

Unlocking the genetic basis of adaptive evolution: study reveals complex chromosomal rearrangements in a stick insect

Research Spotlight: Using artificial intelligence to reveal the neural dynamics of human conversation

Could opioid laws help curb domestic violence? New USF research says yes

NPS Applied Math Professor Wei Kang named 2025 SIAM Fellow

Scientists identify agent of transformation in protein blobs that morph from liquid to solid

Throwing a ‘spanner in the works’ of our cells’ machinery could help fight cancer, fatty liver disease… and hair loss

Research identifies key enzyme target to fight deadly brain cancers

New study unveils volcanic history and clues to ancient life on Mars

Monell Center study identifies GLP-1 therapies as a possible treatment for rare genetic disorder Bardet-Biedl syndrome

Scientists probe the mystery of Titan’s missing deltas

Q&A: What makes an ‘accidental dictator’ in the workplace?

Lehigh University water scientist Arup K. SenGupta honored with ASCE Freese Award and Lecture

Study highlights gaps in firearm suicide prevention among women

People with medical debt five times more likely to not receive mental health care treatment

Hydronidone for the treatment of liver fibrosis associated with chronic hepatitis B

Rise in claim denial rates for cancer-related advanced genetic testing

Legalizing youth-friendly cannabis edibles and extracts and adolescent cannabis use

Medical debt and forgone mental health care due to cost among adults

Colder temperatures increase gastroenteritis risk in Rohingya refugee camps

Acyclovir-induced nephrotoxicity: Protective potential of N-acetylcysteine

Inhibition of cyclooxygenase-2 upregulates the nuclear factor erythroid 2-related factor 2 signaling pathway to mitigate hepatocyte ferroptosis in chronic liver injury

AERA announces winners of the 2025 Palmer O. Johnson Memorial Award

Mapping minds: The neural fingerprint of team flow dynamics

Patients support AI as radiologist backup in screening mammography

[Press-News.org] New AI tool generates high-quality images faster than state-of-the-art approaches
Researchers fuse the best of two popular methods to create an image generator that uses less energy and can run locally on a laptop or smartphone.