(Press-News.org) Cambridge, MA -- Determining the least expensive path for a new subway line underneath a metropolis like New York City is a colossal planning challenge — involving thousands of potential routes through hundreds of city blocks, each with uncertain construction costs. Conventional wisdom suggests extensive field studies across many locations would be needed to determine the costs associated with digging below certain city blocks.
Because these studies are costly to conduct, a city planner would want to perform as few as possible while still gathering the most useful data for making an optimal decision.
With almost countless possibilities, how would they know where to start?
A new algorithmic method developed by MIT researchers could help. Their mathematical framework provably identifies the smallest dataset that guarantees finding the optimal solution to a problem, often requiring fewer measurements than traditional approaches suggest.
In the case of the subway route, this method considers the structure of the problem (the network of city blocks, construction constraints, and budget limits) and the uncertainty surrounding costs. The algorithm then identifies the minimum set of locations where field studies would guarantee finding the least expensive route. The method also identifies how to use this strategically collected data to find the optimal decision.
This framework applies to a broad class of structured decision-making problems under uncertainty, such as supply chain management or electricity network optimization.
“Data are one of the most important aspects of the AI economy. Models are trained on more and more data, consuming enormous computational resources. But most real-world problems have structure that can be exploited. We’ve shown that with careful selection, you can guarantee optimal solutions with a small dataset, and we provide a method to identify exactly which data you need,” says Asu Ozdaglar, Mathworks Professor and head of the MIT Department of Electrical Engineering and Computer Science (EECS), deputy dean of the MIT Schwarzman College of Computing, and a principal investigator in the Laboratory for Information and Decision Systems (LIDS).
Ozdaglar, co-senior author of a paper on this research, is joined by co-lead authors Omar Bennouna, an EECS graduate student, and his brother Amine Bennouna, a former MIT postdoc who is now an assistant professor at Northwestern University; and co-senior author Saurabh Amin, co-director of Operations Research Center, a professor in the MIT Department of Civil and Environmental Engineering, and a principal investigator in LIDS. The research will be presented at the Conference on Neural Information Processing Systems.
An optimality guarantee
Much of the recent work in operations research focuses on how to best use data to make decisions, but this assumes these data already exist.
The MIT researchers started by asking a different question — what are the minimum data needed to optimally solve a problem? With this knowledge, one could collect far fewer data to find the best solution, spending less time, money, and energy conducting experiments and training AI models.
The researchers first developed a precise geometric and mathematical characterization of what it means for a dataset to be sufficient. Every possible set of costs (travel times, construction expenses, energy prices) makes some particular decision optimal. These “optimality regions” partition the decision space. A dataset is sufficient if it can determine which region contains the true cost.
This characterization offers the foundation of the practical algorithm they developed that identifies datasets that guarantee finding the optimal solution.
Their theoretical exploration revealed that a small, carefully selected dataset is often all one needs.
“When we say a dataset is sufficient, we mean that it contains exactly the information needed to solve the problem. You don’t need to estimate all the parameters accurately; you just need data that can discriminate between competing optimal solutions,” says Amine Bennouna.
Building on these mathematical foundations, the researchers developed an algorithm that finds the smallest sufficient dataset.
Capturing the right data
To use this tool, one inputs the structure of the task, such as the objective and constraints, along with the information they know about the problem.
For instance, in supply chain management, the task might be to reduce operational costs across a network of dozens of potential routes. The company may already know that some shipment routes are especially costly, but lack complete information on others.
The researchers’ iterative algorithm works by repeatedly asking, “Is there any scenario that would change the optimal decision in a way my current data can't detect?” If yes, it adds a measurement that captures that difference. If no, the dataset is provably sufficient.
This algorithm pinpoints the subset of locations that need to be explored to guarantee finding the minimum-cost solution.
Then, after collecting those data, the user can feed them to another algorithm the researchers developed which finds that optimal solution. In this case, that would be the shipment routes to include in a cost-optimal supply chain.
“The algorithm guarantees that, for whatever scenario could occur within your uncertainty, you’ll identify the best decision,” Omar Bennouna says.
The researchers’ evaluations revealed that, using this method, it is possible to guarantee an optimal decision with a much smaller dataset than would typically be collected.
“We challenge this misconception that small data means approximate solutions. These are exact sufficiency results with mathematical proofs. We’ve identified when you’re guaranteed to get the optimal solution with very little data — not probably, but with certainty,” Amin says.
In the future, the researchers want to extend their framework to other types of problems and more complex situations. They also want to study how noisy observations could affect dataset optimality.
END
Bigger datasets aren’t always better
MIT researchers developed a way to identify the smallest dataset that guarantees optimal solutions to complex problems.
2025-11-18
ELSE PRESS RELEASES FROM THIS DATE:
AI at the heart of new SFU gel-free ECG system for faster diagnoses
2025-11-18
A new heart monitoring system combining 3D printing and artificial intelligence could transform the way doctors measure and diagnose patients' heart health.
Developed at SFU’s School of Mechatronic Systems Engineering, the system features reusable dry 3D-printed electrodes embedded in a soft chest belt – the folding origami-shaped design uses gentle suction to stick to the skin.
Carbon-based ink printed on the suction cup replaces electrolyte gel, conducting the heart’s electrical signals through to a wearable ...
“Cellular Big Brother”: 3D model with human cells allows real-time observation of brain metastases and paves the way for new treatments
2025-11-18
Using human cells and cutting-edge technology, the team created a three-dimensional (3D) model that accurately simulates the brain invaded by aggressive cancer. Published in Biofabrication, the study combines frontier science, advanced technology, and international collaboration — while also carrying a personal story: part of the team is formed by a couple of scientists who quite literally bring their work home.
Brain metastasis occurs when cancer cells migrate from the original tumor — in this case, the skin — to the brain. This stage of the disease is among the most challenging to treat, and it is associated with over 90% of cancer-related deaths.
“When melanoma ...
Teaching large language models how to absorb new knowledge
2025-11-18
CAMBRIDGE, MA -- In an MIT classroom, a professor lectures while students diligently write down notes they will reread later to study and internalize key information ahead of an exam.
Humans know how to learn new information, but large language models can’t do this in the same way. Once a fully trained LLM has been deployed, its “brain” is static and can’t permanently adapt itself to new knowledge.
This means that if a user tells an LLM something important today, it won’t remember ...
Milestone on the road to the ‘quantum internet’
2025-11-18
Everyday life on the internet is insecure. Hackers can break into bank accounts or steal digital identities. Driven by AI, attacks are becoming increasingly sophisticated. Quantum cryptography promises more effective protection. It makes communication secure against eavesdropping by relying on the laws of quantum physics. However, the path toward a quantum internet is still fraught with technical hurdles. Researchers at the Institute of Semiconductor Optics and Functional Interfaces (IHFG) at the University of Stuttgart have now made a decisive breakthrough in one of the most technically challenging components, the ‘quantum repeater’. They report their results in Nature Communications ...
Blink to the beat
2025-11-18
Yi Du and colleagues from the Chinese Academy of Sciences published an article in the open access journal PLOS Biology on November 18th detailing their findings about a new way our bodies naturally respond to music. Given a steady beat, our eyes blink in synchrony.
The neurological process that helps us move with the music is known as auditory-motor synchronization. This describes the way you tap your foot along with the radio or bob your head at a concert, or why some runners listen to songs with a specific number of beats per minute ...
Even low-intensity smoking increases risk of heart attack and death
2025-11-18
An analysis of data from almost two dozen long-term studies finds that even low-intensity smokers have a substantially higher risk of heart disease and death compared to people who never smoked, even years after they quit. Michael Blaha of the Johns Hopkins Ciccarone Center for Prevention of Cardiovascular Disease, USA, and colleagues report these findings November 18th in the open-access journal PLOS Medicine.
Previous research has shown that smoking cigarettes increases a person’s risk of developing cardiovascular disease, but the exact relationship between how heavily a ...
Research on intelligent analysis method for dynamic response of onshore wind turbines
2025-11-18
Researchers have developed a high-fidelity 13-degree-of-freedom nonlinear model and an intelligent algorithm for wind turbine dynamic analysis. This framework accurately captures complex tower-blade interactions, including often-neglected torsional effects, achieving a remarkable agreement with high-fidelity benchmarks. Published in Smart Construction, this work provides a powerful and efficient tool for structural assessment and future optimization of large-scale wind energy systems.
The global push for sustainable energy has cemented wind power's role in the renewable transition. However, designing safe and cost-effective ...
Type 1 diabetes cured in mice with gentle blood stem-cell and pancreatic islet transplant
2025-11-18
A combination blood stem cell and pancreatic islet cell transplant from an immunologically mismatched donor completely prevented or cured Type 1 diabetes in mice in a study by Stanford Medicine researchers. Type 1 diabetes arises when the immune system mistakenly destroys insulin-producing islet cells in the pancreas.
None of the animals developed graft-versus-host disease — in which the immune system arising from the donated blood stem cells attacks healthy tissue in the recipient — and the destruction of islet cells by the native host immune system was halted. After the transplants, the animals did not require the use of the immune suppressive drugs ...
Serida sequences the first complete genome of the Faba Granja Asturiana, a key advance for its genetic improvement and conservation
2025-11-18
Researchers from the Plant Genetics team of the Regional Service for Agri-Food Research and Development of the Principality of Asturias (Serida) have just published a first version of the genome of the Faba Granja Asturiana variety. This advance is key for the genetic improvement and conservation of one of Asturias’ most emblematic legumes.
The work has been published in the journal Data in Brief under the title “Chromosome-level dataset from de novo assembly of a Fabada common bean genotype using Illumina and PacBio ...
New clues reveal how gestational diabetes affects offspring
2025-11-18
Gestational diabetes can cause a multitude of complications in the offspring, but to date, the reasons are incompletely understood. A new study, exploring a foundational step in the process of building proteins from genetic material, called splicing, reveals that this process is affected, altering how the placenta reads and processes genetic instructions. Researchers found that in pregnancies affected by gestational diabetes, hundreds of genetic messages are assembled incorrectly, potentially disrupting how the placenta functions. ...
LAST 30 PRESS RELEASES:
Gene silencing may slow down bladder cancer
Most people with a genetic condition that causes significantly high cholesterol go undiagnosed, Mayo Clinic study finds
The importance of standardized international scores for intensive care
Almost half of Oregon elk population carries advantageous genetic variant against CWD, study shows
Colorectal cancer screenings remain low for people ages 45 to 49 despite guideline change
Artificial Intelligence may help save lives in ICUs
Uncovering how cells build tissues and organs
Bigger datasets aren’t always better
AI at the heart of new SFU gel-free ECG system for faster diagnoses
“Cellular Big Brother”: 3D model with human cells allows real-time observation of brain metastases and paves the way for new treatments
Teaching large language models how to absorb new knowledge
Milestone on the road to the ‘quantum internet’
Blink to the beat
Even low-intensity smoking increases risk of heart attack and death
Research on intelligent analysis method for dynamic response of onshore wind turbines
Type 1 diabetes cured in mice with gentle blood stem-cell and pancreatic islet transplant
Serida sequences the first complete genome of the Faba Granja Asturiana, a key advance for its genetic improvement and conservation
New clues reveal how gestational diabetes affects offspring
Study finds longer, more consistent addiction medication use among youth sharply lowers risk of overdose, hospitalization
Combating climate change with better semiconductor manufacturing
Evaluation of a state-level incentive program to improve diet
Breakthrough study shows how cancer cells ‘break through’ tight tissue gaps
Researchers build bone marrow model entirely from human cells
$3.7 million in NIH funding for research into sand flies, vectors of parasitic disease leishmaniasis, goes to UNC Greensboro
Researchers enhance durability of pure water-fed anion exchange membrane electrolysis
How growth hormone excess accelerates liver aging via glycation stress
State-of-the-art multimodal imaging and therapeutic strategies in radiation-induced brain injury
Updates in chronic subdural hematoma: from epidemiology, pathogenesis, and diagnosis to treatment
Team studies beryllium-7 variations over Antarctic regions of the Southern Ocean
SwRI identifies security vulnerability in EV charging protocol
[Press-News.org] Bigger datasets aren’t always betterMIT researchers developed a way to identify the smallest dataset that guarantees optimal solutions to complex problems.