PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

A statistical solution to processing very large datasets efficiently with memory limit

Scientists develop a statistical randomness-based framework to optimally classify extremely large datasets without needing large memories

A statistical solution to processing very large datasets efficiently with memory limit
2021-04-01
(Press-News.org) Ishikawa, Japan - Any high-performance computing should be able to handle a vast amount of data in a short amount of time -- an important aspect on which entire fields (data science, Big Data) are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or--as is typical in machine learning--"cluster" them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.

"The problem can be formulated as follows: Suppose we have a clustering tool that can process up to lmax samples. The tool classifies l (input) samples into M(l) groups (as output) based on some attributes. Let the actual number of samples be L and G = M(L) be the total number of attributes we want to find. The problem is that if L is much larger than lmax, we cannot determine G owing to limitations in memory capacity," explains Professor Ryo Maezono from the Japan Advanced Institute of Science and Technology (JAIST), who specializes in computational condensed matter theory.

Interestingly enough, very large sample sizes are common in materials science, where calculations involving atomic substitutions in a crystal structure often involve possibilities ranging in trillions! However, a mathematical theorem called "Polya's theorem," which utilizes the symmetry of the crystal, often simplifies the calculations to a great extent. Unfortunately, Polya's theorem only works for problems with symmetry and is, therefore, of limited scope.

In a recent study published in Advanced Theory and Simulations, a team of scientists led by Prof. Maezono and his colleague, Keishu Utimula, PhD in material science from JAIST (In 2020) and first author of the study, proposed an approach based on statistical randomness to identify G for sample sizes much larger (~ trillion) than lmax. The idea, essentially, is to pick a sample of size l that is much smaller than L, identify M(l) using machine learning "clustering," and repeat the process by varying l. As l increases, the estimated M(l) converges to M(L) or G, provided G is considerably smaller than lmax (which is almost always satisfied). However, this is still a computationally expensive strategy, because it is tricky to know exactly when convergence has been achieved.

To address this issue, the scientists implemented another ingenious strategy: they made use of the "variance", or the degree of spread, in M(l). From simple mathematical reasoning, they showed that the variance of M(l), or V[M(l)], should have a peak for a sample size ~ G. In other words, the sample size corresponding to a maximum in V[M(l)] is approximately G! Furthermore, numerical simulations revealed that the peak variance itself scaled as 0.1 times G, and was thus a good estimate of G.

While the results are yet to be mathematically verified, the technique shows promise of finding applications in high-performance computing and machine learning. "The method described in our work has much wider applicability than Polya's theorem and can, therefore, handle a broader category of problems. Moreover, it only requires a machine learning clustering tool for sorting the data and does not require a large memory or whole sampling. This can make AI recognition technology feasible for larger data sizes even with small-scale recognition tools, which can improve their convenience and availability in the future," comments Prof. Maezono excitedly.

Sometimes, statistics is nothing short of magic, and this study proves that!

INFORMATION:

About Japan Advanced Institute of Science and Technology, Japan Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan's top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry-academia collaborative research.

About Professor Ryo Maezono from Japan Advanced Institute of Science and Technology, Japan Dr. Ryo Maezono is a Professor at the School of Information Science at Japan Advanced Institute of Science and Technology (JAIST) since 2017. He received his Ph.D. degree from the University of Tokyo in 2000 and worked as a researcher at the National Institute for Materials Science, Ibaraki, Japan from 2001-2007. His research interests comprise material informatics and condensed matter theory using high-performance computing. A senior researcher and professor, he has 166 publications with over 1700 citations to his credit.

Funding information This study was funded by JAIST Research Grant (Fundamental Research) 2019, FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), KAKENHI grant (grant numbers 17K17762 and 19K05029), Grant-in-Aid for Scientific Research on Innovative Areas (16H06439 and 19H05169), PRESTO (JPMJPR16NA), Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency (JST), MEXT-KAKENHI (grant numbers 19H04692 and 16KK0097), FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), Toyota Motor Corporation, I-O DATA Foundation, Air Force Office of Scientific Research (AFOSR-AOARD/FA2386-17-1-4049 and FA2386-19-1-4015), and JSPS Bilateral Joint Projects (with India DST).


[Attachments] See images for this press release:
A statistical solution to processing very large datasets efficiently with memory limit

ELSE PRESS RELEASES FROM THIS DATE:

Unique macro-vertebrate at risk from blood sport and climate change

Unique macro-vertebrate at risk from blood sport and climate change
2021-04-01
The kangaskhan, Australia's only species of endemic Pokemon in Pokemon Go, is commonly poached within its natural habitat by Pokemon trainers for use in fighting contests Researchers used several species distribution modeling algorithms to predict how climate change, on top of the already existing human-induced pressures, would impact the distribution of the kangaskhan in the future In addition to this, they found a way to measure how biased commonly used species distribution models are, and found that some models are so biased that their results weren't influenced by the data at all The ...

Weight loss changes people's responsiveness to food marketing: study

2021-04-01
Obesity rates have increased dramatically in developed countries over the past 40 years -- and many people have assumed that food marketing is at least in part to blame. But are people with obesity really more susceptible to food marketing? And if they are, is that a permanent predisposition, or can it change over time? According to a new study by UBC Sauder School of Business Assistant Professor Dr. Yann Cornil (he/him/his) and French researchers, people with obesity do tend to be more responsive to food marketing -- but when their weight drops significantly, so does their responsiveness to marketing. For the study, which was published in the Journal of Consumer Psychology, the researchers followed three groups: patients with severe ...

SLAS Discovery special issue "Advances in Protein Degradation" available now

2021-04-01
Oak Brook, IL - The April edition of SLAS Discovery is a special issue on advances in protein degradation curated by guest editors M. Paola Castaldi, Ph.D., and Stewart L. Fisher, Ph.D. Targeted protein degradation has generated interest within the drug discovery arena due to the inhibition of one particular function of a protein not often delivering the successful results that comes from whole-protein depletion. The pharmacology of PROTACs present challenges, however, namely for the development of orally bioavailable drugs. In the article "Target Validation Using PROTACs: Applying the Four Pillars Framework" authors Rados?aw P. Nowak, Ph.D., and Lyn H. Jones, Ph.D., describe the application of a translational pharmacology framework (the four pillars) ...

SLAS Technology April issue dives into reactive oxygen species

2021-04-01
Oak Brook, IL - The April edition of SLAS Technology features the cover article "Therapeutic Potential of Reactive Oxygen Species: State of the Art and Recent Advances" by Valeria Graceffa, Ph.D. (Institute of Technology Sligo, Sligo, Ireland). The cover article explores the therapeutic potential of reactive oxygen species (ROS) including applications ranging from wound healing and hair growth enhancement, to cancer treatment, stem cell differentiation and tissue engineering. At low concentrations, ROS can be utilized as inexpensive and convenient inducers of tissue regeneration, triggering stem cell differentiation and enhancing collagen synthesis. Recent cancer studies have represented ROS as the 'Achilles Heels' of cancers given their high basal levels, leaving ...

UTSA criminology professor studies impact of COVID-19on gender-based violence

2021-04-01
(APRIL 1, 2021) -The pandemic has exacerbated risk factors for gender-based violence, such as unemployment and financial strain, substance use, isolation, depression anxiety, and general stress, according to the American Psychological Association. That's inspired The University of Texas at San Antonio criminology and criminal justice professor Kellie Lynch, along with professor TK Logan from the University of Kentucky, to work with the National Coalition Against Domestic Violence on a national survey to investigate the impact of the COVID-19 pandemic on the dynamics of gender-based violence and the experiences of those serving victims of gender-based violence. "The consequences of the COVID-19 pandemic are far-reaching and we still have much to learn about ...

Medical studies without adequate pre-publication review could damage public trust in science

2021-04-01
The public could lose trust in science if scientific and medical researchers choose to bypass the traditional high standards of peer-reviewed medical journals in the rush to get research data released, particularly during crises such as the COVID-19 pandemic. That's the warning from three leading medical communications organizations, that have published a joint statement in the peer-reviewed journal Current Medical Research and Opinion - asserting that the integrity of published scientific and medical research must be protected. Out today, the joint statement from the American Medical ...

The Lancet GH: COVID-19 pandemic worsened pregnancy outcomes for women and babies worldwide

2021-04-01
Review of 40 published studies from 17 countries offers first global assessment of the collateral impact of the COVID-19 pandemic on pregnancy outcomes. Findings reveal overall increase in the chances of stillbirth and maternal death during the pandemic, but chances of pre-term birth decreased in high-income countries. Women requiring surgery for ectopic pregnancies increased almost six-fold during the pandemic across all studies, after accounting for the size of included studies (surgery rate for ectopic pregnancies during pandemic 27/37 vs pre-pandemic 73/272), and symptoms of maternal depression were also increased. Study reveals disproportionate impact ...

Will US public support donating COVID-19 vaccines to low- and middle-income countries?

2021-04-01
The pandemic is affecting every country, but not every country has equal access to the lifesaving COVID-19 vaccines. Recent estimates show that high-income countries -- which have just one-fifth of the global adult population -- have purchased more than half of the world's total vaccine doses, resulting in disparities for low- and middle-income countries. A new study by researchers at Virginia Commonwealth University investigates a key question: Will the U.S. population support donating part of its COVID-19 vaccine stockpile to less prosperous countries? "COVID-19 is a true global pandemic that has touched every nation ...

Houston Methodist among largest providers of monoclonal antibody treatment for COVID-19

2021-04-01
Houston Methodist has been a leader in successfully treating high-risk patients with monoclonal antibodies (mAB) for mild to moderate Covid-19 infection. Among the nation's largest providers of mAB therapy, Houston Methodist has infused nearly 4,000 patients since the FDA's Emergency Use Authorization (EUA) was issued. The hospital system was able to quickly ramp up its program once the EUA was granted by leveraging a number of resources through interdisciplinary collaboration. As more hospitals begin to ramp up for treating Covid-19 with mAB therapy, Houston Methodist's example serves as a valuable model for other medical systems to establish or expand mAB treatment programs and improve patient access to this critical therapy. A commentary outlining the challenges, ...

CU Cancer Center researcher reveals new effects of oxygen deprivation in cancer cells

CU Cancer Center researcher reveals new effects of oxygen deprivation in cancer cells
2021-03-31
A team of University of Colorado School of Medicine researchers recently published a paper offering new insight into the role that oxygen deprivation, or hypoxia, plays in cancer development. CU Cancer Center member Joaquin Espinosa, PhD, is the senior researcher on the paper, which he hopes will help lead to more targeted treatments for cancer.  For their paper published this month in the journal Nature Communications, Espinosa and the rest of the team -- Zdenek Andrysik, PhD; Heather Bender, PhD; and Matthew Galbraith, PhD -- used state-of-the-art genomics technologies to map the response of cancer cells to hypoxia with unprecedented detail, ...

LAST 30 PRESS RELEASES:

Older teens who start vaping post-high school risk rapid progress to frequent use

Corpse flowers are threatened by spotty recordkeeping

Riding the AI wave toward rapid, precise ocean simulations

Are lifetimes of big appliances really shrinking?

Pink skies

Monkeys are world’s best yodellers - new research

Key differences between visual- and memory-led Alzheimer’s discovered

% weight loss targets in obesity management – is this the wrong objective?

An app can change how you see yourself at work

NYC speed cameras take six months to change driver behavior, effects vary by neighborhood, new study reveals

New research shows that propaganda is on the rise in China

Even the richest Americans face shorter lifespans than their European counterparts, study finds

Novel genes linked to rare childhood diarrhea

New computer model reveals how Bronze Age Scandinavians could have crossed the sea

Novel point-of-care technology delivers accurate HIV results in minutes

Researchers reveal key brain differences to explain why Ritalin helps improve focus in some more than others

Study finds nearly five-fold increase in hospitalizations for common cause of stroke

Study reveals how alcohol abuse damages cognition

Medicinal cannabis is linked to long-term benefits in health-related quality of life

Microplastics detected in cat placentas and fetuses during early pregnancy

Ancient amphibians as big as alligators died in mass mortality event in Triassic Wyoming

Scientists uncover the first clear evidence of air sacs in the fossilized bones of alvarezsaurian dinosaurs: the "hollow bones" which help modern day birds to fly

Alcohol makes male flies sexy

TB patients globally often incur "catastrophic costs" of up to $11,329 USD, despite many countries offering free treatment, with predominant drivers of cost being hospitalization and loss of income

Study links teen girls’ screen time to sleep disruptions and depression

Scientists unveil starfish-inspired wearable tech for heart monitoring

Footprints reveal prehistoric Scottish lagoons were stomping grounds for giant Jurassic dinosaurs

AI effectively predicts dementia risk in American Indian/Alaska Native elders

First guideline on newborn screening for cystic fibrosis calls for changes in practice to improve outcomes

Existing international law can help secure peace and security in outer space, study shows

[Press-News.org] A statistical solution to processing very large datasets efficiently with memory limit
Scientists develop a statistical randomness-based framework to optimally classify extremely large datasets without needing large memories