PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

A statistical solution to processing very large datasets efficiently with memory limit

Scientists develop a statistical randomness-based framework to optimally classify extremely large datasets without needing large memories

A statistical solution to processing very large datasets efficiently with memory limit
2021-04-01
(Press-News.org) Ishikawa, Japan - Any high-performance computing should be able to handle a vast amount of data in a short amount of time -- an important aspect on which entire fields (data science, Big Data) are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or--as is typical in machine learning--"cluster" them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.

"The problem can be formulated as follows: Suppose we have a clustering tool that can process up to lmax samples. The tool classifies l (input) samples into M(l) groups (as output) based on some attributes. Let the actual number of samples be L and G = M(L) be the total number of attributes we want to find. The problem is that if L is much larger than lmax, we cannot determine G owing to limitations in memory capacity," explains Professor Ryo Maezono from the Japan Advanced Institute of Science and Technology (JAIST), who specializes in computational condensed matter theory.

Interestingly enough, very large sample sizes are common in materials science, where calculations involving atomic substitutions in a crystal structure often involve possibilities ranging in trillions! However, a mathematical theorem called "Polya's theorem," which utilizes the symmetry of the crystal, often simplifies the calculations to a great extent. Unfortunately, Polya's theorem only works for problems with symmetry and is, therefore, of limited scope.

In a recent study published in Advanced Theory and Simulations, a team of scientists led by Prof. Maezono and his colleague, Keishu Utimula, PhD in material science from JAIST (In 2020) and first author of the study, proposed an approach based on statistical randomness to identify G for sample sizes much larger (~ trillion) than lmax. The idea, essentially, is to pick a sample of size l that is much smaller than L, identify M(l) using machine learning "clustering," and repeat the process by varying l. As l increases, the estimated M(l) converges to M(L) or G, provided G is considerably smaller than lmax (which is almost always satisfied). However, this is still a computationally expensive strategy, because it is tricky to know exactly when convergence has been achieved.

To address this issue, the scientists implemented another ingenious strategy: they made use of the "variance", or the degree of spread, in M(l). From simple mathematical reasoning, they showed that the variance of M(l), or V[M(l)], should have a peak for a sample size ~ G. In other words, the sample size corresponding to a maximum in V[M(l)] is approximately G! Furthermore, numerical simulations revealed that the peak variance itself scaled as 0.1 times G, and was thus a good estimate of G.

While the results are yet to be mathematically verified, the technique shows promise of finding applications in high-performance computing and machine learning. "The method described in our work has much wider applicability than Polya's theorem and can, therefore, handle a broader category of problems. Moreover, it only requires a machine learning clustering tool for sorting the data and does not require a large memory or whole sampling. This can make AI recognition technology feasible for larger data sizes even with small-scale recognition tools, which can improve their convenience and availability in the future," comments Prof. Maezono excitedly.

Sometimes, statistics is nothing short of magic, and this study proves that!

INFORMATION:

About Japan Advanced Institute of Science and Technology, Japan Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan's top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry-academia collaborative research.

About Professor Ryo Maezono from Japan Advanced Institute of Science and Technology, Japan Dr. Ryo Maezono is a Professor at the School of Information Science at Japan Advanced Institute of Science and Technology (JAIST) since 2017. He received his Ph.D. degree from the University of Tokyo in 2000 and worked as a researcher at the National Institute for Materials Science, Ibaraki, Japan from 2001-2007. His research interests comprise material informatics and condensed matter theory using high-performance computing. A senior researcher and professor, he has 166 publications with over 1700 citations to his credit.

Funding information This study was funded by JAIST Research Grant (Fundamental Research) 2019, FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), KAKENHI grant (grant numbers 17K17762 and 19K05029), Grant-in-Aid for Scientific Research on Innovative Areas (16H06439 and 19H05169), PRESTO (JPMJPR16NA), Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency (JST), MEXT-KAKENHI (grant numbers 19H04692 and 16KK0097), FLAGSHIP2020 (project numbers hp190169 and hp190167 at K-computer), Toyota Motor Corporation, I-O DATA Foundation, Air Force Office of Scientific Research (AFOSR-AOARD/FA2386-17-1-4049 and FA2386-19-1-4015), and JSPS Bilateral Joint Projects (with India DST).


[Attachments] See images for this press release:
A statistical solution to processing very large datasets efficiently with memory limit

ELSE PRESS RELEASES FROM THIS DATE:

Unique macro-vertebrate at risk from blood sport and climate change

Unique macro-vertebrate at risk from blood sport and climate change
2021-04-01
The kangaskhan, Australia's only species of endemic Pokemon in Pokemon Go, is commonly poached within its natural habitat by Pokemon trainers for use in fighting contests Researchers used several species distribution modeling algorithms to predict how climate change, on top of the already existing human-induced pressures, would impact the distribution of the kangaskhan in the future In addition to this, they found a way to measure how biased commonly used species distribution models are, and found that some models are so biased that their results weren't influenced by the data at all The ...

Weight loss changes people's responsiveness to food marketing: study

2021-04-01
Obesity rates have increased dramatically in developed countries over the past 40 years -- and many people have assumed that food marketing is at least in part to blame. But are people with obesity really more susceptible to food marketing? And if they are, is that a permanent predisposition, or can it change over time? According to a new study by UBC Sauder School of Business Assistant Professor Dr. Yann Cornil (he/him/his) and French researchers, people with obesity do tend to be more responsive to food marketing -- but when their weight drops significantly, so does their responsiveness to marketing. For the study, which was published in the Journal of Consumer Psychology, the researchers followed three groups: patients with severe ...

SLAS Discovery special issue "Advances in Protein Degradation" available now

2021-04-01
Oak Brook, IL - The April edition of SLAS Discovery is a special issue on advances in protein degradation curated by guest editors M. Paola Castaldi, Ph.D., and Stewart L. Fisher, Ph.D. Targeted protein degradation has generated interest within the drug discovery arena due to the inhibition of one particular function of a protein not often delivering the successful results that comes from whole-protein depletion. The pharmacology of PROTACs present challenges, however, namely for the development of orally bioavailable drugs. In the article "Target Validation Using PROTACs: Applying the Four Pillars Framework" authors Rados?aw P. Nowak, Ph.D., and Lyn H. Jones, Ph.D., describe the application of a translational pharmacology framework (the four pillars) ...

SLAS Technology April issue dives into reactive oxygen species

2021-04-01
Oak Brook, IL - The April edition of SLAS Technology features the cover article "Therapeutic Potential of Reactive Oxygen Species: State of the Art and Recent Advances" by Valeria Graceffa, Ph.D. (Institute of Technology Sligo, Sligo, Ireland). The cover article explores the therapeutic potential of reactive oxygen species (ROS) including applications ranging from wound healing and hair growth enhancement, to cancer treatment, stem cell differentiation and tissue engineering. At low concentrations, ROS can be utilized as inexpensive and convenient inducers of tissue regeneration, triggering stem cell differentiation and enhancing collagen synthesis. Recent cancer studies have represented ROS as the 'Achilles Heels' of cancers given their high basal levels, leaving ...

UTSA criminology professor studies impact of COVID-19on gender-based violence

2021-04-01
(APRIL 1, 2021) -The pandemic has exacerbated risk factors for gender-based violence, such as unemployment and financial strain, substance use, isolation, depression anxiety, and general stress, according to the American Psychological Association. That's inspired The University of Texas at San Antonio criminology and criminal justice professor Kellie Lynch, along with professor TK Logan from the University of Kentucky, to work with the National Coalition Against Domestic Violence on a national survey to investigate the impact of the COVID-19 pandemic on the dynamics of gender-based violence and the experiences of those serving victims of gender-based violence. "The consequences of the COVID-19 pandemic are far-reaching and we still have much to learn about ...

Medical studies without adequate pre-publication review could damage public trust in science

2021-04-01
The public could lose trust in science if scientific and medical researchers choose to bypass the traditional high standards of peer-reviewed medical journals in the rush to get research data released, particularly during crises such as the COVID-19 pandemic. That's the warning from three leading medical communications organizations, that have published a joint statement in the peer-reviewed journal Current Medical Research and Opinion - asserting that the integrity of published scientific and medical research must be protected. Out today, the joint statement from the American Medical ...

The Lancet GH: COVID-19 pandemic worsened pregnancy outcomes for women and babies worldwide

2021-04-01
Review of 40 published studies from 17 countries offers first global assessment of the collateral impact of the COVID-19 pandemic on pregnancy outcomes. Findings reveal overall increase in the chances of stillbirth and maternal death during the pandemic, but chances of pre-term birth decreased in high-income countries. Women requiring surgery for ectopic pregnancies increased almost six-fold during the pandemic across all studies, after accounting for the size of included studies (surgery rate for ectopic pregnancies during pandemic 27/37 vs pre-pandemic 73/272), and symptoms of maternal depression were also increased. Study reveals disproportionate impact ...

Will US public support donating COVID-19 vaccines to low- and middle-income countries?

2021-04-01
The pandemic is affecting every country, but not every country has equal access to the lifesaving COVID-19 vaccines. Recent estimates show that high-income countries -- which have just one-fifth of the global adult population -- have purchased more than half of the world's total vaccine doses, resulting in disparities for low- and middle-income countries. A new study by researchers at Virginia Commonwealth University investigates a key question: Will the U.S. population support donating part of its COVID-19 vaccine stockpile to less prosperous countries? "COVID-19 is a true global pandemic that has touched every nation ...

Houston Methodist among largest providers of monoclonal antibody treatment for COVID-19

2021-04-01
Houston Methodist has been a leader in successfully treating high-risk patients with monoclonal antibodies (mAB) for mild to moderate Covid-19 infection. Among the nation's largest providers of mAB therapy, Houston Methodist has infused nearly 4,000 patients since the FDA's Emergency Use Authorization (EUA) was issued. The hospital system was able to quickly ramp up its program once the EUA was granted by leveraging a number of resources through interdisciplinary collaboration. As more hospitals begin to ramp up for treating Covid-19 with mAB therapy, Houston Methodist's example serves as a valuable model for other medical systems to establish or expand mAB treatment programs and improve patient access to this critical therapy. A commentary outlining the challenges, ...

CU Cancer Center researcher reveals new effects of oxygen deprivation in cancer cells

CU Cancer Center researcher reveals new effects of oxygen deprivation in cancer cells
2021-03-31
A team of University of Colorado School of Medicine researchers recently published a paper offering new insight into the role that oxygen deprivation, or hypoxia, plays in cancer development. CU Cancer Center member Joaquin Espinosa, PhD, is the senior researcher on the paper, which he hopes will help lead to more targeted treatments for cancer.  For their paper published this month in the journal Nature Communications, Espinosa and the rest of the team -- Zdenek Andrysik, PhD; Heather Bender, PhD; and Matthew Galbraith, PhD -- used state-of-the-art genomics technologies to map the response of cancer cells to hypoxia with unprecedented detail, ...

LAST 30 PRESS RELEASES:

Age matters: Kidney disorder indicator gains precision

New guidelines for managing blood cancers in pregnancy

New study suggests RNA present on surfaces of leaves may shape microbial communities

U.S. suffers from low social mobility. Is sprawl partly to blame?

Research spotlight: Improving predictions about brain cancer outcomes with the right imaging criteria

New UVA professor’s research may boost next-generation space rockets

Multilingualism improves crucial cognitive functions in autistic children

The carbon in our bodies probably left the galaxy and came back on cosmic ‘conveyer belt’

Scientists unveil surprising human vs mouse differences in a major cancer immunotherapy target

NASA’s LEXI will provide X-ray vision of Earth’s magnetosphere

A successful catalyst design for advanced zinc-iodine batteries

AMS Science Preview: Tall hurricanes, snow and wildfire

Study finds 25% of youth experienced homelessness in Denver in 2021, significantly higher than known counts

Integrated spin-wave quantum memory

Brain study challenges long-held views about Parkinson's movement disorders

Mental disorders among offspring prenatally exposed to systemic glucocorticoids

Trends in screening for social risk in physician practices

Exposure to school racial segregation and late-life cognitive outcomes

AI system helps doctors identify patients at risk for suicide

Advanced imaging uncovers hidden metastases in high-risk prostate cancer cases

Study reveals oldest-known evolutionary “arms race”

People find medical test results hard to understand, increasing overall worry

Mizzou researchers aim to reduce avoidable hospitalizations for nursing home residents with dementia

National Diabetes Prevention Program saves costs for enrollees

Research team to study critical aspects of Alzheimer’s and dementia healthcare delivery

Major breakthrough for ‘smart cell’ design

From CO2 to acetaldehyde: Towards greener industrial chemistry

Unlocking proteostasis: A new frontier in the fight against neurodegenerative diseases like Alzheimer's

New nanocrystal material a key step toward faster, more energy-efficient computing

One of the world’s largest social programs greatly reduced tuberculosis among the most vulnerable

[Press-News.org] A statistical solution to processing very large datasets efficiently with memory limit
Scientists develop a statistical randomness-based framework to optimally classify extremely large datasets without needing large memories