PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Bombarded by explosive waves of information, scientists review new ways to process and analyze Big Data

Bombarded by explosive waves of information, scientists review new ways to process and analyze Big Data
2014-08-26
(Press-News.org) Big Data presents scientists with unfolding opportunities, including, for instance, the possibility of discovering heterogeneous characteristics in the population leading to the development of personalized treatments and highly individualized services. But ever-expanding data sets introduce new challenges in terms of statistical analysis, bias sampling, computational costs, noise accumulation, spurious correlations, and measurement errors.

The era of Big Data – marked by a Big Bang-like explosion of information about everything from patterns of use of the World Wide Web to individual genomes – is being propelled by massive amounts of very high-dimensional or unstructured data, continuously produced and stored at a decreasing cost.

"In genomics we have seen a dramatic drop in price for whole genome sequencing," state Jianqing Fan and Han Liu, scientists at Princeton University, and Fang Han at Johns Hopkins. "This is also true in other areas such as social media analysis, biomedical imaging, high-frequency finance, analysis of surveillance videos and retail sales," they point out in a paper titled "Challenges of Big Data analysis" published in the Beijing-based journal National Science Review.

With the quickening pace of data collection and analysis, they add, "scientific advances are becoming more and more data-driven and researchers will more and more think of themselves as consumers of data."

Increasingly complex data sets are emerging across the sciences. In the field of genomics, more than 500 000 microarrays are now publicly available, with each array containing tens of thousands of expression values of molecules; in biomedical engineering, tens of thousands of terabytes of functional magnetic resonance images have been produced, with each image containing more than 50 000 voxel values. Massive and high-dimensional data is also being gathered from social media, e-commerce, and surveillance videos.

Expanding streams of social network data are being channeled and collected by Twitter, Facebook, LinkedIn and YouTube. This data, in turn, is being used to predict influenza epidemics, stock market trends, and box-office revenues for particular movies.

The social media and Internet contain burgeoning information on consumer preferences, leading economic indicators, business cycles, and the economic and social states of a society.

"It is anticipated that social network data will continue to explode and be exploited for many new applications," predict the co-authors of the study. New applications include ultra-individualized services.

And in the area of Internet security, they add, "When a network-based attack takes place, historical data on network traffic may allow us to efficiently identify the source and targets of the attack."

With Big Data emerging from many frontiers of scientific research and technological advances, researchers have focused on the development of new computational infrastructure and data-storage methods, of fast algorithms that are scalable to massive data with high dimensionality.

"This forges cross-fertilization among different fields including statistics, optimization and applied mathematics," the scientists add.

The massive sample sizes giving rise to Big Data fundamentally challenge the traditional computing infrastructure.

"In many applications, we need to analyze Internet-scale data containing billions or even trillions of data points, which makes even a linear pass of the whole dataset unaffordable," the researchers point out.

The basic approach to store and process such data is to divide and conquer. The idea is to partition a large problem into more tractable and independent sub-problems. Each sub- problem is tackled in parallel by different processing units. On a small scale, this divide-and-conquer strategy can be implemented either by multi-core computing or grid computing.

On a larger scale, handling enormous arrays of data requires a new computing infrastructure that supports massively parallel data storage and processing.

The researchers present Hadoop as an example of a basic software and programming infrastructure for Big Data processing. Alongside Hadoop's distributed file system, they review MapReduce, a programming model for processing large datasets in a parallel fashion, cloud computing, convex optimization, and random projection algorithms, which are specifically designed to meet Big Data's computational challenges.

Hadoop is a Java-based software framework for distributed data management and processing. It contains a set of open source libraries for distributed computing using the MapReduce programming model and its own distributed file system called HDFS. Hadoop automatically facilitates scalability and takes cares of detecting and handling failures.

HDFS is designed to host and provide high-throughput access to large datasets that are redundantly stored across multiple machines. It ensures Big Data's survivability and high availability for parallel applications.

In terms of statistical analysis, Big Data presents another set of new challenges. Researchers tend to collect as many features of the samples as possible; as a result, these samples are commonly heterogeneous and high dimensional.

High dimensionality brings new problems, including noise accumulation, spurious correlation, and incidental endogeneity. For instance, high dimensionality gives rise to spurious correlation. In studying the association between cancers and certain genomic and clinical factors, it might be possible that prostate cancer is highly correlated to an unrelated gene. However, such a high correlation could be explained by high dimensionality: In studies that include so many features, ranging from genomic information to height, weight and gender to favorite foods and sports, some high correlations emerge merely by chance.

INFORMATION:

This research was supported by the National Science Foundation [DMS-1206464 to Jianqing Fan, III-1116730 and III-1332109 to Han Liu] and the National Institutes of Health [R01-GM100474 and R01-GM072611 to Jianqing Fan].

See the article: Jianqing Fan, Fang Han, and Han Liu. "Challenges of Big Data analysis." Natl Sci Rev (June 2014) 1 (2): 293-314 http://nsr.oxfordjournals.org/content/1/2/293.full

The National Science Review is the first comprehensive scholarly journal published in English in China that is aimed at linking the country's rapidly advancing community of scientists with the global frontiers of science and technology. The journal also aims to shine a worldwide spotlight on scientific research advances across China.

[Attachments] See images for this press release:
Bombarded by explosive waves of information, scientists review new ways to process and analyze Big Data

ELSE PRESS RELEASES FROM THIS DATE:

Chinese scientists use laser-induced breakdown spectroscopy to identify toxic cooking 'gutter oil'

Chinese scientists use laser-induced breakdown spectroscopy to identify toxic cooking gutter oil
2014-08-26
The illegal use of waste cooking oil in parts of the nationwide food system is threatening the public's health in China. Now scientists led by Professor Ding Hongbin at the Dalian University of Technology, in northeastern China, present a new means to confront this problem. In a study published in the Chinese Science Bulletin, Ding and fellow researchers at the university's School of Physics and Optoelectronic Engineering outline the potential use of laser-induced breakdown spectroscopy (LIBS) to rapidly distinguish between "gutter oil" and safe, edible oil. ...

Same-beam VLBI Technology successfully monitors the Chang'E-3 rover's movement on the lunar surface

Same-beam VLBI Technology successfully monitors the ChangE-3 rovers movement on the lunar surface
2014-08-26
By using the same-beam VLBI technology, differential phase delay successfully monitored the lunar rover's movement during the Chang'E-3 mission when rover and lander was carrying out the tasks of separation and took photos of each other. The sensitivity of rover motion monitoring was between 50-100mm.Furthermore, relative position between rover and lander was precisely measured by taking the use of the DPD's changing trend. Professor LIU Qing hui and his student ZHENG Xin from the Shanghai Astronomical of observatory, Chinese Academy of Sciences, obtained this result when ...

Laser pulse turns glass into a metal

Laser pulse turns glass into a metal
2014-08-26
Quartz glass does not conduct electric current, it is a typical example of an insulator. With ultra-short laser pulses, however, the electronic properties of glass can be fundamentally changed within femtoseconds (1 fs = 10^-15 seconds). If the laser pulse is strong enough, the electrons in the material can move freely. For a brief moment, the quartz glass behaves like metal. It becomes opaque and conducts electricity. This change of material properties happens so quickly that it can be used for ultra-fast light based electronics. Scientists at the Vienna University of ...

Study calls into question link between prenatal antidepressant exposure and autism risk

2014-08-26
Previous studies that have suggested an increased risk of autism among children of women who took antidepressants during pregnancy may actually reflect the known increased risk associated with severe maternal depression. In a study receiving advance online publication in Molecular Psychiatry, investigators from Massachusetts General Hospital (MGH) report that – while a diagnosis of autism spectrum disorder was more common in the children of mothers prescribed antidepressants during pregnancy than in those with no prenatal exposure – when the severity of the mother's depression ...

Study: Earth can sustain more terrestrial plant growth than previously thought

Study: Earth can sustain more terrestrial plant growth than previously thought
2014-08-26
CHAMPAIGN, Ill. — A new analysis suggests the planet can produce much more land-plant biomass – the total material in leaves, stems, roots, fruits, grains and other terrestrial plant parts – than previously thought. The study, reported in Environmental Science and Technology, recalculates the theoretical limit of terrestrial plant productivity, and finds that it is much higher than many current estimates allow. "When you try to estimate something over the whole planet, you have to make some simplifying assumptions," said University of Illinois plant biology professor ...

New tool to probe cancer's molecular make-up

2014-08-26
Scientists have shown how to better identify and measure vital molecules that control cell behaviour – paving the way for improved tools for diagnosis, prediction and monitoring of cancer. Researchers from the Cancer Research UK Manchester Institute based at The University of Manchester – part of the Manchester Cancer Research Centre – and the Institute of Cancer Research, London, looked at protein kinases, molecules that control various aspects of cellular function. The study, funded by a Biotechnology and Biological Sciences Research Council (BBSRC)/Pfizer CASE studentship ...

Symptoms after breast cancer surgery need to be treated on an individual basis

2014-08-26
For those affected, breast cancer is a dramatic diagnosis. Patients often have to endure chemotherapy and surgery, which, depending on the individual scenario, may mean breast conserving surgery or breast removal—mastectomy. In the aftermath, many women experience symptoms such as pain, fatigue/exhaustion, or sleep disturbances. However, the symptoms are highly individual, as Stefan Feiten and colleagues emphasize in a recent study reported in Deutsches Ärzteblatt International (Dtsch Arztebl Int 2014; 111: 537-44). The authors state that it is crucial for good aftercare ...

Life in Saxony-Anhalt: More attention should be paid to the heart!

2014-08-26
A lack of education, an unhealthy diet, and unemployment go straight to the heart—quite literally, because all three range among the risks that cause ischemic heart disease or contribute to its development. According to a recent study reported by epidemiologists Andreas und Maximilian Stang in Deutsches Ärzteblatt International (Dtsch Arztebl Int 2014; 111: 530-6), the risk factors for heart disease are higher in Saxony-Anhalt than in all other German states, and more persons die from heart disease in the state. Many of the risk factors could be treated in a more targeted ...

A high-resolution bedrock map for the Antarctic Peninsula

2014-08-26
26.08.2014: Antarctic glaciers respond sensitively to changes in the Atmosphere/Ocean System. Assessing and projecting the dynamic response of glaciers on the Antarctic Peninsula to changed atmospheric and oceanic forcing requires high-resolution ice thickness data as an essential geometric constraint for ice flow models. Therefore, a Swiss-German team of scientists developed a complete bedrock data set for the Antarctic Peninsula on a 100 m grid. They calculated the spatial distribution of ice thickness based on surface topography and ice dynamic modelling. Daniel Farinotti, ...

Duality principle is 'safe and sound'

2014-08-26
Decades of experiments have verified the quirky laws of quantum theory again and again. So when scientists in Germany announced in 2012 an apparent violation of a fundamental law of quantum mechanics, a physicist at the University of Rochester was determined to find an explanation. "You don't destroy the laws of quantum mechanics that easily," said Robert Boyd, professor of optics and of physics at Rochester and the Canada Excellence Research Chair in Quantum Nonlinear Optics at the University of Ottawa. In their 2012 version of the famous Young two-split experiment, ...

LAST 30 PRESS RELEASES:

Azacitidine–venetoclax combination outperforms standard care in acute myeloid leukemia patients eligible for intensive chemotherapy

Adding epcoritamab to standard second-line therapy improves follicular lymphoma outcomes

New findings support a chemo-free approach for treating Ph+ ALL

Non-covalent btki pirtobrutinib shows promise as frontline therapy for CLL/SLL

University of Cincinnati experts present research at annual hematology event

ASH 2025: Antibody therapy eradicates traces of multiple myeloma in preliminary trial

ASH 2025: AI uncovers how DNA architecture failures trigger blood cancer

ASH 2025: New study shows that patients can safely receive stem cell transplants from mismatched, unrelated donors

Protective regimen allows successful stem cell transplant even without close genetic match between donor and recipient

Continuous and fixed-duration treatments result in similar outcomes for CLL

Measurable residual disease shows strong potential as an early indicator of survival in patients with acute myeloid leukemia

Chemotherapy and radiation are comparable as pre-transplant conditioning for patients with b-acute lymphoblastic leukemia who have no measurable residual disease

Roughly one-third of families with children being treated for leukemia struggle to pay living expenses

Quality improvement project results in increased screening and treatment for iron deficiency in pregnancy

IV iron improves survival, increases hemoglobin in hospitalized patients with iron-deficiency anemia and an acute infection

Black patients with acute myeloid leukemia are younger at diagnosis and experience poorer survival outcomes than White patients

Emergency departments fall short on delivering timely treatment for sickle cell pain

Study shows no clear evidence of harm from hydroxyurea use during pregnancy

Long-term outlook is positive for most after hematopoietic cell transplant for sickle cell disease

Study offers real-world data on commercial implementation of gene therapies for sickle cell disease and beta thalassemia

Early results suggest exa-cel gene therapy works well in children

NTIDE: Disability employment holds steady after data hiatus

Social lives of viruses affect antiviral resistance

Dose of psilocybin, dash of rabies point to treatment for depression

Helping health care providers navigate social, political, and legal barriers to patient care

Barrow Neurological Institute, University of Calgary study urges “major change” to migraine treatment in Emergency Departments

Using smartphones to improve disaster search and rescue

Robust new photocatalyst paves the way for cleaner hydrogen peroxide production and greener chemical manufacturing

Ultrafast material captures toxic PFAS at record speed and capacity

Plant phenolic acids supercharge old antibiotics against multidrug resistant E. coli

[Press-News.org] Bombarded by explosive waves of information, scientists review new ways to process and analyze Big Data