PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

New data science platform speeds up Python queries

2021-07-01
(Press-News.org) PROVIDENCE, R.I. [Brown University] -- Researchers from Brown University and MIT have developed a new data science framework that allows users to process data with the programming language Python -- without paying the "performance tax" normally associated with a user-friendly language.

The new framework, called Tuplex, is able to process data queries written in Python up to 90 times faster than industry-standard data systems like Apache Spark or Dask. The research team unveiled the system in research presented at SIGMOD 2021, a premier data processing conference, and have made the software freely available to all.

"Python is the primary programming language used by people doing data science," said Malte Schwarzkopf, an assistant professor of computer science at Brown and one of the developers of Tuplex. "That makes a lot of sense. Python is widely taught in universities, and it's an easy language to get started with. But when it comes to data science, there's a huge performance tax associated with Python because platforms can't process Python efficiently on the back end."

Platforms like Spark perform data analytics by distributing tasks across multiple processor cores or machines in a data center. That parallel processing allows users to deal with giant data sets that would choke a single computer to death. Users interact with these platforms by inputting their own queries, which contain custom logic written as "user-defined functions" or UDFs. UDFs specify custom logic, like extracting the number of bedrooms from the text of a real estate listing for a query that searches all of the real estate listings in the U.S. and selects all the ones with three bedrooms.

Because of its simplicity, Python is the language of choice for creating UDFs in the data science community. In fact, the Tuplex team cites a recent poll showing that 66% of data platform users utilize Python as their primary language. The problem is that analytics platforms have trouble dealing with those bits of Python code efficiently.

Data platforms are written in high-level computer languages that are compiled before running. Compilers are programs that take computer language and turn it into machine code -- sets of instructions that a computer processor can quickly execute. Python, however, is not compiled beforehand. Instead, computers interpret Python code line by line while the program runs, which can mean far slower performance.

"These frameworks have to break out of their efficient execution of compiled code and jump into a Python interpreter to execute Python UDFs," Schwarzkopf said. "That process can be a factor of 100 less efficient than executing compiled code."

If Python code could be compiled, it would speed things up greatly. But researchers have tried for years to develop a general-purpose Python compiler, Schwarzkopf says, with little success. So instead of trying to make a general Python compiler, the researchers designed Tuplex to compile a highly specialized program for the specific query and common-case input data. Uncommon input data, which account for only a small percentage of instances, are separated out and referred to an interpreter.

"We refer to this process as dual-case processing, as it splits that data into two cases," said Leonhard Spiegelberg, co-author of the research describing Tuplex. "This allows us to simplify the compilation problem as we only need to care about a single set of data types and common-case assumptions. This way, you get the best of two worlds: high productivity and fast execution speed."

And the runtime benefit can be substantial.

"We show in our research that a wait time of 10 minutes for an output can be reduced to a second," Schwarzkopf said. "So it really is a substantial improvement in performance."

In addition to speeding things up, Tuplex also has an innovative way of dealing with anomalous data, the researchers say. Large datasets are often messy, full of corrupted records or data fields that don't follow convention. In real estate data, for example, the number of bedrooms could either be a numeral or a spelled-out number. Inconsistencies like that can be enough to crash some data platforms. But Tuplex extracts those anomalies and sets them aside to avoid a crash. Once the program has run, the user then has the option of repairing those anomalies.

"We think this could have a major productivity impact for data scientists," Schwarzkopf said. "To not have to run out to get a cup of coffee while waiting for an output, and to not have a program run for an hour only to crash before it's done would be a really big deal."

INFORMATION:

The research was supported by the National Science Foundation (DGE-2039354, IIS-1453171) and the U.S. Air Force (FA8750-19-2-1000).



ELSE PRESS RELEASES FROM THIS DATE:

G-quadruplex-forming DNA molecules enhance enzymatic activity of myoglobin

G-quadruplex-forming DNA molecules enhance enzymatic activity of myoglobin
2021-07-01
A collaboration led by Distinguished Professor Dr. Kazunori Ikebukuro from Tokyo University of Agriculture and Technology (TUAT), Japan, discovered that G-quadruplex (G4)-forming DNA binds myoglobin through a parallel-type G4 structure. Through the G4 binding, the enzymatic activity of myoglobin increases over 300-fold compared to that of myoglobin alone (Figure). This finding indicates that DNA may work as a carrier of genetic information in living organisms and act as a regulator of unknown biological phenomena. "Aptamers" are nucleic acid-based synthetic ligands that can be used against many target molecules with high affinity and specificity. Some aptamers that bind to proteins ...

Catalyzing the conversion of biomass to biofuel

Catalyzing the conversion of biomass to biofuel
2021-07-01
Zeolites are extremely porous materials: Ten grams can have an internal surface area the size of a soccer field. Their cavities make them useful in catalyzing chemical reactions and thus saving energy. An international research team has now made new findings regarding the role of water molecules in these processes. One important application is the conversion of biomass into biofuel. Fuel made from biomass is considered to be climate-neutral, although energy is still needed to produce it: The desired chemical reactions require high levels of temperature and pressure. "If ...

New algorithms give digital images more realistic color

2021-07-01
WASHINGTON -- If you've ever tried to capture a sunset with your smartphone, you know that the colors don't always match what you see in real life. Researchers are coming closer to solving this problem with a new set of algorithms that make it possible to record and display color in digital images in a much more realistic fashion. "When we see a beautiful scene, we want to record it and share it with others," said Min Qiu, leader of the Laboratory of Photonics and Instrumentation for Nano Technology (PAINT) at Westlake University in China. "But we don't want to see a digital photo or video with the wrong colors. Our new algorithms can help digital camera and electronic display developers better adapt their ...

Closing the gap on the missing lithium

Closing the gap on the missing lithium
2021-07-01
There is a significant discrepancy between theoretical and observed amounts of lithium in our universe. This is known as the cosmological lithium problem, and it has plagued cosmologists for decades. Now, researchers have reduced this discrepancy by around 10%, thanks to a new experiment on the nuclear processes responsible for the creation of lithium. This research could point the way to a more complete understanding of the early universe. There is a famous saying that, "In theory, theory and practice are the same. In practice, they are not." This holds true in every academic domain, but it's especially common in cosmology, the study of the entire ...

Manufacturing the core engine of cell division

Manufacturing the core engine of cell division
2021-07-01
A wonder of nature As a human cell begins division, its 23 chromosomes duplicate into identical copies that remain joined at a region called the centromere. Here lies the kinetochore, a complicated assembly of proteins that binds to thread-like structures, the microtubules. As mitosis progresses, the kinetochore gives green light to the microtubules to tear the DNA copies apart, towards the new forming cells. "The kinetochore is a beautiful, flawless machine: You almost never lose a chromosome in a normal cell!", says Musacchio. "We already know the proteins that constitute it, yet important questions about how the kinetochore works are still open: How does it rebuild itself during chromosome replication? ...

Dolichomitus meii wasp discovered in Amazonia is like a flying jewel

Dolichomitus meii wasp discovered in Amazonia is like a flying jewel
2021-07-01
Researchers at the Biodiversity Unit of the University of Turku, Finland, study insect biodiversity particularly in Amazonia and Africa. In their studies, they have discovered hundreds of species previously unknown to science. Many of them are exciting in their size, appearance, or living habits. "The species we have discovered show what magnificent surprises the Earth's rainforests can contain. The newly discovered Dolichomitus meii wasp is particularly interesting for its large size and unique colouring. With a quick glance, its body looks black but glitters electric blue in light. Moreover, its wings are golden yellow. Therefore, you could say it's like a flying jewel," says Postdoctoral Researcher Diego Pádua from the Instituto Nacional ...

The sense of smell in older adults declines when it comes to meat, but not vanilla

2021-07-01
Contrary to what science once suggested, older people with a declining sense of smell do not have comprehensively dampened olfactory ability for odors in general - it simply depends upon the type of odor. Researchers at the University of Copenhagen reached this conclusion after examining a large group of older Danes' and their intensity perception of common food odours. That grandpa and grandma aren't as good at smelling as they once were, is something that many can relate to. And, it has also been scientifically demonstrated. One's sense of smell gradually begins to decline from about the age of 55. Until now, it was believed that one's sense of smell broadly ...

Reducing plastic waste will require fundamental change in culture

2021-07-01
Plastic waste is considered one of the biggest environmental problems of our time. IASS researchers surveyed consumers in Germany about their use of plastic packaging. Their research reveals that fundamental changes in infrastructures and lifestyles, as well as cultural and economic transformation processes, are needed to make zero-waste shopping the norm. 96 percent of the German population consider it important to reduce packaging waste. Nevertheless, the private end consumption of packaging in Germany has increased continuously since 2009. At 3.2 million tons in 2018, the amount of plastic packaging waste generated by end consumers in Germany ...

Study with healthcare workers supports that immunity to SARS-CoV-2 is long-lasting

2021-07-01
One year after infection by SARS-CoV-2, most people maintain anti-Spike antibodies regardless of the severity of their symptoms, according to a study with healthcare workers co-led by the Barcelona Institute for Global Health (ISGlobal), the Catalan Health Institute (ICS) and the Jordi Gol Institute (IDIAP JG), with the collaboration of the Daniel Bravo Andreu Private Foundation. The results suggest that vaccine-generated immunity will also be long-lasting. One of the key questions to better predict the pandemic's evolution is the duration of natural immunity. A growing number of studies suggest that most people generate a humoral ...

Eruption of the Laacher See volcano redated

Eruption of the Laacher See volcano redated
2021-07-01
The eruption of the Laacher See volcano in the Eifel, a low mountain range in western Germany, is one of Central Europe's largest eruptions over the past 100,000 years. The eruption ejected around 20 cubic kilometers of tephra and the eruption column is believed to have reached at least 20 kilometers in height, comparable to the Pinatubo eruption in the Philippines in 1991. Technical advances in combination with tree remains buried in the course of the eruption now enabled an international research team to accurately date the event. Accordingly, the eruption of the Laacher See volcano occurred 13,077 years ago and thus 126 years earlier than previously assumed. This sheds new light on the climate history of the entire North ...

LAST 30 PRESS RELEASES:

Impact of pollutants on pollinators, and how neural circuits adapt to temperature changes

Researchers seek to improve advanced pain management using AI for drug discovery

‘Neutron Nexus’ brings universities, ORNL together to advance science

Early release from NEJM Evidence

UMass Amherst astronomer leads science team helping to develop billion-dollar NASA satellite mission concept

Cultivating global engagement in bioengineering education to train students skills in biomedical device design and innovation

Life on Earth was more diverse than classical theory suggests 800 million years ago, a Brazilian study shows

International clean energy initiative launches global biomass resource assessment

How much do avoidable deaths impact the economy?

Federal government may be paying twice for care of veterans enrolled in Medicare Advantage plans

New therapeutic target for cardiac arrhythmias emerges

UC Irvine researchers are first to reveal role of ophthalmic acid in motor function control

Moffitt study unveils the role of gamma-delta T cells in cancer immunology

Drier winter habitat impacts songbirds’ ability to survive migration

Donors enable 445 TPDA awards to Neuroscience 2024

Gut bacteria engineered to act as tumor GPS for immunotherapies

Are auditory magic tricks possible for a blind audience?

Research points to potential new treatment for aggressive prostate cancer subtype

Studies examine growing US mental health safety net

Social risk factor domains and preventive care services in US adults

Online medication abortion direct-to-patient fulfillment before and after the Dobbs v Jackson decision

Black, Hispanic, and American Indian adolescents likelier than white adolescents to be tested for drugs, alcohol at pediatric trauma centers

Pterosaurs needed feet on the ground to become giants

Scientists uncover auditory “sixth sense” in geckos

Almost half of persons who inject drugs (PWID) with endocarditis will die within five years; women are disproportionately affected

Experimental blood test improves early detection of pancreatic cancer

Groundbreaking wastewater treatment research led by Oxford Brookes targets global challenge of toxic ‘forever chemicals’

Jefferson Health awarded $2.4 million in PCORI funding

Cilta-cel found highly effective in first real-world study

Unleashing the power of generative AI on smart collaborative innovation network platform to empower research and technology innovation

[Press-News.org] New data science platform speeds up Python queries