PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

New data science platform speeds up Python queries

2021-07-01
(Press-News.org) PROVIDENCE, R.I. [Brown University] -- Researchers from Brown University and MIT have developed a new data science framework that allows users to process data with the programming language Python -- without paying the "performance tax" normally associated with a user-friendly language.

The new framework, called Tuplex, is able to process data queries written in Python up to 90 times faster than industry-standard data systems like Apache Spark or Dask. The research team unveiled the system in research presented at SIGMOD 2021, a premier data processing conference, and have made the software freely available to all.

"Python is the primary programming language used by people doing data science," said Malte Schwarzkopf, an assistant professor of computer science at Brown and one of the developers of Tuplex. "That makes a lot of sense. Python is widely taught in universities, and it's an easy language to get started with. But when it comes to data science, there's a huge performance tax associated with Python because platforms can't process Python efficiently on the back end."

Platforms like Spark perform data analytics by distributing tasks across multiple processor cores or machines in a data center. That parallel processing allows users to deal with giant data sets that would choke a single computer to death. Users interact with these platforms by inputting their own queries, which contain custom logic written as "user-defined functions" or UDFs. UDFs specify custom logic, like extracting the number of bedrooms from the text of a real estate listing for a query that searches all of the real estate listings in the U.S. and selects all the ones with three bedrooms.

Because of its simplicity, Python is the language of choice for creating UDFs in the data science community. In fact, the Tuplex team cites a recent poll showing that 66% of data platform users utilize Python as their primary language. The problem is that analytics platforms have trouble dealing with those bits of Python code efficiently.

Data platforms are written in high-level computer languages that are compiled before running. Compilers are programs that take computer language and turn it into machine code -- sets of instructions that a computer processor can quickly execute. Python, however, is not compiled beforehand. Instead, computers interpret Python code line by line while the program runs, which can mean far slower performance.

"These frameworks have to break out of their efficient execution of compiled code and jump into a Python interpreter to execute Python UDFs," Schwarzkopf said. "That process can be a factor of 100 less efficient than executing compiled code."

If Python code could be compiled, it would speed things up greatly. But researchers have tried for years to develop a general-purpose Python compiler, Schwarzkopf says, with little success. So instead of trying to make a general Python compiler, the researchers designed Tuplex to compile a highly specialized program for the specific query and common-case input data. Uncommon input data, which account for only a small percentage of instances, are separated out and referred to an interpreter.

"We refer to this process as dual-case processing, as it splits that data into two cases," said Leonhard Spiegelberg, co-author of the research describing Tuplex. "This allows us to simplify the compilation problem as we only need to care about a single set of data types and common-case assumptions. This way, you get the best of two worlds: high productivity and fast execution speed."

And the runtime benefit can be substantial.

"We show in our research that a wait time of 10 minutes for an output can be reduced to a second," Schwarzkopf said. "So it really is a substantial improvement in performance."

In addition to speeding things up, Tuplex also has an innovative way of dealing with anomalous data, the researchers say. Large datasets are often messy, full of corrupted records or data fields that don't follow convention. In real estate data, for example, the number of bedrooms could either be a numeral or a spelled-out number. Inconsistencies like that can be enough to crash some data platforms. But Tuplex extracts those anomalies and sets them aside to avoid a crash. Once the program has run, the user then has the option of repairing those anomalies.

"We think this could have a major productivity impact for data scientists," Schwarzkopf said. "To not have to run out to get a cup of coffee while waiting for an output, and to not have a program run for an hour only to crash before it's done would be a really big deal."

INFORMATION:

The research was supported by the National Science Foundation (DGE-2039354, IIS-1453171) and the U.S. Air Force (FA8750-19-2-1000).



ELSE PRESS RELEASES FROM THIS DATE:

G-quadruplex-forming DNA molecules enhance enzymatic activity of myoglobin

G-quadruplex-forming DNA molecules enhance enzymatic activity of myoglobin
2021-07-01
A collaboration led by Distinguished Professor Dr. Kazunori Ikebukuro from Tokyo University of Agriculture and Technology (TUAT), Japan, discovered that G-quadruplex (G4)-forming DNA binds myoglobin through a parallel-type G4 structure. Through the G4 binding, the enzymatic activity of myoglobin increases over 300-fold compared to that of myoglobin alone (Figure). This finding indicates that DNA may work as a carrier of genetic information in living organisms and act as a regulator of unknown biological phenomena. "Aptamers" are nucleic acid-based synthetic ligands that can be used against many target molecules with high affinity and specificity. Some aptamers that bind to proteins ...

Catalyzing the conversion of biomass to biofuel

Catalyzing the conversion of biomass to biofuel
2021-07-01
Zeolites are extremely porous materials: Ten grams can have an internal surface area the size of a soccer field. Their cavities make them useful in catalyzing chemical reactions and thus saving energy. An international research team has now made new findings regarding the role of water molecules in these processes. One important application is the conversion of biomass into biofuel. Fuel made from biomass is considered to be climate-neutral, although energy is still needed to produce it: The desired chemical reactions require high levels of temperature and pressure. "If ...

New algorithms give digital images more realistic color

2021-07-01
WASHINGTON -- If you've ever tried to capture a sunset with your smartphone, you know that the colors don't always match what you see in real life. Researchers are coming closer to solving this problem with a new set of algorithms that make it possible to record and display color in digital images in a much more realistic fashion. "When we see a beautiful scene, we want to record it and share it with others," said Min Qiu, leader of the Laboratory of Photonics and Instrumentation for Nano Technology (PAINT) at Westlake University in China. "But we don't want to see a digital photo or video with the wrong colors. Our new algorithms can help digital camera and electronic display developers better adapt their ...

Closing the gap on the missing lithium

Closing the gap on the missing lithium
2021-07-01
There is a significant discrepancy between theoretical and observed amounts of lithium in our universe. This is known as the cosmological lithium problem, and it has plagued cosmologists for decades. Now, researchers have reduced this discrepancy by around 10%, thanks to a new experiment on the nuclear processes responsible for the creation of lithium. This research could point the way to a more complete understanding of the early universe. There is a famous saying that, "In theory, theory and practice are the same. In practice, they are not." This holds true in every academic domain, but it's especially common in cosmology, the study of the entire ...

Manufacturing the core engine of cell division

Manufacturing the core engine of cell division
2021-07-01
A wonder of nature As a human cell begins division, its 23 chromosomes duplicate into identical copies that remain joined at a region called the centromere. Here lies the kinetochore, a complicated assembly of proteins that binds to thread-like structures, the microtubules. As mitosis progresses, the kinetochore gives green light to the microtubules to tear the DNA copies apart, towards the new forming cells. "The kinetochore is a beautiful, flawless machine: You almost never lose a chromosome in a normal cell!", says Musacchio. "We already know the proteins that constitute it, yet important questions about how the kinetochore works are still open: How does it rebuild itself during chromosome replication? ...

Dolichomitus meii wasp discovered in Amazonia is like a flying jewel

Dolichomitus meii wasp discovered in Amazonia is like a flying jewel
2021-07-01
Researchers at the Biodiversity Unit of the University of Turku, Finland, study insect biodiversity particularly in Amazonia and Africa. In their studies, they have discovered hundreds of species previously unknown to science. Many of them are exciting in their size, appearance, or living habits. "The species we have discovered show what magnificent surprises the Earth's rainforests can contain. The newly discovered Dolichomitus meii wasp is particularly interesting for its large size and unique colouring. With a quick glance, its body looks black but glitters electric blue in light. Moreover, its wings are golden yellow. Therefore, you could say it's like a flying jewel," says Postdoctoral Researcher Diego Pádua from the Instituto Nacional ...

The sense of smell in older adults declines when it comes to meat, but not vanilla

2021-07-01
Contrary to what science once suggested, older people with a declining sense of smell do not have comprehensively dampened olfactory ability for odors in general - it simply depends upon the type of odor. Researchers at the University of Copenhagen reached this conclusion after examining a large group of older Danes' and their intensity perception of common food odours. That grandpa and grandma aren't as good at smelling as they once were, is something that many can relate to. And, it has also been scientifically demonstrated. One's sense of smell gradually begins to decline from about the age of 55. Until now, it was believed that one's sense of smell broadly ...

Reducing plastic waste will require fundamental change in culture

2021-07-01
Plastic waste is considered one of the biggest environmental problems of our time. IASS researchers surveyed consumers in Germany about their use of plastic packaging. Their research reveals that fundamental changes in infrastructures and lifestyles, as well as cultural and economic transformation processes, are needed to make zero-waste shopping the norm. 96 percent of the German population consider it important to reduce packaging waste. Nevertheless, the private end consumption of packaging in Germany has increased continuously since 2009. At 3.2 million tons in 2018, the amount of plastic packaging waste generated by end consumers in Germany ...

Study with healthcare workers supports that immunity to SARS-CoV-2 is long-lasting

2021-07-01
One year after infection by SARS-CoV-2, most people maintain anti-Spike antibodies regardless of the severity of their symptoms, according to a study with healthcare workers co-led by the Barcelona Institute for Global Health (ISGlobal), the Catalan Health Institute (ICS) and the Jordi Gol Institute (IDIAP JG), with the collaboration of the Daniel Bravo Andreu Private Foundation. The results suggest that vaccine-generated immunity will also be long-lasting. One of the key questions to better predict the pandemic's evolution is the duration of natural immunity. A growing number of studies suggest that most people generate a humoral ...

Eruption of the Laacher See volcano redated

Eruption of the Laacher See volcano redated
2021-07-01
The eruption of the Laacher See volcano in the Eifel, a low mountain range in western Germany, is one of Central Europe's largest eruptions over the past 100,000 years. The eruption ejected around 20 cubic kilometers of tephra and the eruption column is believed to have reached at least 20 kilometers in height, comparable to the Pinatubo eruption in the Philippines in 1991. Technical advances in combination with tree remains buried in the course of the eruption now enabled an international research team to accurately date the event. Accordingly, the eruption of the Laacher See volcano occurred 13,077 years ago and thus 126 years earlier than previously assumed. This sheds new light on the climate history of the entire North ...

LAST 30 PRESS RELEASES:

Prostate cancer screening as good as breast cancer screening, say researchers

AI expert and industry leading toxicologist Thomas Hartung hails launch of agentic AI platform a “transformative moment” in chemical safety science

The RESIL-Card tool launches across Europe to strengthen cardiovascular care preparedness against crises

Tools to glimpse how “helicity” impacts matter and light

Smartphone app can help men last longer in bed

Longest recorded journey of a juvenile fisher to find new forest home

Indiana signs landmark education law to advance data science in schools

A new RNA therapy could help the heart repair itself

The dehumanization effect: New PSU research examines how abusive supervision impacts employee agency and burnout

New gel-based system allows bacteria to act as bioelectrical sensors

The power of photonics

From pioneer to leader: Alex Zhavoronkov chairs precision aging discussion and presents Luminary Award to OpenAI president at PMWC 2026

Bursting cancer-seeking microbubbles to deliver deadly drugs

In a South Carolina swamp, researchers uncover secrets of firefly synchrony

American Meteorological Society and partners issue statement on public availability of scientific evidence on climate change

How far will seniors go for a doctor visit? Often much farther than expected

Selfish sperm hijack genetic gatekeeper to kill healthy rivals

Excessive smartphone use associated with symptoms of eating disorder and body dissatisfaction in young people

‘Just-shoring’ puts justice at the center of critical minerals policy

A new method produces CAR-T cells to keep fighting disease longer

Scientists confirm existence of molecule long believed to occur in oxidation

The ghosts we see

ACC/AHA issue updated guideline for managing lipids, cholesterol

Targeting two flu proteins sharply reduces airborne spread

Heavy water expands energy potential of carbon nanotube yarns

AMS Science Preview: Mississippi River, ocean carbon storage, gender and floods

High-altitude survival gene may help reverse nerve damage

Spatially decoupling active-sites strategy proposed for efficient methanol synthesis from carbon dioxide

Recovery experiences of older adults and their caregivers after major elective noncardiac surgery

Geographic accessibility of deceased organ donor care units

[Press-News.org] New data science platform speeds up Python queries