PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Now, every biologist can use machine learning

New automated machine learning platform enables easy, all-in-one analysis, design, and interpretation of biological sequences with minimal coding

2023-06-21
(Press-News.org)

By Lindsay Brownell

(BOSTON) — The amount of data generated by scientists today is massive, thanks to the falling costs of sequencing technology and the increasing amount of available computing power. But parsing through all that data to uncover useful information is like searching for a molecular needle in a haystack. Machine learning (ML) and other artificial intelligence (AI) tools can dramatically speed up the process of data analysis, but most ML tools are difficult for non-ML experts to access and use. Recently, automated machine learning (AutoML) methods have been developed that can automate the design and deployment of ML tools, but they are often very complex and require a facility with ML that few scientists outside of the AI field have.

A group of scientists at the Wyss Institute for Biologically Inspired Engineering at Harvard University and MIT has now filled that unmet need by building a new, comprehensive AutoML platform designed for biologists with little to no ML experience. Their platform, called BioAutoMATED, can use sequences of nucleic acids, peptides, or glycans as input data, and its performance is comparable to other AutoML platforms while requiring minimal user input. The platform is described in a new paper published in Cell Systems and is available to download from GitHub.

“Our tool is for folks who don't have the ability to build their own custom ML models, who find themselves asking questions like, ‘I have this cool data set, will ML even work for it? How do I get it into an ML model? The complexity of ML is what's stopping me from going further with this data set, so how do I overcome that?’”, said co-first author Jackie Valeri, a graduate student in the lab of Wyss Core Faculty member Jim Collins, Ph.D. “We wanted to make it easy for biologists and experts in other domains to use the power of ML and AutoML to answer fundamental questions and help uncover biology that means something.”

AutoML for all

Like many great ideas, the seed that would become BioAutoMATED was planted not in the lab, but over lunch. Valeri and co-first authors Luis Soenksen, Ph.D. and Katie Collins were eating together at one of the Wyss Institute’s dining tables when they realized that despite the Institute’s reputation as a world-class destination for biological research, only a handful of the top experts working there were capable of building and training ML models that could greatly benefit their work. 

“We decided that we needed to do something about that, because we wanted the Wyss to be at the forefront of the AI biotech revolution, and we also wanted the development of these tools to be driven by biologists, for biologists,” said Soenksen, a Postdoctoral Fellow at the Wyss Institute who is also a serial entrepreneur in the science and technology space. “Now, everyone agrees that AI is the future, but four years ago when we got this idea, it wasn’t that obvious, particularly for biological research. So, it started as a tool that we wanted to build to serve ourselves and our Wyss colleagues, but now we know that it can serve much more.”

While various AutoML systems have already been developed to simplify the process of generating ML models from datasets, they typically have drawbacks; among them, the fact that each AutoML tool is designed to look at only one type of model (e.g., neural networks) when searching for an optimal solution. This limits the resulting model to a narrow set of possibilities, when in reality, a different type of model altogether may be more optimal. Another issue is that most AutoML tools aren’t designed specifically to take biological sequences as their input data. Some tools have been developed that use language models for analyzing biological sequences, but these lack automation features and are difficult to use. 

To build a robust all-in-one AutoML for biology, the team modified three existing AutoML tools that each use a different approach for generating models: AutoKeras, which searches for optimal neural networks; DeepSwarm, which uses swarm-based algorithms to search for convolutional neural networks; and TPOT, which searches non-neural networks using a variety of methods including genetic programming and self-learning. BioAutoMATED then produces standardized output results for all three tools, so that the user can easily compare them and determine which type produces the most useful insights from their data.

The team built BioAutoMATED to be able to take as inputs DNA, RNA, amino acid, and glycan (sugars molecules found on the surfaces of cells) sequences of any length, type, or biological function. BioAutoMATED automatically pre-processes the input data, then generates models that can predict biological functions from the sequence information alone. 

The platform also has a number of features that help users determine whether they need to gather additional data to improve the quality of the output, learn which features of a sequence the models “paid attention” to most (and thus may be of more biological interest), and design new sequences for future experiments.

Nucleotides and peptides and glycans, oh my!

To test-drive their new framework, the team first used it to explore how changing the sequence of a stretch of RNA called the ribosome binding site (RBS) affected the efficiency with which a ribosome could bind to the RNA and translate it into protein in E. coli bacteria. They fed their sequence data into BioAutoMATED, which identified a model generated by the DeepSwarm algorithm that could accurately predict translation efficiency. This model performed as well as models created by a professional ML expert, but was generated in just 26.5 minutes and only required ten lines of input code from the user (other models can require more than 750). They also used BioAutoMATED to identify which areas of the sequence seemed to be the most important in determining translation efficiency, and to design new sequences that could be tested experimentally.

They then moved on to trials of feeding peptide and glycan sequence data into BioAutoMATED and using the results to answer specific questions about those sequences. The system generated highly accurate information about which amino acids in a peptide sequence are most important in determining an antibody’s ability to bind to the drug ranibizumab (Lucentis), and also classified different types of glycans into immunogenic and non-immunogenic groups based on their sequences. The team also used it to optimize the sequences of RNA-based toehold switches, informing the design of new toehold switches for experimental testing with minimal input coding from the user. 

“Ultimately, we were able to show that BioAutoMATED helps people 1) recognize patterns in biological data, 2) ask better questions about that data, and 3) answer those questions quickly, all within a single framework - without having to become an ML expert themselves,” said Katie Collins, who is currently a graduate student at the University of Cambridge and worked on the project while an undergraduate at MIT. 

Any models predicted with the help of BioAutoMATED, as with any other ML tool, need to be experimentally validated in the lab whenever possible. But the team is hopeful that it could be further integrated into the ever-growing set of AutoML tools, one day extending its function beyond biological sequences to any sequence-like object, such as fingerprints. 

“Machine learning and artificial intelligence tools have been around for a while now, but it’s only with the recent development of user-friendly interfaces that they’ve exploded in popularity, as in the case of ChatGPT,” said Jim Collins, who is also the Termeer Professor of Medical Engineering & Science at MIT. “We hope that BioAutoMATED can enable the next generation of biologists to faster and more easily discover the underpinnings of life.”

“Enabling non-experts to use these platforms is critical for being able to harness ML techniques’ full potential to solve long-standing problems in biology, and beyond. This advance by the Collins team is a major step forward for making AI a key collaborator for biologists and bioengineers,” said Wyss Founding Director Don Ingber, M.D., Ph.D., who is also the also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and Boston Children’s Hospital, and the Hansjörg Wyss Professor of Bioinspired Engineering at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS).

Additional authors of the paper include George Cai from the Wyss Institute and Harvard Medical School; former Wyss Institute members Pradeep Ramesh, Rani Powers, Nicolaas Angenent-Mari, and Diogo Camacho; and Felix Wong and Timothy Lu from MIT. 

This research was supported by the Defense Threat Reduction Agency (grant HDTRA-12210032), the DARPA SD2 program, the Paul G. Allen Frontiers Group, the Wyss Institute for Biologically Inspired Engineering, an MIT-Takeda Fellowship, CONACyT grant 342369/408970, and an MIT-TATA Center fellowship (2748460).

PRESS CONTACT 

Wyss Institute for Biologically Inspired Engineering at Harvard University 
Lindsay Brownell, lindsay.brownell@wyss.harvard.edu, +1 617-432-8266 

MULTIMEDIA AVAILABLE 

### 

The Wyss Institute for Biologically Inspired Engineering at Harvard University is a research and development engine for disruptive innovation powered by biologically inspired engineering with visionary people at its heart. Our mission is to transform healthcare and the environment by developing ground-breaking technologies that emulate the way Nature builds and accelerate their translation into commercial products through formation of startups and corporate partnerships to bring about positive near-term impact in the world. We accomplish this by breaking down the traditional silos of academia and barriers with industry, enabling our world-leading faculty to collaborate creatively across our focus areas of diagnostics, therapeutics, medtech, and sustainability. 

 

 

END



ELSE PRESS RELEASES FROM THIS DATE:

University of Toronto Engineering researchers are using electric fields to control the movement of defects in crystals

University of Toronto Engineering researchers are using electric fields to control the movement of defects in crystals
2023-06-21
An international team of researchers, led by University of Toronto Engineering Professor Yu Zou, is using electric fields to control the motion of material defects. This work has important implications for improving the properties and manufacturing processes of typically brittle ionic and covalent crystals, including semiconductors — a crystalline material that is a central component of electronic chips used for computers and other modern devices.  In a new study published in Nature Materials, researchers from ...

Assessment of a peer support group intervention for undocumented Latinx immigrants with kidney failure

2023-06-21
About The Study: This study of 23 undocumented immigrants with kidney failure receiving emergency dialysis found that a peer support group intervention achieved feasibility and acceptability. The findings suggest that a peer support group may be a patient-centered strategy to build camaraderie and provide emotional support in kidney failure, especially for socially marginalized uninsured populations who report limited English proficiency.  Authors: Lilia Cervantes, M.D., of the University of Colorado, Anschutz Medical Campus, in Aurora, is the corresponding author.  To ...

Biodegradable gel shows promise for cartilage regeneration

Biodegradable gel shows promise for cartilage regeneration
2023-06-21
A gel that combines both stiffness and toughness is a step forward in the bid to create biodegradable implants for joint injuries, according to new UBC research. Mimicking articular cartilage, found in our knee and hip joints, is challenging. This cartilage is key to smooth joint movement, and damage to it can cause pain, reduce function, and lead to arthritis. One potential solution is to implant artificial scaffolds made of proteins that help the cartilage regenerate itself as the scaffold biodegrades. How well the cartilage regenerates is linked to how well a scaffold can mimic the biological properties of cartilage, and to date, researchers have struggled ...

New study in Nature Water demonstrates a vastly more sustainable, cost-effective method to desalinate industrial wastewater

New study in Nature Water demonstrates a vastly more sustainable, cost-effective method to desalinate industrial wastewater
2023-06-21
Vanderbilt researchers are part of a team that has developed a cutting-edge method that seeks to make the removal of salt from hypersaline industrial wastewater far more energy-efficient and cost-effective. While desalination through reverse osmosis has made tremendous strides—allowing for salt removal from seawater for less than a penny per gallon—it still falls short in eliminating saline in wastewater from industries like mining, oil and gas and power generation and in inland brackish water. The industrial brines are currently injected into deep geological formations or transferred to a evaporation ponds, and both disposal methods are facing more regulatory and ...

Researchers reveal mechanism of protection against breast and ovarian cancer

2023-06-21
In a new paper published today in Nature, researchers at the Francis Crick Institute have outlined the structure and function of a protein complex which is required to repair damaged DNA and protect against cancer. Every time a cell replicates, mistakes can happen in the form of mutations, but specialised proteins exist to repair the damaged DNA. People with mutations in a DNA repair protein called BRCA2 are predisposed to breast, ovarian and prostate cancers, which often develop at a young age. In the clinic, these cancers are treated with a drug that inhibits PARP, ...

Atoms realize a Laughlin state

Atoms realize a Laughlin state
2023-06-21
The discovery of the quantum Hall effects in the 1980's revealed the existence of novel states of matter called "Laughlin states", in honor of the American Nobel prize winner who successfully characterized them theoretically. These exotic states specifically emerge in 2D materials, at very low temperature and in the presence of an extremely strong magnetic field. In a Laughlin state, electrons form a peculiar liquid, where each electron dances around its congeners while avoiding them as much as possible. Exciting such a quantum liquid generates collective states that physicists associate to fictitious particles, whose ...

Ovarian cancer study identifies key genes for potential treatments

2023-06-21
New research is increasing our understanding about why some women with the most lethal form of ovarian cancer respond much better to treatment than others.  Researchers at Imperial College London have confirmed that the tumours of some women with high-grade serious ovarian cancer (HGSOC) contain a type of lymphoid tissue – known as tertiary lymphoid structures, or TLS – and that the presence of this tissue gives women a significantly better prognosis. They have also identified genes in HGSOC ...

Detection of an echo emitted by our Galaxy's black hole 200 years ago

Detection of an echo emitted by our Galaxys black hole 200 years ago
2023-06-21
An international team of scientists has discovered that Sagittarius A* (Sgr A*)1, the supermassive black hole at the centre of the Milky Way, emerged from a long period of dormancy some 200 years ago. The team, led by Frédéric Marin2, a CNRS researcher at the Astronomical Strasbourg Observatory (CNRS/University of Strasbourg), has revealed the past awakening of this gigantic object, which is four million times more massive than the Sun. Their work is published in Nature on 21 June. Over a period of one year at the beginning of the 19th century, the black ...

Study hints at how cancer immunotherapy can be safer

2023-06-21
New Haven, Conn. — Cancer immunotherapy has revolutionized treatment of many forms of cancer by unleashing the immune system response against tumors. Immunotherapies that block checkpoint receptors like PD-1, proteins that limit the capacity of T cells to attack tumors, have become the choice for the treatment of numerous types of solid cancer. However, the introduction of PD-1-blocking agents can often result in T cells attacking healthy tissues in addition to cancer cells, causing severe, sometimes life-threatening, side effects that can blunt the benefits of immunotherapy. A new study published by researchers ...

Drug-resistant fungi are thriving in even the most remote regions of Earth

Drug-resistant fungi are thriving in even the most remote regions of Earth
2023-06-21
New McMaster research has found that a disease-causing fungus — collected from one of the most remote regions in the world — is resistant to a common antifungal medicine used to treat infections. The study, published today in mSphere, showed that seven per cent of Aspergillus fumigatus samples collected from the Three Parallel Rivers region in Yunnan, China were drug resistant. Perched 6,000 metres above sea level and guarded by the staggering glaciated peaks of the Eastern Himalayas, the region is sparsely populated and undeveloped, which makes the presence of antimicrobial-resistant strains of A. fumigatus all the more striking for Jianping Xu, ...

LAST 30 PRESS RELEASES:

UT Health San Antonio ranks at the top 5% globally among universities for clinical medicine research

Fayetteville police positive about partnership with social workers

Optical biosensor rapidly detects monkeypox virus

New drug targets for Alzheimer’s identified from cerebrospinal fluid

Neuro-oncology experts reveal how to use AI to improve brain cancer diagnosis, monitoring, treatment

Argonne to explore novel ways to fight cancer and transform vaccine discovery with over $21 million from ARPA-H

Firefighters exposed to chemicals linked with breast cancer

Addressing the rural mental health crisis via telehealth

Standardized autism screening during pediatric well visits identified more, younger children with high likelihood for autism diagnosis

Researchers shed light on skin tone bias in breast cancer imaging

Study finds humidity diminishes daytime cooling gains in urban green spaces

Tennessee RiverLine secures $500,000 Appalachian Regional Commission Grant for river experience planning and design standards

AI tool ‘sees’ cancer gene signatures in biopsy images

Answer ALS releases world's largest ALS patient-based iPSC and bio data repository

2024 Joseph A. Johnson Award Goes to Johns Hopkins University Assistant Professor Danielle Speller

Slow editing of protein blueprints leads to cell death

Industrial air pollution triggers ice formation in clouds, reducing cloud cover and boosting snowfall

Emerging alternatives to reduce animal testing show promise

Presenting Evo – a model for decoding and designing genetic sequences

Global plastic waste set to double by 2050, but new study offers blueprint for significant reductions

Industrial snow: Factories trigger local snowfall by freezing clouds

Backyard birds learn from their new neighbors when moving house

New study in Science finds that just four global policies could eliminate more than 90% of plastic waste and 30% of linked carbon emissions by 2050

Breakthrough in capturing 'hot' CO2 from industrial exhaust

New discovery enables gene therapy for muscular dystrophies, other disorders

Anti-anxiety and hallucination-like effects of psychedelics mediated by distinct neural circuits

How do microbiomes influence the study of life?

Plant roots change their growth pattern during ‘puberty’

Study outlines key role of national and EU policy to control emissions from German hydrogen economy

Beloved Disney classics convey an idealized image of fatherhood

[Press-News.org] Now, every biologist can use machine learning
New automated machine learning platform enables easy, all-in-one analysis, design, and interpretation of biological sequences with minimal coding