Researchers explore the human immune system by looking at the active components, namely the various genes and cells involved. But there is a broad range of these, and observations necessarily produce vast amounts of data. For the first time, researchers including those from the University of Tokyo built a software tool which leverages artificial intelligence to not only offer a more consistent analysis of these cells at speed but also categorizes them and aims to spot novel patterns people have not yet seen.
Our immune system is important — it’s impossible to imagine complex life existing without it. This system, comprising different kinds of cells, each playing a different role, helps to identify things that threaten our health, and take actions to defend us. They are both very effective, but also far from perfect; hence, the existence of diseases such as the notorious acquired immunodeficiency syndrome, or AIDS. And recent earth-shattering issues, such as the coronavirus pandemic, serve to highlight the importance of research around this intricate yet powerful system.
One key branch of research in immunology involves the identification of immune system components and ascertaining their function. Doing this through manual observation would be impossible due to the time it would take, and some automated tools exist, but have limitations around accuracy, consistency or flexibility. To this end, a team of researchers led by Professor Tatsuhiko Tsunoda from the University of Tokyo’s Department of Biological Sciences rose to the challenge and developed a system to boost immunology research.
“We present scHDeepInsight, an AI-based framework for rapidly and consistently identifying immune cells from the RNA of cells. Instead of viewing all cell types as unrelated, the system reflects the natural hierarchy of the immune system,” said lead researcher Shangru Jia. “By turning cellular genetic profiles into images and applying a hierarchy-aware AI, known as a convolutional neural network, or CNN, it can distinguish both broad immune cell types and finer subtypes, and it can do so more consistently than previous attempts. In our benchmark, labeling about 10,000 cells only took a few minutes, whereas manual marker-based annotation can take many hours to days. In comparison with other automated methods, run time is in a similar range. The main advantages are the consistency of predictions across the hierarchy and the improved accuracy gained from incorporating hierarchical labels, rather than raw speed alone.”
There are three main aspects to scHDeepInsight. Hierarchical learning, whereby the model mirrors the immune system’s ‘family tree,’ can distinguish both broad immune categories and finer subtypes. Image-based representation transforms gene data into 2D images so the CNN can capture subtle relationships between genes more effectively than by looking at tables of raw data. And analytics built into the system can highlight which genes contribute most to a behavior, and these can be checked against known markers to see how they align with past observations.
“A spreadsheet of gene numbers misses how genes relate to each other. When we map genes to pixels in an image so that related genes are placed nearby, the result is an image with meaningful structure. Image-recognition models such as CNNs are very good at detecting such patterns, allowing them to capture complex relationships between genes that are hard to learn from raw tables,” said Jia. “The main challenge was balancing performance across both broad cell types and detailed subtypes, especially for rare cell populations. We addressed this by adapting the training process, so the model paid more attention to the categories that were harder to distinguish, reducing the risk of overlooking small but important subtypes.”
scHDeepInsight is primarily a research tool rather than a full diagnostic system, partly due to its infancy, but mainly as the model is only trained on healthy cells. By applying it to patients’ samples, researchers can see where they deviate from a healthy baseline. Such deviations may provide clues for further study, but medical interpretation requires additional validation. So this development will aid in fundamental research throughout the field of immunology, but it might take time before descendants of scHDeepInsight find their way into diagnostic systems.
“Studies where immune changes are important, including cancer immunology, infections and autoimmune conditions, can benefit from more reliable cell labels. Since our model is trained on healthy immune cells, its immediate value is in providing a consistent healthy baseline for comparison. Disease-related shifts can then be measured relative to this baseline, but clinical interpretation requires validation in each context,” said Jia. “Generalization and validation are key. Clinical samples are diverse, so the model must be tested across varied trials and protocols. Integration into clinical workflows, regulatory requirements for transparency and reproducibility are also essential before routine use. For research use today, scHDeepInsight is already available as a downloadable package — researchers can readily apply it in their own analyses. Broader validation and clinical integration remain goals for the future.”
Work on scHDeepInsight has not finished. The team aims to improve its abilities and features, taking it beyond immune system-related cellular identification and into other biological domains. Ultimately, they hope to validate the system for use as a tool for clinical research by using precise immune system profiling to support studies of disease. And there’s also the matter of its capacity to spot novel cell types.
“For each cell, the model outputs probabilities at both the broad type and subtype levels. If confidence is high for the broad lineage but low for all known subtypes within that lineage, the cell may represent a potentially novel state. In test analyses of brain immune datasets, this probability pattern helped highlight regions that were rich in specialized microglia cells residing in the central nervous system,” said Jia. “AI models reflect their training data. If a reference atlas is incomplete, some rare or context-specific populations can be misclassified or underrepresented. Predictions must therefore be interpreted with caution and validated experimentally. Our design emphasizes transparency to support careful, evidence-based use.”
###
Journal: Shangru Jia, Artem Lysenko, Keith A Boroevich, Alok Sharma, Tatsuhiko Tsunoda, “scHDeepInsight: A Hierarchical Deep Learning Framework for Precise Immune Cell Annotation in Single-Cell RNA-seq Data”, Briefings in Bioinformatics, DOI: 10.1093/bib/bbaf523
Funding: This work was partly funded by JSPS KAKENHI Grant Numbers 24K15175, 25KJ1104, JP20H03240 and JP25K02261, Japan and JST CREST Grant Number JPMJCR2231.
Useful links:
Graduate School of Science - https://www.s.u-tokyo.ac.jp/en/
Department of Biological Sciences - https://www.bs.s.u-tokyo.ac.jp/english/
Research Contacts:
Professor Tatsuhiko Tsunoda
Department of Biological Sciences, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, JAPAN
tsunoda@bs.s.u-tokyo.ac.jp
Press contact:
Mr. Rohan Mehra
Public Relations Group, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
press-releases.adm@gs.mail.u-tokyo.ac.jp
About The University of Tokyo:
The University of Tokyo is Japan's leading university and one of the world's top research universities. The vast research output of some 6,000 researchers is published in the world's top journals across the arts and sciences. Our vibrant student body of around 15,000 undergraduate and 15,000 graduate students includes over 5,000 international students. Find out more at www.u-tokyo.ac.jp/en/ or follow us on X (formerly Twitter) at @UTokyo_News_en.
END