(Press-News.org)
Medical databases are undergoing rapid expansion, with the number of observed values and variable types continuously increasing, resulting in increasingly rich data content. This growth leads to a significant expansion in the size of individual data files, encompassing both an increase in the number of rows (length) and the number of columns (width). For instance, the chartevents file in the MIMIC 3.0 database boasts hundreds of millions of records, and the numeric file in the Amsterdam Critical Care Database version 1.0.2 is similarly large. In contrast, the core file of the first wave (Wave 1) of the ELSA dataset contains only 12,099 records but includes up to 4,484 variables.
Querying, cleaning, and processing such large-scale databases present numerous challenges. Currently, common data access methods like SQL queries, distributed storage system conversion, and specialized data platforms have their drawbacks. SQL queries, while versatile as a standard method, require users to master the SQL language, have a steep learning curve, and are inefficient for complex queries. Distributed storage systems such as Hadoop offer strong scalability but come with high deployment and maintenance costs, necessitating specialized technical teams, which makes them unsuitable for most clinical researchers, leaving ordinary researchers struggling to independently apply such systems for analysis.
To address these challenges, this paper proposes an innovative "slicing + dictionary" data processing strategy, based on the theory of data decomposition and restructuring, extended and applied to specific scenarios in medical big data. This method effectively reduces the amount of data processed in a single operation, constructs an efficient indexing system through preprocessing, and maintains clinical relevance during data decomposition and reorganization.
The strategy comprises two core components: data slicing and dictionary construction. Data slicing employs a multi-dimensional strategy to adapt to different clinical research needs, including clinical dimension slicing (dividing by parameter types like vital signs), event dimension slicing (constructing data views around key clinical events such as surgery), and hybrid dimension slicing (creating composite slices by combining multiple features). The slicing granularity can be flexibly adjusted, and it draws inspiration from distributed database sharding technology while incorporating clinical semantic considerations. Slicing is classified into vertical (for row-dominated data) and horizontal (for column-heavy data), with both applicable simultaneously for extremely large datasets.
Dictionary construction acts as a bridge between user query intentions and data slices, featuring an encoding-description-location-attribute structure, a multi-level classification system, synonym mapping, and cross-database compatibility. It draws on the experience of the Unified Medical Language System (UMLS) in integrating biomedical terminology, enabling researchers to retrieve data using standardized clinical terms.
The core advantages of this method include reduced resource requirements, improved query efficiency, enhanced flexibility, and cross-database universality, directly addressing the limitations of traditional analysis methods, as supported by relevant research.
However, it faces limitations such as slice design trade-offs, update and maintenance costs, support for non-standard queries, and initial setup effort, though long-term benefits may offset the initial investment. Future research will focus on automated slicing optimization, dictionary self-learning, cloud-based deployment models, and integration with AI/ML workflows, with the method expected to become more intelligent and usable with technological advancements.
In conclusion, the "slicing + dictionary" method offers a new paradigm for addressing large-scale medical database access challenges, reducing technical barriers and resource requirements, improving efficiency and flexibility, and empowering ordinary researchers. It holds promise for advancing medical research, promoting the democratization of medical big data, and optimizing medical resource allocation, with future work focusing on practical implementation, performance validation, and optimization for different scenarios.
END
A new study maps the planetary boundary of “functional biosphere integrity” in spatial detail and over centuries. It finds that 60 percent of global land areas are now already outside the locally defined safe zone, and 38 percent are even in the high-risk zone. The study was led by the Potsdam Institute for Climate Impact Research (PIK) together with BOKU University in Vienna and published in the renowned journal One Earth.
Functional biosphere integrity refers to the plant world’s ability to co-regulate ...
America’s youth mental health crisis has escalated to the point that thousands of children primarily suffering from suicide-related behaviors and depression are stuck in hospital emergency rooms for three days or more, according to new research from Oregon Health & Science University.
The study, published today in the journal JAMA Health Forum, examined Medicaid claims data from 2022.
Among 255,000 hospital emergency department visits for mental health conditions involving Medicaid-enrolled kids, more than 1 in 10 visits resulted in children being “boarded” — ...
About The Study: The results of this cross-sectional analysis showed significant variation in the prices and affordability of 549 essential medicines across 72 markets in 2022. Strategies to promote equitable drug prices and improve drug affordability are urgently needed.
Corresponding Author: To contact the corresponding author, Olivier J. Wouters, PhD, email olivier_wouters@brown.edu.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamahealthforum.2025.2043)
Editor’s ...
Kyoto, Japan -- As space programs evolve and we continue to mistreat our own planet, human dreams of space tourism and planetary colonization seem increasingly common. However, features of spaceflight such as gravitational changes and circadian rhythm disruption -- not to mention radiation -- take a toll on the body, including muscle wasting and decreased bone density. These may even affect our ability to produce healthy offspring.
Studying the impact of spaceflight on germ cells -- egg and sperm precursor cells -- is particularly important because they ...
FastUKB is an innovative tool specifically developed to streamline and enhance research workflows utilizing the UK Biobank, effectively addressing key limitations of existing platforms such as the UK Biobank Research Analysis Platform (RAP). One of its most notable features is its breakthrough bulk data extraction functionality, which transforms traditionally complex coding tasks into intuitive click operations. This is made possible through a user-friendly interface equipped with dropdown menus and a hierarchical variable tree structure, allowing researchers to effortlessly navigate and select the data they need. Unlike RAP, which restricts ...
New York, NY (August 15, 2025) Mount Sinai is celebrating its 13th year as the official hospital and medical services provider of the US Open Tennis Championships, which begins with Fan Week from Monday, August 18, through Saturday, August 23, and continues with the Main Draw Sunday, August 24, through Sunday, September 7. It is also Mount Sinai’s 11th year in this role for the U.S. teams for the Billie Jean King Cup and Davis Cup events.
Mount Sinai, one of the largest academic medical systems in New York, will continue to provide the highest level of health care in orthopedics, sports medicine, emergency medicine, musculoskeletal radiology, and more to ...
A multi-institutional team led by Weill Cornell Medicine has received a five-year, $14.9 million grant from the National Institute of Allergy and Infectious Diseases, part of the National Institutes of Health, to find ways to remove latent HIV from the cells of individuals with HIV. The team aims to use a personalized medicine approach to transform the management of HIV into effective cures.
Over 40 million people worldwide are living with HIV, according to the World Health Organization. People with ...
Topological insulators (TIs) have fundamentally reshaped our understanding of materials by introducing robust boundary states arising from bulk topological invariants. Extending this paradigm, higher-order topological insulators (HOTIs), characterized by boundary states of dimension at least two lower than the bulk, have attracted significant attention. However, conventional HOTI realizations mainly rely on discrete, lattice-engineered tight-binding models, which constrain their experimental accessibility ...
A research team led by Dr. Juyeon Jung at the Bio-Nano Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), has developed a nanobody-based technology that can precisely identify and attack only lung cancer cells, opening new possibilities for cancer therapy.
This breakthrough addresses the limitations of conventional chemotherapy by reducing harmful side effects while maximizing cancer cell-killing efficiency. In particular, it shows remarkable therapeutic potential for lung adenocarcinoma, a subtype of non-small cell lung cancer (NSCLC).
Lung cancer is one of the world’s deadliest diseases, claiming millions of lives each year. Among its types, ...
CAMBRIDGE, MA -- Using artificial intelligence, MIT researchers have come up with a new way to design nanoparticles that can more efficiently deliver RNA vaccines and other types of RNA therapies.
After training a machine-learning model to analyze thousands of existing delivery particles, the researchers used it to predict new materials that would work even better. The model also enabled the researchers to identify particles that would work well in different types of cells, and to discover ways to incorporate new types of ...