(Press-News.org)
Reinforcement learning (RL) is a machine learning technique that trains software by mimicking the trial-and-error learning process of humans. It has demonstrated considerable success in many areas that involve sequential decision-making. However, training RL models with real-world online tests is often undesirable as it can be risky, time-consuming and, importantly, unethical. Thus, using offline datasets that are naturally collected through past operations is becoming increasingly popular for training and evaluating RL and bandit policies.
In particular, in practical applications, the Off-Policy Evaluation (OPE) method is used to first filter the most promising candidate policies, called “top-k policies,” from an offline logged dataset, and then use more reliable real-world tests, called online A/B tests, to choose the final policy. To evaluate the effectiveness of different OPE estimators, researchers have primarily focused on metrics such as the mean-squared error (MSE), RankCorr and Regret. However, these methods solely focus on the accuracy of OPE methods while failing to evaluate the risk-return tradeoff during online policy deployment. Specifically, MSE and RankCorr fail to differentiate whether near-optimal policies are underestimated or poor-performing policies are overestimated, while Regret focuses only on the best policy and overlooks the possibility of harming the system due to sub-optimal policies in online A/B tests.
Addressing this issue, a team of researchers from Japan, led by Professor Kazuhide Nakata from Tokyo Institute of Technology, developed a new evaluation metric for OPE estimators. “Risk-return measurement is crucial in ensuring safety in risk-sensitive scenarios such as finance. Inspired by the design principle of the financial risk assessment metric, Sharpe ratio, we developed SharpeRatio@k, which measures both potential risk and return in top-k policy selection,” explains Prof. Nakata. The study was published in the Proceedings of the ICLR 2024 Conference.
SharpeRatio@k treats the top-k policies selected by an OPE estimator as a policy portfolio, similar to financial portfolios, and measures the risk, return and efficiency of the estimator based on the statistics of the portfolio. In this method, a policy portfolio is considered efficient when it contains policies that greatly improve performance (high return) without including poorly performing policies that negatively affect learning in online A/B tests (low risk). This method maximises return and minimises risk, thereby identifying the safest and most efficient estimator.
The researchers demonstrated the capabilities of this novel metric through example scenarios and benchmark tests and compared it with existing metrics. Testing revealed that SharpeRatio@k effectively measures the risk, return and overall efficiency of different estimators under varying online evaluation budgets, while existing metrics fail to do so. Additionally, it also addresses the overestimation and underestimation of policies. Interestingly, they also found that while in some scenarios it aligns with existing metrics, a better value of these metrics does not always result in a better SharpeRatio@k value.
Through these benchmarks, the researchers also suggested several future research directions for OPE estimators, including the need to use SharpeRatio@k for efficiency assessment of OPE estimators and the need for new estimators and estimator selection methods that account for risk-return tradeoffs. Furthermore, they also implemented their innovative metric in an open-source software for a quick, accurate and insightful evaluation of OPE.
Highlighting the importance of the study, Prof. Nakata concludes, “Our study shows that SharpreRatio@k can identify the appropriate estimator to use in terms of its efficiency under different behaviour policies, providing useful insight for a more appropriate estimator evaluation and selection in both research and practice.”
Overall, this study enhances policy selection through OPE, paving the way for improved reinforcement learning.
###
Related link:
SCOPE-RL document
SCOPE-RL open link
###
About Tokyo Institute of Technology
Tokyo Tech stands at the forefront of research and higher education as the leading university for science and technology in Japan. Tokyo Tech researchers excel in fields ranging from materials science to biology, computer science, and physics. Founded in 1881, Tokyo Tech hosts over 10,000 undergraduate and graduate students per year, who develop into scientific leaders and some of the most sought-after engineers in industry. Embodying the Japanese philosophy of “monotsukuri,” meaning “technical ingenuity and innovation,” the Tokyo Tech community strives to contribute to society through high-impact research.
https://www.titech.ac.jp/english/
END
Researchers at the University of Arizona College of Medicine – Tucson were awarded a $1.8 million grant by the National Institute of General Medical Sciences, a division of the National Institutes of Health, to learn how human papillomavirus makes its way to a cell’s nucleus.
Human papillomavirus, or HPV, which can cause warts and certain cancers, has been with us since the dawn of humanity and causes about 5% of cancers worldwide. It also is an important source of information about human biology, according to Samuel K. Campos, PhD, an associate professor of immunobiology at the University of Arizona College of Medicine ...
A new study has found there are no adverse long-term cardiovascular health consequences for the now-adult children of mothers who were given corticosteroids because they were at risk of early birth in a landmark trial conducted in Auckland, New Zealand, 50 years ago.
The Auckland Steroid Study by obstetrician Professor Graham ‘Mont’ Liggins and paediatrician colleague Dr Ross Howie from 1969 to 1974 in Green Lane Hospital, Auckland, found that two corticosteroid injections given to pregnant women at risk of early (preterm) birth halved the incidence of respiratory distress in the babies and significantly reduced neonatal deaths.
Co-author of the new study, Dr ...
Active military service may heighten a woman’s risk of having a low birthweight baby, suggests a review of the available scientific evidence published online in the journal BMJ Military Health.
The findings highlight the need for more research specifically focused on women in the armed forces, and their reproductive health in particular, conclude the study authors.
Worldwide, increasing numbers of women are on active service in their country’s armed forces. The UK Armed Forces, for example, has set a target of 30% female representation by 2030. And more and more countries are deploying women in combat ...
National clinical guidelines for the treatment of COVID-19 vary significantly around the world, with under-resourced countries the most likely to diverge from gold standard (World Health Organization; WHO) treatment recommendations, finds a comparative analysis published in the open access journal BMJ Global Health.
And nearly every national guideline recommends at least one treatment proven not to work, the analysis shows.
Significant variations in national COVID-19 treatment recommendations have been suspected since the advent of the pandemic, but these haven’t been ...
Health concerns are still the primary motive for more than half of those who say they want to stop smoking in England, but cost is now a key factor for more than 1 in 4, finds an analysis of national survey responses, published in the open access journal BMJ Public Health.
Given this shift in thinking, making much more of the potential savings to be had might encourage more people to stub out for good, suggest the researchers.
Health concerns are generally the primary motive for people trying to stop smoking, with social and ...
Anti-N-methyl-d-aspartate (Anti-NMDA) receptor encephalitis is an acute
autoimmune disorder that develops both neurological symptoms and psychiatric
symptoms, including hallucination, cognitive disturbance, epilepsy, movement
disorder, and impaired consciousness. This disease may be misdiagnosed at the early
stage as a psychosis disease because of primary psychiatric symptoms. The
misdiagnosis may delay appropriate therapeutic intervention. Most patients with
anti-NMDA receptor encephalitis respond to immunotherapy [1, 2].
The pathology of this disease is ...
A new case report published in the peer-reviewed OMICS: A Journal of Integrative Biology describes how longitudinal multi-omics monitoring (LMOM) helped to detect a precancerous pancreatic tumor and led to a successful surgical intervention. Click here to read the article now.
The patient had undergone annual blood-based LMOM, in which 143 endogenous metabolites in serum and a panel of 140 proteins in plasma were measured. David Wishart, PhD, from the University of Alberta, ...
Working memory is one of the brain’s executive functions, a skill that allows humans to process information without losing track of what they’re doing.
In the short term, working memory allows the brain to complete an immediate task, like loading the dishwasher. Long term, it helps the brain decide what to store for future use, such as whether more dishwasher soap will be needed.
University of Texas at Arlington researchers know that working memory varies greatly among individuals, but they aren’t sure exactly why. To better understand, Matthew Robison, assistant professor of psychology, and doctoral student Lauren D. Garner conducted an experiment to see if ...
Bentham Science is pleased to announce that two of its journals, Current Nanomaterials and Current Analytical Chemistry, have been officially indexed in the EI Compendex.
EI Compendex is one of the most comprehensive subject-specific literature databases, encompassing high-quality research articles from prominent engineering and applied science journals worldwide.
Current Nanomaterials, a leading peer-reviewed journal devoted to the exploration and dissemination of cutting-edge research in the field of nanomaterials, covers a broad spectrum ...
The fierce competition between China and the United States of America for control of emerging technologies such as AI and 5G will determine the international balance of power, a new study says.
Developments in quantum computing, the Internet of Things, and Big Data have transformed the global order and have led to new alliances and dynamics, the analysis shows.
Forming new allies has become imperative for the USA because the country cannot address the challenges posed by China in isolation. This has involved sharing sensitive advanced technologies with national security and ...