Scientists studied optimal multi-impulse linear rendezvous via reinforcement learning

2023-09-12

(Press-News.org)

Multi-impulse orbital rendezvous is a classical spacecraft trajectory optimization problem, which has been widely studied for a long time. Numerical optimization methods, deeplearning (DL) methods, reinforcement learning (RL) methods have been proposed. However, for the numerical optimization methods, they need long computation time, and they are usually not valid for the many-impulse rendezvous case with the magnitude constraints. For the machine learning (ML) methods, the DL method needs large amounts of data, and the RL method has the weakness of low efficiency. Nevertheless, ML demonstrates more accurate predictions for the short-term horizon, whereas RL for the longer term. Combining the advantages of both, a policy can be pretrain differently. In a research paper recently published in Space: Science & Technology, researchers from Harbin Institute of Technology proposed a reinforcement learning-based approach to design the multi-impulse rendezvous trajectories in linear relative motions, enabling the rapid generation of rendezvous trajectories through the offline training and the on-board deployment.

First, authors provide the mathematical model describing a multi-impulse linear rendezvous problem and the RL algorithms used, and present the RL-based approach to rendezvous design. For the multi-impulse linear rendezvous problem, the relative motion for rendezvous is typically represented by the 2-body linear relative motion equations. Based on the linear equations, constrained optimization is used to solve the multi-impulse rendezvous problems, where the optimization variables are the impulse vector and the impulse time. The objective function is the total velocity increment for the fuel-optimal orbital rendezvous problem, while the objective function is the rendezvous time for the time-optimal orbital rendezvous problem. In addition, the impulse magnitude constraints, the time constraints, and the terminal distance constraint are also formulated. For RL, the goal is to train a policy π(a|s) that learns how to map the states s and the actions a in order to maximize the reward signal ℛ(s,a) for an agent interacting with its environment. To this end, the multi-impulse rendezvous problem is considered as a fully observable Markov decision process (MDP). In this RL, the actor-critic (AC) architecture is adopted for its state-of-the-art performance on a wide variety of complex control problems. Moreover, the advantage-weighted AC (AWAC) algorithm is used to accelerate RL by using a smaller expert dataset. AWAC can always approach expert performance faster than SAC for all dataset sizes tested, and AWAC can approach a better performance with smaller expert datasets compared with IL. To sum up, assuming that the spacecraft can be maneuvered according to its current state, with Markovian properties, the rendezvous design is formulated as an RL problem. The state vector s is formulated to reflect the state of the spacecraft and relevant problem variables. The policy network π(a|s) outputs an action based on the state. The action vector a contains the impulse and the coasting period. ℛ, the reward at a single timestep for the fuel-optimal orbital rendezvous problem or the instantaneous reward of the MDP, is defined. In addition, in order to get a closer terminal distance, a semi-analytical method is used by combining with the RL approach. The overall algorithmic scheme is shown in Fig. 2.

Then, authors examine the proposed method for rendezvous missions in four scenarios. As for fuel-optimal orbital rendezvous in the random initial states, the random eccentricity of the target orbit satisfies a uniform distribution in [0.65,0.75] and the perigee height of the target is set to be 500 km. The maximum number of maneuvers is set to be 6, and the impulse magnitude limit is set to be 5 m/s. An expert dataset of 1,000 trajectories generated by DE is used to speed up the training of RL, and results show that the algorithm using the expert dataset can converge in fewer timesteps. One hundred of experiments are done in random initial states with different maximum distances to evaluate the performance of the RL-based approach. Compared with the DE algorithm, the fuel consumption of the RL-based approach is increased by about 10%; however, the computation time is less than 0.1% of that of the DE algorithm. As for the fuel-optimal orbital rendezvous in the fixed initial state, a special case is used where both the target and the chaser are moving near a geostationary transfer orbit. Two cases are considered: (1) the 6-impulse rendezvous where the solution of the RL policy is compared with that of the DE, and (2) the 20-impulse rendezvous, where the control variables generated by the RL policy are used as initial values of the SQP for further optimization. Figure 8 illustrates the magnitude of each impulse in the 2 solutions. The last 3 impulses of the SQP solution are almost zero, i.e., the fuel-optimal orbital rendezvous is achieved with 16 impulses. In comparison, the RL-based solution has a more uniform variation of impulse magnitude. As for the time-optimal orbital rendezvous in the random initial states, the scenario parameters of the experiment are the same as those in the fuel-optimal orbital rendezvous. The RL-based approach requires only 0.02% of the computation time to obtain a feasible solution with only 15% less reward than the numerical optimization. As for the time-optimal orbital rendezvous in the fixed initial state, the 6-impulse and 20-impulse orbital rendezvous problems are also used for evaluation. Table 5 shows the coasting time and the velocity increment of each maneuver for both approaches. Since the policy network tends to learn general laws, the RL-based solution has a more uniform variation of control variables.

Finally, authors make the conclusions. Conclusion includes some concluding remarks. In this study, separate reward functions are designed for the fuel-optimal and time-optimal objectives. The numerical results show that the trained agents can design the optimal multi-impulse rendezvous maneuvers with different objectives at a random initial state. The proposed approach is effective for arbitrary multi-impulse rendezvous near elliptical orbits, especially in the case of a large number of impulses. The proposed approach can quickly produce feasible solutions that are slightly worse than the global optimization methods, making it an attractive choice in time-sensitive situations. The rendezvous trajectory generated by the trained agent can also be used as the initial value for further optimization. The offline training agent can be deployed on spacecraft due to its short computation time advantage.

END

[Attachments] See images for this press release:

Scientists studied optimal multi-impulse linear rendezvous via reinforcement learning 2

Scientists studied optimal multi-impulse linear rendezvous via reinforcement learning 3

ELSE PRESS RELEASES FROM THIS DATE:

SwRI engineers recognized with international AOC awards

2023-09-12

SAN ANTONIO — September 12, 2023 —The Association of Old Crows (AOC), an international organization for the electronic warfare (EW) community, has recognized three early-career Southwest Research Institute engineers for their achievements in EW research and development. Two honorees received back-to-back Electronic Warfare Professional Outstanding Young Crow Awards. AOC named one engineer a 2023 Future 5, a designation for innovative professionals building EW careers. EW technology detects and defeats enemy signals on the electromagnetic spectrum to protect U.S. and allied forces. Recipients of the international AOC EW Professional Outstanding Young Crow Award demonstrate outstanding ...

NIH grant to fund network of data warehouses for Baton Rouge health research institutions

2023-09-12

BATON ROUGE – The National Institutes of Health has awarded the Louisiana Clinical & Translational Science Center, or LA CaTS, a grant of nearly $1.3 million to support the efforts of in-state healthcare institutions to share health data for research purposes across a common structure. Awarded as a single grant, the funds will be primarily split between two LA CaTS member institutions: nearly $780,000 for the Pennington Biomedical Research Center and $490,000 for Tulane University School of Medicine. Together, these two projects will strengthen the LA CaTS Center’s capacity to address health care disparities, ...

New photonic neural networks promise ultrafast computing for complex tasks

2023-09-12

Photonic neural network systems, which are fast and energy efficient, are especially helpful for dealing with large amounts of data. To advance photonic brain-like computing technologies, a group of researchers at the University of Strathclyde combined a spike-based neural network with a semiconductor laser that exhibits spiking neuronal behaviors. Recently, they presented high-performance photonic spiking neural network operation with lower training requirements and introduced a novel training scheme for getting better results. This research was published Aug. 29 in Intelligent Computing, a Science Partner Journal. Neural ...

Lee McIntyre ("Post-Truth" "How to Talk to a Science Denier") Returns with "On Disinformation"

2023-09-12

September 12th, 2023 For immediate release Fom the bestselling author of Post-Truth and How to Talk to a Science Denier, comes On Disinformation: How to Fight for Truth and Protect Democracy The effort to destroy facts and make America ungovernable didn't come out of nowhere. It is the culmination of seventy years of strategic denialism. In On Disinformation, Lee McIntyre shows how the war on facts began, and how ordinary citizens can fight back against the scourge of disinformation that is now threatening the very ...

More people develop sepsis than we thought — but more survive

2023-09-12

Sepsis, also colloquially referred to as blood poisoning, is a serious condition. Just over 3,000 people die with a diagnosis of sepsis in Norwegian hospitals each year. However, sepsis is not actually poisoning at all. The condition occurs when the immune system overreacts to an infection that can be caused by bacteria, viruses, fungi or parasites. The immune system attacks the organs of the body and the patient develops organ failure. A new study of 300,000 sepsis admissions has found that the condition is more prevalent than previously thought. However, ...

Mount Sinai researchers develop novel, automated measure of sleep studies to determine severity of obstructive sleep apnea

2023-09-12

Mount Sinai researchers have developed a novel, automated measure of analyzing sleep studies to determine the severity and risk of mortality in patients with obstructive sleep apnea, a chronic sleep disorder that affects about 30 million people in the United States. The study findings, which provide a validated tool to better manage sleep apnea and promote preventive care, were published in the American Journal of Respiratory and Critical Care Medicine on September 12. The Mount Sinai Sleep and Circadian Analysis (SCAN) Group developed an automated breath-by-breath measure called ventilatory burden that assesses the proportion of small breaths during a routine sleep study. This ...

Fall snow levels can predict a season's total snowpack in some western states

2023-09-12

Spring break can be a good time for ski trips — the days are longer and a little warmer. But if people are booking their spring skiing trips the fall before, it's hard to know which areas will have the best snow coverage later in the season. Researchers who study water resources also want to know how much snow an area will get in a season. The total snowpack gives scientists a better idea of how much water will be available for hydropower, irrigation and drinking later in the year. A team led by researchers at the University of ...

Ten superintendents drive national initiative to champion health in schools

2023-09-12

With the start of the new school year, ten school system superintendents from coast to coast are working with the American Heart Association, a global force for healthier lives for all, to improve the health and well-being of students, families and educators nationwide. These top volunteer leaders, who are members of the association’s 2023-2024 national Superintendent Council will focus on providing guidance on how schools across the country can combat challenges that affect physical and mental well-being – contemporary issues like ...

City of Hope receives $100 million gift to create first-of-its-kind national integrative oncology program

2023-09-12

Photos, b-roll and video available for download in electronic media kit: https://t.ly/RRu-V. (Credit: City of Hope) Event photos will be available at https://dam.gettyimages.com/assignments/city-of-hope-receives-100-million-gift. (Event photo credit: Getty Images for City of Hope) LOS ANGELES — City of Hope, one of the largest cancer research and treatment organizations in the United States, today announced a $100 million gift from Andrew and Peggy Cherng, philanthropists, co-founders and co-CEOs of Panda Express, to create a first-of-its-kind, national integrative oncology program that brings together Eastern and Western ...

A combination of cancer inhibitors shows success in slowing tumor growth

2023-09-12

An international team of researchers has demonstrated that a combination of inhibitors may suppress tumor growth and prevent relapse in patients with certain cancers, including head and neck squamous cell carcinoma and lung adenocarcinoma. Their findings support the future development of innovative therapeutic approaches targeting these cancers. The team’s work is published in the journal Oncogene on August 17, 2023. Scientists know that in humans and other mammals, the Hippo signaling pathway plays a key role in the rapid ...

Scientists studied optimal multi-impulse linear rendezvous via reinforcement learning

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: