High-temperature fusion plasma experiments conducted in the Large Helical Device (LHD) of the National Institute for Fusion Science (NIFS), have renewed the world record for an acquired data amount, 0.92 terabytes (TB) per experiment, in February 2022, by using a full range of state-of-the-art plasma diagnostic devices*1. The International Thermonuclear Experimental Reactor (ITER), which is currently under construction in France through the international collaboration of seven parties, is expected to generate approximately 1 TB of data per experiment in ten years, and LHD is currently the only experiment in the world that produces data closely aligned to ITER.
The promotion of “Open Science,” in which large-scale research data assets are utilized and shared across society, was adopted as a joint statement at the G7 meeting held in Sendai, Japan in 2023. NIFS started full-fledged efforts toward Open Science by establishing the “Open Access Policy” in February 2022 and the “Research Data Policy” in October 2022. Since 2023, all the data obtained from LHD experiments are open to the public immediately after acquisition and analysis is completed. All computing program source codes for data analysis are also openly available.
In Open Science, the FAIR Principle is regarded as an important indicator*2. NIFS considers the fulfilment of the FAIR requirements in diagnostic raw and analyzed data, i.e., valuable digital assets of the LHD project, to be an important proposition of the LHD Academic Research Platform and continues its efforts.
Although LHD experiment data has become one of the world’s largest data assets and is widely used by domestic and international fusion plasma researchers, it has been seldom used for other purposes such as in different research fields or in industry. This may be due to (1) the difficulty of finding the data of interest from a wide variety of experiment data, and (2) the enormous number and the huge size of individual data, which make it difficult to start data analysis easily and quickly.
In order to solve these problems, it is expected that (1)’ a comprehensive, bird’s-eye view of huge amounts of experiment data are enabled, and (2)’ the data-analysis environment can be easily prepared to start analyses instantly, and data computing resources can be increased or decreased as necessary.
Research Achievements
LHD experiment data is a large-scale digital asset. To promote its use by researchers in different fields, industry, and the general public, a computer environment that can be easily used by anyone is necessary. An important possibility exists in “cloud services” technology. Cloud services provide an environment in which data analyses can be started immediately, enabling researchers, industry, and even citizen users to make use of data very effectively. Now, NIFS has been adopted for the “Amazon Web Services (AWS) Open Data Sponsorship Program*3”, and has completed the data transfer of about 2 petabytes of LHD experiment data*4 onto AWS’s cloud storage, Amazon Simple Storage Service (Amazon S3) *3, to make them freely accessible to anyone on the Internet (Figure 1).
A computing environment capable of running a suite of data analysis programs is also indispensable for the utilization of vast open data. LHD data replicated entirely on AWS’s cloud storage can now be accessed directly from AWS cloud computers for high-performance, massive data analyses at any time. It is also a major advantage for the promotion of Open Science that Amazon S3 enables us to provide a reliable, nonstop data service, independent of the NIFS system and network capabilities.
Unlike other research fields, such as global environmental, meteorological, and astronomical observations, where international research data sharing has already been taking place for more than a few decades, there has been little international data collaboration or sharing in fusion energy research and development, especially in the experimental field. This is because experimental results often differ from one device to another, making it difficult to simply compare and evaluate them. The LHD open data represents the world’s first major step towards interdisciplinarity and universalization of fusion energy research.
The results will be presented orally at the 14th IAEA Technical Meeting on Control Systems, Data Acquisition, Data Management and Remote Participation in Fusion Research to be held in São Paulo, Brazil, July 15-19, 2024.
Significance of Achievements and Future Developments
The LHD diagnostic raw and analyzed database, which is the world’s largest accumulation of fusion energy research data, is a very valuable digital research asset. By making all of it as open data on the AWS cloud, it is expected that the database will not only be used for research purposes within and outside fusion research, but will also attract participation from the general public and new entrants from other countries and industries that wish to start new fusion energy research and development. The barriers for first entry are expected to be lowered significantly. In addition, it is expected to be a major digital platform for research knowledge exchange, human exchange and development not only in Japan but also elsewhere in the world. For this purpose, NIFS intensively promotes this large data repository under the name of the “Plasma and Fusion Cloud*5”, by using the NII RDC, the research data cloud platform of the National Institute of Informatics.
In the future, to advance Open Science principles, we have just started assigning a global persistent identifier, DOI (Digital Object Identifier)*6, to about 40 million LHD data to facilitate their findability and accessibility. It may take three to four years to complete registration, due to the extremely large number of data entities. However, when all the data is registered, it is expected to be the largest number of publicly available research data DOIs in the world, exceeding the current world leaders such as Geoscience Australia (approximately 7 million DOIs), CERN (approx. 6.7 million), and the Interdisciplinary Earth Data Alliance (IEDA) in the USA (approx. 5 million).
Comments from Amazon Web Services Japan
The following comment is given by Ushio Usami, the country leader for AWS worldwide public sector in Japan.
“We are very pleased to be able to contribute to the utilization of fusion energy in collaboration with the National Institute for Fusion Science. I hope that this open data will be utilized not only in the academic research field in Japan, but also by industries around the world to promote technological innovation in various scientific fields.”
For more information, refer to the following article on the AWS blog.
https://aws.amazon.com/jp/blogs/news/25years-huge-fusion-experiment-data-fully-open-on-s3-via-odp-2024/ .
Comments from National Institute of Informatics
The following comment is given by Dr. Keiichi Nakano, Chief Researcher for Cyber Science Infrastructure of Research Center for Open Science and Data Platform, National Institute of Informatics (NII), and also the Program Manager of the “Developing a Research Data Ecosystem for the Promotion of Data-Driven Science”.
“In this achievement, the research data infrastructure (NII Research Data Cloud: NII RDC) that we have built was used as a function for utilizing huge amounts of data. We are delighted that the NII RDC was able to contribute to the practical implementation of Open Science, which will have a global impact. We hope to continue to deepen our collaboration with NIFS and contribute to the development of global Open Science through this research data.”
[Glossary]
*1 LHD experiment …
A physics experiment whose fully superconducting coils generate a helical magnetic field in which to hold high-temperature plasma. While the magnetic field can be steadily maintained, it can very frequently conduct short pulse experiments, and even sustain them over long periods. Acquired data tends to grow in size. (cf. Appendix fig.)
*2 FAIR Principles ...
A set of principles to make research data "Findable," "Accessible," "Interoperable," and "Reusable." It is a common international indicator to show how far Open Science requirements are met. (cf. https://doi.org/10.18908/a.2019112601 )
*3 Amazon Web Services (AWS) Open Data Sponsorship Program ...
An AWS program providing free storage space on Amazon S3 for open data of any scientific field. Amazon S3 is an AWS cloud storage service, an object storage service that provides high scalability, data availability, security and performance.
*4 Terabyte, Petabyte ...
A unit of data size. A petabyte is 10 to the 15th power (1 × 1015) of an alphanumeric character (= 1 byte).
1 PB (petabyte) = 1,000 TB (terabyte) = 1,000,000 GB (gigabyte).
2 petabytes of data require approximately 40,000 of 50 GB Blu-ray discs.
*5 Plasma and Fusion Cloud ...
A next-generation interdisciplinary research data ecosystem proposed and promoted by NIFS. It aims to integrate experimental data, theoretical model calculations, high-performance supercomputers, and computational programs into a single “digital system”, based on the Research Data Cloud (RDC) framework and infrastructure provided by the National Institute of Informatics (NII).
*6 DOI (Digital Object Identifier) ...
A digital persistence identifier attached to research papers and data objects. DOIs are used to identify, search, and cite them, similar to the ISBN for books and ISSN for journals and magazines.
Research Support:
This research is supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) under the contract with the National Institute of Informatics (NII), which is implementing the “Research Data Ecosystem Development Project to Promote the Use of AIs” and has been selected as use case creation project No.2023-6, “Construction and Development of ‘Plasma and Fusion Cloud’, an Open Utilization Platform for Fusion Research Data”. The “Plasma and Fusion Cloud” is being constructed using the framework and various services of the Research Data Cloud (RDC) infrastructure, promoted by the National Institute of Informatics (NII).
END