Fujitsu & team debuts Fugaku-LLM; large language model trained on supercomputer “Fugaku”
- Large language model with enhanced Japanese language ability was developed using Japanese supercomputing technology
- Distributed parallel learning by maximizing the performance of the supercomputer “Fugaku”
- Enhanced Japanese language ability, for use in research and business
- Commercial use is permitted, which will lead to innovative research and business applications such as AI for Science
A team of researchers in Japan has released Fugaku-LLM, a large language model (1) with enhanced Japanese language capability, using the RIKEN supercomputer Fugaku. The team is led by Professor Rio Yokota of Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, Shota Sasaki of CyberAgent, Inc, and Noriyuki Kojima of Kotoba Technologies Inc.
To train large language models on Fugaku, the researchers developed distributed training methods, including porting the deep learning framework Megatron-DeepSpeed to Fugaku in order to optimize the performance of Transformers on Fugaku. They accelerated the dense matrix multiplication library for Transformers, and optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication library on the Tofu interconnect D.
Fugaku-LLM has 13 billion parameters (2) and is larger than the 7-billion-parameter models that have been developed widely in Japan. Fugaku-LLM has enhanced Japanese capabilities, with an average score of 5.5 on the Japanese MT-Bench (3), the highest performance among open models that are trained using original data produced in Japan. In particular, the benchmark performance for humanities and social sciences tasks reached a remarkably high score of 9.18.
Fugaku-LLM was trained on proprietary Japanese data collected by CyberAgent, along with English data, and other data. The source code of Fugaku-LLM is available on GitHub (4) and the model is available on Hugging Face (5). Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license.
In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.
Acknowledgement
This research was supported by the Fugaku policy-supporting proposal "Development of Distributed Parallel Training for Large Language Models Using Fugaku" (proposal number: hp230254).
[1] Large language model : Models the probability with which text appears and can predict the text (response) that follows a given context (query).
[2] Parameter : A measure of the size of a neural network. The more parameters, the higher the performance of the model, but the more data is required for training.
[3] Japanese MT-Bench : Benchmark test provided by Stability AI
[4] GitHub : https://github.com/ - Platform used to publish open source software
[5] Hugging Face : https://huggingface.co/ - Platforms used to publish AI datasets
[6] ChatGPT : A large language model developed by OpenAI, which has brought about a major social change, surpassing 100 million users in about two months after its release.
[7] GPU : Originally produced as an accelerator for graphics, but has recently been used to accelerate deep learning
[8] Continual learning : A method for performing additional training on a large language model that has already been trained. Used for training language models in different languages or domains.
Comments
Post a Comment