At the end of June, Yandex
released a neural network with 100 billion parameters called YaLM 100B to the public . It is the largest GPT-like neural network in the public domain. It tells about how they taught, showed the best examples and what the neuron is capable of. But is it so good in practice and applicable at home? The article is silent about this, moreover, it is not so easy to run and check it, since approximately 200 Gb of GPU RAM is required. This comment on Habré
reveals the situation most accurately
Allegedly, in Yandex, all such smart people, and they didn’t even post a normal How-to. There is no api for a large model, there is no ready-made stripped-down medium or small model for ordinary people (in Google Colab). No example is given on how to set up the model, how to generate text. It’s just that the article indicates a couple of nuances for nerds and that’s it. It is enough to take a closer look at how the bank did it with the letter “C” and do the same. I got the impression that this model is just one of the failed experiments that was a pity to throw in the trash, so it was posted in Open Source to show what great models Yandex creates, and moreover, it is open source!
There are a lot of questions on the Internet how to run yalm or even try online, but there are no answers to this. I was among the users who asked these questions. And set about figuring it out. Since I really needed a way to generate texts for financial robots. So that they can predict not only the values, but also comment on it in text, based on financial reports. In essence, it will be the same as what financial analysts do, only with the use of artificial intelligence. There are two ways to run yalm.
Rent a server in the cloudwith 200+ Gb GPU RAM or modify the code and run with deepspeed zero offload (when the GPU sequentially processes part of the neural network, and the rest is stored in CPU RAM or NVMe). The first one is very expensive, about 2500 rubles per hour or 1.7 million per month. The second unknown, because the code in the repository is not provided, only
hints in the issue of the repository, which is not difficult to do. Let’s start simple.
- YaLM 100B Launch Instructions
- 1. We rent 200 GB GPU RAM, for example here .
- 2. Clone the repository with YaLM
- 3. Download checkpoints (basic model training information)
- 4. Install nvidia – docker2
- 5. Building a container for YaLM
- 6. Prepare content
- 6.1 Checkpoints
- 6.2 Video cards
- 7. Run the docker container
- 8. Run the example from YaLM 100B
- 9. Results of the work
- How to run YaLM without 200Gb GPU RAM?
- Summing up
YaLM 100B Launch Instructions
1. We rent 200 GB GPU RAM, for example here .
You need at least 200 GB of total video memory. 8×40 = 320 GB. Only this one fits. Less than 200 is impossible, more is possible. The arrow indicates the CPU RAM, we do not look at it. She can be anyone.
We indicate a disk of about 300 GB, so that with a spare and preferably a fast disk, because. tens of gigabytes of data will be transferred to and from it.
When creating in sources, select Ubuntu ML (Machine Learning). This is mandatory so that the video cards are configured and nothing needs to be installed additionally.
After the server is activated (it may take 5-10 minutes), connect to the server via ssh or directly in the web console on the server page and execute the command.
The result should be a table with video cards, driver version and cuda. Approximately like this.
In the driver version header and where. On the left side are the device numbers, in the center is the size of the device memory. If you do not have this information, then you have collected the server from the wrong source. Ubuntu ML (Machine Learnong) is required, as described above.
2. Clone the repository with YaLM
sudo git clone https://github.com/yandex/YaLM-100B/ yalm
Clone to your home folder so you don’t have to edit the docker config afterwards. If cloned somewhere else, then
go here and add the path to where cloned.
3. Download checkpoints (basic model training information)
sudo chmod +x ./download/download.sh
sudo bash ./download/download.sh
This will take about an hour. In order not to waste time in vain, we create a new ssh connection and in parallel we start building a docker container.
4. Install nvidia – docker 2
Normal docker is not suitable,
nvidia-docker2 is needed .
5. Building a container for YaLM
sudo chmod +x ./docker/*
sudo bash ./docker/build.sh
It’s also about an hour.
Life hack. You can download checkpoints, install docker and build a container on a cheap server with one video card. It will be the same in time, so you can save a little. After assembly on a cheap server, we delete it, and create a combat server using a disk from a cheap server. Then you will not overpay the time for waiting for the assembly and pumping out checkpoints.
6. Prepare content
After the download of checkpoints is over, you need to slip them into the configs. There are two ways, correct parameters or transfer checkpoints. Everywhere it is expected that the checkpoints will be in the main directory of the project, respectively, what has been downloaded must be transferred from the download folder above. Being in the yalm folder execute
mv ./download/yalm100b_checkpoint ./
Or change the paths to the files in the example files
6.2 Video cards
We check that the video cards are correctly set. If you have eight video cards, then nothing needs to be changed. If the number is different, then we change these lines
In the second line, the numbers of the devices used (you can look at them in nvidia-smi, which you have already launched). In the fourth, their number.
7. Run the docker container
Being in the yalm folder, execute the command
sudo bash ./docker/run.sh
If everything is OK, then you will be taken to a container in which you need to go to the yalm folder in your home directory.
8. Run the example from YaLM 100B
We are ready to launch one of the examples. They are described
chmod +x ./examples/generate_interactive.sh
Be patient, it remains to wait another 10-15 minutes until the GPT model is created and the weights from the checkpoints are loaded.
When the build finishes, MegatronML will prompt you to enter a context to generate text. Be careful when you type. Under certain circumstances, an error occurs, the program crashes and you need to start the assembly again. Therefore, it is better to use examples that take text from a file.
9. Results of the work
Looks interesting. Of course, these are just good examples. I ran the test on different samples. As expected, the better the context, the more meaningful text will be generated. The full set of experimental generations can be viewed at the links:
For the price, it cost me about 9 thousand rubles for renting servers of different capacities from training and from preparation to generation. A particular disappointment was that you can not instantly generate everything. It takes a very long time to start and the text does not generate as quickly as we would like, given the cost of the server per hour.
How to run YaLM without 200Gb GPU RAM?
You need to add deepspeed zero offload to the config. For those who know what we are talking about, it will be very easy to do it. For others, this is not a trivial task at all. It is important to know that offload can be either in CPU RAM or NVMe. You can forget about NVMe at the moment, because. a very large amount of data is being processed and the disk can not cope with it. Zero offload CPU is more real. True, for this you need to have 200+ Gb CPU RAM in stock, which is also not cheap. And one text will be generated for about 20-40 minutes, since it has not yet been possible to parallelize it on two video cards. As you can see in the screenshot below, only one video card was involved in the generation, and then only for a quarter of the memory. It remains to be seen why all 24 GB are not used,
Well, in conclusion, I will say that it is possible to run even on one RTX 3070 TI. But there is no particular sense in this, because. NVMe will not allow you to quickly process 150 GB of data in the swap, which are in the appendage of 96 GB of RAM.
Of course, I will still try to find the optimal launch paths. But so far I have come to the conclusion that YaLM 100b is too expensive / too slow for my tasks. For the same money, people will write much more and much better. But I think it’s temporary, we’ll see. If you need help with launching, setting up yalm, or want to see the results on your context examples, write to the mail or telegram.