Umuyoboro wa Neural YaLM 100B mubikorwa.

Программирование

Mu mpera za Kamena, Yandex
yasohoye umuyoboro w’imitsi ufite miliyari 100 zitwa YaLM 100B ku baturage . Ninini nini ya GPT imeze nkumuyoboro rusange. Ivuga uburyo bigishije, yerekanye ingero nziza nicyo neuron ishoboye. Ariko nibyiza cyane mubikorwa kandi birakoreshwa murugo? Ingingo iracecetse kubyerekeye, byongeye kandi, ntabwo byoroshye gukora no kuyigenzura, kubera ko hafi 200 Gb ya RAM ya GPU isabwa. Iki gitekerezo kuri Habré
kigaragaza uko ibintu bimeze neza
.

Bavuga ko, muri Yandex, abantu nkabo bose bafite ubwenge, kandi ntibanashyizeho uburyo busanzwe-Kuri. Nta api kuri moderi nini, ntanuburyo bwateguwe bwambuwe hasi cyangwa buto buto kubantu basanzwe (muri Google Colab). Nta karorero gatangwa muburyo bwo gushiraho icyitegererezo, uburyo bwo kubyara inyandiko. Nukuba ingingo yerekana ibice bibiri kubitekerezo kandi nibyo. Birahagije kwitegereza neza uko banki yabikoze inyuguti “C” kandi igakora kimwe. Nabonye ko iyi moderi ari imwe gusa mubigeragezo byatsinzwe byababaje guta imyanda, nuko yashyizwe muri Open Source kugirango yerekane icyitegererezo gikomeye Yandex akora, kandi byongeye, ni isoko ifunguye!

Hano haribibazo byinshi kuri enterineti uburyo bwo gukora yalm cyangwa no kugerageza kumurongo, ariko nta gisubizo cyibi. Nari mubakoresha babajije ibi bibazo. Kandi ushireho kubimenya. Kubera ko nari nkeneye rwose uburyo bwo kubyara inyandiko za robo yimari. Kugira ngo bashobore guhanura indangagaciro gusa, ariko banabisobanure mubyanditswe, bashingiye kuri raporo yimari. Mubyukuri, bizaba kimwe nibyo abasesengura imari bakora, gusa hakoreshejwe ubwenge bwubukorikori. Hariho inzira ebyiri zo gukoresha yalm.
Gukodesha seriveri mu gicuhamwe na 200+ Gb GPU RAM cyangwa uhindure code hanyuma ukore hamwe na zeru yihuta ya zeru (iyo GPU ikurikiranye itunganya igice cyurusobe rw’imitsi, naho ibindi bikabikwa muri CPU RAM cyangwa NVMe). Iya mbere ihenze cyane, hafi 2500 ku isaha cyangwa miliyoni 1.7 ku kwezi. Iya kabiri itazwi, kubera kode mu bubiko ntabwo yatanzwe, gusa
irerekana mubibazo byububiko, bitagoye gukora. Reka dutangire byoroshye.

YaLM 100B Amabwiriza yo Gutangiza

1. Dukodesha RAM 200 GB GPU, urugero hano .

Umuyoboro wa Neural YaLM 100B mubikorwa.

Ukeneye byibura 200 GB yububiko bwa videwo yose. 8×40 = 320 GB. Gusa iyi irahuye. Abatageze kuri 200 ntibishoboka, ibindi birashoboka. Umwambi werekana RAM RAM, ntabwo tuyireba. Ashobora kuba umuntu uwo ari we wese.

Twerekana disiki igera kuri 300 GB, kugirango hamwe nibisanzwe kandi nibyiza disiki yihuta, kuko. mirongo ya gigabytes yamakuru azoherezwa kuri no kuva.

Umuyoboro wa Neural YaLM 100B mubikorwa. Mugihe urema mumasoko, hitamo Ubuntu ML (Kwiga Imashini). Ibi ni itegeko kugirango amakarita ya videwo agizwe kandi ntakintu na kimwe gikeneye gushyirwaho byongeye.

Mugihe cyo gukora seriveri, hari nuance hamwe na kwota, urashobora kumva ko ibikoresho bitaboneka, ariko mubyukuri ukeneye kongera ibipimo mugenamiterere. Seriveri imaze gukora (birashobora gufata iminota 5-10), ihuza seriveri ukoresheje ssh cyangwa muburyo butaziguye kurubuga rwa page kurupapuro rwa seriveri hanyuma ukore itegeko.

nvidia-smi

Ibisubizo bigomba kuba imbonerahamwe ifite amakarita ya videwo, verisiyo yubushoferi na cuda. Hafi nkibi.
Umuyoboro wa Neural YaLM 100B mubikorwa. Muri verisiyo yumushoferi umutwe naho. Kuruhande rwibumoso hari nimero yibikoresho, hagati ni ubunini bwibikoresho byibikoresho. Niba udafite aya makuru, noneho wakusanyije seriveri kuva isoko itariyo. Ubuntu ML (Machine Learnong) irakenewe, nkuko byasobanuwe haruguru.

2. Koresha ububiko hamwe na YaLM

sudo git clone https://github.com/yandex/YaLM-100B/ yalm
cd yalm

Clone kububiko bwurugo kugirango udakenera guhindura docker config nyuma. Niba byakoronijwe ahandi, noneho
jya hano wongere inzira igana clon.

3. Kuramo ibirindiro (amakuru y’ibanze y’amahugurwa y’icyitegererezo)

sudo chmod + x ./kururutsa/kurupapuro.sh
sudo bash

Ibi bizatwara isaha imwe. Kugirango tudatakaza umwanya kubusa, dushiraho ssh nshya ihuza kandi mugihe kimwe turatangira kubaka kontineri ya docker.

4. Shyiramo nvidiadocker 2

Ubusanzwe docker ntabwo ikwiye,
nvidia-docker2 irakenewe .
https://docs.nvidia.com/datacenter/igicu- kavukire

5. Kubaka kontineri ya YaLM

cd yalm
sudo chmod + x ./docker/*
sudo bash ./docker/build.sh

N’isaha imwe.

Ubuzima. Urashobora gukuramo ibirindiro, gushiraho docker no kubaka kontineri kuri seriveri ihendutse hamwe namakarita imwe ya videwo. Bizaba kimwe mugihe, urashobora rero kuzigama bike. Nyuma yo guterana kuri seriveri ihendutse, turayisiba, kandi dukora seriveri yo kurwana dukoresheje disiki ivuye muri seriveri ihendutse. Noneho ntuzishyura igihe kinini cyo gutegereza inteko no kuvoma ibirindiro.

6. Tegura ibirimo

6.1

Nyuma yo gukuramo ibirindiro birangiye, ugomba kubinyerera muri config. Hariho inzira ebyiri, gukosora ibipimo cyangwa kwimura bariyeri. Ahantu hose hateganijwe ko bariyeri zizaba ziri mubuyobozi bukuru bwumushinga, ibyakuweho bigomba kwimurwa bivuye mububiko bwo gukuramo hejuru. Kuba mububiko bwa yalm gukora

mv ./kumanura/yalm100b_igenzura ./

Cyangwa uhindure inzira kuri dosiye murugero rwamadosiye
https://github.com/yandex/YaLM-100B/blob/c91b7d7fe8dbf39c9e307d6d324446d0df136a23/urugero/generate_interactive.sh#L8-L9

6.2 Ikarita ya videwo

Tugenzura ko amakarita ya videwo yashyizweho neza. Niba ufite amakarita umunani ya videwo, ntakintu rero kigomba guhinduka. Niba umubare utandukanye, noneho duhindure iyi mirongo kumurongo
Umuyoboro wa Neural YaLM 100B mubikorwa. wa kabiri, imibare yibikoresho byakoreshejwe (urashobora kubireba muri nvidia-smi, umaze gutangiza). Icya kane, umubare wabo.

7. Koresha icyuma cya docker

Kuba mububiko bwa yalm, kora itegeko

sudo bash ./docker/run.sh

Niba ibintu byose ari byiza, noneho uzajyanwa muri kontineri aho ugomba kujya mububiko bwa yalm mububiko bwawe.

cd ~ / yalm

8. Koresha urugero kuva YaLM 100B

Twiteguye gutangiza rumwe murugero. Basobanuwe
hano .

chmod + x ./urugero/gukora_guhuza.sh
./urugero

Ihangane, hasigaye gutegereza indi minota 10-15 kugeza igihe GPT ikorewe kandi uburemere buva kuri bariyeri buremerewe.
Umuyoboro wa Neural YaLM 100B mubikorwa.

Iyo kubaka birangiye, MegatronML izagusaba kwinjiza imiterere kugirango ubyare inyandiko. Witondere igihe wanditse. Mubihe bimwe, habaye ikosa, porogaramu iragwa kandi ugomba kongera gutangira inteko. Kubwibyo, nibyiza gukoresha ingero zifata inyandiko muri dosiye.

9. Ibisubizo by’akazi

Umuyoboro wa Neural YaLM 100B mubikorwa.
Umuyoboro wa Neural YaLM 100B mubikorwa. Birasa n’ibishimishije. Birumvikana ko izi ari ingero nziza gusa. Nakoze ikizamini ku ngero zitandukanye. Nkuko byari byitezwe, ibyiza ni ibivugwamo, ibisobanuro byinshi bizasobanurwa. Urutonde rwuzuye rwibisekuruza rushobora kurebwa kumurongo:

Ku giciro, byantwaye amafaranga agera ku bihumbi 9 yo gukodesha seriveri yubushobozi butandukanye kuva mumahugurwa no kuva mubitegura kugeza kubisekuru. Icyagutengushye cyane nuko udashobora guhita ubyara byose. Bifata igihe kirekire cyane kugirango utangire kandi inyandiko ntabwo itanga vuba nkuko twabyifuzaga, urebye ikiguzi cya seriveri kumasaha.
Umuyoboro wa Neural YaLM 100B mubikorwa.  

Nigute ushobora gukoresha YaLM idafite RAM ya 200Gb GPU?

Ugomba kongeramo umuvuduko mwinshi zeru kuri config. Kubazi ibyo tuvuga, bizoroha cyane kubikora. Kubandi, iki ntabwo ari umurimo muto na gato. Ni ngombwa kumenya ko gukuramo bishobora kuba muri CPU RAM cyangwa NVMe. Urashobora kwibagirwa ibya NVMe muriki gihe, kuko. umubare munini cyane wamakuru arimo gutunganywa kandi disiki ntishobora guhangana nayo. Zeru zeru CPU nukuri. Nukuri, kubwibyo ukeneye kugira RAM ya 200+ Gb CPU mububiko, nayo ntabwo ihendutse. Kandi inyandiko imwe izakorwa muminota igera kuri 20-40, kubera ko bitarashoboka kubigereranya namakarita abiri ya videwo. Nkuko mubibona mumashusho hepfo, ikarita imwe ya videwo yonyine yagize uruhare mubisekuru, hanyuma kuri kimwe cya kane cyibuke. Hasigaye kureba impamvu 24 GB yose idakoreshwa,
Umuyoboro wa Neural YaLM 100B mubikorwa. Muraho, mu gusoza, nzavuga ko bishoboka gukora no kuri TI imwe ya RTX 3070. Ariko nta bisobanuro byihariye muribi, kuko. NVMe ntizakwemerera gutunganya byihuse amakuru ya GB 150 muri swap, iri kumugereka wa 96 GB ya RAM.
Umuyoboro wa Neural YaLM 100B mubikorwa.

Incamake

Birumvikana, nzakomeza kugerageza gushakisha inzira nziza yo gutangiza. Ariko kugeza ubu naje gufata umwanzuro ko YaLM 100b ihenze cyane / itinda cyane kubikorwa byanjye. Ku mafranga amwe, abantu bazandika byinshi kandi byiza cyane. Ariko ndatekereza ko arigihe gito, tuzareba. Niba ukeneye ubufasha mugutangiza, gushiraho yalm, cyangwa ushaka kubona ibisubizo kurugero rwawe, andika kuri posita cyangwa telegaramu.

pskucherov
Rate author
Add a comment

  1. Olha

    Статья на Мега актуальную тему! Спасибо.

    Reply
  2. Данила

    Крутая статья! Спасибо автору!

    Reply
  3. Дмитрий

    СПАСИБО !!!
    три дня эту информацию искал
    нет подобного о RuGPT3 и Порфириче?

    Reply