Neural network YaLM 100B le nuwɔna me.

Программирование

Le June ƒe nuwuwu la, Yandex
ɖe ahɔhɔ̃mekawo ƒe kadodo aɖe si me nu biliɔn 100 le si woyɔna be YaLM 100B ɖe go na dukɔa . Enye ahɔhɔ̃mekawo ƒe kadodo gãtɔ kekeake si le abe GPT ene le dutoƒo. Eƒo nu tso alesi wofia nui, ɖe kpɔɖeŋu nyuitɔwo fia kple nusi ahɔhɔ̃mekaa te ŋu wɔna ŋu. Gake ɖe wònyo alea gbegbe le nuwɔna me eye wòwɔa dɔ le aƒemea? Nyatia mezi ɖoɖoe le esia ŋu o, hekpe ɖe eŋu la, mele bɔbɔe nenema be woaƒu du ahalé ŋku ɖe eŋu o, elabena GPU RAM si ade 200 Gb hiã. Nya sia si wogblɔ tso Habré
ŋu ɖe nɔnɔmea fia wòde pɛpɛpɛ wu
.

Wogblɔ be, le Yandex la, ame dzeaɖaŋu mawo katã, eye womeɖo How-to si sɔ gɔ̃ hã ɖe afima o. Api aɖeke meli na kpɔɖeŋu gã aɖeke o, kpɔɖeŋu aɖeke meli si wowɔ xoxo si woɖe ɖa le eŋu alo kpɔɖeŋu sue aɖeke na ame tsɛwo o (le Google Colab me). Womegblɔ kpɔɖeŋu aɖeke le alesi woaɖo kpɔɖeŋua, alesi woawɔ nuŋɔŋlɔ ŋu o. Ðeko wòle be nyatia fia nuances eve aɖewo na nerds eye eyae nye ema. Esɔ gbɔ be woalé ŋku ɖe alesi gadzraɖoƒea wɔe kple ŋɔŋlɔdzesi “C” ŋu nyuie eye nàwɔ nenema ke. Mexɔ susu be kpɔɖeŋu sia nye dodokpɔ siwo do kpo nu la dometɔ ɖeka ko si nye nublanuinya be woatsɔ aƒu gbe ɖe gbeɖuɖɔ me, eyata wodae ɖe Open Source me be woaɖe kpɔɖeŋu gã siwo Yandex wɔ la afia, eye gawu la, enye dzɔtsoƒe gbadzaa!

Nyabiase geɖewo le Internet dzi alesi woawɔ yalm alo ate kpɔ le Internet dzi gɔ̃ hã, gake ŋuɖoɖo aɖeke meli na esia o. Menɔ ezãla siwo bia nya siawo la dome. Eye nàdze egɔme anɔ eŋu bum. Esi wònye be mehiã mɔ aɖe ŋutɔŋutɔ si dzi mato awɔ nuŋɔŋlɔwo na ganyawo ŋuti robotwo ta la. Ale be menye asixɔxɔawo koe woate ŋu agblɔe ɖi o, ke woate ŋu aƒo nu tso eŋu le nuŋɔŋlɔ me hã, le ganyawo ŋuti nyatakakawo nu. Le nyateƒe me la, anye nusi ganyawo ŋuti numekulawo wɔna, ne wozã nunya wɔwɔe ko. Mɔ eve li siwo dzi woato aƒu du yalm.
Haya server aɖe le alilikpoa mekple 200+ Gb GPU RAM alo trɔ asi le kɔda la ŋu eye nàƒu du kple deepspeed zero offload (ne GPU la wɔa dɔ tso ahɔhɔ̃mekawo ƒe network ƒe akpa aɖe ŋu ɖe wo nɔewo yome, eye wodzra susɔea ɖo ɖe CPU RAM alo NVMe me). Gbãtɔ xɔ asi ŋutɔ, abe ruble 2500 ene le gaƒoƒo ɖeka me alo miliɔn 1.7 ɣleti sia ɣleti. Evelia si womenya o, elabena wometsɔ kɔda si le nudzraɖoƒea na o, ɖeko
aɖaŋuɖoɖowo le nudzraɖoƒea ƒe nya la me, si wɔwɔ mesesẽ o. Mina míadze egɔme bɔbɔe.

YaLM 100B ƒe Dzedzeme Mɔfiamewo

1. Míehayaa 200 GB GPU RAM, le kpɔɖeŋu me le afisia .

Neural network YaLM 100B le nuwɔna me.

Èhiã video ƒe ŋkuɖodzinu bliboa si ƒe lolome nye GB 200 ya teti. 8×40 = 320 GB ƒe lolome. Esia koe sɔ. Ne mede 200 o la mate ŋu adzɔ o, geɖe wu ate ŋu adzɔ. Aŋutrɔa fia CPU RAM, míeléa ŋku ɖe eŋu o. Ate ŋu anye amesiame.

Míefiaa disk si ƒe lolome anɔ abe 300 GB ene, ale be kple spare eye ne anya wɔ la, disk si zɔna kabakaba, elabena. woatsɔ nyatakaka siwo ƒe lolome nye gigabyte ewo ayi edzi ahatso egbɔ.

Neural network YaLM 100B le nuwɔna me.Ne èle nu wɔm le dzɔtsoƒewo me la, tia Ubuntu ML (Mɔ̃wo Nusɔsrɔ̃). Esia nye sedziwɔwɔ ale be woawɔ ɖoɖo ɖe videokaɖiawo ŋu eye mehiã be woade naneke eme kpee o.

Ne èle server wɔm la, nuances aɖewo li siwo me quotawo le, àte ŋu ase le ɖokuiwò me be dɔwɔnuawo meli o, gake le nyateƒe me la, ɖeko wòle be nàdzi quotaawo ɖe edzi le ɖoɖoawo me. Ne wowɔ dɔ le server la ŋu vɔ (ate ŋu axɔ miniti 5-10), do ka kple server la to ssh dzi alo tẽ le web console si le server la ƒe axa dzi eye nàwɔ sededea.

nvidia-smi ƒe ƒuƒoƒo

Ele be emetsonua nanye kplɔ̃ si dzi videokaɖiwo, ʋukulawo ƒe tɔtrɔ kple cuda le. Anɔ abe esia ene.
Neural network YaLM 100B le nuwɔna me.Le ʋukulawo ƒe tɔtrɔ ƒe tanya me kple afisi. Le miame la, mɔ̃a ƒe xexlẽdzesiwo le eme, le titina la, mɔ̃a ƒe ŋkuɖodzinu ƒe lolome le. Ne nyatakaka siawo mele asiwò o la, ekema èƒo server la nu ƒu tso teƒe si mesɔ o. Ubuntu ML (Machine Learnong) hiã, abe alesi míegblɔe le etame ene.

2. Wɔ nudzraɖoƒea ƒe nɔnɔmetata kple YaLM

sudo git ƒe nɔnɔmetata https://github.com/yandex/YaLM-100B/ yalm
cd yalm

Kloe ɖe wò aƒeme agbalẽdzraɖoƒea ale be mahiã be nàtrɔ asi le docker ƒe ɖoɖowɔɖia ŋu le ema megbe o. Ne wowɔ eƒe nɔnɔmetata le teƒe bubu la, ekema
yi afisia eye nàtsɔ mɔa akpe ɖe afisi wowɔ nɔnɔmetata le ŋu.

3. Wɔ checkpoints (kpɔɖeŋu hehenana ŋuti nyatakaka veviwo) ƒe kɔpi .

sudo chmod +x ./ɖe/aɖe.sh
sudo bash ./ɖe/aɖe.sh

Esia axɔ abe gaƒoƒo ɖeka ene. Be míagagblẽ ɣeyiɣi dzodzro o la, míewɔa ssh kadodo yeye eye le parallel me míedzea docker nugoe tutu gɔme.

4. De nvidiadocker 2 ɖe wò kɔmpiuta dzi

Docker si sɔ mesɔ o,
nvidia-docker2 hiã .
https://docs.nvidia.com/datacenter/cloud-native/nugoe-dɔwɔnu/ɖoɖo-mɔfiame.html#ɖoɖo-ɖoɖo-nvidia-nugoe-dɔwɔnu

5. Nugoe aɖe tutu na YaLM

cd yalm
sudo chmod +x ./docker/*
sudo bash ./docker/tu.sh

Anɔ abe gaƒoƒo ɖeka ene hã.

Agbe hack. Àte ŋu awɔ checkpoints ƒe kɔpi, ade docker eye nàtu nugoe ɖe server si mexɔ asi o dzi kple video card ɖeka. Anɔ nenema le ɣeyiɣi aɖe megbe, eyata àte ŋu adzra ga vi aɖe ɖo. Ne míeƒo ƒu ɖe server si mexɔ asi o dzi vɔ la, míetutunɛ ɖa, eye míewɔa combat server to disk si tso server si mexɔ asi o dzi zazã me. Ekema màxe ɣeyiɣi si nètsɔ lala takpekpea kple pɔmpiwo tsɔtsɔ do go le ʋuɖoƒewo la wògbɔ eme o.

6. Dzra emenyawo ɖo

6.1 Dzodzroƒewo

Ne checkpoints ƒe kɔpiwɔwɔ wu enu la, ele be nàtsɔ wo ade configs la me. Mɔ eve li, parameters ɖɔɖɔɖo alo transfer checkpoints. Le afisiafi la, wokpɔ mɔ be dodokpɔwɔƒeawo anɔ dɔa ƒe nyatakakadzraɖoƒe vevitɔ me, ɖe wo nɔewo yome la, ele be woaɖe nusiwo woɖe tso kɔmpiuta dzi la tso agbalẽdzraɖoƒe si woɖe ɖe go le etame la me. Be le yalm ƒe agbalẽdzraɖoƒea execute

mv ./download/yalm100b_kpɔƒe ./ .

Alo trɔ mɔ siwo dzi woato ayi faɛl siwo le kpɔɖeŋu faɛlwo me
https://github.com/yandex/YaLM-100B/blob/c91b7d7fe8dbf39c9e307d6d324446d0df136a23/examples/generate_interactive.sh#L8-L9

6.2 Videogbalẽviwo

Míeléa ŋku ɖe videokaɖiawo ŋu nyuie hã. Ne videokaɖi enyi le asiwò la, ekema mehiã be nàtrɔ naneke o. Ne xexlẽdzesia to vovo la, ekema míetrɔa fli siawo
Neural network YaLM 100B le nuwɔna me.Le fli evelia me la, mɔ̃ siwo wozã ƒe xexlẽdzesiwo (àteŋu alé ŋku ɖe wo ŋu le nvidia-smi me, si nèdze egɔme xoxo). Le enelia me la, woƒe xexlẽme.

7. Ƒu du docker ƒe nugoe la

Esi nèle yalm ƒe agbalẽdzraɖoƒea ta la, wɔ sededea

sudo bash ./docker/ƒuƒu.sh

Ne nusianu le nyuie la, ekema woakplɔ wò ayi nugoe aɖe me si me wòle be nàyi yalm ƒe agbalẽdzraɖoƒe si le wò aƒeme nyatakakadzraɖoƒea.

cd ~/yalm ƒe ƒuƒoƒo

8. Ƒu du kpɔɖeŋua tso YaLM 100B me

Míele klalo be míadze kpɔɖeŋuawo dometɔ ɖeka gɔme. Woƒo nu tso wo ŋu
le afisia .

chmod +x ./kpɔɖeŋuwo/wɔ_nuwɔwɔ_nuwɔna.sh
./kpɔɖeŋuwo/wɔ_nuwɔwɔ_nuwɔna.sh

Gbɔ dzi ɖi, esusɔ be nàlala miniti 10-15 bubu vaseɖe esime woawɔ GPT ƒe kpɔɖeŋua eye woatsɔ kpekpeme siwo tso dodokpɔwɔƒeawo ade agba me.
Neural network YaLM 100B le nuwɔna me.

Ne xɔtutua wu enu la, MegatronML abia tso asiwò be nàŋlɔ nya siwo ƒo xlãe be nàwɔ nuŋɔŋlɔ. Kpɔ nyuie ne èle agbalẽ ŋlɔm. Le nɔnɔme aɖewo me la, vodada aɖe dzɔna, ɖoɖowɔɖia dzea anyi eye wòhiã be nàgadze kpekpea gɔme ake. Eyata anyo wu be nàzã kpɔɖeŋu siwo ɖea nuŋɔŋlɔ tso faɛl aɖe me.

9. Dɔa me tsonu

Neural network YaLM 100B le nuwɔna me.
Neural network YaLM 100B le nuwɔna me.Edze abe ɖe wòdoa dzidzɔ na ame ene. Nyateƒee, esiawo nye kpɔɖeŋu nyuiwo ko. Meƒu du dodokpɔa ɖe kpɔɖeŋu vovovowo dzi. Abe alesi wokpɔ mɔe ene la, zi alesi nya siwo ƒo xlãe la nyoe la, zi nenemae gɔmesese anɔ nuŋɔŋlɔa ŋui. Woateŋu akpɔ dodokpɔ dzidzimewo ƒe ƒuƒoƒo bliboa le kadodoawo me:

Le asixɔxɔa ta la, exɔ abe ruble akpe 9 ene nam le server siwo ƒe ŋutete to vovo tso hehexɔxɔ dzi kple tso dzadzraɖo dzi va ɖo dzidzime dzi haya ta. Nusi koŋ ɖe dzi le ƒowòe nye be màte ŋu awɔ nusianu enumake o. Exɔa ɣeyiɣi didi ŋutɔ hafi dzea egɔme eye nuŋɔŋlɔa mewɔa dɔ kabakaba abe alesi míedi ene o, ne míebu ga si server la xɔna le gaƒoƒo ɖeka me ŋu.
Neural network YaLM 100B le nuwɔna me. 

Aleke woawɔ YaLM 200Gb GPU RAM manɔmee?

Ele be nàtsɔ deepspeed zero offload akpe ɖe config la ŋu. Le amesiwo nya nu si ŋu míele nu ƒom tsoe gome la, ewɔwɔ anɔ bɔbɔe ŋutɔ. Le ame bubuwo gome la, esia menye dɔ maɖinu kura o. Ele vevie be nànya be offload ateŋu anɔ CPU RAM alo NVMe me. Àte ŋu aŋlɔ NVMe be le ɣeyiɣi sia me, elabena. wole nyatakaka gbogbo aɖe ŋutɔ ŋu dɔ wɔm eye disk la mate ŋu anɔ te ɖe enu o. Zero offload CPU nye nu ŋutɔŋutɔ wu. Nyateƒee, na esia la, ele be 200+ Gb CPU RAM nanɔ asiwò le nudzraɖoƒe, si hã mexɔ asi o. Eye woawɔ nuŋɔŋlɔ ɖeka abe miniti 20-40 ene, elabena womete ŋu tsɔe sɔ kple wo nɔewo le videokaɖi eve dzi haɖe o. Abe alesi nàte ŋu akpɔe le screenshot si le ete me ene la, video card ɖeka koe kpɔ gome le dzidzimea me, eye emegbe memory la ƒe akpa enelia koe kpɔ gome le eme. Míekpɔ nusita womezãa GB 24 la katã o, .
Neural network YaLM 100B le nuwɔna me.Enyo, le nyataƒoƒo me la, magblɔ be anya wɔ be woaƒu du le RTX 3070 TI ɖeka gɔ̃ hã dzi. Gake gɔmesese tɔxɛ aɖeke mele esia ŋu o, elabena. NVMe maɖe mɔ na wò be nàwɔ dɔ kabakaba le nyatakaka 150 GB ŋu le swap la me o, siwo le RAM 96 GB ƒe kpeɖeŋutɔ me.
Neural network YaLM 100B le nuwɔna me.

Kpuie ko la

Nyateƒee, magadze agbagba kokoko be madi mɔ nyuitɔ siwo dzi woato aɖo yameʋua. Gake vaseɖe fifia la, meva ƒo nya ta be YaLM 100b xɔ asi akpa / le blewu akpa na nye dɔwo. Le ga ma ke ta la, amewo aŋlɔ nu geɖe wu eye woanyo wu sã. Gake mesusu be ɣeyiɣi kpui aɖe koe wònye, míakpɔe. Ne èhiã kpekpeɖeŋu le yalm ƒe gɔmedzedze, ɖoɖowɔwɔ me, alo nèdi be yeakpɔ emetsonuwo le wò nya siwo ƒo xlãe ƒe kpɔɖeŋuwo dzi la, ŋlɔ agbalẽ ɖe posu alo telegram la dzi.

pskucherov
Rate author
Add a comment

  1. Olha

    Статья на Мега актуальную тему! Спасибо.

    Reply
  2. Данила

    Крутая статья! Спасибо автору!

    Reply
  3. Дмитрий

    СПАСИБО !!!
    три дня эту информацию искал
    нет подобного о RuGPT3 и Порфириче?

    Reply