Let's pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.
william falcon
•
February 13, 2024

william falcon
View ChannelAbout
No channel description available.