Leave feedback

A Hackers' Guide to Language Models

16:08 01:31:13

Introduction to Language Models

/ 10 when I say a hacker's guide what we're going to be looking at is a code first approach to understanding how to use language models in practice 00:09 → 00:19

/ 10 so before we get started we should probably talk about what is a language modelsI would say that this is going to make more sense if you know the kind of basics of deep learning if you don't I think you'll still get plenty out of it and there'll be plenty of things you can do but if you do have a chance I would recommend checking out course.fast.ai which is a free course and specifically um if you could at least kind of watch if not work through the first five lessons that would get you to a point where you understand all the basic fundamentals of deep learning that will make this this lesson tutorial make even more sense 00:19 → 01:08

What is a Language Model

/ 10 so let's start by talking about what a language model is and so as you might have heard before a language model is something that knows how to predict the next word of a sentence or knows how to fill in the missing words of a sentence and we can look at an example of one open AI has a language model text DaVinci 003 and we can play with it by passing in some words and ask it to predict what the next words might be so if we pass in when I arrived back at the panda breeding facility after the extraordinary reign of live frogs I couldn't believe what I saw I just came up with that yesterday and I thought what might happen next so kind of fun for Creative brainstorming uh there's a nice site called nat.dev Nat dot let Dev lets us play with a variety of language models and here I've selected text DaVinci 003 and I'll hit submit and it starts printing stuff out the pandas were happily playing and eating the frogs that had fallen from the sky there's an amazing sight to see these animals taking advantage of such a unique opportunity 01:28 → 02:36

Language Model Training

/ 10 the basic idea of what chat GPT gpt4 Bard Etc are doing comes from a paper which describes an algorithm that I created back in 2017 called ULM fit and Sebastian Rooter and I wrote a paper up describing the ULM fit approach which was the one that basically laid out what everybody's doing how this system works and the system has three steps step one is language model training but you'll see this is actually from the paper we actually described it as pre-training now what language model pre-training does is this is the thing which predicts the next word of a sentence and so in the original ULM fit paper so the algorithm I developed in 2017 then Sebastian Rooter and I wrote it up in 2018 early 2018 what I originally did was I trained this language model on Wikipedia now what that meant is I took a neural network um and a neural network is just a function if you don't know what it is it's just a mathematical function that's extremely flexible and it's got lots and lots of parameters and initially it can't do anything but using stochastic gradient descent or SGD you can teach it to do almost anything if you give it examples and so I gave it lots of examples of sentences from Wikipedia so for example from the Wikipedia article for the birds the birds is a 1963 American Natural horror natural horror Thriller film produced and directed by Alfred and then it would stop and so then the model would have to guess what the next word is and if it guest Hitchcock it would be rewarded and if it gets guessed something else it would be penalized and effectively basically it's trying to maximize those rewards it's trying to find a set of weights for this function that makes it more likely that it would predict Hitchcock 06:30 → 08:26

Language Model Fine-Tuning

/ 10 the second step is we do something called language model fine tuning a language model fine tuning we are no longer just giving it all of Wikipedia or nowadays we don't just give it all of Wikipedia but in fact a large chunk of the internet is fed to pre-training these models in the fine tuning stage we feed it a set of documents a lot closer to the final task that we want the model to do but it's still the same basic idea it's still trying to predict the next word of a sentence after that we then do a final classifier fine tuning and then the classifier fine-tuning this is this is the kind of end task we're trying to get it to do 11:52 → 12:39

Using the OpenAI API

/ 10 so far we've used chat GPT which costs 20 bucks a month and there's no like per token cost or anything but if you want to use the API from python or whatever you have to pay per token which is approximately per word maybe it's about uh one and a third tokens per word on average 37:53 → 38:12

/ 10 so those are some options for free or low cost you can also of course you know go to one of the many kind of GPU server providers and they change all the time is to kind of what's what's good or what's not run pod is one example 55:28 → 55:52

Building Your Own Code Interpreter

/ 10 well to use a language model on your own computer you're going to need to use a GPU um so I guess the first thing to think about is like do you want this does it make sense to do stuff on your own computer what are the benefits um there are not any open source models that are as good yet as gpt4 and I would have to say also like actually open ai's pricing's really pretty good so it's it's not immediately obvious that you definitely want to kind of go in-house but there's lots of reasons you might want to and we'll look at some examples of them today one example you might want to go in-house is that you want to be able to ask questions about your proprietary documents or about information after September 2021 the the knowledge cut off or you might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning and these are all things that you absolutely can get better than GPT for performance at work or at home without too much without too much money or travel so these are the situations in which you might want to go down this path and so you don't necessarily have to buy a GPU on kaggle they will give you a notebook with two quite old gpus attached and very little Ram but it's something or you can use collab and on collab you can get much better gpus than kaggle has and more RAM particularly if you pay a monthly subscription fee 53:36 → 55:27

GPU Rental Options

/ 10 and you can see you know if you want the biggest and best machine you're talking 34 an hour so it gets pretty expensive but you can certainly get things a lot cheaper 80 cents an hour Lambda Labs is often pretty good 55:52 → 56:14

/ 10 Fast AI which basically lets you use other people's computers when they're not using them and as you can see you know they tend to be much cheaper than other folks and they tend to have better availability as well 56:36 → 56:58

Buying a GPU

/ 10 if you can it's worth buying something and definitely the one to buy at the moment is the GTX 3090 used you can generally get them from eBay for like 700 bucks or so a 40 90 isn't really better for language models even though it's a newer GPU the reason for that is that language models are all about memory speed how quickly can you get in and stuff in and out of memory rather than how fast is the processor and that hasn't really improved a whole lot 57:06 → 57:37

/ 10 so the two thousand bucks hmm the other thing as well as memory speed is memory size 24 gigs it doesn't quite cut it for a lot of things so you'd probably want to get two of these gpus so you're talking like fifteen hundred dollars or so 57:37 → 57:53

Alternative to GPUs

/ 10 or funnily enough you could just get a Mac with a lot of ram particularly if you get an M2 Ultra Max have so it's not a terrible option particularly if you're not training models you're just wanting to use other existing trained models 58:28 → 58:50

/ 10 most people who do this stuff seriously almost everybody has in video cards so then what we're going to be using is a library called Transformers from hugging face and the reason for that is that basically people upload lots of pre-trained models or fine-tuned models up to the hugging face Hub 58:52 → 59:14

Using Pre-trained Models

/ 10 these metrics are not particularly well aligned with real-life usage for all kinds of reasons and also sometimes you get something called leakage which means that sometimes some of the questions from these things actually leaks through to some of the training sets so you can get as a rule of thumb what to use from here but you should always try things 59:41 → 01:00:07

Choosing the Right Model

/ 10 so generally speaking for the kinds of gpus you we're talking about you'll be wanting no bigger than 13B and quite often 7B so let's see if we've confined here the 13B model for example 01:00:18 → 01:00:37

/ 10 um all right so you can find models to try out from things like this leaderboard um and there's also a really great leaderboard called fast eval which I like a lot because it focuses on some more sophisticated evaluation methods such as this Chain of Thought evaluation method so I kind of trust these a little bit more and these are also GSM 8K is a difficult math benchmark 01:00:37 → 01:01:06

Improving Model Performance

/ 10 generally speaking we use 16-bit floating Point numbers nowadays but if you think about it 16 bit is two bytes so 7B times two it's going to be 14 gigabytes just to load in the weights so you've got to have a decent model to be able to do that perhaps surprisingly you can actually just cast it to 8-bit and it still works pretty well thanks to something called discretization 01:02:39 → 01:03:13

/ 10 there's a different kind of discretization called gptq where a model is carefully optimized to work with uh four or eight or other you know lower Precision data automatically and um this particular person known as the bloke is fantastic at taking popular models running that optimization process and then uploading the results back to hugging face 01:05:34 → 01:06:06

/ 10 and that's because there's a lot less memory moving around 01:06:34 → 01:06:38

Retrieval Augmented Generation

/ 10 to do this we can use something called retrieval augmented generation so what happens with retrieval augmented generation is when we take the question we've been asked like who is Jeremy Howard and then we say okay let's try and search for documents that may help us answer that question so obviously we would expect for example Wikipedia to be useful 01:11:41 → 01:12:11

/ 10 so we can create a stable Beluga model and now something really important that I keep forgetting everybody keeps forgetting is during the instruction tuning process the instructions that are passed in actually uh they don't just appear like this they actually always are in a particular format and the format Believe It or Not changes quite a bit from from fine-tune to fine-tune and so you have to go to the web page for the model and scroll down to found out what the prompt format is 01:07:52 → 01:08:42

/ 10 so here's the prompt format so I generally just copy it and then I paste it into python which I did here and created a function called make prompt that used the exact same format that it said to use 01:08:47 → 01:09:07

/ 10 what we do now is we want to fine-tune a model now we can do that in a notebook from scratch takes I don't know 100 or so lines of code it's not too much but given the time constraints here and also like I thought why not why don't we just use something that's ready to go so for example there's something called Axolotl which is quite nice in my opinion 01:23:07 → 01:23:33

Training Your Own Model

/ 10 I trained it and then I thought okay let's create our own one so we're going to have this context and this question get the count of competition hosts by theme and I'm not going to pass it an answer so I'll just ignore that so again I've found out what prompt they were using um and created a SQL prompt function 01:24:57 → 01:25:30

/ 10 so I think that's pretty remarkable we have just built it also took me like an hour to figure out how to do it and then an hour to actually do the training um and at the end of that we've actually got something which which is converting um Pros into SQL based on a schema so I think that's that's a really exciting idea 01:26:02 → 01:26:28

Running Models on Mac

/ 10 if you've got a Mac uh you there's a couple of really good options the options are mlc and lima.cpp currently mlc in particular I think it's kind of underappreciated it's a you know really nice project um uh where you can run language models on literally iPhone Android web browsers everything it's really cool 01:26:35 → 01:27:09

/ 10 and so I'm now actually on my Mac here and I've got a tiny little Python program called chat and it's going to import chat module and it's going to import a discretized 7B and that's going to ask the question what is the meaning of life 01:27:12 → 01:27:40

Create a Roughcut

Use your own videos

  1. In YouTube Studio, press Create → Upload in the toolbar
  2. Add your video, press Next a few times
  3. Choose "unlisted" so that your video remains private
  4. Save, and wait for a bit until YouTube processes captions
  5. Copy video link and use it in Roughcut!

Or maybe pick a video from your Youtube Watch Later.