# [ML News] GPT-4 Rumors | AI Mind Reading | Neuron Interaction Solved | AI Theorem Proving

## Метаданные

- **Канал:** Yannic Kilcher
- **YouTube:** https://www.youtube.com/watch?v=r8wiBA3ZaQE
- **Дата:** 27.11.2022
- **Длительность:** 41:55
- **Просмотры:** 108,191
- **Источник:** https://ekstraktznaniy.ru/video/12544

## Описание

#ai #mlnews #gpt4

Your weekly news from the AI & Machine Learning world.

OUTLINE:
0:00 - Introduction
0:25 - AI reads brain signals to predict what you're thinking
3:00 - Closed-form solution for neuron interactions
4:15 - GPT-4 rumors
6:50 - Cerebras supercomputer
7:45 - Meta releases metagenomics atlas
9:15 - AI advances in theorem proving
10:40 - Better diffusion models with expert denoisers
12:00 - BLOOMZ & mT0
13:05 - ICLR reviewers going mad
21:40 - Scaling Transformer inference
22:10 - Infinite nature flythrough generation
23:55 - Blazing fast denoising
24:45 - Large-scale AI training with MultiRay
25:30 - arXiv to include Hugging Face spaces
26:10 - Multilingual Diffusion
26:30 - Music source separation
26:50 - Multilingual CLIP
27:20 - Drug response prediction
27:50 - Helpful Things

ERRATA:
HF did not acquire spaces, they launched spaces themselves and supported Gradio from the start. They later acquired Gradio.

References:
AI reads brain signals to predict what you're thi

## Транскрипт

### Introduction []

rumors of gpt4 are in the air neuron Transmissions is now solved in closed form and mind reading is a thing now it's Monday and welcome to ml news hello this is your regular update of what's going on in the machine learning and AI World our first story is the most interesting one

### AI reads brain signals to predict what you're thinking [0:25]

brain reading is more and more becoming a thing there is a paper called seeing beyond the brain conditional diffusion models with sparse mask modeling for vision decoding in this paper the authors give a visual stimulus to a subject a real human and then look at their brain waves this is non-invasive this is fmri brain scans and from that reading of the fmri they're able to decode what the person is seeing you can see right here on the top you have visual stimuli and on the bottom you have the reconstructed images now what you'll be able to see is that the pixels don't exactly match however the semantic content is very often the same now this is done via aligning the latent spaces of the encoders for the brain data and encoders from images and this has been a long-standing problem because the training data that exists to map what people are seeing from their brain waves to the image space is just super sparse but the authors here go around that by pre-training on unlabeled fmri data and first get a very good Auto encoder on that data going then the latent space can be determined compressed and then from that latent space we can learn a conditional image diffusion decoder in order to map the visual stimuli to the encoding of the brain waves so the Paradigm that we see in deep learning where you want to do some unsupervised pre-training first because you have much more unlabeled data and only then include the task specific data and learn that on top of the unsupervised pre-trained models also holds in the field of brain computer interfaces apparently so it's pretty cool that we're more and more getting the chance to peek into people's brains now this isn't yet a full thought reader or anything like this essentially they disambiguate between I believe leave some 100 different classes of labels but it's still very cool that you can essentially reconstruct just from Reading brain waves what kind of image the person is seeing and what about is in the image in a related article neurosciencenews. com writes that brain machine interface device predicts internal speech now this is a little bit different in that it's actually invasive so this is an interface directly to the brain but it is able to predict internal speech which means speech that you just internally think to yourself it is able to decode that now it is not able to decode arbitrary speech I believe they go up to about eight words or something like this so it's not yet exactly super accurate but we are making big progress in that front all right Next

### Closed-form solution for neuron interactions [3:00]

News Ramin Hassani writes that they've published a new article in nature machine intelligence and solved a differential equation that's been long standing without a closed form solution we now have that closed form solution and it concerns the interactions between neurons this is a major benefit for people who want to implement biologically inspired sort of biologically plausible neural networks because previously you'd have to have some sort of an ode solver in order to even model that connection properly and now that there's a closed form solution you can essentially just forward and back prop through that formula and the absolute coolest thing is that they have implemented this in both High torch and tensorflow so you can technically build in this directly into your architectures today now it's not guaranteed to be like a lot better than what we currently have in terms of neuron connection but that's not the point is to get to a place where we can simulate biologically plausible neural networks as well as possible and from those potentially learn something about the brain and we might actually get some inspiration for how to even improve our artificial neural network architectures from this so check out the paper and the repository in case you're interested

### GPT-4 rumors [4:15]

Alberto Romero on substack has an article called gpt4 rumors from Silicon Valley this is a summary of things that people whatever people means talk about currently around Jeep D4 so open AI has been announcing like tiny bits of the next iteration of their language models here and there used to be an interview by Sam Altman where he said gpt4 isn't really gonna be that much bigger than gpt3 and it's probably still going to be in the text domain it's probably going to be a bit more aligned to humans a bit more you know learning from Human feedback and so on and people were kind of like a tiny bit disappointed I guess because it's not all we're gonna build the next giant thing but now more and more rumors are coming out that in fact gpt4 might be very well what they claim colossal so another scale up of two orders of magnitude or something like this in terms of numbers of parameters or even three orders of magnitude although some rumors claim that it is going to be sparse so there's not really like a one-to-one comparison on the other hand there are also a lot of rumors that claim that GPT 4 is going to be multi-modal after all so including text images videos and so on basically anything they can get their fingers on so we'll see which one of these turns out to be true it's very well possible that they first aimed at just sort of improving gpt3 and then all of a sudden with recent developments around diffusion models and so on they've now gone into the direction of you know let's just do another giant leap and from people who have apparently spoken to other tried the new model or a precursor to the new gpd4 they say that gpt4 will be just as much an improvement over gpt3 as gpt3 was over gpt2 and if you remember in case you remember gpt3 was a giant improvement over gbt 2 now is this going to be AGI and solve all our problems probably not but in case this is true in cases it is really the same amount of Step from gpt2 to gpd3 as it is from gpt3 to the new gpt4 then I think we're in for pretty amazing times in any case rumors be rumors and I guess we'll only know when we actually see it the new model is rumored to be released sometimes between December and February so the wait isn't going to be that long related to this open AI is also rumored

### Cerebras supercomputer [6:50]

to collaborate with cerebrus and cerebrus in turn has just released their biggest supercomputer to date which is called Andromeda and has 13. 5 million cores now cerebras is a company that builds extremely large chips they want to do as much as they can like on a single chip and that's why their chips are like I think they're about yay big I'm not exactly sure but this absolute super computer is just comprised of 16 cerebral CS2 systems so that should give you an already an indication of just how big their individual systems already are now connecting them makes for a ginormous super computer now here on the website it says get demo but I guess for most of you it's not really gonna be an option to you know go into business with this kind of scale but for some of you it might be and you might very well want to click that button The Meta research blog announces the esm

### Meta releases metagenomics atlas [7:45]

metagenomic atlas the first view of the dark matter of the protein universe so a lot of folding work a lot of protein folding work has been done recently with Alpha fold and esm fold now meta releases a database of what's called metagenomics is essentially if you just go outside and you pick up a piece of dirt there's going to be like a ton of microbes a ton of bacteria a ton of organic material in there and all of that genomic material isn't necessarily something you'd find in like the Human Genome Project or something like this yet it's still very important for example for Ecology for medicine but also for human well-being so this meta genomic Atlas is the first database that reveals the structures of the metagenomic world at the scale of hundreds of millions of proteins you can explore that there is a link to the atlas right here if you're anywhere near this world of protein folding I guess this is a very exciting time and I'm also so excited for the progress we make on other Frontiers rather than just scaling up and producing more stories about unicorns like for all the criticisms that these big models get and the pressure to just scale and scale they do every now and then deliver us something like this something that's absolutely undeniably useful for some natural science out there and as we get better with our core research even if that's on pictures of cats I strongly believe that this will greatly benefit adjacent Fields such as biology mathematics physics chemistry and more

### AI advances in theorem proving [9:15]

of the other Sciences also on the meta AI blog they released a blog post called teaching AI Advanced mathematical reasoning now I've dealt before with some of the papers that meta had in this regard where they tried to come up with systems that use a prover so there are these things called prover systems or proof Assistance or essentially formalize your whole mathematics inputs so you'd spell out everything super formally super descriptive super detailed and then you can use the system to search for new proofs by applying some proof strategies here and there so you can say I want to do now a contra position of two things and so on however as you'll quickly discover the amount of strategies that you can apply to a given statement to search for a proof is really huge and that leaves you essentially with a search problem so this paper uses essentially a variant of Monte Carlo tree search the same thing that like alphago uses in order to determine the next moves in a go game proof strategy or the next proof step that should be applied in order to reach a given Target statement again very cool that what initially dealt with a bunch of games and was really flashy because we can also go and chess much better has developed into something that is of actual use in an adjacent field in this case mathematics so very cool check out the paper if you are interested Nvidia has released a paper called e

### Better diffusion models with expert denoisers [10:40]

diff I a text to image diffusion models with Ensemble of expert denoisers this is I would say a typical Nvidia paper where they don't reinvent the world but what they do is they take what exists and they apply a strong engineering mindset to it they improve upon it and it just results in a very high qualitative output so in this case they take the idea of these text to image diffusion models but then on top of that they have an ensemble of expert denoisers so they don't just have one denoiser like we'd be used to in a diffusion model they have an ensemble of denoisers which means that different models can take care of different phases in this denoising process also they stage the image production in multiple steps now this has been done before but it is a very viable strategy in that you essentially have one model produce a low resolution version of the image and then you successively scale that up now as you can see right here all in all that results in super high quality images that can either be done from a text description or from as you can see right here text plus some kind of map or some kind of mask that you draw or over here you can also input some sort of a style reference image into the system so again it's just amazing how people are able to push forward the state of the art in such a short time

### BLOOMZ & mT0 [12:00]

big science has released two new models one called blooms and the other one called mt0 These are evolutions of their previous models and they're mainly concerned with multi-task prompted fine tuning we've dealt with prompted fine-tuning before in the Galactica paper which essentially means that after you pre-trained your model you fine tune it on prompted samples so like you would ask gpt3 with a prompt to do some kind of task you go ahead and actually fine tune on the prompt the input and the output of that task to make the model learn to respond to such prompts in an appropriate fashion and if you do that for multiple tasks you also have the ability to then generalize to new tasks because that will carry over from the pre-training specifically these new models deal with this exact setting but in non-english data so across lingual generalization doing this in multiple languages and potentially also generalizing across languages the models are on hogging face if you want to check them out iClear 2023 reviews are out on open

### ICLR reviewers going mad [13:05]

review and there are quite a few surprises in the negative Direction so Robert Tang here tweets out an example where the authors respond to a reviewer with response to you is a waste of time I hope you can respect the author's work and give constructive comments instead of taking a few minutes to give a trivial suggestion I recommend that you complete a university maybe kindergarten course before giving your review comments that's just lovely somehow believing in the good of human beings maybe this person just like had an absolutely terrible day and they really need this paper and the review is actually very bad like actually does make like a super trivial dunk on the paper and you know I'm not sure what happened right here if you're ever inclined to write the rebuttal like this just don't just sleep go to sleep wake up the next day breathe and realize that it's kinda useless even if it's probably true another worrying issue tweeted out by Stella biederman is the following so One reviewer criticized this model for that it is not acceptable to only compare with publicly available models meaning that the paper should also have compared with non-publicly available models now there is of course a debate to have right here in order to properly compare to someone's model you need to have access to it on the other hand there has been a long history of science where people just hadn't been putting stuff out into open source and you'd essentially just have to take the numbers from the tables from their paper and then put those into your paper and essentially just believe what they said it's possible that the reviewer here is of the stance that look you know you can just take the number that they claim and put them there on the other hand it's also entirely fair to say that well I don't have access to their model I can't verify their numbers and therefore I'm not going to put them into my paper the Crux is obviously if that fact that you leave these things away and that orange public also makes your method appear a lot better in comparison because the only actual competitors to your method are closed source and only have some number and some paper I don't know what's the correct answer right here but it's certainly worth having a discussion about and lastly and you might actually have heard of this one is this paper called variance reduction is an antidote to byzantines better rates weak assumptions and communication compression as a cherry on the top people do get creative with titles these days but the problem that One reviewer here had is with the word byzantines which the reviewer claimed to be a disparaging of the whoever people consider themselves Byzantine now Byzantine is a term that's been long used in various fields of analysis a security cryptography I believe Game Theory so the term is very well known and is an established technical term however the reviewers of strong opinion that is a term that contains Prejudice and his derogatory and is denouncing the ethno-religious practice of some people now the reviewer bases their opinion strongly on the fact that the iClear code of ethics says You must respect cultural heritage of others and repeatedly claims that the usage of the term Byzantine in this work is a violation of the iClear code of ethics whereas the authors claim this is a technical term it's been used for a long time and it is disparaging to absolutely no one and the conversation goes on and on I believe there are over 36 comments in this thread including some other people coming in and saying hey I'm actually considered Byzantine and I don't have a problem with the term so don't defend you know us while the reviewer did make some suggestions for other terms such as deviant but the authors pointed out that none of these suggestions capture the term in its full existence or in how people actually use it as the debate goes on you'll see the reviewer shifting their stance a little bit from the fact that it's just not appropriate to use the term that the paper also isn't technically correct but I strongly believe that the reviewers only introduced that point after the discussion had been going on for a while and they realized they needed to make another stronger case on scientific terms now the problem here is that on open review I believe you can't see the modifications so we have no idea these comments they were all changed around even the original comment is changed around to like include some other feedback and so on so it seems the timeline here is a little bit murky the authors here also point out that this point the point that the word Byzantine is inappropriate was apparently initially the only criticism of that reviewer or the only real criticism but the reviewer gave the paper a really low score and if you know anything about conferences most meta reviewers just kind of look whether there is one bad score and then the paper already has very poor Chan chances or they look at the average which would obviously be decreased strongly by one bad score so essentially the reviewer held the paper hostage a little bit and wanted the authors to change the wording the authors even agree to abbreviate the word Byzantine to Biz like the short form Biz because they just didn't agree that any of the other terms would do the technical nature Justice the reviewer disagreed that would actually solve the problem and essentially said that even if they were to change the term they would now expect not only to not use that term but also the paper to contain a discussion of why the word Byzantine is not appropriate or at least like a moral struggle of the authors or bringing this up of why this is problematic the reviewer again repeatedly and insistently claims that it violates the iClear code of ethics and holds that as like a stick to the like hit the authors with like code of ethics this is against the code of ethics what's interesting is that at some point the program chairs commented on this as well saying that the program chair committee and ethics chair have been following This Thread closely upon preliminary investigation the ethics chair find that the use of the b word it's not the b word is it possibly emerging issue but not yet a major ethics issue that could justify rejecting research there seems to be no widespread agreement that the b word is offensive this discussion between reviewers and authors is still valuable to your community which raises awareness of this potentially emerging issue we appreciate the thoughts from the reviews and they said that this is essentially now resolved by saying you know reviewer you made your point but we don't agree with the point the reviewer responded again lengthily pointed out that this violates the iClear code of ethics now in the end you could say it's all good and the program chairs came in and essentially squashed the reviewer and said okay the paper is fine you can use the word Byzantine it's not problematic all good but I strongly actually believe that this is a Big Win For This reviewer right here because the ethics chair the appropriate response would be shut off you're an embarrassment to the scientific institution and you're barred from reviewing any more papers for any other conferences this is a joke shut up but they didn't do that they essentially said yes to the reviewer it's a possibly emerging issue because they've seen that there was quite a bit of uproar in the community that's such a what is essentially a technical term that is no one absolutely no one except this reviewer feels is not appropriate was used the Essex chair said yes it's possibly emerging so this is like a groundwork for the future this is how these things slip in there I have full conviction that people who write these codes of Ethics do so with the best intentions at least most of them I do believe some of them predict exactly this and this is how you again and again slip these things in so one person makes a fuzz you take the temperature of the community it's like ah not yet ready but we have now precedence right so at the next conference the same reviewer can make a fuzz again and they can point back and say well other people you don't know it's the same review other people have said this before so actually this might actually be problematic and the ethics chair here seems to be bound by the fact that someone said this is ridiculous shut up however they do so in the most lenient way in the most way that guarantees that in the future this will actually become a problem so in my opinion big win for the reviewer right here complainers and I don't like it

### Scaling Transformer inference [21:40]

Google has a new paper called efficiently scaling Transformer inference on how they scale their big home models on tpus now it is not going to be very applicable for most of you but in case you care on how they enable something like 32 larger context lengths and super duper flops and super duper Hardware utilization during large batch process sing give this paper a read also from Google The Google research Blog has an entry called infinite nature

### Infinite nature flythrough generation [22:10]

generating 3D fly-throughs from still photos this is on top of a paper that they published at eccv which generates infinite views or infinite fly-throughs as the title says and the cool thing is this happens from still images so you can give a single image and it will generate a fly through from that image they use various techniques for that but the base idea is that you take an image and you predict its depth map so how far away all the stuff is and then you use that in order to render the image from a slightly different view if you know how far away all the things are you can position your camera slightly differently and you can still determine where the pixels go now this will leave some pixels to be undetermined because you can now see behind things that you didn't see before and then you have another model here in this refine step that essentially fills in these missing pixels and then you repeat again you pose the depth map you adjust your camera position tiny bit and then you fill in the pixels that are missing in order to train this it's not exactly super easy but there are some various techniques called cycle consistency or what they do right here they have an adversarial setup they have a discriminator to determine whether after a number of steps the image still looks like it's been generated from a real like nature image and if you back propagate that error then you can generate very long very high quality fly-throughs through nature here you can see a bunch of examples what I do find interesting is that they also added a specific Sky model in order to make you feel like the sky is more real I suspect their original works that the sky was often the problem and looked unrealistic so now everything that sky here is produced actually by a separate model as far as I can tell

### Blazing fast denoising [23:55]

paella I hope that's how you pronounce it is a new paper that also does text to image however this one is speed optimized so in order to do diffusion you have to take some bit of noise and then run it through the diffusion process step after step there are various techniques to speed this up and Paya supercharges them and manages to do the whole diffusion process in only 10 steps which amounts to only 500 milliseconds So within only 500 milliseconds you have a high quality image from a given piece of text again amazing progress in a field that is super young check out paella there's corresponding paper to it called Fast text conditional discrete denoising on Vector quantized latent spaces now if you enjoyed the previous paper on

### Large-scale AI training with MultiRay [24:45]

how to scale up Palm then you might also enjoy multi-ray which is by meta and the blog post is called optimizing efficiency for a large scale AI models this describes the system called multi-ray I've read the blog post and I have to say it's kinda wishy-washy you have to guess a lot of the stuff they just kind of describe in words what it does and they link to various things that they've done but I can't exactly read out you know what precisely they're doing right here but if you need some inspiration of how a system like this would work or you know some hints of how this is really done in practice at scale and give this blog post a read archive pairs up with hugging face so

### arXiv to include Hugging Face spaces [25:30]

previously hugging face has acquired hugging face spaces from gradio which allows you to make little demos out of your hogging face repositories and now archive includes those spaces so if you upload a paper to Archive you can attach a demo from a hugging face space so people can directly on archive try out your model if you have one or your Technique and do so interactively this is very cool and obviously I'm a big fan of integrating interactive things into our very old format of eight page PDFs okay we got a bunch of new models this

### Multilingual Diffusion [26:10]

week the first one is alt diffusions by flag AI which is a multi-lingual diffusion model so this is essentially stable diffusion but multilingual as you can see right here English Chinese Spanish French Russian Japanese Korean Arabic and Italian next is dmox by meta which is a music Source separation model

### Music source separation [26:30]

so this thing you can put like a song in there and it will separate the sources meaning it will separate things like drums and vocals and isolate those perfect for practicing something doing karaoke and whatever you want to do with it the paper is called hybrid Transformers for music Source separation and it's an archive there's a new

### Multilingual CLIP [26:50]

multilingual clip available from Lion trained on their own data set the lion 5B and it reaches 77 zero shot on imagenet in English and around 55 for Italian Japanese and Chinese and supports over 100 languages the cool thing is that it's very efficient the training because it uses locked image tuning which we've discussed previously in a video so check out the model and check out locked image tuning if you haven't seen it yet it is really cool paper and a cool and simple technique in

### Drug response prediction [27:20]

other news a research group at the citizens University of New York has released a model that can accurately predict the human response to novel drug compounds now they're certainly not the first people to release such a model this has obviously been going on for as long as data science has existed but also it's cool to see that even in this front in the drop Discovery front giant progress is being made on the back of what started out as cat image research alright some helpful things for this week we have quite a lot to get through

### Helpful Things [27:50]

so let's get into it this is a pixel art Sprite sheet generator if you're into old games into Sprite animations and so on this is a stable diffusion based model that will create the Sprites for you given a description look at this I typed in Fat Joe prompt extend is a model that will extend your prompts so here is an example you type in psychedelic liquids space and it will append what it thinks that stable diffusion needs to give you what you want so this is like a little bit of a translator between human input and whatever a very competent human using stable diffusion could do with all the modifiers such as concept or sharp Focus illustration Unreal Engine and so on there's a new blog post on hugging face telling you how to fine-tune whisper or multilingual ASR but you can fine tune whisper for whatever you want this blog post is your point of entry dream texture is a plugin to make blender interact with stable diffusion so here's a demo person types into blender whatever they want as a texture in terms of text and then boom apply and it's now in the texture absolutely great the YouTube channel Mutual information has a series on reinforcement learning that I can highly recommend they spend a lot of time on this and I hope it is helpful to anyone who's looking to get into RL lovely tensors solves a problem we all have had in the past so if I just want to print some tensor I'm gonna get this and it's absolutely not helpful at all as soon as your tensors go beyond like four or five values it's useless to just look at them so all you do is you import lovely tensors you monkey patch that stuff in and all of a sudden if you print a tensor a numpy array a torch tensor whatever it will give you the shape the amount of elements statistics means the standard deviations and so on this is a much better way to look at tensors now if the tensor is small enough it will actually show you the values but as soon as it's bigger than that it will give you much more useful information so here it warns you that there is Infinities there's nands in the tensors and so on and even here it tells you well this one is actually all zeros you can still get back to the original tensor using sort of property access here you have verbose access that will give you the values even if it's large and here you get the just the plain old way if you really want that there are various helper methods around this also to show images to show statistics to show channels and to show things such as different filters in a stack of convolutional filters I'll leave you to explore all of that yourself but if you work with tensors a lot in an experimental sense this is surely worth it GPT index is a technique to build an index out of files using GPT so this uses GPT to essentially take a bunch of files and then for example recursively summarize them so that you essentially have a structure where you have a summary on top of a bunch of stuff and then if you like one of them you go into it and then you have summaries of the sub stuff that is there you go into that it's kind of an experimental I want to say this is a bit of a new way of thinking about what we could do with these models in order to organize information now that we have generative capabilities and I like that people think out of the box so if you're also interested check out this repository there's a new upscaler for stable diffusion made by Riverside Wings The Notebook is by and Shepard and compute has been sponsored by stability AI The Notebook here runs you through the whole process of up sampling and it gives really cool results I've previously talked about dags Hub Dax Hub is like a bit of GitHub for machine learning and I know a lot of places claim this nowadays but axub really believes in the open source Paradigm and now they release something they call Direct data access and essentially a technique to stream down and upload version data to some place they essentially connects a DVC which you might know as like a data versioning tool with a transparent approach where you don't need to like pull all the whole data at once or you know stream it in some custom way you can just treat it as if it already existed and magically the library in the background will pull down the data as you need it in a streamed fashion so no long waiting on data to arrive you can just simply let go train and even if you don't have space for the whole data it will still work now I don't have exactly time here to explain you all of the things that you can do with it but the install is really simple you essentially install their Hooks and everything works just transparently and magically so if you're interested check it out and also check out their blog it's regularly updated for example here is how to build an end-to-end active learning Pipeline with fully open tools GN is a GPU environment management tool lets you easily control configure and monitor the GPU resources that you are using and it is intended to ease up the process of GPU allocation for data scientists without code changes so this is in case you're in some lab and you share gpus with others this tool is a must have I wish that this had existed during my PhD it manages local gpus remote gpus cluster gpus and so on you can reserve gpus free up gpus essentially whatever you want to do it has even a vs code plugin so if you're at all using gpus and especially if you're sharing them consider this tool mbxp is a multilingual Benchmark for code completion in 10 plus programming languages Tsai is an open source package intended for applying deep learning to time series on top of pytorch and fast AI colossal AI has released two blog posts both pertain to better and faster and cheaper training of models the first one is what they call aigc AI generated content which essentially means image generation models and the second one is for structure prediction of protein monomers and multimers and both times they're able to speed up these models by a lot now the code is openly available so do go and check it out and the performance gains here are not only during inference like we saw before but this in fact provides for example for stable diffusion 6. 5 times faster training and pre-training cost savings so the hardware cost of fine tuning can be almost seven times cheaper than if you were to do it in the vanilla way appvid is a benchmark for tracking any point in a video super gradients is an awesome library to build train and fine-tune production-ready deep learning state-of-the-art Vision models now we've seen a lot of libraries that you know claim to just make stuff better if you're into Vision I believe having like a library that's specific for vision such as semantic segmentation or bounding box prediction or even image classification it really pays off to have a library that's dedicated to your field especially if it's something like Vision where we have a lot of custom techniques that make these models just so much more efficient and better but not only that super gradients also provides a lot of pre-trained checkpoints so even if you're just into using some models this Library might be good for you shumai is a network connected differentiable tensor library for typescript and JavaScript as you can see in this demo what you can do is you can define neural networks in typescript and then you can distribute them over multiple places over multiple machines and you can use the await like the async awaits syntax from JavaScript in order to ship data to some other machines or call some function on another machine and the library handles everything from you from forward propagation to back propagation and training it's really cool and the API for this looks quite clean safe tensors by hugging phase is a new format who store and load answers safely I previously done a video where I showed how you can like smuggle a remote code execution into the hugging face Hub because the models essentially use the pi torch loading function and pytorch in turn uses the pickle function by python which executes arbitrary code safe tensors is supposed to alleviate that by defining a safe fixed and simple format to store tensors now obviously the trade-off here is that you can't store arbitrary things anymore if you want to store arbitrary things you have to allow arbitrary code to be executed so while I expect that a lot of architectures might switch to something like safe tensors it is not a full solution for the problem For Better or Worse research will come up with new things new ways of doing things and if you constrain yourself to a particular way of doing things then that will always not be enough however it's mostly gonna be enough Velo is a learn Optimizer and the cool thing here is that it really seems to be better than or at least on car with very hand-tuned optimizers so you might know optimizers as stochastic gradient descent or Adam or something like this but it is possible to learn an Optimizer so to learn a system that controls the optimization behavior of a training run of another system these people have taken a lot of different ml problems networks have run optimization problems on them and have essentially learned an Optimizer that optimizes all of these different problems well so that's what we consider a learned Optimizer and this one really seems that for many problems especially like mainstream problems it works really well out of the box so without you having to tune you know the beta 2 parameters and the learning rate and stuff like this you just apply it in its default configuration and it does a pretty good job this is super important if you want to do rapid prototyping rapid exploration of some new ideas without doing a giant grid search overall the parameters the Merlin data loader is data loader specifically for recommender systems have you know few extra or a few special requirements namely there's often quite few data I want to say compared to some thing like an image classifier like the data points are mostly tabular and they're not as many so loading from disk and loading like Pairs and stuff from disk often can become the bottleneck so a data loader is super important here and the Merlin data loader promises to be over 10 times faster over native framework data loaders if you're into recommender system try this out Loda is an Assembly Language a computational model and a distributed tool for mining programs this topic is very far away from me but some of you might actually be interested so if you're into integer sequences there are these online encyclopedias of integer sequences like one two three four or five and so on so there's sequences of integers and the question is always what's the program behind them like can I come up with a piece of code that produces that integer sequence into perpetuity and you know one two three four five is quite simple but it gets complicated very quickly and especially to teach machines to come up with the rules behind the a sequence is very challenging problem so Loda is a system that allows you to mine such programs essentially you can run it and it will crank crank and intelligently search for these programs but not only that it is also a distributed tool for doing that so you can distribute you can partake in mining of such programs and much more so as I understand this is about what a loader program looks like or what it searches for so here you can see one of these sequences and this is apparently the program it comes up with it looks pretty interesting if you're interested uh check loader out is a library for geometric algebra in Jax and numpy if you're into geometric algebra here's the example of a rigid body physics engine with a constrained solver then this Library might be for you mteb is a benchmark for text embedding this is from similar authors as the buyer Benchmark which is a retrieval and Mark but this goes further this is a benchmark that covers eight embedding tasks over 56 data sets and 112 languages and it also evaluates in this paper already 33 models on that Benchmark so the goal here is to find the one unified text embedding that covers all Downstream tasks and the status this far is that Universal embedding hasn't been found yet the leaderboard shows that some models are good at some tasks other other tasks so the Holy Grail of text embedding is still somewhere out there and this Benchmark might prove that you have found it okay the last cool thing I want to show you is not bot and this is already a little bit older than that Friedman tweeted this out in September but essentially he managed to connect gpt3 to the browser to a web browser and just let it interact with the web browser by prompting it in an appropriate way you know given the website's HTML structure so apparently the original idea comes from Sharif shamim and that bot has a repository on GitHub look it's just one python file I know half of you are super cringing right now but you know Research b research and if you want to figure out how it's done Hana what works and if you want to give it a shot yourself might be really cool to do so please do alright that was all from ml news this was a big chunk thank you so much for being here thank you for supporting the channel come to Discord if you're not already on it link is in the description we have fantastic paper discussions every week and we talk General machine learning every day with that being said stay hydrated bye foreign