# The Hardware Lottery (Paper Explained)

## Метаданные

- **Канал:** Yannic Kilcher
- **YouTube:** https://www.youtube.com/watch?v=MQ89be_685o
- **Дата:** 18.09.2020
- **Длительность:** 52:11
- **Просмотры:** 11,015
- **Источник:** https://ekstraktznaniy.ru/video/13313

## Описание

#ai #research #hardware

We like to think that ideas in research succeed because of their merit, but this story is likely incomplete. The term "hardware lottery" describes the fact that certain algorithmic ideas are successful because they happen to be suited well to the prevalent hardware, whereas other ideas, which would be equally viable, are left behind because no accelerators for them exists. This paper is part history, part opinion and gives lots of inputs to think about.

OUTLINE:
0:00 - Intro & Overview
1:15 - The Hardware Lottery
8:30 - Sections Overview
11:30 - Why ML researchers are disconnected from hardware
16:50 - Historic Examples of Hardware Lotteries
29:05 - Are we in a Hardware Lottery right now?
39:55 - GPT-3 as an Example
43:40 - Comparing Scaling Neural Networks to Human Brains
46:00 - The Way Forward
49:25 - Conclusion & Comments

Paper: https://arxiv.org/abs/2009.06489
Website: https://hardwarelottery.github.io/

Abstract:
Hardware, systems and algorithms researc

## Транскрипт

### Intro & Overview []

hi there are you interested in winning the lottery then let me tell you this video is not for about winning the lottery okay i've done enough videos with lottery in the title only for people to be mad at me for not telling them how to win the lottery this is about computer science research and very unfortunately the author of this paper has decided to put this word in the title so if you're here because you want to win the lottery this is not for you it's something completely different for everyone else today we're looking at the hardware lottery by sarah hooker of google brain this paper is it's kind of a mix it's part a of a historic look back at hardware and software developments in machine learning and it is a analysis of kind of the current situation and an outlook and sort of an opinion piece of the way forward and how hardware and software should mix and what we should focus on in the future so the basic principle is quite simple in this paper

### The Hardware Lottery [1:15]

it introduces this term the hardware lottery this essay introduces the term hardware lottery to describe when a research idea wins because it is compatible with available software and hardware and not because the idea is superior to alternative research directions okay so right off the bat um i think this is a statement where i think many people can agree or i think almost everyone will some agree with this statement in to a certain degree but certainly to a high degree right we are all aware that of course we have the hardware we have hardware is very inflexible it's expensive to develop and so on so any sort of software development any algorithmic development may simply succeed because it is suited to the hardware that we have so that was my first reaction when i read this paper it's a it's a very gut feeling of yes of course this is the case but then the historic analysis is also nice but i was wondering what is there a deeper reason to kind of go into this and we are going to see some pros and cons uh that i think in this paper right here where it i'm not exactly entirely sure what specific point it's trying to make the overarching point i completely agree with uh the fact that of course what hardware is here is important and may lead to certain ideas succeeding but it i have trouble with the narrower points and i'm gonna try to illustrate this in this paper while also telling you what the paper says so first of all um here the term is called the hardware lottery but off the bat you already see that it says a research idea wins because it is compatible with available software and hardware so the hardware lottery right off the bat is connect is means that also the software is there so it's technically the hard end software lottery and the bigger question i would have to someone arguing that really the hardware lottery is an important concept to have is why what does what distinguishes the hardware lottery let's even say it's just hardware what distinguishes the hardware lottery from any lottery like why can't i say okay there's the x lottery and the x lottery is as any circumstance that surrounds a research idea right here you have idea one idea two idea three and they all depend on many circumstances and x is one of those circumstances and it just so happens that the circumstance in the world favors idea two and a different circumstance would actually favor idea one what's so special about hardware other than it's more expensive than software right to um to illustrate this further let's say okay you have hardware and you say well hardware is expensive but then again um you can sort of build a hierarchy where okay down here there is like ideas they depend on software like software frameworks that we have such as tensorflow pytorch these again depend on particular hardware but and you can say okay the hardware is much more expensive so we we are not as flexible and the ideas might just succeed because of the hardware but then you can go even step further and say well up here is sort of the consumer um if you don't like the market term then maybe say the society the end user and so on because the hardware ultimately is directed towards what humans in society need and that changes over time as well so and and it's way more expensive to change the needs of human society than to change the hardware so i can just also claim okay x is now society so the one particular research idea down here might win simply because it is more suited to the current societal needs and that kind of carries over and you might say well make doesn't that make it a good idea preferable to idea 2 preferable to idea 3 over here that would just optimize for a different society which leads us to the question what does it mean to first win here it just says a research idea wins and you might have an idea so i have an idea it's not clearly defined here but maybe winning means that a lot of researchers actually research in that direction and the other question is here and not because the idea is superior to alternative research directions and here my question would be what does superior mean what does it what does it mean for an idea to be superior as i said here certainly if an idea is more incongruent with current societal needs you might claim it's superior and someone else might say well if societal needs were different then a different research idea might be suited better the same way someone could say well if hardware was different then a different research idea might be better maybe you can say if hardware was different a different research idea might be better suited to the current needs of society but then i'm pretty sure i can go two three four levels up here um again so these terms are a bit vague i think we can all the again the initial sentiment when reading this is absolutely in favor right i absolutely agree i don't want to trash this i just want to sort of um i try to think a bit deeper about what is actually said here and this is where sort of my troubles start so let's dig a bit into the historic part and i think the point the paper is sort of trying to make is that um not yet that there are specific hardware choices that were made at one particular point and because it's so expensive to change hardware um that means that a lot of researchers simply go along with whatever ideas work on that particular hardware that's available and other research ideas are neglected simply because the hardware isn't available which again this is a sentiment that i think we can all agree with so the first part here the paper is

### Sections Overview [8:30]

in the following sections and this is a bit important to keep in mind as a red thread because i feel one can get lost in the details of the paper so in the first section two we ask what has incentivized the development of software hardware and machine learning research in isolation um oh we need to read this first this essay begins by acknowledging a crucial paradox machine learning researchers mostly ignore hardware despite the role it plays in determining what ideas succeed so the argument is that we develop ideas independent of hardware but also we don't it kind of makes it a double point it says that we think we just think about ideas but the ideas we might think about may be shaped by the hardware that's available and if we're not aware of that we might not we might not see other ideas as viable so section two asks what has incentivized the development of software hardware and machine learning research in isolation so where does this come from that we don't think about the hardware that's at the end um section three considers the ramifications of this siloed evaluation with examples of early hardware and software lotteries so this is the kind of risk historical look back then today the hardware landscape is increasingly heterogeneous this essay posits that the hardware lottery has not gone away and the gap between the winners and the losers will grow increasingly larger so this is a point that the paper basically makes that um this hardware lottery has not gone away so right now we are in this hardware lottery and it does so specifically with regards to saying that chips like gpus and tpus and even more specialized chips are optimized to neural networks and that's why the whole world sort of over focuses on neural networks right now and discards other research ideas and the gap between the winners and the losers will grow increasingly larger meaning that the research ideas that are seen as inviable now if we develop even more hardware into that direct into the direction of neural networks those research ideas will become more and more inaccessible to the community then lastly sections four to five unpack these arguments so the ones that we've just seen section six concludes with some thoughts on what it will take to avoid future hardware lotteries all right so section two here is this sort of historic look back and it goes from these um it the point is here separate

### Why ML researchers are disconnected from hardware [11:30]

tribes so the point is that something has made it such that the communities the software communities and the hardware communities and the idea let's say the idea communities the researchers in ai algorithms let's call them the algorithmers they don't think that much about each other and it makes the case that early machines were super duper specialized um early machines were single use were not expected to be repurposed for new tasks because of the cost of electronics and the lack of cross-purpose software so early machines early computing machines were just single purpose and so on but that all changed when the whole world focused on sort of general purpose cpus that could execute any instructions of course according to touring machine or of von neumann architectures so the point that the paper makes is at some point a shift happened the general purpose computer area crystallized in 1969 when an opinion piece by young engineer called gordon moore appeared in electronics magazine with the app title cramming more components onto circuit boards that's a cool title so this famously gave rise to moore's law or predicted you could double the amount of transistors on an integrated circuit every two years and this sort of held true um where people stopped building general like sorry people stopped building special purpose hardware but invested just more and more into building these general purpose chips these cpus um that and the reason why they stopped making specialized hardware is any specialized hardware you build will simply be surpassed uh by the next generation of cpus so even if you make a specific purpose hardware for some problem you just have to wait like one or two of these cycles and ordinary general purpose cpus will simply have will overtake your specialized hardware and since cpus are general purpose the market for them is naturally huge so this has made it such that what was mainly developed was general purpose cpus i think the paper wants to make the point though i'm not in exactly sure i think it wants to make the point that even though the cpus might be called general purpose they aren't general purpose like they have their specific advantages and disadvantages and that's going to hurt for example neural networks in the years following this so in conclusion to this chapter they say in the absence of any lever with which to influence hardware development machine learning researchers rationally began to treat hardware as a sunk cost to work around rather than something fluid that could be shaped however just because we have abstracted away hardware does not mean it has ceased to exist early computer science history tells us there are many hardware lotteries where the choice of hardware and software has determined which ideas succeeded and which fail and the example is kind of the charles babbage's analytic engine that charles babbage designed but was something like 50 years earlier or so then parts could even be manufactured for this idea to succeed and we know many stories of these people being ahead of their time and they have this interesting quote i think somewhere from silicon valley here being too early is the same as being wrong and this paper of course focuses on hardware but to come back the conclusion of this chapter is that because uh of this general purpose area because the entire focus was on building general purpose cpus this has led to people not really having integrated thought of hardware software algorithm but treating hardware as this thing that can execute any instruction and then the algorithm comes on top of this sort of black box that we can't really change we just have the hardware we have um yeah which comes back i'm and again i'm not sure like sure they did sure i agree that the entire world focusing on general purpose comp cpus has some influence but certainly hardware is just expensive to make so you could argue that even if this hadn't happened a machine learning researcher wouldn't necessarily think about the hardware but they would at least have a choice if there were a selection of hardwares right okay so that was the section two

### Historic Examples of Hardware Lotteries [16:50]

section three now we really go into the historic evidences and there are kind of early historic evidence like this uh charles babbage's machine that he invented um an early example the analytical machine in 1837 um and no it wasn't even decade it was only surfaced during world war ii in the first part of the 20th century electronic vacuum tubes were heavily used uh were heavily used for heavily used this is i i've not i've noticed a number of typos in in the paper i realize it's a preprint if the author is listening um i can also make a list but this one just popped out uh for radio communication and radar during world war ii these vacuum tubes were repurposed to provide the compute power necessary to break the german enigma code so it would be long after not only after charles babbage invented this machine but even after he died that people would sort of re-take and in some parts reinvent his ideas to build modern computers the big example though that the paper makes is what it calls the lost decades and this is the story of neural networks coupled with two things with an ai winter and uh a focus on expert systems and maybe also though that's not entirely mentioned here a focus on things like svms so i think it's widely known that the main ingredients for neural networks are very very old so here the paper gives some examples back propagation invented in 63 reinvented again and deep convolutional networks paired with back propagation by and la con it says however it was only three decades later that deep neural networks were widely accepted as a promising research direction i think this sort of the timeline here is this here probably refers to around 2010 um shortly after that of course alexnet beats imagenet and so on but even earlier a bit earlier people were doing heavy research into neural networks and three decades later so this is paired with kind of these numbers right here let's say 1970 1980 when these ideas were invented presented but computers back then were simply unsuited to the to run neural networks here it says um the gap between these algorithmic advances and empirical success is in large part two to incompatible hardware during the general purpose computing areas hardware like cpus were heavily favored and widely available cpus were good at executing any set of complex instructions but occur high memory costs because of the need to cache intermediate results and process one instruction at a time this is known as the von neumann bottleneck the available compute is restricted by the lone channel between cpu and memory along which date has to travel sequentially so the paper goes on and says um there were some efforts into specialized hardware for neural networks but funding was kind of not there and other specialized hardware was more into the direction of popular ideas than like prologue and lisp which could do expert systems and not necessarily neural networks and only it would take a hardware fluke in the early 2000s a full four decades after the first paper about backpack back propagation was published for the insight about massive parallelism to be operationalized in a useful way for connectionist deep neural networks a graphical processing unit was originally introduced in the 1970s as a specialized accelerator for video games and developing graphics yada gpus were repurposed for an entirely unimagined use case to train deep neural networks had one critical advantage over cpus they were far better at parallelizing a set of simple decomposable instructions such as matrix multiplications multiples multiplications um multiples i don't know so the point here is that the ideas were around for a long time but it would take gpus to make them work and so the image that the paper builds up i think is that you have these you're here and you research and then you have a decision to make which hardware do i build for the future and there are two directions this is direction one and this is direction two and let's say for whatever reason direction one is chosen okay then because it's so expensive to build different hardware the world largely goes with direction one and builds on top of that okay so that also means that all the research ideas uh that profit from direction one will appear to be much more effective that research ideas that would have profited from direction two and it sort of says that neural networks are over here and it's sort of the and the let's say the other systems what do we give expert systems let's call them expert systems and other types of ideas were over here and they appeared to work really well until um they stopped in progress and then by accident sort of this road here was traveled use with gpus so it was not obvious but by accident still this was developed and then neural networks could flourish and if it wasn't for that fluke video games basically or animation we would have never known that neural networks work as well as they do so again that's the point the paper makes and i think we can all agree uh with that particular point but i want to again i want to build up sort of a different picture right here in that um why is only like i feel hardware is considered a bit much here so i think you can make the general case that at any junction you have several things you can choose and then once you choose a thing all the things go in that direction like new ideas will be more in that direction also new hardware because a lot of people research on it the paper also makes the point there is kind of this feedback loop but let's say neural networks were down here what i would argue and this is a bit of a point that paper makes in a half formulated way i think is that it basically says that had we invested in matrix multipliers in gpus instead of cpus in these early years that means that neural networks would have sort of succeeded as an idea at that time and i'm not entirely convinced of this because first of all you can see right here gpus were actually around in the 1970s so the hardware was available it's not like it was super easy in 2010 it was for these early researchers to build their code into gpu compatible code that was certainly hard especially if you read the papers but it would have been hard in 1970 as well it would not have been significantly harder i think so i i'm not sure if the picture is really like this or if the picture so if this is the cpu direction is more like that neural networks are actually somewhere up here and the fact is we we actually needed the good cpus in order to develop the in order to make use of the gpus right and this here would be gpu um in order to make use of the gpus to then enable these neural networks on the gpus because certainly it has helped a lot that cpus were built that um you know computers just built on gpus would be sad computers built on cpus are cool they can do multi-processing they can do internet they can do actually they can do most of the video game except display the graphics and very arguably that without the heavy focus on cpus we would not have neural networks today even if we had invested all of that effort into building gpus because society has just advanced so much because of cpu so i'm sort of tempted to challenge this notion here that uh just because of the happenstance that cpus were advanced at that time that neural networks are didn't have their breakthrough back then i think we needed both um that being said i do agree with the paper that we might have never ever realized that neural networks worked if it weren't for the fact that there is specialized hardware around um yeah so that would be my points to this the paper makes yeah makes this point about okay there is hardware lotteries and in so now it also introduces software lotteries though it said at the beginning that hardware lotteries included software but i'm going to guess that the general concept of a lottery was simply presented and again i don't see exactly what's so special about hardware because again i can make the same case for software it's just a shorter time frame i can make the same case for theory right like whatever now neural tangent kernels are are the hit right everyone's like wow ntks who knows right but some big names announced this and some theory has been done in this direction and because there is already a big momentum lots of people publish in it who knows if that's a good idea or if there were other ideas that had we done the fundamental work in this uh would flourish right now i again i don't i agree with the sentiment i don't see why the hardware is the uh is such a special case right here so the next thing that the paper looks

### Are we in a Hardware Lottery right now? [29:05]

like is kind of the current day so it tries to make the point that we might be in a hardware lottery right now and again the intuition of course is yes of course we have the hardware we have it's difficult to change especially since hardware builds up on hardware with the tree i drew before let's draw it again you draw a tree and literally every decision you make in the tree and this doesn't only need to be hardware right every single decision you make will mean that pretty much all of the previous choices here are now fixed and ingrained we build upon inventions of the past it's impossible to go back and do all of these things again and if you see something curious right here and this is where we're going to later i want you to see what happens if here is a good idea like here is my super duper booper idea and simply didn't make the cut for that choice like someone chose a different hardware direction software library direction what not it wasn't in vogue and my idea was unpopular then if one choice is made this choice right here it's hard to go back if two choices are made right that build upon each other it's even harder to go back so as time goes on it's harder and harder to go back which is a point that the paper will make at the end that the difference between the winners and the losers is getting bigger and bigger which is an effect that this idea that once was a curiosity that could be investigated becomes a very costly investigation because we need to reinvent and re-engineer a whole bunch of decisions and if at with time goes on it's simply forgotten because there's so much that we have built past this um however this is for the loser right this is the loser however for the winner i i disagree right here because here it says okay this direction the idea direction here let's say there is a super cool idea that would beat neural the crap out of neural networks um what not whatever the latest schmidt hooper paper is uh that idea would be neural networks and this here is neural networks and everyone's doing neural networks and schmidt uber's idea is just forgotten about now to say that neural networks are the winner and the winners will increase and increase is correct but it forgets that right here there is this whole branching so within the neural networks you have again this branching and maybe over here what kind of neural networks were completely forgotten like mlps no mlps are maybe still a thing um i don't even remember like early neural networks where 10 h non-linearities for mlps or something like this nine by nine filters in convolution things like this right uh we it's sort of the nine by nine filters are technically in the class of neural networks but as time progresses and this branch here are the three by three filters which are massively out competing the nine by nine filters so the nine by nine filters are forgotten and it could be that if the 9x9 filters no sorry because of the 3x3 filters now we have specialized hardware that is exclusively focuses on 3x3 filters so we go down this route down this route that understood and there might have been some other super duper idea down here that only works when we have really big filters and now we never know that this existed right um so to say that the difference between the winners and the losers gets bigger and bigger sort of misjudges that these winners will be fractionated and fractionated and every push in one direction comes with costs to these other directions within that winner branch but this is i don't yeah ultimately you know you have a choice do i want to go back and go this direction or do i want to add something here it might just might be worth more for society to go up here the paper is going to argue at the end that we should sort of keep funding uh alternative directions in hardware which i think is always a good thing to not lock in on particular ideas but also uh you can you sort of have a have to strike a balance because you know researching on things that already work and make them better is a crucial part as well because you can discard these sub ideas that don't make any sense all right so it gives some examples of current hardware lottery winners um to improve efficiency there is a shift from task agnostic hardware like cpus to domain specialized hardware that tailor the design to make certain tasks more efficient the first examples of domain-specific hardware released over the last few years tpus and then it also says edge tpus cortec arm cortex m55 facebook's bixer which i think is just like a box with eight gpus in it and some infiniband um optimize explicitly for costly operations common to deep neural networks like matrix multiplies so here i have again there's this double meaning so it says here is task agnostic hardware like cpus but at the same time it argues that cpus are particularly bad at matrix multiplies so it's not really task agnostic it's just focused on different tasks but i see what the paper means right here we do build hardware that make matrix multiplies faster which means that neural networks that benefits neural networks research closer collaboration between hardware and research communities will undoubtedly continue to make the training and deployment of deep neural networks more efficient for example unstructured pruning and weight quantization are very successful compression techniques in the network but inc are incompatible with current hardware and compilations kernels um i don't know what that means but it's incompatible with current hardware the paper argues that because we see that these ideas are good there will be specialized hardware for them and i think the point the paper is trying to make is sort of like see another win for neural networks because we go down the neural network road people focus on neural networks focus on how to prune them and so on hardware will be developed which will lock us in further into neural networks which again is paper is basically saying like look because we went this road right here we're gonna go this road a lot more but then what you have to see is that if we in if we then from this road go here because we do want to do weight quantization in this particular way we also are going to neglect this which would be doing some whatever other thing that we could do um yeah so there's always in each decision there's a branching undoubtedly the paper is correct and it says the branching decides the future but i think the focus here on hardware and neural networks versus non-neural networks is a bit it's very specific to that thing it then it makes the point why it matters so why it matters because the paper says okay um where is that here in 2019 a paper was published called machine learning is stuck in a rut the authors consider the difficulty of training a new type of computer vision architecture called capsule networks and i kind of realized that capsule networks aren't really suited to current um to current hardware and he says whether or not you agree that capsule networks are the future of computer vision the authors say something interesting about the difficulty of trying to train a new type of image classification architecture on domain specific specialized hardware design has prioritized delivering on commercial use cases while built-in flexibility to accommodate the next generation of research ideas remains a distant secondary consideration which is true though i would also say i mean gpu cpus and gpus combined are extremely general operations like they're very generalized okay gpus are good at matrix multiplies but cpus are good at a lot of other things so i would say the gpu cpu combo is a very very flexible general purpose hardware design that doesn't lock you in too much and maybe maybe it's just that capsule networks are by algorithmic design way harder to implement like to build specialized hardware for capsule networks i'm not sure if that would even be possible and to speed them up to the degree that cnns are sped up by gpus just out of the algorithmic nature of capsule networks and i've done videos on capsule networks they sound pretty cool but they also sound like implementing the thing in hardware is going to be quite tough even if you build specialized hardware they also go into gpt three

### GPT-3 as an Example [39:55]

claiming that um so current the paper claims that because we are kind of locked in this neural network uh this neural network paradigm in this kind of hardware several major research labs are making this bet engaging in a bigger is better race in the number of model parameters and collecting ever more expansive data sets however it is unclear whether this is sustainable an algorithm's scalability is often thought of as the performance gradient relative to the available resources given more resources how does the performance increase and they go into examples here that you can scale up the parameters which gives you less and less of a gain so it's like this diminishing return over time which it brings up gpt3 which i find interesting because gpt3 showed in a way okay it was in log space but it showed a fairly linear decrease in perplexity so a long linear decreasing perplexity given more parameters which goes a bit against the narrative of the paper and also in terms of this definition up here given more resources how does the performance increase i see the fact that you say well it's 12 billion sorry 12 million dollars to train gpt3 says right here on the other hand i would say what's the cost of you know building specialized hardware to research alternative research directions by the way we have no idea what alternative research directions work so the only thing we could do is fund all hardware and if we had to fund all hardware for other algorithms then select the ones that are promising then invest more and so on 12 million dollars will get us nowhere which i think is a point the paper is trying to make but from a efficiency perspective given where we are now it's actually more viable to build gpt free which again i think this is something the paper agrees with but um at the same time it tries to make the point that look we are investing more and more and we're getting less and less out of it maybe it's time to go a different route in terms of hardware but that's going to be more and more expensive the more we go into this neural network direction i'm not yeah i'm not sure about this again if you think of this tree um the paper basically tries to argue that what gpt-3 is trying to do is it's trying to make a push up here into the next kind of push the frontier on the path that we have gone uh for a while and the paper is trying to say that had we gone had we imaginarily gone a different path down here a equally hard push in this direct in a direction would maybe yield a better result yes maybe but yeah um but the question is it at what point does it become viable to sort of abandon this entire direction and skip and kind of start there because we would need to do the whole tree thing again and then within the tree the same

### Comparing Scaling Neural Networks to Human Brains [43:40]

logic applies it does though make a good comparison to the human brain which works fundamentally different it says um while deep neural networks may be scalable it may be prohibitively expensive to do so in a regime of comparable intelligence to humans uh an apt metaphor is that we appear to be trying to build a ladder to the moon sort of saying that we can't we can't the way at the rate where we scale neural networks right now um it's not conceivable that we reach human level intelligence by simply scaling them up which is why we might want to investigate different entirely different directions and entirely different hardware choices um yeah which you know granted um that's correct though i would say transformers aren't particularly suited to the hardware because they require such huge memories and gpus traditionally have been rather limited in memories uh in memory sorry and transformers still kick ass on these on this hardware even though memory is extremely limited compared to like cpu memory and only now do we see gpu manufacturers focus on more memory so you can argue from the perspective of the paper and say c because we have neural network hardware now people are building more neural network hardware but also you can say that initially a bad choice was made sort of but researchers still managed to demonstrate transformers would work and now the hardware is developing in this direction which is also i think the paper argues at some point again i i have a hard point parsing out a a direct point here i think the paper is more meant to make you sort of think about the different points it brings up which is also probably why this video is

### The Way Forward [46:00]

more of me rambling than anything else so here it says that currently there are some initiatives to build other types of chips other types of hardware and so on but they as well as the last ones they might be not enough because it takes producing a next generation chip typically costs 30 to 80 million dollars and two to three years to develop and even that is however even investment of this magnitude may still be woefully inadequate as hardware based on new materials requires long lead times of 10 to 20 years in public investment and is currently far below industry levels of r d um this is the kind of darpa and china who funded research in this direction so the paper says it might be way too little though it also says there are a couple of good uh lights at the end of the tunnel saying experiments using reinforcement learning to optimize chip placement may help decrease cost i think i've done a video on this paper there are also renewed interest in reconfigurable hardware such as field program gate arrays and course grain reconfigure configurable arrays so this is hardware that you can sort of meta program so you can take the hardware and you can specialize it by programming it and so it's kind of meta programming it you can sort of take one of these things and make it into like a sort of a gpu if you need it like that and then you can reprogram it program it differently for a different application though if again if i take the other side of this paper i would say well isn't that the same thing that cpus were and yet still cpus made it almost impossible for neural networks to run aren't you even though fpgas are very general aren't you making implicit choices on the ideas that are very well suited to fpgas or using reinforcement learning to optimize chip placement isn't that the exact same thing um yeah again i guess you can make this argument at in like at infinitum infinum no infinum is different okay this video must come to an end so the last part here says that uh what is also needed is kind of a software revolution that there is a shorter feedback time where it imagines software that tells researchers which hardware their algorithm is particularly suited or how their algorithm would fare on different hardware such that if you invent a new algorithm it doesn't work on a gpu you could sort of submit it to the software and then the software would tell you well that this would work really well if type x of hardware existed and then you can maybe invest money into that rather than discarding your idea

### Conclusion & Comments [49:25]

in conclusion yeah it doesn't the conclusion isn't very long the performance of an algorithm is fundamentally intertwined with the hardware and software it runs on this essay proposes the term hardware lottery to describe how these downstream choices determine whether a research idea succeeds or fails today the hardware landscape is increasingly heterogeneous this essay posits that the hardware lottery has not gone away and the gap between the winners and losers will grow increasingly larger in order to avoid future hardware lotteries we need to make it easier to quantify the opportunity cost of settling for the hardware and software we have and my conclusion is i generally agree with this paper i really appreciate the historic overview um but i do think the focus is it centers too much around hardware where i think this lottery case you can make for literally any single branching choice and maybe you weigh that by the cost that it takes to revert or change that choice in the future and it also focuses a lot on neural networks versus non-neural networks where it kind of yeah this winners and losers thing uh where it says neural networks are the winners and if we investigate more into neural networks then they will remain the winners because of this feedback loop however it's kind of in my opinion discards the thing that within the neural networks in the next choice of hardware they're going to be winners and losers again and again and they're going to be entire branches of neural network research that are abandoned because they don't fit the hardware choices uh once more and this gap between what it's conceived the winners and the losers it only it compares losers in terms of an idea that was had uh in one year to the winners which are always reevaluated every year so um it's kind of not a fair comparison in my opinion and then also um no that was it for me yes i do i do implore you if you are interested in things like this as i said this is more of historical and opinion peace trying to make some argument and give you some directions to think about which is pretty cool as a change to a simple bland research paper all right that was it from me again if you're still here waiting for how to win the lottery this is not the video bye see you next time