# Matthew Stewart - The emerging world of ML sensors

## Метаданные

- **Канал:** Towards Data Science
- **YouTube:** https://www.youtube.com/watch?v=wE7MG20VT8g
- **Дата:** 21.09.2022
- **Длительность:** 41:34
- **Просмотры:** 1,036
- **Источник:** https://ekstraktznaniy.ru/video/45965

## Описание

Matt Stewart, a deep learning and TinyML researcher at Harvard University, is back on the TDS Podcast to discuss ML sensors and the challenging ethical, privacy, and operational questions introduced by them.

Intro music:
➞ Artist: Ron Gelinas
➞ Track Title: Daybreak Chill Blend (original mix)
➞ Link to Track: https://youtu.be/d8Y2sKIgFWc

0:00 Intro
3:20 Special challenges with TinyML
9:00 Most challenging aspects of Matt’s work
12:30 ML sensors
21:30 Customizing the technology
24:45 Data sheets and ML sensors
31:30 Customers with their own custom software
36:00 Access to the algorithm
40:30 Wrap-up

## Транскрипт

### Intro []

hey everyone and welcome back to the towards data science podcast so today we live in the era of AI scaling where it seems like everywhere you look people are pushing to make large language models even larger or more multimodal and they're leveraging ungodly amounts of processing power to do it but although that's one of the defining trends of the Modern Age of AI it's not the only one because at the far opposite extreme from the world of hyperscale Transformers and giant dense Nets is the fast evolving world of tiny ml where the goal is to pack AI systems onto Edge devices now my guest today is Matt Stewart a deep learning and Tiny ml researcher at Harvard University where he collaborates with the world's leading iot and Tiny ml experts on projects that are aimed at getting small devices to do big things with AI and recently along with his colleagues Nat co-authored a paper that introduced a new way of thinking about sensing now the idea here is to tightly integrate machine learning and sensing on one device so for example today we might have a sensor like a camera embedded on an edge device and that camera would have to send data about all the pixels in its field of view back to a central server that might take that data and use it to perform a task like facial recognition but that's not great because it involves sending potentially sensitive data in this case images of people's faces from an edge device to a server which introduces security risks and so instead what if the camera's output was processed on the edge device itself so that all that had to be sent to the server was much less sensitive information like whether or not a given face was actually detected now these systems where Edge devices harness onboard Ai and share only processed outputs with the rest of the world they're what Matt and his colleagues are calling ml sensors now ml sensors really do seem like they'll be an important part of the future and they introduce a whole host of challenging ethical privacy and operational questions that I'll be discussing with Matt on this episode of The Tourist data science podcast you are the first person who's done the hat trick on the tourist data science podcast so first off congratulations I think we'll have a trophy made out for you and uh and you can hang it on your wall or well I guess then anyway we'll make the trophy so you can hang it on your wall um great well I think this conversation is by the way going to be really interesting anybody who hasn't heard Matt's previous episodes on the podcast I do recommend checking them out especially the last one I think it was which talked about tinyml this idea of building Edge devices with machine learning on them that sort of math specialty so today Matt we're going to be diving into some of your more recent work on kind of a new kind of sensing strategy Paradigm for the world of modern day AI that I think it's going to get a lot of attention here this is something that I'm personally really excited about as well before we get to that though maybe a bit of a primer for people who are not as familiar with the world of tiny mL of edge machine learning can you speak a little bit to like what's this landscape like what are some of the special challenges that come up when you're doing machine learning on

### Special challenges with TinyML [3:20]

edge devices and what does the state of the art look like in this space yeah sure so maybe I should first give some context on sort of how this became a field in the first place um and so most people are familiar with sort of the um the growing model landscape so you have these large language models which you know take weeks to train and use hundreds of gpus um and they have the carbon emissions of you know like a small country or something like that uh and you know that's really sort of this bifurcation that happened where people were like okay well we have these huge models but there's also stuff we could do with the small scale as well um and you know this kind of Lent into the sort of iot landscape uh so if you think of iot you're thinking of cloud connected devices and so the cloud is fundamentally that sort of big ml space where you know all of the computation happens we call it sort of compute Centric so um you know we're sending our data up there and that's where all the fun stuff's going on and then they just send the results back and so if you imagine something like Alexa or OK Google you're talking to it and it's going to send this information to the cloud and it's gonna do its fun stuff and send it back so it's fundamentally like a dumb device it can't actually do any anything itself the only thing it can do is listen for you to say that word um and that in itself is an application of tiny machine learning so it's listening out for that specific word for you to say um and sort of every one second it's going through intervals and saying did you say Okay Google and you know most of the time you don't and if you funnily enough if you do say it but you it's not sure if you said it it's going to sort of prime itself to be like okay now I'm really listening out to hear if you're saying that word um but yeah one of the main issues with iot is you know you have to be connected to the cloud so it consumes a lot of power because you have to power the antenna and you know that sort of gets into the weeds a bit but generally communication is quite an expensive um thing to do energy wise so and also you know people can uh extract that information if they are listening in on it so there's also sort of a security loss and they're sending that data to the cloud and you have no idea where it's going or who has access to it or what other stuff it's being used for so really there's sort of some privacy security transparency concerns going on there um the nice thing about tinyml is you don't actually have to be connected to the internet or to the cloud because the intelligence is embedded on the system itself so you have a neural network that's on your device and your device is you know is doesn't have to be connected to the cloud it's uh precludes the possibility of someone intercepting that information yeah stealing your data or having the data stored on the cloud in the first place um and so there's a lot more stuff you could do with it you can put it on machines and do preventative maintenance by listening out for like you know when is the machines starting to sound clunky um so maybe I should you know go and check on it uh instead of waiting for it to break or um yeah you know it's in iPhones it looks to see if you're trying to talk to Siri or something like that I hope I don't activate my phone um but yeah so that's kind of the prime of tiny ml in itself but there's still some sort of privacy and transparency concerns because you don't know what that network was trained on that's on this device um yeah you don't know some well sometimes you if it is connected to the cloud you don't know what it is doing maybe it's just doing it for firmware updates but the point is you don't really know um and so that presents some additional concerns and you know this kind of did come to a head at one point when uh the Google nests the moment yeah thermometer I don't know if anyone's heard of this but basically they found microphones inside these thermometers and people were like well what the hell is this for like it's a thermometer why is it you know a microphone um and it turns out it was actually listening being used to listen for uh smoke alarms things like that but obviously if you have a microphone in there they could obviously use it to extract other information so really the transparency angle is something that you know as we start to get more and more cameras in our devices it's going to become a really big factor for people and you know we even had a professor who works in embedded systems and he bought a TV recently and he didn't even know there was a camera in it um so that kind of shocked him when he worked it out after about six weeks uh and you know if someone has learned it as that he works in this field doesn't know there's a camera in his TV then what hope do the rest of us have absolutely and it's it speaks to this like you know obviously privacy security all that but also just like consumer trust I mean to your point you discover One microphone One camera and one of these devices that doesn't seem like it should have one and this is a you know there's a major risk to the whole industry really as people start to wonder about this stuff now I think it's fair to say that you've done a great job giving a sense of like mistakes here this seems like a very important problem and one that maybe you know a lot of people haven't thought of through the security privacy lens but what about the headache so like you know the headache of putting machine learning on an edge device presumably they're all a whole host of different challenges that come up in that context can you

### Most challenging aspects of Matt’s work [9:00]

provide a little bit of an overview like what's the bane of your existence as a person who worries about things like tinyml yeah so I mean there's a few and um I would say one of the main ones is um the heterogeneity of the Computing platforms so that is to say there's lots of different systems that you have to work with because you know these companies build their own um circuit boards and sensor systems and really for it for planning machine learning to be successful it has to be able to interface and be deployed on each of these individual systems um and you know there's entire companies and infrastructures now uh the sole function is to do this one thing so that's kind of a problem that people are already sort of on task to solving um and there's very clever ways of dealing with that and then there's kind of the social problem of you know like we discussed how do you best communicate with all of these stakeholders like the companies make it easy for them to build these devices with the end users to make sure that they themselves you know aren't scared by having a camera take pictures of them because they don't know what it's going to do with it or listening to them 24 7. um and then there's actually putting it on the device itself you know like making sure that the device has enough memory to fit the network on um making sure you have the right sensors uh and then also one thing that's becoming more common now is people want to start training on the device and that's pretty difficult because uh in order to train a neural network you have to have gradients and as people probably know if they're familiar with machine learning like you can't really get like you need very small numbers to train this network and generally these systems are like eight bit or something so you really don't have the capacity to sort of manipulate small numbers and Float variables just because you know you're working with an 8-bit microcontroller it doesn't have that capacity so it really has to be done off the device and so there's clever ways around this so that one of them is called Federated learning so it's where you extract all the information from say like every say every person's iPhone all of their data gets sent up to the cloud and it kind of gets compiled into this huge data set which then is used to train a model which then gets pushed out to everybody else so it's kind of like your individual information is being used to train a large scale model which is then condensed and sort of re-uploaded but obviously you have to be Cloud connected for that to be a possible t right okay so that's really helpful and I think gives people a bit of a sense for like the challenge of this space there are so many kind of things that almost intrinsically this industry is one that likes big models it likes consolidation and as a result you know a bit of a bias in that direction whereas this Edge device stuff which is really important you know still has a lot of those Kinks to be uh ironed out and so we're about to jump into this conversations exploration of this new paradigm that you've proposed along with your colleagues this idea of ml sensors kind of a new way of thinking about what it means to for a sensor to do its work in the age of AI um so I wonder for people who are less familiar with the idea of sensing on-device uh sensing could you explain like what's the current on-device sensing Paradigm how is it currently tackled and then how is that different from what you're doing with ML sensors today

### ML sensors [12:30]

yeah so right now um the way that time and machine Learning Works um when it's deployed is you have sort of a model that's embedded on this microcontroller in some flash memory and there might be a few sensors on board so camera and the camera is a peripheral of the device so that is to say that it's like fully connected to the system and so that data is then being sent to the ram which is you know being used to interface with the processor so it's all on the one system so if someone has access to that system they have access to the raw data itself um you know they can manipulate and do whatever they want the really the Crux of the thing that we have proposed the ml sensor it's not actually um you know a defined piece of Hardware or architecture it's more of a logical framework that we've proposed um and so what really the Crux of it is a separation of the computation um from where the raw data is to where the processing occurs and so that's probably going to sound a bit confusing to people that aren't familiar with this kind of thing um and so maybe a good way of thinking about it is um passwords you know so for example if you uh there is a law in the U. S called PCI DSS which says if someone hacks into um well it says that you can't store passwords in plain text basically so on a database so if someone hacks into the database and they get this um you know list of all these passwords they're not actually the passwords they don't have the information already they have a hash of the password but this hash you can't remove you can't reverse it and so you can't extract the actual password from this hash and so even though the information has been stolen actually the user is still kind of safe you have to do a lot more stuff in order to extract that information and it's kind of analogous here if someone hacks into the system they could feasibly get the raw data but it's just so much more difficult that it's almost a waste of time and you'd have to be physically present to extract it from the hardware um so does this imply like a tight coupling then between the machine learning and the raw data collection like they're so integrated that it's kind of hard to think of the raw data in the absence of the analysis the machine learning on top of it yeah so the current way right now is there is the machine learning and the processing they're coupled together so they're occurring on the same device in the same memory um whereas what we would prefer is that there is this sort of separation of computation so in traditional memory architectures you have a concept of user space and kernel space um and basically all that means to those who aren't familiar is that the user shouldn't be able to interact with this kernel space because the user is you know the user and they can do they should be able to do the things um that they want to do but they shouldn't be able to effectuate the sort of Crux of the system the operating system itself and so there has to be this logical separation of memory like to see you just accidentally do something stupid for example yeah so you don't delete you know important file operating system right yeah so yeah and this we've kind of taken that and sort of applied it to this computation system instead we've said okay well what if we put had a sensor here which had some sort of processing power on it that was interacting with the raw data itself and then it sends information to the actual processor which then performs you know additional functions and so this ml sensor is a self-contained modular system you know you could imagine it like a circuit board or a um a little sensor that you go and buy from a shop but the point is it's embedded on that device and you don't need per se to be connected um in this logical way to the actual main processor you can give like a concrete example of like how the data might flow and say a real system yeah sure so uh right now we're trying to make an open source version of a person detector and so you know the application of a person detector is uh is there a person in this picture yes or no and if it's yes it would send a binary response that would be like a one and if it's a no it would be a binary response that's a zero and so an ml sensor in this case would all it would have is a power signal and a ground signal and the data output signal you don't even need a data input signal because the sensor is already on um the device and so on this sensor you would have a camera and you would have some kind of um processing capability perhaps and basically it would perform the computation at the level of the sensor so this is a truly data Centric device and it's sending the high level information which in this case is the one or the zero that says is a person present or not and it's sending that information to the broader device that the processor and so you know that processor doesn't know what you look like because it doesn't have access to the picture the image it only knows that there's either a person there or there isn't they can't take that information upload to the cloud work out how many people live in your house what their ethnicities are you know what your habits are and there's so many things that yeah you could do with this information that would make people uneasy and this is kind of a way of giving them the transparency and um the assurance that isn't happening you know because this device only has one capability and that one capability is to tell if there's a person or not in the picture so it's essentially it's almost like we're moving from this age of thinking of sensors as things like cameras that give us a bunch of raw data about the environment to sensors as like those things that have some kind of on-board intelligence that do some of the chewing the number crunching for us ahead of time and then the sensor what we get out of it is this like distilled thing this conclusion already for us to chew on later so essentially as you say like we're not sending all these pixels we're not sending you know Matt Stewart's face across the internet we're just sending oh like Matt is here or Matt is not here and that's the signal that's getting shared to the cloud or shared elsewhere that's kind of the idea yeah pretty much because you know inevitably some of these devices are going to have to be connected to the cloud but as long as there's some assurance that you know what data is being transferred and you know that it's not sensitive information then I think people would probably be okay with that I mean there is still concerns of you know what level of information is correct this is maybe a very simple example the person detector whereas with things like you know voice data or other information it maybe becomes a bit less obvious what the correct way is so there obviously has to be some sort of you know people have to have a dialogue about this and say what's the appropriate mechanism for each application and so we're not really sure at this point this is more of like a philosophy paper and discussion um yeah so I know we were discussing before about operating systems um before we started the call where you have this sort of time in the 80s and 90s where everybody was releasing these sort of philosophical papers of saying what's the role of operating systems in the modern day for humans what should it be able to do what should it not be able to do and now when we look at things like this user in kernel space it's just so obvious like oh well of course you wouldn't want the user to be able to delete the file system but you know back in those times it wasn't so obvious because no one had done it yeah and the same thing is kind of happening now we're saying what is the role of sensors in this world where we now have machine learning and in some sense it's a much more complex problem because well there's a lot more of things you can do with an operating system but there's also a lot of things you can do with machine learning it's you know like the fifth pillar of um of science essentially it really has you think about you know well as you say like what is a sensor what does it mean to send some thing and thinking about how to cleave reality at its joints in a way like there's so many different ways you could think about segmenting like this part is the sensor processing or whatever and uh and this it also ties into like the I guess the product strategy right so if you imagine being a company that like builds ml sensors I imagine one question you'd have to Grapple with is like what um what's the level of customizability for example of the onboard AI so for example

### Customizing the technology [21:30]

if we take a camera and we can turn it into a an ml sensor that does face detection by having an algorithm that does facial recognition on board or we could tweak that algorithm to make it do I don't know age prediction or something with a little bit of fine tuning so if you give the end user some access some ability to tweak the uh through transfer learning or other means the onboard AI I guess you know that feels like it might start to flirt with making it less of an ml I guess it's still an ml sensor because you still have the sensor thing on board but do you have any thoughts about that like from a almost a production standpoint do we get into like a vast permutation of like a kind of ill-defined sensor devices yeah I mean you you know those there are a lot of questions about that you know and we don't really know because right now there hasn't even no one's even built an ml sensor um you know some people think they have just because um you know maybe it's not the clearest definition um that's been created but yeah there's certainly a lot of things you can do with it and we're not quite sure what the role is yet of what this is going to be but we do know that this is probably sort of the right direction to be going in because this is sort of a privacy by Design security by Design um and if we do it correctly transparency by Design approach um and so one thing we've actually uh said is you know if you build these sensors um the ones that are most commonly used they should probably be open source so that's why we're trying to make this open source person detector because you could imagine you know every house in the future has their lighting system controlled by um a person detector because you don't need a light on in a room if you're not in the room and that would be a simple way to sort of uh reduce energy costs in a very obvious way and you could do the same with you know maybe your heating system turns on when you're in the house it turns off when you leave you don't actually have to physically do it yourself because sometimes you're going to forget um and so there's sort of those applications of it but you could imagine that there's yeah a lot more you can do and it gets very complicated when you start doing things like age prediction and then it becomes more like a traditional big ml system and you would have to have sort of 10 sensors coupled together um honestly I'm not really sure that we have proposed the idea of composability which is you know if you had this person detector you could theoretically link it to another device which does face recognition and you could use that for say you know letting people into their house like a smart front door lock I suppose right um but again we're not exactly sure how that would work functionally but it seems like it would be it's plausible and if you can do that with a privacy by Design approach where you're not you know having to get tens of millions of people's faces uploaded to the cloud I think people would prefer that generally yeah no that seems like a really big plus to this um one aspect that seems really interesting in this context too you talked about this idea of a data sheets in your blog post and paper so just like by the way a background I guess for everybody who's listening the data sheets are things that usually apply to

### Data sheets and ML sensors [24:45]

traditional sensors and I'm sure Matt will correct me if I'm wrong uh they basically give out uh basically performance specs for the sensor here's what it can do well here's what it can't do and so on and then there's this question of like how do you apply data sheets to ml sensors when there's some interesting data processing happening on the device um and what do those data sheets look like uh can you speak to that a little bit like how do you think about the data sheet ecosystem in this context yeah so I mean I could definitely go in a rabbit hole here I would say I mean I've worked a lot with sensors so I you know I've read a lot of these data sheets and I'm kind of familiar with what they um they contain but there's also a famous paper called data sheets for data sets and this kind of takes the idea of the sensor data sheet and starts applying it to data in general and then um then I saw a paper called the data nutrition label and that starts you know trying to get condense all of this information into sort of a smart label that is something you could communicate to a user so you can imagine if you're buying a TV and you see a symbol that says hey this has a camera on it that would be like a really useful piece of information um potentially because you might not be okay with that or you know you might be flattered by the transparency of the manufacturer and say okay they don't just want to steal my information they want to tell me and maybe you just be like I don't need that functionality yeah um and so in the data sheet we were like okay well how do we make this information sort of in all in a condensed place where we're adding this electrical characteristics that you get in a normal sensor with the machine learning model and the data set characteristics that you get in this sort of idea of data sheets for data sets and then also combine that with what should the end user care about and so we came out with this idea of okay maybe each ml sensor should have this data sheet but then obviously you get into the issue of okay well I could just lie about everything you know I could say this doesn't have a camera in it whatever like no one cares no one's going to read this um and so then we said well maybe there needs to be some kind of Regulation behind this especially as you know you're really going to start seeing in the next 10 years like cameras everywhere in your house it's already kind of happening with just like smart mirrors I've seen um I've seen like smart workout things which have you know like a bunch of cameras on they're looking at your poses and seeing if you're doing your like downward dog in yoga correctly um and so as these become more and more common I think people are going to start to have a bit more resistance and so they would really like to have this sort of information um and you know people haven't really started thinking about this right now and so you can imagine we could very easily get to a stage where it just suddenly is a massive problem and no one's thought about it and it's it would be a much better position to say okay we've already thought about this and we have this procedure in place um and maybe some mechanism in place as well for people to have some assurance and so you know there could be a third-party entity which does audits in the same way that you might audit an algorithm for bias or you might audit a company to make sure they're not you know cooking their books or something like that and that would give Assurance to these people that okay this sensor only has this functionality and it's been verified by somebody else so the manufacturer isn't lying and I know this data is safe and secure and protected and I think that's kind of probably the best we can do but I'm you know it's hard to say at this point one thing I'd imagine might be a challenge here too is like the carbonatoric explosion of possibilities just with like every possible combination of algorithms and Hardware um and then you'd have to presumably kind of measure their properties on a data sheet like this so how would you avoid like having to test almost like every sensor variety for robustness and performance characteristics before it's shipped is that sensical uh there's something called ml Perth at least which has is sort of a benchmarking framework and the group I work with they created or helped create a benchmark called tineml perf which basically is the same idea so you know take a bunch of standardized data sets and you say make this very specific model and we'll look at the performance of it and it can assess Hardware performance in terms of um you know Hardware characteristics the number of inferences per second or like energy consumption um per inference that kind of thing and so there's ways of benchmarking individual systems for the same task um in terms of like a manufacturer that's selling their sensor ensuring that every single one is correct you know that gets into sort of more of an engineering and quality assurance problem I guess um and so you know you could test a batch of them um every so many senses or something and you could send those to an auditor or you might just have a suite in the same way that you do a unit test on a software Library you might have a way of doing a unit test on one of these sensors and just make sure the accuracies within you know a certain band or something like that so I think it's definitely a feasible way to do it again if this is the best way or the um a suitable way I'm not sure because again this is a very new field and I'm just one person you know I'm sure if you made an FDA for algorithms they might have uh much more nuanced views than myself I was kind of more thinking you know what happens if you know if it's the case that a manufacturer says hey you know the end user can specify an algorithm that they want to slap onto this thing you know then the manufacturer might get into the business of produ like pumping out a whole bunch of different combinations of these things and perhaps more than they could you know test at scale but uh well I mean I think it's a to your point it's an engineering challenge more than anything um one thing oh so wait sorry so you do you mean like they could have one with like a hundred different networks on that they're selling at the same time yeah exactly yeah like I'm imagining you know a matrix of hey like do you want to like you know facial detector age detector or whatever and that's just for when you have a camera on board then you know you change the hardware and it's like now we have a whole bunch of other so especially if as I imagine things

### Customers with their own custom software [31:30]

continue you could imagine um customers actually installing their own custom software on these things in which case kind of assessing the performance characteristics of these things would well maybe the responsibility would then be delegated to the actual customer at that point because they're the ones with all the requests but uh anyway so many directions that this thing could go it's hard to pin it down that's pretty fascinating actually yeah um I know one thing we were talking about a lot was uh you know we've mentioned the model performance and the data sets but I'd say the probably the most important section of the data set that I kind of didn't really discuss is the end to end performance so if you have a person detector you might want to know okay well how does the accuracy depend on how far I am from the sensor because maybe I'm going to put it on a building and it's going to be looking 100 meters away and maybe it's going to be mostly operating at night time how does the sensor get affected by you know changes like that and so that would be the important section for those people and yeah you could imagine certainly like a manufacturer having a hundred different person detectors but they're probably going to pay for that in the sense of no one's going to know what to buy yeah you know something what's it called the illusion of choice or something you have two too many options yeah the embarrassment of riches in a way I guess the um yeah it all depends so much as well on like kind of the business model too like you know we just talked about this idea of like is it the company that's you know installing a bunch of pre-developed algorithms that they themselves can assure or is you know maybe it's a kind of developer tool if you will a platform where other companies can order things install their own algorithms and then uh or who knows even their own Hardware I have no idea this is the future how where this is going to go but it's it kind of shows I think the fact that you've identified this kind of um this division the set natural separation between the sensor and kind of on-device sensing and the rest of the world uh the moment that it kind of clicks it really clicks and it's like okay this is going to become a thing and we do have to worry about what goes on that sensor the Dynamics of like who's responsible for what components of it how do we assure performance and that sort of thing and then one thing that comes to mind in that context too is like adversarial attacks as well like do you have any sense of how you know adversarial AI obviously is done today do people find ways of defeating uh or deceiving AI systems big scale AI systems facial recognition that sort of thing do you think that that's going to change at all in the context of ml sensors does that introduce any New Dimensions that are relevant for adversarial attacks you know if I'm totally honest I haven't thought about that too much um it would be it certainly would be a problem I mean because adversarial attacks you can't really avoid against too much unless you do sort of smoothing characteristics within the model landscape itself you know make sure that someone can't just like put a certain mask on and pretend to be Jeremy or something like that and fool this facial recognition software um but yeah you know that would be certainly something that once people are designing the ethical implications or maybe in the data set themselves they could even test you know maybe there needs to be some kind of uh challenge that someone creates where they're like okay let's we're going to test all of these different um algorithms or models or sensors and we're going to see how well they perform when we're trying to you know screw with them and that could even be a section of the data sheet who knows um I mean the one we propose is just an example of a date sheet but that's not to say that that's that should be the final or you know all the sections are sort of finalized in that way uh but yeah that's a super interesting question and I wish I had a better answer no I mean again as you say it's early days and adversarial attacks were pretty kind of Niche field in any case the one thing that made me wonder about that was just like you know you imagine you embed the um the algorithm on the sensor essentially and now I would imagine it's easier for people to get access physically get access to the algorithm and then you get into extraction attacks basically like replicating the algorithm so that you can figure out what its weaknesses are um like is that something that would

### Access to the algorithm [36:00]

generally be easier if you know because the algorithm is physically accessible down to Downstream actors or is that is there a reason that's not the case yeah so I mentioned before with the like um the password example where you know you can take those hashes and you can run it through a hash table and you can you know it's going to take you forever and whatever but you could feasibly steal their passwords if you had the expertise and the time and it's the same with this if you really at the time and expertise you could probably you know do something finicky with a sensor and um screw with the internal structure of it but I think it would be difficult and there's probably ways of protecting against that I know we were discussing oh also you can also um imagine someone they take the uh the memory and they try and Flash like a new model on it which you know would screw with the right um with the model architecture especially if it's like a Cascade of effects like maybe it controls your lighting heating um so you could probably do some funky things with that um but yeah so um you could probably defend against that if you did certain things like for example I proposed that uh to the to these uh people that you could make it with ROM chips so you can't put any new model on it obviously there's limitations of that because you can't update the model either so if you do find a problem with it you're kind of just stuck with this broken uh model uh but yeah there's pros and cons like maybe the wrong way is the correct way or maybe uh maybe it's not it's kind of hard to say at this point actually one thing I wonder is sort of like look at the future of this Paradigm where do you see it being applied first like are there some areas where you see low-paying fruit where like okay ml sensors would you know probably be a good fit for this use case yeah so um I mean we really think that uh the two main ones are going to be keyword spotting and person detection and so that's kind of why we're making this open source person detection framework just because it's you know it's not really like it's a very simple easy to understand example and there's plenty of things you could do with it I've mentioned the like lighting and heating you could link that to other systems which maybe like count how many people are in a room and maybe that could be useful for you know fire codes or predicting other aspects like that so I think that would be the main ones there's a lot of Downstream things people could do with that if we released to open source but one thing we were discussing is you know there's downsides uh to this and like if you have very Niche applications maybe it makes more sense for a company to make their own system as opposed to sort of an open source ecosystem but then that leads to potentially problematic applications you know you could imagine you could use this same technology in the same way that you could use tiny ml um uh in a negative way to Target specific uh groups for example or ethnicities um things especially if you're putting this intelligent technology in things like bombs that only target specific types of people uh and you know that's just I don't want to say it's an inevitability but that's just something with any technology it can be used in negative applications so um yeah obviously I hope no one does that but it's possible so it's one of the challenges actually like you know we often have this discussion uh with some of the folks I work with on the National Security side where you're looking at you know what can actually be done we talk about weaponized drones things like that and you know this is it is frankly an inevitability as you say there's no incentive not to do these things and so it's a question of like how do we take the lead time that we have right now to look at these paradigms and say you know what are the principles of design that we want to inculcate what are the best practices what are the ethical questions that are likely to come up in the future and I think that's a big part of the reason why this work is so important why it's so great that you're already putting your finger on this and sort of exploring the Paradigm so thanks so much for exploring it with us here today and for sharing your thoughts this is a great third time around um by the way I just want to I want to ask I know you guys have a website right mlsensors. org is that it yeah that's um kind of like our working

### Wrap-up [40:30]

group uh website if you like okay great so if people are interested in the concept you know want to learn more about it that's a great place to dig into it some more and there's a great blog post as well that uh Matt published on hey on towards data science uh so we'll share that here uh too is there anything else you wanted to share um yeah so I'd say you know this is kind of the brain shot of uh Pete Warden who is uh one of the creators of tensorflow Lite micro which is how you embed these models on sensor systems and he also has several blog posts on this for people who are still like you know what is an ml sensor or why is this important even and then we have If people really want to get deep in the weeds we have an archive paper and we have a few other things coming out um sort of later this year um and so I'll probably share those on social media if people are interested awesome so everybody stay tuned for that and I'll share Matt's Twitter handle and a bunch of these links in the blog post that'll come with the podcast Matt thanks so much again thanks Jeremy
