SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

56:05

SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

Yannic Kilcher 21.06.2020 49 169 просмотров 1 533 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Implicit neural representations are created when a neural network is used to represent a signal as a function. SIRENs are a particular type of INR that can be applied to a variety of signals, such as images, sound, or 3D shapes. This is an interesting departure from regular machine learning and required me to think differently. OUTLINE: 0:00 - Intro & Overview 2:15 - Implicit Neural Representations 9:40 - Representing Images 14:30 - SIRENs 18:05 - Initialization 20:15 - Derivatives of SIRENs 23:05 - Poisson Image Reconstruction 28:20 - Poisson Image Editing 31:35 - Shapes with Signed Distance Functions 45:55 - Paper Website 48:55 - Other Applications 50:45 - Hypernetworks over SIRENs 54:30 - Broader Impact Paper: https://arxiv.org/abs/2006.09661 Website: https://vsitzmann.github.io/siren/ Abstract: Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions. Authors: Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, Gordon Wetzstein Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher

Оглавление (13 сегментов)

Intro & Overview

hi there today we're looking at implicit neural representations with periodic activation functions by vincent's it's Minh Julien NP Martell Alexander W Bergman David Beale and L and Gordon wetstein so this paper is a bit of a special paper if you're like me coming from like classic machine learning or deep learning things like this paper requires you to think around your notion of what it means to handle data and so on a bit and to think about data points and so on essentially what they're doing is they are representing signals such as images or sound or generally waves or point clouds they're representing these signals as functions mapping for example from their coordinates to their values and we'll see what that entails and they they're not the first ones to do this but they managed to do this very well using these new models called sirens which are basically neural networks that so siren spelled like this neural networks that have sine waves as their nonlinearities instead of like Ray lose or hyperbolic tangents and so on so it turns out that if you initialize these very carefully those can be made to capture these signals very well so that's the kind of high-level overview and we'll go through the paper in a bit of a fashion of someone that is not in this particular literature so this is not going to be like as in-depth or technical as usually because I myself am NOT super familiar with this kind of literature with the neural representations and so on so if you go at this paper from a machine learning perspective the first like you're going to be ultimately like super confused at the beginning so I'm going to try to kind of clear up read retrace my steps of my confusion okay so I love that this

Implicit Neural Representations

paper starts out we're interested in a class of functions Phi that satisfy equations of the form this right here they aren't we all like your a we are interested in a class of functions okay I've never you know particularly had many dreams about functions like this but so how can you look at this so we're interested in the relation between inputs and outputs this here is the function as you can see this maps input to output okay and we're also interested in its derivatives so here you go first second third derivative and so on so this function right here is what we're going to call a neural representation or an implicit representation it's called a neural representation I guess if it's a neural network that this function is so far so good you've seen this right you've seen this could be a data point and then could map it to a label or something like this since we're going to represent images you already know maybe against a generative adversarial Network where this here is the latent vector and then you have a neural network mapping this latent vector to an image right so this is going to produce an image this here is quite similar but not quite so in again I guess this here would count as the representation the continuous representation of this picture however in this case right here the function itself is the representation so in again what we do we learn this right here this function Phi we learn this from data such that if I plug in one particular vector I get one particular image and if I plug in another vector I get another image and the function always stays the same here it's going to be one function per image so each image the function is the image so how is a function at image you can simply if I have an image and it's made of pixels right so each pixel has an X and the y coordinate okay let's call that X 1 and X 2 the coordinate of that and each pixel also has a color value right which is three-dimensional so each pixel has a three-dimensional or G be color value so technically an image is a function from coordinates to pixel values okay so if even if this is my image is represented by a function then if I input any coordinates like 3 for that function should return what are the RGB values at that maybe it's like 0. 5 0. 7 and 0. 1 those are the RGB values at that okay so now the goal is to have this right here be a neural network where I have like a multi-layer perceptron and they I think they always use like a five layer MLPs so really simple neural networks and you simply input so here you have two input neurons where this here goes so one gets the three one gets the four then this travels through the network and at the end the network should output three output nodes and they should be like the point five point seven point one okay now again this network here is they now train this network to map input to output okay to map coordinates to values and this of course it is one particular image so you're gonna have one neural network per image now you might reasonably ask why do we do it like this right why don't we just save the image as the pixel values why do we need like a function mapping the coordinates to the pixels and that's a valid yes and the image is just one example of this but one advantage that you immediately get is that now you have a continuous representation so now you cannot not only do you know because if you store an image like this you only know its value at each of the pixel locations however if you store an image like this you know its value at any continuous in-between location right so you can ask the network what's the pixel value at three point two and four point one alright it will give you an answer and if the network is trained well it will give you sort of a an answer that makes sense that is what's the exact color at this sub pixel location right here now so far so good right so essentially this boils down to not really a machine learning problem in the classic sense but an optimization problem because all you have to do is you have to make the neural network match all input to all output there's not really a training and a test set right here namely your data set is going to be all the pixels in the image so each pixel in the image is going to be one data point because it's one so each pixel is XY to RGB okay and the way they train these networks you know at the examples of pixels the way they train it they simply sample a mini batch of pixels like this one this one this one they use that mini batch to train the network to do one step and then they sample another mini batch and so on you might sample the same pixels multiple times but ultimately what you want is sort of a continuous representation of the image that there this is not a new idea and this has been around and they cite a lot of literature where this has been around before so what their new thing is that they say these other representations so if you use a neural network in a classic sense like this and you do your training with the mini matches like this what you'll end up with is a bad image so if you then simply go right once you've trained the network you can take it and take your network and you can simply output each pixel location so you say okay now I'm going to reproduce this image using my network because if it's trained well it certainly give me back the positions at the pixels so you ask it what 0 what's 0 1 what's 0 2 that's 0 at 0 3 and you can fill in the picture and that usually gives you very bad outcomes or so they claim I mean I haven't checked it particularly but you can see right

Representing Images

here this is the ground truth and the here you have a network that is parameterised with rayleigh functions like with relu nonlinearities and as you can see the Rayleigh network misses a lot of the sort of higher definition things in the image and so it depends on the architecture that you use how well you can make a neural network represent those things again you kind of need to forget what you know about machine learning in the classic sense because like I'd still see people you're gonna go just use again or something like this so yes valid point but we're in the business right now of solving this particular problem and as we'll go on to see it's not just about images but images are a nice example of a natural signal so the 10h networks you also see they I think they fail even harder they have these artifacts back here even and this here it gets better when you do Rayleigh networks with what is called a positional encoding so not only do you have your X and your Y coordinates go through a rare loot Network but you also have them go through a positional encoding and that's very much like in like you would have in a transfer so if you watch my video about attention is all you need I explained how the positional encodings work there but basically what you do is you map these things to cosine and sine waves so you're going to be like it's the sine of X times 10 and then the sine wave of x times a hundred and so on so which you'll end up and you do the same for Y and that ends you up with more features that sort of then the function can use to represent positions way better than just given the x and y coordinates if you do that you kind of recover some of the image but you see here they also analyze how so this is the ground truth and this is the gradient of the ground truth which is basically a sobel filter if you know that it's basically an edge detector color gradient thing and then this here is the second derivative the laplacian of the image and ideally if your implicit representation models the signal very well it should also model the derivatives of the signal very well so now we're kind of connecting it to what we saw at the beginning right these siren networks are specifically designed to not only match the signal right here but also match its derivatives and if you match maybe in an image it's not so it's not that important to match the derivatives even though it is because there are small things like you can see right here the grass isn't as well represented and here you mostly you get some artifacts that you see here in the gradient might not be as important for images in terms of human vision but for many signals it's also important to match the derivatives and here at the siren even though it's trained on the image itself you can see that it's derivatives are very much in line with the original signal so simply by matching the signal you this architecture manages to also capture the derivatives of the signal and therefore have a more faithful representation okay so that was positional RBF ray lose are simply the rail network and I think somewhere in here there is an RBF kernel if you young kids don't know what an RBF kernel is then ya know I guess I don't want to dunk on anyone it's basically you how do I explain him you map it to an infinite dimensional space using Gaussian kernels yeah maybe Wikipedia is better at that than I am so sirens what do they do in order to be able to capture a signal very well what do how does it sing a siren

SIRENs

different from like an RBF Network and the answer is pretty pretty simple so the architecture of a siren network is the end does it all already stand for network I'm not sure honestly maybe we'll find out yes it's the sinusoidal representation networks so the N is networks so we don't say siren Network we say siren and a siren is simply made of what is that here it's a multi-layer perceptron basically right so it is a this here is the network this is the final layer of the network which is a linear layer before that you have all these layers just not concatenated but following each other so it's a multi-layer perceptron pretty regular and each of the layers in the multi-layer perceptron is made up like this you have an input you multiplied by a weight matrix you add a bias and then you put it through a sine wave so the sine wave here is really that's the only change from a multi-format MLP otherwise so usually here you have something like a sigmoid or a rayleigh function now you have a sine wave and the I mean it's a bit weird right because a function is like this so it has this Center thing where it kind of switches but here it's linear and monotonic and here it's kind of constant and even a sigmoid so the sigmoid is don't remember like this yes I guess so the sigmoid is like this so it's kind of constant here monotonic and so on we're used to monotonic activation functions where as a sine wave is really different the sine wave of course is something like this right where it's not monotonic at all like if you want to increase your function value at any point and you're here and you go up the hill and you do a step that's too large you end up down the hill again but it turns out that these networks have particularly have some good properties if you want to capture natural signals and they have some bad properties namely that the fact that they are periodic and go down again and the reason why they get around the bad properties is because or so they claim they initialize the network in a very particular fashion because I think at least I when I started in deep learning I had this idea so a lot of other people must have had this idea too of like hey what if I just replaced a non-linearity with like a sine function could I do something this and then tried it out and it didn't really work so I scrapped that now this here of course isn't simply replacing than this Network it's also using the neural network for something completely different than I would namely it's using the neural network to learn these implicit representations and not like I would to do simply for learning a data set but still it seems like you need to

Initialization

initialize them fairly with very careful consideration and we'll go on to that right now so actually they just described it it's not like a it's not very interesting but you need to sample the weights uniformly from this uniform distribution where I think yeah and they have a proof in the supplementary material where they sort of show why that is so or not here we propose to draw weights with C equals six such that W is in this uniform distribution right here oh no it's different okay this ensures that the input to each of the sign activation is normal distributed with a standard deviation of one since only a few weights have magnitude larger than pi the frequency throughout the sign network grows only slowly finally we propose to initialize the first layer of the sine network with weights so that the sine function spans multiple periods over negative 1 to 1 we found w0 to equal 30 to work well for all the applications in this work the proposed initialization scheme yielded fast and robust conversions using the atom optimizer for all experiments in this work so the initialization here takes a fairly prominent piece in that paper which tells me maybe that they have spent a lot of time working on this and this is I mean if this is the case this is to their credit because I guess most people like me would try out something like this and then after a while realize it doesn't work and to you know be so convinced and to go and really figure out how do we need to initialize these to make it work and of course as you're doing this there's still like a 99% chance that it's not going to work once you've done that is quite respectable I find it might have been really different this might have been the first thing they thought about and just worked it out but

Derivatives of SIRENs

yeah okay so what is the deal with all these derivatives now since this network right here has these sine waves in it right so it's a neural network with sine waves as derivatives as nonlinearities what now so we have a neural network what now is the first derivative of that neural network right with respect to its input so we have an input now what's the first derivative with respect to its input and the cool thing about this is what's the first derivative of a sine wave it's of course a sine wave that's shifted so it's a cosine which is a sine wave that's simply phase shifted and then the next derivative again is a shifted sine wave and so on so the derivative of a siren is a siren and that does not hold for any of these other nonlinearities so in Ray lose it's the derivative of a rail network is like a con so if we if I take the derivative of this it's like a constant zero right here and then a constant one right here and if I then take the derivative again it's simply a constant zero function right and all these other nonlinearities their derivatives are different from themselves and here since we want to not only match a signal but also the signals derivatives these property of this siren become in very very handy so how do you train a siren we've already alluded to how you would do that in the kind of idea of matching an image where you simply train the pixel values to the RGB values but there's more that you can do with the sirens given that they basically given that their derivatives are also sirens what you can do so with the image part we've basically neglected all of this we simply said we want to find a relationship between the input X and the output like this what we can also do is we can say no we want to find a relationship between the input and its first derivative and not even have this as part of the let's say of the loss function and then we can see what comes out so that's what they do oh can I find it right here okay so here you

Poisson Image Reconstruction

what they do right here okay so here you see the ground truth image and this is its gradient laplacian okay now we've already seen that we can fit the image itself but what if we just fit the first derivative so we simply input this thing right here we input this into the siren we do the same thing right the siren is now it maps x and y to our GB but our loss function isn't going to be mapping x and y to RGB our loss function is going to depend on the gradient of that so our loss function is going to be something like the gradient of the image that's called the image I minus the gradient of that function that map's X of this function right here okay because we have these auto differentiation tools right now we can easily make this into a loss function so here we are looking for the function whose gradient matches the gradient of the image right now again you can say why is this why can't we just match the image itself and I validate it's not about why can't we just it's about demonstrating the power of these networks so if you only match the gradients right what you'll find is if you then look at the function right you still find the function you don't find the gradient you still train the function weights of the function itself but the loss function depends on the gradient of that function if you do that you'll find that if you then look at the function again you can ask the function to produce the image by simply cycling over each of the coordinates you'll find that look at that just by matching the gradient you'll match the image itself pretty well right and that's pretty cool um now of course you're not going to match the RGB values this is a grayscale image and you know there's a there's kind of a reason for that because since the gradient losses like constant bias information so what if you'd match an RGB image I'm gonna guess you're going to have like a color very much color distortions but and here what you're going to have in this case is just distortions in luminosity like if you know that if you have a function if you have the derivative of a function and you want to find the function itself and you integrate then the solution is always an entire space of functions because you will integrate the function this thing right here and so with the whatever its input is and you have to add a constant and you don't know what the constant was in the original function because when you derive the function the constant drops away so similarly here what we'd expect is that the image that we're getting back will be faithful with respect to like it's it's borders right since we're matching the gradient and the gradient is basically an edge detector will match the sort of edge information of the picture which you can clearly see but what we would expect is some difference in overall luminosity and I don't even know how they exactly did this because they now have to choose a constant to add maybe they just chose it in some way or maybe they just let the network do but this is you know still pretty impressive you can see there's some detail missing but not much and the same exact same thing you can do for matching the second derivative so now you match the laplacian of the image and remember in the rail networks they don't even have a laplacian it's a constant so this is something you could never do and you can see that the outcoming image is still pretty good right this or this is now missing the constant luminosity in the first and second derivative sorry in the 0s and first derivative and still the information is the reconstruction is pretty good all right so these demonstrates kind of the power of these networks again we're not having our data set our entire data set is just this image so if we fit something then this thing right here is our entire data set there's no big data set and this is a test sample like this is the data set and the test sample at the same I guess you can consider the laplacian here the data set and then the actual image is the test sample like the label or something like this so what

Poisson Image Editing

does that buy you here is a thing you can do if you want to mix two images what do you do so if you want to mix this and this what you could do is linearly interpolate but that would be not very cool because right here you have a lot of like very bright pixels which probably have like values of 1 and here you'd have the dark pixels which probably have values like more or close to zero and the if you simply mix them if you simply add them together / - then you'd get kind of get a wash of the two and similarly here you kind of wash out the bear because you'd have some pixel values here that would come over and generally not a good idea to mix images like this now you know with Ganz we can do this but we have to have like a training data set and so on here what we'll say is we'll simply say we'll take the gradient of this and then we'll add the two gradient maps now what does thus is that as you can see right here on the left is the composite gradients and what this does is right here in the sky there is no gradient information in this image because it's just a flat patch of sky right so and down maybe down here there's not that much gradient information there is a bit right but not here so that's where this bear head is and if you want to mix images like it can be a good idea to mix their gradients because generally the information in an image is where the gradients are so what we would expect the gradient to represent the gradient would carry over this portion it would maybe carry over a bit of this portion it would carry over this portion and this portion so everything where the signal is not flat so here you can see the composite gradient and if we fit again we fit our function such that the gradient of the function that we fit matches this mixed gradient right here then this is the gradient of the function that we match and this is the actual function and you can see pretty good right it basically mixed everywhere where there was gradient and this is now just reconstructed from this gradient there is no I think there is no at least as I understand it there is no pixel information carried over from either of those images they are simply added to this gradient the gradient is fit and then the function is asked to output pixel value at each location and that's that okay so this is just a you know thing that you can play around with but they do other more interesting things right here for example this representing shapes

Shapes with Signed Distance Functions

with signed distance functions so if you go over the formulation the actual formulation of their loss function we haven't actually done this right quite yet it's here it's very complicated Lee stated but ultimately what this means is so a component right here is are these CM which are constraints so this loss function operates on these constraints and the constraints are across a of X which basically is just X it's kind of a the anything depending on the input itself then the output of the function the gradient of second derivative third derivative and so on so this these sirens can fit anything that you can formulate as a set of constraints that relate the input of the function right here to its output or any of its derivatives and we've already seen that at once we if we fit an image our only constraint is that these things match right here with the original image that the coordinates are mapped to the RGB values then when we match the gradients we don't care about this we only care about the relation between this and so on so the loss function is literally just over the entire signal space which in our case was over the entire image we want these constraints to hold or to be as small as possible or the constraints are always formulate such that if they are fulfilled they equal zero and so the for exam the l2 loss between the RGB values of the true image and the RGB values that you fit the RGB loss sorry the l2 loss would be a constraint like this and of course the more differentiable you make it the more the easier this network has at fitting it right so that's why there is this norm right here but it's not that complicated it simply says whatever you can formulate as a constraint on relating the inputs to the outputs or any of the derivatives of this implicit representation that is the loss function alright so the in the next interesting thing we can do as I said is representing shapes with signed distance functions so we're going to go slowly and this is yeah it's not that hard inspired by recent work on shape representation with differentiable signed distance functions as DFS we fit s DFS directly on oriented point clouds using both Rayleigh based implicit neural representations and sirens ok so what is an S DFA signed distance function that's pretty easy a signed distance function is simply a distance function with a sign like wow so a if you have a and it's usually done if you have like a boundary somewhere between things then of course any point here has a distance to the boundary but you if you have a signed distance function it simply means that each point also has a sign in front of it and that means all the things on one side of the boundary maybe have a plus and all the things on the other side maybe have a minus so even though two points could be the same distance from the boundary one is like plus five away and one is negative five away and you can do this is useful for example when you fit point clouds as they do in this example so when they have point clouds and that's usually in 3d space but if you have point clouds you basically have points right here and you know that the points should represent some kind of shape maybe a wall or so they have these room interiors as you can see right here so this is a 3d scene but you only have a point cloud of the 3d scene and what that means is that maybe you were in this room and you put up a laser scanner right here laser scanner I don't have no clue how a laser scanner looks and the laser scanner kind of shoots lasers at random locations and always measures the distance right and that's how you end up with a point cloud so you'll end up with like a point cloud where in 3d space you know where the laser hit something and the reasonable assumption to make if you have like a dense sampling of this is that you should be able to like connect those point clouds in some way to obtain the actual continuous shape of the thing that you measured and this is what we're going to try to do with these sirens right to go from point clouds to shape by training and implicit representation so we're going to train a neural network that represents this shape right here basically by mapping coordinates to two signed distance values so whenever we ask the neural network at this location here what's the signed distance and it's going to tell us oh it's plus 5 or at this location here what's the scientist and it's going to tell us oh it's zero right so we're going to train a neural network to do that and hello yes No ok so this is a bit more complicated and since we have these awesome power of the sirens we can also do two more constraints so we know and this goes on this amounts to solving a particular icono boundary value problem that constrains the norm of spatial gradients to be one almost everywhere so this icon abound revalue problem this is a property of signed distance function that the norm of the gradient with respect to the input is 1 almost everywhere means everywhere I guess except at the boundary itself where the distance is zero so I could be wrong note that rail networks are seeming seemingly ideal for representing s DF s as their gradients are a locally constant and their second derivatives are zero adequate training procedure for working directly with point clouds were described in Prior work we fit a siren to an oriented point cloud using a loss of the form and now we look at the loss so the first thing you observe in the loss is that it is made of three different integrals and that simply means they now partition the space right here they partition it into two different regions so to say so we go here no can i zoom here so the first region is going to be whatever is on the boundary itself right and that's basically wherever a point whatever a point hit right whenever you have a point or on the boundary itself that's going to be your Omega zero is going to be that and then all the other points right here are going to be part of your Omega without the Omega zero so you're going to have different constraints for all of these things right here for example and I'm have to pay attention that I don't say anything wrong you will have this constraint of this gradient my tablet I mean maybe I'll start monetizing just so I can get a new tablet okay so no okay the this condition right here says that the gradient should be one and that's actually everywhere right so I was wrong that the gradient is only one outside the boundary then you can see right here the last part is all the points that are not on the boundary since right our network Maps any point in 3d space to assign distance function so most of these points aren't going to be on the boundary itself even though in the mini-batch where we train where they train the sample points on and off the boundary at the at equal rates just to have the network train more stable so this is a condition on all the points off of the boundary and they say here this function is this exponential function with alpha larger than one it penalizes off surface points for creating SDF value is close to zero so this is simply a regularizer that says whenever i input coordinates that are far away from the boundary from the surface then there should be a large signed distance function like it should not be close to zero because it's away from a boundary okay and in practice how you're gonna train this is if you have a point cloud if your coordinates are far away from the next point then this is going to be a high this should be a high value otherwise the network is penalized so we have this condition right here on the gradients which we know signed distance function should fulfill we have this thing right here which is a regularizer basically telling points far away from our data that they should have a high distance function and then we have this last thing right here which is for all the points on the surface itself here's what we'll what we require first of all we require there you to be zero or close to zero right this is the lost functions that we want to minimize this and this is simply the output value so the signed distance function of points on the surface you know the things we actually measure they should be zero right because the signed distance function measures how far away from the surface you are so this is pretty intuitive but then also this right here it says that the gradient of the signed distance function and the normal vector of that point should align and that basically means and this is now I think this is because we have an oriented point cloud or no yes so what we can do is we can kind of connect points next to each other and then calculate the normal vectors of that right and the signed did the network if we ask the network hey what do you think about this position right here the network should tell us first of all the signed distance function should be zero because it's on the boundary second of all the norm of the gradient of the signed distance function at that point should be one because that's a property of signed distance function and third and that's the thing right now the gradient of the signed distance function should align with this normal vector right and that's you know pretty intuitive because you want the signed distance function to increase in value the gradient basically tells you where the highest increase in value of the function is you want it to increase along the normal direction and not along any other direction so that's a pretty good constraint to have so you can see right here I mean you don't really have to understand exactly about signed distance functions and so on but these sirens are pretty good at capturing all of these different constraints and this was a point you know on the surface points off the surface additionally say hey you should have a pretty high valley and actually not zero value but a pretty high value so and again we only fit one particular scene we only ever fit one scene with an entire network so the entire neural network did this in this whole structure right here everything is captured by this neural network that we trained on the point cloud and you can see that if you use a ray loo what you'll get is this super wobbly because if even if you train the rail you with the same loss function these constraints on the gradients they're just not gonna work out with the relu because the gradients are like constant and discontinuous right whereas the siren can basically fulfill all of these constraints on the different parts like on the values and on the gradients of that of the loss function and they have another example right here where they fit this shape yeah so you see all that the details are preserved way better where the Ray lose they'll simply kind of flatten over everything and make it wobbly alright so I hope this sort of made sense and we'll go to the last thing right now

Paper Website

but is restarting I want to show you the website right here they have for this it's a pretty cool website to go along with it and as you can see right here they have all these samples that they have in the paper but also in animated format in as you can see right here this is the fitting process the learning process of how you represent these images so as I said there you want to fit these functions to the ground truth and that happens in steps so this is very much like you would learn a deep learning functions that I think they use the atom optimizer it's just that the data set now come all comes from this one ground truth image and you can see that the siren Network on the right pretty quickly zeroes in on the image and then gets the details subsequently right they also represent audio with this and you can watch that they represent video compare that to Ray Lu representations then here solving the Poisson equation is where you only fit the gradients or the laplacian of an image and still get out the good image that's pretty cool and here you can see that you can actually play around with these things so you can click on them and look at this learned thing so on the left you can see what the siren Network learned and I must scroll down here a bit and on the right is a ray Lu representation of the same thing so this is the same network with the same objective it just has Ray Lewis instead of sine waves as activation functions so you can see how much of a difference that makes right here and the middle is array with the positional encodings still not good right the only thing right here that you have to think of if you look at how big these sirens are how many parameters they have they're about at the order of magnitude of how many pixels there are in the image so I'm yeah it's certainly a method but - like these it's not like your the implicit representation here is very well at generalizing though it would be very cool to see what happens outside right if you because now you have you can input any XY coordinates so technically you'd you could continue the picture to the bottom and just see what the siren thinks should be here at the bottom so all of these things would be pretty cool to actually experiment with and they have the code available to do that and you can see the feeding process of the Helmholtz equation right here and related projects a pretty cool website I definitely value to check it out and let's go back to the paper and we're back and my tablet

Other Applications

crashed and let's continue so they're now going on to use sirens in order to solve PD es and so in physics often you have these problems where you are given an equation but the equation doesn't necessarily involve a function itself but only involves derivatives of that function like or relates derivatives to the function and so on so one example here is this Helmholtz equation that's given as this where the I think the F is a known function but this is the wave field we want to you want to get you want to figure out which is unknown and then this H M is including for example this right here which is the Laplace operator so you're given the relation between the function and a Laplace operator of the wave that you want to find out and your task is to recover the wave now I don't want to go very much into this right here but what you can do is basically you can measure you can have a room and you can have measurements of the way four of its derivatives and so on and then you kind of calculate backwards from the measurements to what the actual wave was and these sirens turn out to be very good at things like this and I guess that's in this solving for the wave field things but essentially what this amounts to is a numerical salt and numerical solution of these partial differential equations in physics using these sirens and that's pretty cool and

Hypernetworks over SIRENs

the last thing they do is and this gets back to a more of the machine learning context where they say learning a space of implicit functions so now they go ahead and say yeah so we can represent images in terms of these functions right but each image is basically its own function so an optimization a fitting problem can we somehow learn functions of functions so this goes this comes now back to more of a machine learning context where you say so I have a network right here that that gives me the parameters of the siren so this right here is okay let's go to an example in this example what you'll have is you have an image like this one where a few pixels are masked actually most of the pixels are masked and you want to put this into a CNN and the CNN should output the parameters of the siren network so the parameters because the siren network given its parameters is the image itself so that's the siren a set siren Network the siren is the image if you know its parameters right so here you train a CNN to give you the parameters of the siren that's almost the same as training a CNN to give you the image directly but again we don't want to have the explicit representation of an image we want to have the implicit representation such that it's continuous and we can manipulate it and so on so the CNN is now trained on a data set so you take C 410 and you construct a whole bunch of images with only kind of a hundred pixels remaining and then you train a CNN to give you the parameters of the siren that would reconstruct the ground truth right and then you can test that on the test image and you can see right here the results are pretty good so these are test samples these are now images that were not seen during training of this CNN and therefore the outcoming siren also hasn't seen that image it's the siren is simply parameterised by the CNN you can see this works pretty well so even if you don't have ten pixels you already get something out of it right and if you have a hundred pixel you already get fairly close to the ground truth right here now this is not gam quality images of course but it's pretty impressive to see that an implicit parameterization an implicit representation of the images can be so powerful right yeah so this is a pretty cool thing and again it's better than its it's kind of more back to the machine learning framework that you're used to because there's a train and the test data set and now the only thing is that the output is a function given by its parameters and not the actual pixel values okay so that's

Broader Impact

actually let's look at the broader impact statement the proposed siren representation enables accurate representations of natural signals such as images audio and video in a deep learning framework this may be an enabler for downstream tasks involve such signals such as classification for images or speech-to-text systems for audio such applications may be leveraged for both positive and negative ends Saturn may in the future further enable novel approaches to the generation of such signals this has potential for misuse in impersonating actors without their consent for an in-depth discussion of so-called deep fakes we refer the reader to a recent review article in her neural rendering this has like no perplexity at all like is anyone benefited by this seriously okay but at least we made the authors think of the consequences of their research yeah so I invite you to check out this paper maybe with this right now you can follow a bit better what happens here this is a different paradigm of research it's a cool paradigm it's a way from your usual machine learning framework and yeah so I'm excited what happens next in this I also invite you to check out the websites they have lots of videos and goodies and so on and with that bye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник