2
adversarially generated images based on double exposures o f p i c t u r e s o f m a s k s f r o m b o o k s o n m a s k s from all over the world
Michael Etzensperger cpress
Generative Adversarial Net works (GA Ns) I can do a lot of really cool things people have used them for all kinds of things things. Li ke, you k now, you d raw a sketch of a shoe a nd it will render you an actual picture of a shoe or a ha ndba g. T hey’re fa irly low resolution right now, but is ver y impressive t he way that they can produce real quite good-looking i m a g e s . You c ou l d m a k e a n e u r a l n e t w or k . T h a t’s a c l a s s i f ie r, r i g h t? You g i ve it l ot s a n d lots of pictures of cats and lots and lots of pict u res of Dot. Dogs a nd you say you k now, you present it with a picture of a cat and it says it out put s a nu mber. L et’s say bet ween 0 & 1 & 0 represents cats and one represents dogs. And so you give it a cat and it puts out one a nd you say no t ha t’s not r ight shou ld be zero and keep training until eventually it can tell the difference right? So somewhere inside that. Network its it must have formed some model of what cats are and what dogs are at least as far as images of images of them are concerned but that model really you can on l y r e a l l y u s e it t o c l a s s i f y t h i n g s . You c a n’t s a y Okay draw me a new cat picture drew me a cat picture. I haven’t seen before it doesn’t know how to do that. So quite often you want a model that can generate new samples you have so you give it a bunch of samples from a particular distribution and you want it to give you more samples which are also from that same distribution. So as to learn the underlying structure of what you’ve given it a nd t ha t’s k i nd of t r ick y. Act u a l ly. T here’s a lot of Wel l, t here’s a lot of cha l len g es i nvolved i n t ha t but let’s be honest . I don’t t h i n k a s a hu ma n you ca n f ind t hat t r ick y, you k now, i f I k now what a cat look s l i ke but bei ng nau ght y.
Greatest a r t ist s i n t he world. I’m not su re t hat I cou ld d raw you a decent cat. So, you k now, this is this is not confined to just Computing. I s it t h i s? Ye a h , t h a t’s t r u e . T h a t’s rea l ly t r ue. But i f you t a ke L et’s do l i ke a rea l ly simple example of a generative model say you you give your network one thing it looks like this and then you give it another one you like these your training samples looks like this is another one that looks like this and then what are those dogs? A re your systems instances of something on two Dimensions? Ye a h . I m e a n r i g h t n ow it’s l it e r a l l y j u s t d a t a w e just it doesn’t matter what it is. Just some yeah, these are pieces of data points. And so t hese a re t he t hings you’re g iv ing it a nd t h e n it w i l l l e a r n you c a n t r a i n it . You w i l l l e a r n a model and the model it might learn is somet h i n g l i ke t h is r ight? It’s f ig u red out t ha t t hese dot s a l l l ie a lon g a pa t h a nd i f it’s model was always to draw a line, then it could learn by adjusting the parameters of that line. It would move the line around until it found a line. That was a good fit and generally gave you a good prediction. But then if you were to ask you this model. Now make me a new one. Unless you did something clever what you get is probably this because that is on average the closest to any of these because any of these dots you don’t k now if t hey’re going to be above or below or do you know to the left or the right? T here’s no pa t ter n t here. It’s k i nd of random. So the best place you can go that will minimize your error is to go right on the line every time but anybody looking at this will say wel l t ha t’s fa ke. T ha t’s not a plausible example of something from this distribution, even though for a lot of the like error functions that people use
when training networks this would perform best . S o it’s t h is i nterest i n g sit u a t ion w here T here’s not ju st one r ight a ns wer, you k now, genera l ly spea k ing t he way t hat neural networks work is your training them towards a specific you have a label or you h a v e a You h a v e a n ou t p u t a Ta r g e t ou t p u t and you get penalty the further away you are from that output. Whereas in an application l i ke t h is, t here’s a f fected. T here’s basically an infinite number of perfectly valid outputs here. But so so to generate this what you actually need is to take this model and then apply some Randomness say t hey’re a l l w it hin, you k now, t hey t hey occu r randomly and they normally distributed around this line with this standard deviation or whatever but a lot of models would have a hard time actually picking one of all of the possibilities and they would have this tendency to kind of smooth things out and go for the average. Whereas we actually just want ju st pick me one. It doesn’t ma t ter. S o t ha t’s pa r t of the problem of generating adversarial training is a is a way of training. Not just networks actually way of training. Running s ystems w h ich i nvolves focu si n g on t he s ystem’s Witnesses. So if you if you are learning, let’s say let’s say you’re tea ch i n g you r net work to recognize handwritten digits. The normal w a y you w ou l d d o t h a t . You h a v e you r bi g t r a i n i n g s a m p l e of l a b e l e d s a m p l e s . You’v e g ot a n a r r ay of pi xels t ha t look s l i ke a 3 a nd t hen it’s labeled with three and so on. And the normal way that you were training at work with this is you would just present all of them pretty much a random you present as many ones as 223 s and just keep throwing e x a m p l e s a t it . W h a t’s t h i s? Ye s , you g ot t h a t right now you got that wrong. It should
really be this. And keep doing that in the system will eventually learn but if you were actually teaching a person to recognize the numbers if you were teaching a child, you wouldn’t do that. Like if you if you’d been teaching them for a whi le presenting t hem a nd you k now, get ting t he response in correcting them and so on and you noticed that they can do, you know with w it h 23 45689. T hey’re get ting li ke 70 80 percent, you know accuracy recognition rate, but one in s e v e n , i t ’s l i k e 5 0 /5 0 b e c a u s e they keep any time they get a 1 or a 7 they just g uess because they can’t tell the difference bet ween them. If you noticed that you wouldn’t keep training those other numbers, r i g h t? You w ou l d s t op a n d s a y w e l l , you k n ow what, we’re just going to focus on one a nd seven because t h is is a n issue for you. I’m going to keep showing you ones and sevens and correcting you until the error rate on ones and 7s comes up comes down to the error rate t h a t you’r e g e t t i n g on you r ot h e r nu m b e r s . You’r e focusing the training on the area where t he st udent is fa i l i n g a nd t here’s A ba la nce t here when you’re teaching. Hu ma ns, because if you keep relentlessly focusing on their weaknesses and ma k ing them do stuff they can’t do all the time. They will just become super discouraged and give up but neural net work s don’t have feel i n g s yet . S o t ha t’s rea l ly n ot a n i s s u e . You c a n j u s t c ont i nu a l l y h a m m e r on the weak points find whatever t hey’re hav ing t rouble w it h a nd focus on t hat a nd so that behavior and I think some people have had teachers where it feels like this. It feels li ke a n adversa r y, r ight? It feels li ke they want you to fail. So in fact, you can make them an actual adversary if you have some process, which is genuinely doing its best to make the network give us high and error as possible that will produce this effect where if
it’s spot s a ny wea k ness, it w i l l focu s on that and thereby Force the learner to learn to not have that weakness anymore, like one form of adversarial training people. Sometimes do is if you have a game playing You m a k e it p l a y it s e l f a l ot of t i m e s b e c a u s e Full-time, they are trying to look for weaknesses in their opponent and exploit those wea k nesses. A nd when t hey do t hat, t hey’re forced to then improve or fix those weaknesses in themselves because their opponent is exploiting those weaknesses. So Every time the every time the system finds a strategy that is extremely good against this opponent the t he opponent w ho’s a lso t hem ha s to lea r n a way of dealing with that strategy and so on and so on so as the system gets better it forces it sel f to g et bet ter beca u se it’s cont i nuou sly having to learn how to play a better and bet ter opponent . It’s qu ite it’s qu ite eleg a nt , you k now, t his is where we get to generative adversarial networks. L et’s say you’ve g ot a net work you w a nt to L et’s say you w a nt ca t pict u res. You k n ow you w a nt t o b e a b l e t o g i v e it a b u n c h of pictures of cats and have it spit out a new picture of a cat that you’ve never seen before that looks exactly like a cat the way t hey’re degenerative adversa ria l net work work s is it’s t h is a rch itect u re w here you actually have two networks. One of the Net work ’s is t he d isc r i m i na tor w ho ha s l e s s s p e l l i n g. Ye a h . I l i k e t h a t . T h e d i s c r i m i n a t or net work is a classif ier, right? So straightforward classifier you give it an image and it outputs a number between 0 and 1 and your training that in standard supervised lea r n i ng way. T hen you have a A nd t he generator Is usually a convolutional neural network. A lt hou gh act ua l ly, bot h of t hese ca n be ot her
processes, but people tend to use neural networks for this and the generator you give it some r a ndom noise a nd t ha t’s t he r a ndom. T ha t’s w here it g et s it s sou rce of R a ndom ness so that it can give multiple answers to the same question effectively some random noise and it generates an image from that noise and t he idea is it’s supposed to look l i ke a ca t . S o t he way that we do this with a generative adversarial network is is this architecture, whereby you have 2 net work s play i ng a g a me ef fect ively. It’s a compet it ive g a me. It’s a d versa r ia l bet ween t hem. A nd i n fa ct , it’s a ver y si m i la r k i nd of game to the games we talked about in the prev iou s t he a lpha g o v ideo, r ight? It’s a m i n /ma x game because these two networks are f ighting over one number. One of them wants the number to be high one of t hem wa nts to k now. No. And what that number actually is is the error rate of t he discriminator. So T he discriminator wants a lower rate. The generator once a higher rate. The discriminators job is to look at an image, which could have come from the original data set. Or it could have come from the generator and its job is to say yes. T his is a rea l ima ge or k now, t his is fake and outputs a number between 0 and 1 like 1 483 0104. It’s fa ke for ex a mple. A nd t he generator gets fed as its input just some random noise and it then generates an image from t ha t a nd it’s rew a rd, you k now, it’s t r a i n i n g is pretty much the inverse of what the discriminator says for that image. So if it produces an image, which the discriminator can immediately tell us take it gets a negative reward, you k now, it’s t ha t’s it s t r a i ned not to do t ha t . If it manages to produce an image that the discriminator can’t tell is fa ke.
T hen t ha t’s rea l ly g ood. S o you t r a i n t hem a re i n a c yc l e ef fe c t i v e l y. You you g i v e the discriminator a real image get its output. Then you generate a fake image and get the discriminator that And then you give it a real so t he It’s a lter na t i n g rea l i ma g e fa k ing is rea l ima ge fa ke ima ge. Usua l ly, I mea n there are things you can do where you train them at different rates and what happened by default. It doesn’t generate to get a n y h e l p w it h t h i s a t a l l , or i s it p u r e l y? Ye s . S o if you this is this is like part of what makes t h is especia l ly clever. Act ua l ly, t he generator does get help because If you set up the Net work ’s r ight , you ca n u se t he g r a d ient of t he d i s c r i m i n a t o r. To t r a i n t h e g e n e r a t o r S o w h e n? I know you done backpropagation before about how neural networks are trained its gradient descent. Right? And in fact, we t a l ked about t his in li ke 201 4 pure. If you’re a you’re a blind person climbing a mount a in or you r it’s rea l ly f u n ny a nd you’re cl i mbi n g a m ou nt a i n . You c a n on l y s e e d i r e c t l y w h a t’s u n d e r n e a t h you r ow n fe e t . You c a n s t i l l climb that mountain if you just follow t h e g r a d ie nt . You j u s t l o o k d i r e c t l y u n d e r m e , which way is t he you k now, which way is the ground sloping. This is what we did the hill c l i m b a l g or it h m . E x a c t l y. Ye a h , s om e t i m e s people call it hill climbing. Sometimes people ca l l it g r a d ient descent . It’s t he sa me met aphor upside dow n ef fectively if we’re climbing up or were climbing down your training them by gradient descent, which means that You’r e n ot j u s t you’r e n ot j u s t a b l e t o s a y. Ye s , t h a t’s g o o d . No, t h a t’s b a d . You’r e actually able to say and you should adjust your you should adjust your weights in this direction so that you’ll move down the gradient.
R ight, so genera l ly you’re t r y ing to move dow n the gradient of error for the network. If you li ke. If you’re t ra ining at your t ra ining thing to just recognize cats and dogs. You’r e j u s t m ov i n g it . You’r e m ov i n g it d ow n t h e gradient towards the correct label. Whereas in this case. The generator is being moved. Sort of up the gradient for the discriminators error so it can find out not just you did well you d id ba d ly but here’s how to t wea k you r weights so that you will so that the discriminated would have been more wrong. So t h a t yo u c a n c o n f u s e . To d i s c r i m i n a t e h i m m or e so you can think of this whole thing and analogy people sometimes use is like a forger and an expert investigator person right a t t h e b e g i n n i n g. You k n ow, l e t’s a s s u m e t h e r e’s one folder i n t here’s one i nvest ig a tor and all of the art buyers of the world are idiots at the beginning the The level of the qua lit y of t he forgeries is going to be qu ite low, right? The guy just go get some paint and he then he just writes, you know the castle on it and he can sell it for a lot of money and i nvest ig ator comes a long a nd says, I’m not su re I’ d I don’t k now t ha t’s r ight . Maybe it is I’m not su re. I haven’t rea l ly f ig u red it out. A nd then as time goes on the investigator w ho’s t he d isc r i m i na tor w i l l st a r t to spot cer t a i n things that are different between the things that the forger produces and real paintings and then they’ll start to be able to reliably spot. Oh, t his is a fa ke, you k now, t his uses t he w ron g t y pe of pa i nt or w ha tever. S o it’s fa ke and once that happens. The forger is forced to get bet ter r ight Yu k i ca n’t sel l h is face a ny more. He has to find that kind of paint so he goes and you know digs up Egyptian mummies or whatever to get the legit paint and now he can forge again, and now the discriminator of the investigator has fooled.
And they have to find a new thing that distinguishes the real from The Fakes and so on and so on in a cycle they force each ot her to i mprove a nd it’s t he sa me t h i n g here. So at the beginning the generator is ma k i n g ju st r a ndom noise ba sica l ly beca u se it’s g et t i n g r a ndom noise i n a nd it’s doi n g somet h i n g to it . W ho k nows w ha t a nd it’s A nd t he discriminating goes that looks nothing li ke a cat, you k now. A nd t hen event ua l ly because the discriminator is also not very smart at the beginning right and and they just they both get better and better the generator gets better at producing cat looking things and the discriminator gets better and better at identifying them until eventually in principle. If you run this for long enough, theoretically you end up with a situation where the generator is creating images that look exactly indistinguishable from images from the real data set. A nd t he d isc r i m i na tor i f it’s g iven a rea l i ma g e or a fake image always outputs is 0.5 5 0 /5 0 . I d o n’t k n o w c o u l d b e e i t h e r t h e s e t h i n g s are literally indistinguishable. Then you pretty much can throw away the discriminator and you’ve got a generator which you give random noise to and it outputs brand-new indistinguishable images of cats. T here’s a not her cool t h i n g a bout t h is, w h ich is every every time we ask the generator to generate New Ima ge, we’re g iv ing it some ra ndom d a t a , r i g ht we g i ve it j u s t t h i s Ve c t or of random numbers which you can think of as being a randomly selected point in a space because you k now, if you g ive it if you g ive it ten ra ndom numbers, you know between 0 & 1 or whatever that is effectively a point in a 10 d i mensiona l spa ce a nd t he t h i n g t ha t’s cool. Is that as the generator learns its Force?
Two It’s t he g e ne r a t or i s ef fe c t i vel y m a k i n g a mapping from that space into cat pictures. This is called a latent space By the way generally any to nearby points in that light and space will when you put them through the Produce similar cabbages, you know similar pictures in general Which means sort of As you move around if you tell some take that point and smoothly move it around the lake and space you get a smoothly varying picture of a cat. And so the directions you can move in the space actually end up corresponding to something that we as humans might consider mea n i n g f u l a bout ca t s. S o t here’s one, you k now, t h is one d i rect ion a nd it’s not necessarily one dimension of the space or w ha tever, but a nd it’s not necessa r i ly l i nea r or straight line or anything, but there will be a direction in that space which corresponds to how big the cat is in the frame, for example, or another dimension will be either the color of t he cat or whatever. S o t ha t’s rea l ly cool beca u se it mea ns t ha t By intuitively you think the fact that the generator can reliably produce a very large number of images of cats means it must have some like Understanding of what cats are right or at least what images of cats a re a nd it’s n ice to see t ha t it ha s a ct u a l ly structured its latent space in this way t ha t it’s by look i n g a t a hu g e nu mber of pict u res of cats. It has actually extracted some of t he st r uct u re of cat pict u res in genera l in a way, which is Meaningful when you look at it. So that means you can do some really cool things. So one example was they trained a let one of these systems. On a really large database of just face photographs And Could generate arbitrarily large number of well as large as the input space number of different faces
and so they found the actually by doing basic arithmetic like just adding and subtracting vectors on the latent space would actually produce meaningful changes in the image. If you took a bunch of latent vectors, which when you give them to the generator produce pictures of men and a bunch of them that produce pictures of women and average those you get a point in your latent space which corresponds to a picture of a man or a picture of a woman which is not one of your input poi nt s, but it’s sor t of Represent a t ive. A nd t hen you could do the same thing and say oh I only want give me the average point. Of all of the things that correspond to pictures of men wearing sunglasses, right? And then if you t a ke you r s u n g l a s s Ve c t or you r me n we a r i n g s u n g l a s s e s Ve c t or s u bt r a c t t he m a n ve c t or a nd a d d t he wom a n Ve c t or you get a point in your space. And if you run that through the generator you get a woman wearing sunglasses, right? So doing doing basic Ve c t or a r it h met ic i n you r i nput s p a c e actually is Meaningful in terms of images in a way that humans would recognize w h ich mea ns t ha t t here’s t here’s a sense i n w h ich the generator really does have an understanding of wearing sunglasses are not or being a man or being woman, which is kind of an impressive result. Computerphile¹
1 h t t p s : // w w w . y o u t u b e . c o m / watch?v=Sw9r8CL98N0
List of sources: Maske 22,
Inkjet-Print, 35x42cm, 2015
Maske 42, Inkjet-Print, 35x42cm 2015
Maske 04, Inkjet-Print, 35x42cm 2014
Maske 31, Inkjet-Print, 35x42cm 2015
Maske 19, Inkjet-Print, 35x42cm 2015
Maske 43, Inkjet-Print, 35x42cm 2016
Maske 58, Inkjet-Print, 35x42cm 2016
Maske 11, Inkjet-Print, 35x42cm 2014
Maske 10, Inkjet-Print, 35x42cm 2014
Maske 30, Inkjet-Print, 35x42cm 2015
Maske 54, Inkjet-Print, 35x42cm 2016
Maske 24, Inkjet-Print, 35x42cm 2015
Maske 35, Inkjet-Print, 35x42cm 2015
Maske 13, Inkjet-Print, 35x42cm 2014
Maske 47, Inkjet-Print, 35 x 4 2cm 2016
Maske 52, Inkjet-Print, 35x42cm 2016
Maske 53, Inkjet-Print, 35x42cm 2016
Maske 41, Inkjet-Print, 35x42cm 2015
Maske 07, Inkjet-Print, 35 x 4 2cm 2014
Maske 27, Inkjet-Print, 35 x 4 2cm 2015
Maske 37, Inkjet-Print, 35 x 4 2cm 2015
Maske 48, Inkjet-Print, 35x42cm 2016
Maske 32, Inkjet-Print, 35x42cm 2015
Maske 57, Inkjet-Print, 35 x 4 2cm 2016
Maske 55, Inkjet-Print, 35x42cm 2016
Maske 23, Inkjet-Print, 35x42cm 2015
Maske 46, Inkjet-Print, 35x42cm 2016
Maske 60, Inkjet-Print, 35x42cm 2016
Maske 18, Inkjet-Print, 35x42cm 2015
Maske 15, Inkjet-Print, 35x42cm 2014
Maske 02, Inkjet-Print, 35x42cm 2014
Maske 60, Inkjet-Print, 35x42cm 2016
Maske 38, Inkjet-Print, 35x42cm 2015
Maske 59, Inkjet-Print, 35x42cm 2016
Maske 20, Inkjet-Print, 35x42cm 2015
Maske 04, Inkjet-Print, 35x42cm 2014
Maske 49, Inkjet-Print, 35x42cm 2016
Maske 14, Inkjet-Print, 35x42cm 2014
Maske 09, Inkjet-Print, 35x42cm 2014
Maske 17, Inkjet-Print, 35 x 4 2cm 2015
Maske 33, Inkjet-Print, 35x42cm 2015
Maske 44, Inkjet-Print, 35x42cm 2016
Maske 05, Inkjet-Print, 35x42cm 2014
Maske 06, Inkjet-Print, 35x42cm 2014
Maske 28, Inkjet-Print, 35x42cm 2015
Maske 50, Inkjet-Print, 35x42cm 2016
Maske 16, Inkjet-Print, 35x42cm 2014
Maske 56, Inkjet-Print, 35x42cm 2016
Maske 01, Inkjet-Print, 35x42cm 2014
Maske 03, Inkjet-Print, 35x42cm 2014
Maske 26, Inkjet-Print, 35x42cm 2015
one book was used as a source for this book
GAN: Ivo Vigan,snk71 Concept/Image Editing: Michael Etzensperger michaeletzensperger.ch Graphic Design: Michael Etzensperger (based on a layout by Christof NĂźssli, typosalon) Lithography: Michael Etzensperger Explanation on GAN: Computerphile Transcription: Live Transcribe App Print: online-druck.biz This book is based on Masken, Michael Etzensperger, cpress 2018, ISBN 978-3-9524710-3-6 Edition: 35 unique copies, each copy contains 51 pictures from a stock of 1785 pictures, no picture was used twice Publishedby: cpress, Zurich, cpress.ch Distribution: cpress, cpress.ch CC licence: CC -BY-NC - SA creativecommons.org
Nr. 3 2 /3 5 , 2 0 1 9
Michael Etzensperger
Thanks: Michael Bloomberg, Peter Burleigh, Nina Calderone, Computerphile, I n s t i t u t K u n s t F H N W, B i r g i t K e m p k e r, Chus Martinez, Christof NĂźssli, Christoph Oeschger, snk7 1, A nselm Sta lder, Thorsten Strohmeier, Ivo Vigan