Sonntag, 24. Februar 2019

Phantom Style Transfer with GCP

For a while now we’ve been enjoying to try out image processing algorithms on video footage frame by frame. Last year we took a tensorflow implementation of the famous style transfer technique by Image Style Transfer Using Convolutional Neural Networks and applied it on images of our christmas tree. Now we figured we could try applying that technique on frames of a video and (hopefully) be amazed by the results. Also we are aware of previous work on this subject, namely the paper “Artistic style transfer for videos” which looked into making the style transfer more smooth within the frames. Here’s an example of their results.

For the video we choose the famous phantom fight scene from the Devil May Cry game of 2001. Since it fits the theme we choose Van Gogh’s Scream as the style input. The frames of the video were resampled to 512x288 pixels. The video itself was a bit more than one minute or 2300 frames, which means we had to process about 340 million pixels. We started by applying the style transfer on a single image for a different number of iterations to see what effect we could achieve. Here are some of our results (raw version, after 5 and 50 iterations):



 Taking a look at the loss showed us that even after just five iterations both the style and total loss made a significant improvement over the raw image. And while both the total and style loss kept improving , the content loss only made small improvements compared to them. Here’s what the losses for a single frame over 50 iterations looked like:



From our analysis done on our laptop we learned that while working on a few frames can be done in reasonable time, processing the entire video would take forever.The framerate of our video was about 30 frames per second and since even 5 iterations took about a minute on a single frame, we saw that our laptop would need about 30 minutes just to process a single (!) second of video.

 Lucky for us, GCP has a Deeplearning VM Image section where they offer predefined virtual machine setups specifically for machine learning. That includes GPUs and pre configured software frameworks like tensorflow and pytorch. We picked a predefined tensorflow image with two CPUs and one NVIDIA Tesla K80 for about $0.4 per hour.

Our very first step was to copy our image frames to the VM instance.
$ gcloud compute scp --project <project name> --zone us-west1-b --recurse <my frame folder> tensorflow-1-vm:~/ 
 
Then we compared run times on a single frame to our laptop. While our laptop needed about one minute for just five iterations on a single frame, the gcp machine could do the same in less than three seconds. That meant processing the entire video could be done in about two hours (about $1). Processing the video using 50 iterations per frame could be done in about 8 hours ($4). Here are our result. We were quite surprised on how smooth they are, considering that we processed every frame independently of each other.  
 
 

Keine Kommentare:

Kommentar veröffentlichen