In order to get a better understand for GANs we decided to train our own. We started by checking out this tutorial, which contains good explanations and all the code necessary to train a GAN to produce pictures from the CIFAR-10 dataset. After reproducing their results we decided to train a GAN on a different kind of dataset, World of Warcraft.
This summer Blizzard re-released the classic version of their record winning online game World of Warcraft. We used to play it back in the day and so far we enjoy the re-release of the classic. We thought it might be fun to record some gameplay footage and training a GAN on the video frames. The tutorial we mentioned before used about 50 thousand training images. By simply playing the game and recording it using 24 frames per second, we could get a similar sized dataset in a single afternoon.
For the recording we used this software and for extracting the frames we simply used imageio and its ffmpeg plugin. Instead of storing raw image files or some hdf5 format we simply kept the frames in an mp4 file. When training on them, we loaded random frames from that file, which in itself consists of 3 min video samples randomly stacked together. And since it is a valid video file we could publish our dataset on youtube!
Here are some results of the 32x32 GAN from the tutorial trained on WoW frames instead of CIFAR-10:
Both rock and meadow grounds were learned by the GAN and it made sure to put my character somewhere in the middle.
Obviously the game had a much better resolution than that, and while we did not try to train a network on full resolution images, we did want to train on images that are more "recognizable". However simply swapping the 32x32 images with 256x256 images lead to many difficulties.
- First, larger images and networks need more memory and while 50 thousand 32x32 images fit easily in our memory, the 256x256 images did not.
- In order to fix that we lowered the batch size and loaded only parts of the dataset into memory after each iteration. That however slowed down the learning process.
- Lastly the results we got were disappointing, which made us realize that we need a more sophisticated network architecture.
While looking for alternatives we found this implementation of a GAN trained on 256x256 images of clothes and shoes. In addition to the conv and relu layers they also use batch normalization and upsampling layers for the generator and pooling layers for the discriminator. Again we simply used their code while only modifying the training data.
The network learned to display our character, especially our blue cape and red dress. It also learned to recreate some of the ground textures from the dataset. The trained model files as well as more results are on our github, the code we used can be found from the links we mentioned before. The code "as is" runs on the CPU. If you have an nvidia card with cuda installed you can speed things up by telling tensorflow to use it. That's especially useful for the 256x256 GAN.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
The difference between the two generator networks can be seen here. While the 32x32 network only relies on convolution and relu layers, the 256x256 network also uses normalization, up sampling and activation layers. The larger network needed more iterations to train and a single iteration took more time as well. That's why we trained the larger network using our GeForce GTX 1050 Ti.For a future work we think it would be cool to produce even higher resolution images. The game itself has configurable graphics options, which allow the player to choose how detailed the world should look. More details mean more plants, better textures and overall more "stuff". We think it might be interesting to see if GANs trained on different detail versions of the game are able to reproduce some of those differences.







Keine Kommentare:
Kommentar veröffentlichen