GPU vs CPU in Convolutional Neural Networks using TensorFlow
We at team science really enjoy writing about Deep Learning and TensorFlow! In a previous post, we compared NumPy with TensorFlow in matrix operations, which showed huge advantages of TensorFlow over Numpy. In this blogpost, we train two different Convolutional Neural Networks (CNN) and compare the speed of the GPU to the CPU.
It is well known that the GPU has huge speed performance in deep learning applications, but it would be nice to see on what scale. If you are not familiar with Deep Learning, go check out our simple introduction to Neural Networks.
Before we get to the results, let's look at the reasons for the overperformance of the GPU compared to the CPU: Deep Learning involves a huge amount of matrix and vector operations, mostly multiplications that can be massively parallelized, which implies speed up on GPUs. This is because GPUs were designed to handle these matrix operations in parallel, as this is essential for computer games for e.g. 3D effects, land rendering, etc. On the other hand, a single core CPU would take a matrix operation in serial, one element at a time. A single GPU could have hundreds or thousands of cores, while a CPU typically has no more than a few cores (between two and eight).
For this performance comparison, we trained the models on the MNIST database, which is a database of handwritten digits. Furthermore, a Lenovo y50-70 laptop was used. It is equipped with a GeForce GTX 860m (4GB GDDR5 Memory, 1152 CUDA cores) and a Memory of 16GB (DDR3-1600, two memory banks and eight cores in total). We created two models to compare the performance: one model with two convolutional layers, and one model with four convolutional layers. The graph of the models is presented in the following image using TensorBoard, a visualization tool from TensorFlow.
Let me give you a quick explanation of the graph above:
ConvN contains the convolutional layer and the activation function. Our convolutions use a fixed stride of one, and are zero padded on all levels. The activation function is the ReLu.
PoolN are the pooling layers, which is a way to downsample the image data extracted from the convolutional layers. We use the max pooling method over 2x2 blocks.
Densely1 is a fully connected layer. In dense layers, every node in the layer is connected to every node in the preceding layer. We use 1024 neurons.
Dropout (used to reduce overfitting): We use a value of 0.5, which implies that there is a probability of 0.5 that any given element will be dropped during training.
The results of the training models are presented in the next graph.
As expected, the GPU outperforms the CPU. In both training models, with two and with four layers, the GPU is around 8 (!) times quicker than the CPU. There is nothing more to comment; the results speak for themselves.
If you want to learn more, check out the code of a two layer CNN model in this GitHub Gist. Stay tuned for upcoming advanced tutorial involving TensorFlow here on our blog!
Note: The accuracy of both models in the test data reached around ~ 99.3 %.