This is a walkthrough to writing a Deep Learning implementation using TensorFlow. I’m no expert in Machine Learning, therefore, I expect that you learn the theory by yourself before trying to understand the code here. However, there is no harm in tinkering with it.
Deep Learning: The momentum
Buzzwords take over our world. In recent history, ‘social media’, ‘internet of things’ and now ‘deep learning’ really take over the planet in momentum. Lets take a look what world think interesting in last more than 10 years among Computer Science-related fields of study.
From academic perspective, among the fields of study compared as above, Deep Learning seems to be at #1 position, followed by Analysis of Algorithms at #2, Computer Security is #3, and surprisingly Artificial Intelligence is at #4. Lastly, Computer Graphics is at the bottom of people’s interest.
It’s been less than a week, I have got my hands on coding to solve problems using Deep Learning. It is not wise to write about an academic topic which deserves to be mastered in the first place. However, I wore my developer hat intended to write something about how I wrote my first TensorFlow based Deep Learning solution so that it won’t get lost.
I have used Theano before (briefly), and I am currently in the middle of a decision on which toolchain I should settle in. I’m currently also considering H2O stack. The former has a huge community around the world, and the latter is sheer eye-candy. However, Google’s TensorFlow is most definitely having its time with hype, utility, and developer-friendliness. Just look at the spike in popularity at launch. It sure takes serious $ to make that happen, however, they have managed to keep a steady interest among the practitioners apparently. By the way, both Theano and TensorFlow are GPU-capable.
Dataset and expected outcome
The Car Evaluation dataset is publicly available and its simplicity makes it approachable for noobs such as myself. I encourage you to go ahead and check it out by yourself. The dataset itself boils down to some sort of survey, which essentially takes people’s feedback on cars based on its price, maintenance cost, how many doors it has, how many people can ride, etc.
Sample data looks like this:
The program that I have written is capable of parsing the data file which is in CSV, and training a multi-layer perceptron model with backpropagation so that it may mimic human intellect and decide for us when unseen data points are encountered.
The program should output as follows (but, fret not!):
The last two lines of the output are more relevant than anything else. This basically says that the training conducted gave the ability to predict its output class 100% of the time. Although, our goal is always to maximize accuracy, it is not always the case, especially due to the fact that we randomize the training and test data. More on that later. The first part of the output are essentially bunch of statistics which help us to make sense out of the data, or simply give us more information. Feel free to ignore that.
If you have paid attention to the dataset, especially take a look at the top 5 rows, you will quickly notice that the dataset was changed to some numeric representation in our output.
Google is trying to bring a complete toolchain for TensorFlow, although incomplete at the moment, I think they are on right track. TensorFlow is fun, but its simply too explicit and hinder developer productivity. Therefore, they have been building TFLearn, their Scikit Learn-inspired API for TensorFlow, that is awesome fun to write code against. Some of the other tools that I used, along with TFLearn and Scikit Learn, in my pipeline were pandas, NumPy and obligatory SciPy. It is easy to setup all of them at once.
Setting up the environment
First of, download the code from here, open a Terminal, and execute the following:
Now that you have all the required packages installed, go ahead and run by
python main.py. You should expect to see a similar output as stated above.
The code consists of the following four files:
- car-data.csv: as you can understand this is the dataset
- requirements.txt: this file contains the list of packages to be installed
- main.py: this is the entry point for the code
- categorical_dnn.py: the class which contains all the Deep Learning logic
The contents of the main.py are fairly straight forward. As it can be seen that it only instantiates the
CategoricalDNN class and passes some important parameters. Learning rate, training size, and iterations can be specified from here. By training size, it is meant that what % of the data would be used for training and the rest for the test.
It is important to note that the
CategoricalDNN class assumes that the invoker has no clue about the data, meaning that the class itself is able to extract meaning from the data, preprocess and train itself. However, needless to mention that it would only work better for categorical data.
The Categorical DNN
Let us begin by looking at the packages this class makes use of:
Although the code is fairly self-explanatory with complementary inline comments, important parts are discussed here in the sequence of invocation.
_load_csv is a (meant to be) private method, which loads the CSV into memory using pandas. It then goes on to count how many columns the dataset contains. It assumes that the last column contains the
class information, against which the dataset needs to be trained.
It then figures the unique classes the last column contains, which is an important metric for designing the classifier, again using pandas. In the end, all the columns are looped over, and all strings are converted into respective categorical values for our classifier to deal with. Actually a nifty pandas trick does the heavy-lifting.
Shuffling and Splitting
The dataset is now randomized and split into the ratio previously defined using
The classifier can only work with int32 / int64, therefore, it was required to convert both the training and test labels. In the end, it was also made sure that the labels themselves aren’t being fed into the classifier, so they are filtered out in the last two lines.
Training the Model and Measuring Performance
Now that we are done with all the preparations for the training, let us go ahead and design the network. The hidden layers can be arranged in any way that seems suitable or yields better results, but from our observation the following design suits us best. There are also literatures which propose or establish many best practices, but I am not going into that discussion, simply because I do not know enough.
I have used
ReLU as the activation function, which can be
softmax, etc. as required. Now that the model is trained it is time to measure it’s performance.
We take Scikit’s help again to measure the model’s accuracy.
In this post, I have tried to walk you through a very basic categorical Deep Neural Network using TensorFlow, which was really a cakewalk. Well, if you learn something what isn’t? I tried to explain important blocks of the code. Hope you find it useful.
You may explore the source code here.