Pytorch text classification tutorial


Pic from https://www.pexels.com/ (they’re great and don’t even require attribution!)

Some time ago we saw how to classify texts with neural networks. The article covered the following topics:

In today’s article, we are going to build the same network, but instead of using TensorFlow, we are going to use Pytorch. We’ll focus only on the code. So if you need a primer on neural networks, it’s a good idea to check out the previous article. :)

We’ll create a machine learning model that classifies texts into categories. The dataset is the 20 Newsgroups, which contains 18,000 posts about 20 different topics. We will use only 3 categories: comp.graphics, sci.space, and rec.sport.baseball.

What is Pytorch?

Pytorch is a Python-based scientific computing package that is a replacement for NumPy, and uses the power of Graphics Processing Units. It is also a deep learning research platform that provides maximum flexibility and speed.

The biggest difference between Pytorch and Tensorflow is that Pytorch can create graphs on the fly. This makes debugging so much easier (and fun!).

A primer on Pytorch dynamics

When you execute a line of code, it gets executed. There isn’t an asynchronous view of the world. When you drop it into a debugger, or receive error messages and stack traces, understanding them is straight forward. The stack trace points to exactly where your code was defined.

Building the network

Ok, let’s see how things work in Pytorch.

The basics

As usual, we have tensors, which are multi-dimensional matrices that contain elements of a single data type.

import torch

x = torch.IntTensor([1,3,6])
y = torch.IntTensor([1,1,1])
result = x + y
print(result)

>>> 2
>>> 4
>>> 7
>>> [torch.IntTensor of size 3]

The torch package contains data structures for multi-dimensional tensors and mathematical operations.

Step 1: Define the network

With TensorFlow each layer operation has to be explicitly named:

def multilayer_perceptron(input_tensor, weights, biases):
    layer_1_multiplication = tf.matmul(input_tensor, weights['h1'])
    layer_1_addition = tf.add(layer_1_multiplication, biases['b1'])
    layer_1_activation = tf.nn.relu(layer_1_addition)

    layer_2_multiplication = tf.matmul(layer_1_activation, weights['h2'])
    layer_2_addition = tf.add(layer_2_multiplication, biases['b2'])
    layer_2_activation = tf.nn.relu(layer_2_addition)

    out_layer_multiplication = tf.matmul(layer_2_activation, weights['out'])
    out_layer_addition = out_layer_multiplication + biases['out']

    return out_layer_addition

With Pytorch we use torch.nn. We need to multiply each input node with a weight, and also to add a bias. The classtorch.nn.Linear does the job for us.

The base class for all neural network modules is torch.nn.Module. The forward(*input) defines the computation performed at every call, and all subclasses should override it.

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class OurNet(nn.Module):
 def __init__(self, input_size, hidden_size, num_classes):
     super(Net, self).__init__()
     self.layer_1 = nn.Linear(n_inputs,hidden_size, bias=True)
     self.relu = nn.ReLU()
     self.layer_2 = nn.Linear(hidden_size, hidden_size, bias=True)
     self.output_layer = nn.Linear(hidden_size, num_classes, bias=True)

 def forward(self, x):
     out = self.layer_1(x)
     out = self.relu(out)
     out = self.layer_2(out)
     out = self.relu(out)
     out = self.output_layer(out)
     return out

Cool, right?

Step 2: Update the weights

The way the neural network “learns” is by updating the weight values. With Pytorch we use the torch.autograd package to do that.

Torch.autograd.Variable wraps a tensor and records the operations applied to it. This is very handy and allows us to work with the gradient descent in a very simple way. Let’s have a closer look at the documentation.

A variable is a thin wrapper around a Tensor object that also holds the gradient and a reference to the function that created it. This reference allows us to trace the entire chain of operations that created the data.

Variable

We didn’t specify the weight tensors like we did with TensorFlow because the torch.nn.Linear class has a variable weight with shape (out_features x in_features).

To compute the gradient, we will use the the method Adaptive Moment Estimation (Adam). Torch.optim is a package that implements various optimization algorithms.

To use torch.optim, you have to construct an optimizer object that will hold the current state and also update the parameters based on the computed gradients.

To construct an optimizer, you have to give it an iterable that contains the parameters (all should be variables ) to optimize. Then you can specify options that are specific to an optimizer, such as the learning rate, weight decay, etc.

Let’s construct our optimizer:

net = OurNet(input_size, hidden_size, num_classes)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

The parameters() method from torch.nn.Module returns an iterator over the module parameters. To compute the loss we’ll use torch.nn.CrossEntropyLoss

One important thing about torch.nn.CrossEntropyLoss is that input has to be a 2D tensor of size (minibatch, n) and target expects a class index (0 to nClasses-1) as the target for each value of a 1D tensor of size minibatch. For example:

import torch.nn as nn

loss = nn.CrossEntropyLoss()

input = Variable(torch.randn(2, 5), requires_grad=True)
print(">>> batch of size 2 and 5 classes")
print(input)

target = Variable(torch.LongTensor(2).random_(5))
print(">>> array of size ‘batch_size’ with the index of the maxium label for each item")
print(target)

output = loss(input, target)
output.backward()

>>> batch of size 2 and 5 classes
>>> Variable containing:
>>> 0.0400  0.5079 -1.8777 -0.1471 -1.0270
>>> -1.1513  0.7802  0.6816  0.8945 -0.2956
>>> [torch.FloatTensor of size 2x5]

>>> array of size 'batch_size' with the index of the maxium label for each item
>>> Variable containing:
>>> 1
>>> 2
>>> [torch.LongTensor of size 2]

So we need to change the get_batch() function from the previous article to work like it does in the example above.

Now let’s update the weights and see the magic of the variables.

The method torch.autograd.backward computes the sum of the gradients for given variables. As the documentation says, this function accumulates gradients in the leaves, so you might need to zero them before calling them. To update the parameters, all optimizers implement a step() method. The functions can be called once the gradients are computed, for example you can use backward() to call them.

In the neural network terminology, one epoch equals one forward pass (getting the output values), and one backward pass (updating the weights) equals all the training examples. In our network, the get_batch() function gives us the number of texts with the size of the batch.

Putting it all together, we get this:

# Train the Model
for epoch in range(num_epochs):
    total_batch = int(len(newsgroups_train.data)/batch_size)
    for i in range(total_batch):
        batch_x,batch_y = get_batch(newsgroups_train,i,batch_size)
        articles = Variable(torch.FloatTensor(batch_x))
        labels = Variable(torch.FloatTensor(batch_y))

        # Forward + Backward + Optimize
        optimizer.zero_grad() # zero the gradient buffer
	outputs = net(articles)
	loss = criterion(outputs, labels)
	loss.backward()
	optimizer.step()

	print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
 				%(epoch+1, num_epochs, i+1, len(newsgroups_train.data)//batch_size, loss.data[0]))

And that’s it.

I never thought I would say this about a piece of code, but it’s beautiful.

Isn’t it?

Now let’s test the model:

# Test the Model
correct = 0
total = 0
total_test_data = len(newsgroups_test.target)
batch_x_test,batch_y_test = get_batch(newsgroups_test,0,total_test_data)
articles = Variable(torch.FloatTensor(batch_x_test))
labels = Variable(torch.LongTensor(batch_y_test))
outputs = net(articles)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()

And that’s it.

You have created a model using a neural network to classify texts into categories.

Congratulations. 😄

You can see the notebook with the final code here.

If you enjoyed this piece, please show your love and clap so that others can find it!