Review: MXNet deep learning shines with Gluon

With the addition of the high-level Gluon API, Apache MXNet rivals TensorFlow and PyTorch for developing deep learning models

Review: Apache MXNet deep learning shines with Gluon
Thinkstock
At a Glance

When I reviewed MXNet v0.7 in 2016, I felt that it was a promising deep learning framework with excellent scalability (nearly linear on GPU clusters), good auto-differentiation, and state-of-the-art support for CUDA GPUs. I also felt that it needed work on its documentation and tutorials, and needed a lot more examples in its model zoo. In addition, I would have liked to see a high-level interface for MXNet, which I imagined would be Keras.

editors choice award logo plum InfoWorld

Since then, there has been quite a bit of progress. MXNet moved under the Apache Software Foundation umbrella early in 2017, and although it’s still “incubating” at version 1.3, it feels fairly well fleshed out.

While there has been work on Keras with an MXNet back end, a different high-level interface has become much more important: Gluon. Prior to the incorporation of Gluon, you could either write easy imperative code or fast symbolic code in MXNet, but not both at once. With Gluon, you can combine the best of both worlds, in a way that competes with both Keras and PyTorch.

What is Gluon for MXNet?

The advantages claimed for Gluon include simple code, flexible modeling, dynamic graphs, and high performance:

  1. Simple, easy-to-understand code: Gluon offers a full set of plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers.
  2. Flexible, imperative structure: Gluon does not require the neural network model to be rigidly defined, but rather brings the training algorithm and model closer together to provide flexibility in the development process.
  3. Dynamic graphs: Gluon enables developers to define neural network models that are dynamic, meaning they can be built on the fly, with any structure, and using any of Python’s native control flow.
  4. High performance: Gluon provides all of the above benefits without impacting the training speed that the underlying engine provides.

These four items, along with a vastly expanded collection of model examples, bring Gluon/MXNet to rough parity with Keras/TensorFlow and PyTorch for ease of development and training speed. You can see Gluon code examples illustrating each these characteristics on the main Gluon page and repeated on the overview page for the Gluon API.

The Gluon API includes functionality for neural network layers, recurrent neural networks, loss functions, data set methods and vision data sets, a model zoo, and a set of experimental contributed neural network methods. You can freely combine Gluon with standard MXNet and NumPy modules—for example, module, autograd, and ndarray—as well as with Python control flows.

Gluon has a good selection of layers for building models, including basic layers (Dense, Dropout, etc.), convolutional layers, pooling layers, and activation layers. Each of these is a one-line call. These can be used, among other places, inside of network containers such as gluon.nn.Sequential().

A HybridSequential network can be cached (turned into a symbolic graph) for high performance using the hybridize() method:

net = nn.HybridSequential()
with net.name_scope():
    net.add(nn.Dense(256, activation="relu"))
    net.add(nn.Dense(128, activation="relu"))
    net.add(nn.Dense(2))
net.hybridize()

Note the way the Dense layer method can take an activation layer name as a parameter. That’s one of many similarities between Gluon and Keras.

Neither the Sequential nor the HybridSequential containers is documented as part of the Gluon API. As I discovered by searching the source code tree, they are implemented in incubator-mxnet/python/mxnet/gluon/nn/basic_layers.py.

What’s new in MXNet 1.3?

MXNet v1.3 includes a long list of new features, improvements, and bug fixes. Highlights include the ability to hybridize RNN (recurrent neural network) layers for performance, new and updated pre-trained vision models, model export to ONNX (Open Neural Network Exchange) format, and runtime integration of Nvidia TensorRT into MXNet in order to accelerate inference. Further, the integration of Intel MKL (Math Kernel Library) into MXNet gives up to 4x improvement in performance on Intel CPUs for intensive operations, including convolution nodes.

The MXNet community has also been paying closer attention to QA and continuous integration. Among the steps taken is to integrate the sample notebooks from the tutorial Deep Learning: The Straight Dope into the nightly CI testing.

Installation of MXNet without tears

If you already have a working, current installation of Python, MXNet and Jupyter notebooks with Notedown, you can skip to the next section. Otherwise, please follow along.

I can’t tell you how many problems I had with older versions of the different software components throwing obscure errors, along with interference from the installations of other packages, before figuring out this reproducible sequence. This way, you shouldn’t encounter bugs, except in your own code, and you shouldn’t break other deep learning frameworks you may have installed.

Is it the only possible installation option? No, of course not. It’s even easier to run MXNet in Amazon SageMaker, or run a Deep Learning AMI on AWS, which has everything you need already installed.

Start by installing the latest version of Python 3 for your platform. (I’ve had problems running MXNet with Python 2 or earlier builds of Python 3.) I suggest installing Python 3 from Python.org. If you prefer an Anaconda or MiniConda environment, you can install Python 3 with one of those instead, and possibly skip the Jupyter installation step.

Verify that you can run python3 from the command line and that it reports the latest version. In my late-October 2018 installation, python3 -V returns Python 3.7.1; your version may be later.

Then install Jupyter. I used pip. This step isn’t needed if you installed Anaconda, which installs Jupyter by default.

python3 -m pip install --upgrade pip
python3 -m pip install jupyter

If you run jupyter notebook from the command line you should see a browser window open, and be able to create a new notebook with a Python 3 kernel. Close those two windows and stop the notebook server, typically by pressing Ctrl-c twice at the command line.

Now install Notedown using a tarball as described in the Gluon crash course Readme. The Notedown plug-in lets Jupyter read notebooks saved in markdown format, which is useful both for the crash course and for Deep Learning: The Straight Dope.

pip install https://github.com/mli/notedown/tarball/master

Smoke test this by running Jupyter with Notedown:

jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'

Once again close any webpages and stop the notebook server.

Now we are ready to create a virtual environment for MXNet. If you are an Anaconda user, you can instead create the virtual environment with conda. I used the native Python3 venv facility, starting from my home directory:

python3 -m venv envs/mxnet

Now activate the virtual environment and install MXNet for your platform. I chose the MXNet build with MKL (Intel’s high-performance library for its CPUs), as I’m on a Mac (for which there’s no MXNet binary for CUDA GPUs), but if you have a recent Nvidia GPU with CUDA installed on Linux or Windows, you could install an MXNet version with both CUDA and MKL support. On a Bash shell the MXNet installation in the virtual environment was as follows:

source envs/mxnet/bin/activate
pip install mxnet-mkl

Activation is slightly different in the C shell and Fish shell, as you can run the activate script directly instead of using source. In any case, you’ll need to activate the environment whenever you want to get back to this MXNet installation after closing the shell. If you’re not in your home directory, the Bash activation command would be:

source ~/envs/mxnet/bin/activate

Test the MXNet installation at the command line by running Python 3 and importing the MXNet library we just installed. Note that the (mxnet) prefix on the command line means that we’re in the virtual environment.

(mxnet) Martins-Retina-MacBook:~ martinheller$ python3
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
[Clang 6.0 (clang-600.0.57)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import mxnet as mx
>>> from mxnet import nd
>>> nd.array(((1,2,3),(5,6,7)))

[[1. 2. 3.]
 [5. 6. 7.]]
<NDArray 2x3 @cpu(0)>
>>> ^D
(mxnet) Martins-Retina-MacBook:~ martinheller$

Now, we’re ready to test MXNet within a Jupyter notebook with Notedown, in the virtual environment where we installed MXNet:

mxnet notebook command line IDG

The Bash shell output from starting a Jupyter notebook.

mxnet start python3 kernel IDG

The Jupyter notebook initial web page, showing a directory. Drop down the New menu at the top right to pick a kernel. We want Python 3. Add the three lines of Python code shown in the figure below and run the cell.

mxnet notebook test IDG

Testing MXNet within the Jupyter notebook. All is as expected.

Now that you’ve tested your MXNet installation in a Jupyter notebook, you can take the next step and test Gluon more fully. Browse to the gluon-api/gluon-api repo on GitHub, and download the Jupyter notebook of the sample code. Change to the directory where you downloaded the notebook, activate your MXNet virtual environment if necessary, run Jupyter notebook, open the sample, and run it. It may take a while to complete the training. If all is well, you’ll see something like the following:

mxnet gluon test IDG

This Jupyter notebook is the canonical test for Gluon in MXNet. It’s an implementation of a simple two-layer dense neural network for MNIST digit identification, a multilayer perceptron, using Softmax cross-entropy loss and a stochastic gradient descent algorithm.

Gluon and MXNet Module tutorials

MXNet now has a number of tutorials both for Gluon and the Module API. I have already mentioned the long course on deep learning with Gluon, Deep Learning: The Straight Dope, and the short version, the 60-Minute Gluon Crash Course.

In addition, there are about 30 Gluon tutorials for Python. On the Module API side, there are about 24 tutorials for Python, five for Scala, two for C++, nine for R, and four for Perl.

When I reviewed Keras in September of this year, I said that “If I were starting a new deep learning project today, I would most likely do the research with Keras.” I’m no longer quite so sure about that. Gluon/MXNet is almost as good a choice as Keras/TensorFlow for deep learning research on CPUs and GPUs.

On the down side, MXNet currently lacks support for TPUs or FPGAs, unlike TensorFlow, and it lacks an equivalent of TensorFlow’s TensorBoard for visualization of graphs. Further, Keras/TensorFlow has a larger ecosystem than Gluon/MXNet.

Keras can be deployed on more environments than Gluon, but you can deploy Gluon models for prediction to Android, iOS, Raspberry Pi, and Nvidia Jetson devices, in addition to computers able to train the models and to TensorRT. Gluon and Keras are both currently more mature than PyTorch, which is still in a beta state. PyTorch and Gluon can both create models dynamically; Keras currently cannot.

Ultimately, the choice of which deep learning framework to use may well revolve around your specific requirements—or what you know and like. But thanks to Gluon and other dramatic improvements (in documentation, tutorials, models, etc.), MXNet is shaping up to be as good a choice as TensorFlow or PyTorch for deep learning.

At a Glance
  • Apache MXNet with Gluon is shaping up to be as good a choice as TensorFlow with Keras for deep learning research on CPUs and GPUs.

    Pros

    • High-level Gluon API eases model creation
    • Scales to multiple GPUs across multiple hosts with efficiency of 85 percent
    • Excellent development speed and programmability
    • Excellent training speed
    • Supports Python, R, Scala, Julia, Perl, C++, and Clojure (experimental)

    Cons

    • Smaller ecosystem than Keras/TensorFlow
    • Doesn’t support TPUs or FPGAs
    • Still considered “incubating” by the Apache Software Foundation

Copyright © 2018 IDG Communications, Inc.