Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132926 views
License: OTHER
Kernel: Python 3

Credits: Forked from deep-learning-keras-tensorflow by Valerio Maggio

ConvNet HandsOn with Keras

Problem Definition

Recognize handwritten digits

Data

The MNIST database (link) has a database of handwritten digits.

The training set has 60,00060,000 samples. The test set has 10,00010,000 samples.

The digits are size-normalized and centered in a fixed-size image.

The data page has description on how the data was collected. It also has reports the benchmark of various algorithms on the test dataset.

Load the data

The data is available in the repo's data folder. Let's load that using the keras library.

For now, let's load the data and see how it looks.

import numpy as np import keras from keras.datasets import mnist
Using Theano backend. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
!mkdir -p $HOME/.keras/datasets/euroscipy_2016_dl-keras/data/
# Set the full path to mnist.pkl.gz path_to_dataset = "euroscipy_2016_dl-keras/data/mnist.pkl.gz"
# Load the datasets (X_train, y_train), (X_test, y_test) = mnist.load_data(path_to_dataset)
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz 15024128/15296311 [============================>.] - ETA: 0s

Basic data analysis on the dataset

# What is the type of X_train?
# What is the type of y_train?
# Find number of observations in training data
# Find number of observations in test data
# Display first 2 records of X_train
array([[[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], [[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)
# Display the first 10 records of y_train
array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4], dtype=uint8)
# Find the number of observations for each digit in the y_train dataset
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949]))
# Find the number of observations for each digit in the y_test dataset
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([ 980, 1135, 1032, 1010, 982, 892, 958, 1028, 974, 1009]))
# What is the dimension of X_train?. What does that mean?
(60000, 28, 28)

Display Images

Let's now display some of the images and see how they look

We will be using matplotlib library for displaying the image

from matplotlib import pyplot import matplotlib as mpl %matplotlib inline
# Displaying the first training data
fig = pyplot.figure() ax = fig.add_subplot(1,1,1) imgplot = ax.imshow(X_train[1], cmap=mpl.cm.Greys) imgplot.set_interpolation('nearest') ax.xaxis.set_ticks_position('top') ax.yaxis.set_ticks_position('left') pyplot.show()
Image in a Jupyter notebook
# Let's now display the 11th record
Image in a Jupyter notebook