Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
keras-team
GitHub Repository: keras-team/keras-io
Path: blob/master/examples/nlp/md/addition_rnn.md
3508 views

Sequence to sequence learning for performing number addition

Author: Smerity and others
Date created: 2015/08/17
Last modified: 2024/02/13
Description: A model that learns to add strings of numbers, e.g. "535+61" -> "596".

View in Colab โ€ข GitHub source


Introduction

In this example, we train a model to learn to add two numbers, provided as strings.

Example:

  • Input: "535+61"

  • Output: "596"

Input may optionally be reversed, which was shown to increase performance in many tasks in: Learning to Execute and Sequence to Sequence Learning with Neural Networks.

Theoretically, sequence order inversion introduces shorter term dependencies between source and target for this problem.

Results:

For two digits (reversed):

  • One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs

Three digits (reversed):

  • One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs

Four digits (reversed):

  • One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs

Five digits (reversed):

  • One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs


Setup

import keras from keras import layers import numpy as np # Parameters for the model and dataset. TRAINING_SIZE = 50000 DIGITS = 3 REVERSE = True # Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of # int is DIGITS. MAXLEN = DIGITS + 1 + DIGITS

Generate the data

class CharacterTable: """Given a set of characters: + Encode them to a one-hot integer representation + Decode the one-hot or integer representation to their character output + Decode a vector of probabilities to their character output """ def __init__(self, chars): """Initialize character table. # Arguments chars: Characters that can appear in the input. """ self.chars = sorted(set(chars)) self.char_indices = dict((c, i) for i, c in enumerate(self.chars)) self.indices_char = dict((i, c) for i, c in enumerate(self.chars)) def encode(self, C, num_rows): """One-hot encode given string C. # Arguments C: string, to be encoded. num_rows: Number of rows in the returned one-hot encoding. This is used to keep the # of rows for each data the same. """ x = np.zeros((num_rows, len(self.chars))) for i, c in enumerate(C): x[i, self.char_indices[c]] = 1 return x def decode(self, x, calc_argmax=True): """Decode the given vector or 2D array to their character output. # Arguments x: A vector or a 2D array of probabilities or one-hot representations; or a vector of character indices (used with `calc_argmax=False`). calc_argmax: Whether to find the character index with maximum probability, defaults to `True`. """ if calc_argmax: x = x.argmax(axis=-1) return "".join(self.indices_char[x] for x in x) # All the numbers, plus sign and space for padding. chars = "0123456789+ " ctable = CharacterTable(chars) questions = [] expected = [] seen = set() print("Generating data...") while len(questions) < TRAINING_SIZE: f = lambda: int( "".join( np.random.choice(list("0123456789")) for i in range(np.random.randint(1, DIGITS + 1)) ) ) a, b = f(), f() # Skip any addition questions we've already seen # Also skip any such that x+Y == Y+x (hence the sorting). key = tuple(sorted((a, b))) if key in seen: continue seen.add(key) # Pad the data with spaces such that it is always MAXLEN. q = "{}+{}".format(a, b) query = q + " " * (MAXLEN - len(q)) ans = str(a + b) # Answers can be of maximum size DIGITS + 1. ans += " " * (DIGITS + 1 - len(ans)) if REVERSE: # Reverse the query, e.g., '12+345 ' becomes ' 543+21'. (Note the # space used for padding.) query = query[::-1] questions.append(query) expected.append(ans) print("Total questions:", len(questions))
``` Generating data... Total questions: 50000
</div> --- ## Vectorize the data ```python print("Vectorization...") x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=bool) y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=bool) for i, sentence in enumerate(questions): x[i] = ctable.encode(sentence, MAXLEN) for i, sentence in enumerate(expected): y[i] = ctable.encode(sentence, DIGITS + 1) # Shuffle (x, y) in unison as the later parts of x will almost all be larger # digits. indices = np.arange(len(y)) np.random.shuffle(indices) x = x[indices] y = y[indices] # Explicitly set apart 10% for validation data that we never train over. split_at = len(x) - len(x) // 10 (x_train, x_val) = x[:split_at], x[split_at:] (y_train, y_val) = y[:split_at], y[split_at:] print("Training Data:") print(x_train.shape) print(y_train.shape) print("Validation Data:") print(x_val.shape) print(y_val.shape)
``` Vectorization... Training Data: (45000, 7, 12) (45000, 4, 12) Validation Data: (5000, 7, 12) (5000, 4, 12)
</div> --- ## Build the model ```python print("Build model...") num_layers = 1 # Try to add more LSTM layers! model = keras.Sequential() # "Encode" the input sequence using a LSTM, producing an output of size 128. # Note: In a situation where your input sequences have a variable length, # use input_shape=(None, num_feature). model.add(layers.Input((MAXLEN, len(chars)))) model.add(layers.LSTM(128)) # As the decoder RNN's input, repeatedly provide with the last output of # RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum # length of output, e.g., when DIGITS=3, max output is 999+999=1998. model.add(layers.RepeatVector(DIGITS + 1)) # The decoder RNN could be multiple layers stacked or a single layer. for _ in range(num_layers): # By setting return_sequences to True, return not only the last output but # all the outputs so far in the form of (num_samples, timesteps, # output_dim). This is necessary as TimeDistributed in the below expects # the first dimension to be the timesteps. model.add(layers.LSTM(128, return_sequences=True)) # Apply a dense layer to the every temporal slice of an input. For each of step # of the output sequence, decide which character should be chosen. model.add(layers.Dense(len(chars), activation="softmax")) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) model.summary()
``` Build model...
</div> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold">Model: "sequential"</span> </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ โ”ƒ<span style="font-weight: bold"> Layer (type) </span>โ”ƒ<span style="font-weight: bold"> Output Shape </span>โ”ƒ<span style="font-weight: bold"> Param # </span>โ”ƒ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ โ”‚ lstm (<span style="color: #0087ff; text-decoration-color: #0087ff">LSTM</span>) โ”‚ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>) โ”‚ <span style="color: #00af00; text-decoration-color: #00af00">72,192</span> โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ repeat_vector (<span style="color: #0087ff; text-decoration-color: #0087ff">RepeatVector</span>) โ”‚ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">4</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>) โ”‚ <span style="color: #00af00; text-decoration-color: #00af00">0</span> โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ lstm_1 (<span style="color: #0087ff; text-decoration-color: #0087ff">LSTM</span>) โ”‚ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">4</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>) โ”‚ <span style="color: #00af00; text-decoration-color: #00af00">131,584</span> โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ dense (<span style="color: #0087ff; text-decoration-color: #0087ff">Dense</span>) โ”‚ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">4</span>, <span style="color: #00af00; text-decoration-color: #00af00">12</span>) โ”‚ <span style="color: #00af00; text-decoration-color: #00af00">1,548</span> โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Total params: </span><span style="color: #00af00; text-decoration-color: #00af00">205,324</span> (802.05 KB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Trainable params: </span><span style="color: #00af00; text-decoration-color: #00af00">205,324</span> (802.05 KB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Non-trainable params: </span><span style="color: #00af00; text-decoration-color: #00af00">0</span> (0.00 B) </pre> --- ## Train the model ```python # Training parameters. epochs = 30 batch_size = 32 # Formatting characters for results display. green_color = "\033[92m" red_color = "\033[91m" end_char = "\033[0m" # Train the model each generation and show predictions against the validation # dataset. for epoch in range(1, epochs): print() print("Iteration", epoch) model.fit( x_train, y_train, batch_size=batch_size, epochs=1, validation_data=(x_val, y_val), ) # Select 10 samples from the validation set at random so we can visualize # errors. for i in range(10): ind = np.random.randint(0, len(x_val)) rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])] preds = np.argmax(model.predict(rowx, verbose=0), axis=-1) q = ctable.decode(rowx[0]) correct = ctable.decode(rowy[0]) guess = ctable.decode(preds[0], calc_argmax=False) print("Q", q[::-1] if REVERSE else q, end=" ") print("T", correct, end=" ") if correct == guess: print(f"{green_color}โ˜‘ {guess}{end_char}") else: print(f"{red_color}โ˜’ {guess}{end_char}")
``` Iteration 1 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 10s 6ms/step - accuracy: 0.3258 - loss: 1.8801 - val_accuracy: 0.4268 - val_loss: 1.5506 Q 499+58 T 557 โ˜’ 511 Q 51+638 T 689 โ˜’ 662 Q 87+12 T 99 โ˜’ 11 Q 259+55 T 314 โ˜’ 561 Q 704+87 T 791 โ˜’ 811 Q 988+67 T 1055 โ˜’ 101 Q 94+116 T 210 โ˜’ 111 Q 724+4 T 728 โ˜’ 777 Q 8+673 T 681 โ˜’ 772 Q 8+991 T 999 โ˜’ 900 ```
``` Iteration 2 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.4688 - loss: 1.4235 - val_accuracy: 0.5846 - val_loss: 1.1293 Q 379+6 T 385 โ˜’ 387 Q 15+504 T 519 โ˜’ 525 Q 552+299 T 851 โ˜’ 727 Q 664+0 T 664 โ˜’ 667 Q 500+257 T 757 โ˜’ 797 Q 50+818 T 868 โ˜’ 861 Q 310+691 T 1001 โ˜’ 900 Q 378+548 T 926 โ˜’ 827 Q 46+59 T 105 โ˜’ 122 Q 49+817 T 866 โ˜’ 871 ```
``` Iteration 3 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.6053 - loss: 1.0648 - val_accuracy: 0.6665 - val_loss: 0.9070 Q 1+266 T 267 โ˜’ 260 Q 73+257 T 330 โ˜’ 324 Q 421+628 T 1049 โ˜’ 1022 Q 85+590 T 675 โ˜’ 660 Q 66+34 T 100 โ˜’ 90 Q 256+639 T 895 โ˜’ 890 Q 6+677 T 683 โ˜‘ 683 Q 162+637 T 799 โ˜’ 792 Q 5+324 T 329 โ˜’ 337 Q 848+34 T 882 โ˜’ 889 ```
``` Iteration 4 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.6781 - loss: 0.8751 - val_accuracy: 0.7037 - val_loss: 0.8092 Q 677+1 T 678 โ˜’ 676 Q 1+531 T 532 โ˜’ 535 Q 699+60 T 759 โ˜’ 756 Q 475+139 T 614 โ˜’ 616 Q 327+592 T 919 โ˜’ 915 Q 48+912 T 960 โ˜’ 956 Q 520+78 T 598 โ˜’ 505 Q 318+8 T 326 โ˜’ 327 Q 914+53 T 967 โ˜’ 966 Q 734+0 T 734 โ˜’ 733 ```
``` Iteration 5 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.7142 - loss: 0.7807 - val_accuracy: 0.7164 - val_loss: 0.7622 Q 150+337 T 487 โ˜’ 489 Q 72+934 T 1006 โ˜’ 1005 Q 171+62 T 233 โ˜’ 231 Q 108+21 T 129 โ˜’ 135 Q 755+896 T 1651 โ˜’ 1754 Q 117+1 T 118 โ˜’ 119 Q 148+95 T 243 โ˜’ 241 Q 719+956 T 1675 โ˜’ 1684 Q 656+43 T 699 โ˜’ 695 Q 368+8 T 376 โ˜’ 372 ```
``` Iteration 6 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.7377 - loss: 0.7157 - val_accuracy: 0.7541 - val_loss: 0.6684 Q 945+364 T 1309 โ˜’ 1305 Q 762+96 T 858 โ˜’ 855 Q 5+650 T 655 โ˜‘ 655 Q 52+680 T 732 โ˜’ 735 Q 77+724 T 801 โ˜’ 800 Q 46+739 T 785 โ˜‘ 785 Q 843+43 T 886 โ˜’ 885 Q 158+3 T 161 โ˜’ 160 Q 426+711 T 1137 โ˜’ 1138 Q 157+41 T 198 โ˜’ 190 ```
``` Iteration 7 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.7642 - loss: 0.6462 - val_accuracy: 0.7955 - val_loss: 0.5433 Q 822+27 T 849 โ˜‘ 849 Q 82+495 T 577 โ˜’ 563 Q 9+366 T 375 โ˜’ 373 Q 9+598 T 607 โ˜’ 696 Q 186+41 T 227 โ˜’ 226 Q 920+920 T 1840 โ˜’ 1846 Q 445+345 T 790 โ˜’ 797 Q 783+588 T 1371 โ˜’ 1360 Q 36+473 T 509 โ˜’ 502 Q 354+61 T 415 โ˜’ 416 ```
``` Iteration 8 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.8326 - loss: 0.4626 - val_accuracy: 0.9069 - val_loss: 0.2744 Q 458+154 T 612 โ˜‘ 612 Q 309+19 T 328 โ˜‘ 328 Q 808+97 T 905 โ˜‘ 905 Q 28+736 T 764 โ˜‘ 764 Q 28+79 T 107 โ˜‘ 107 Q 44+84 T 128 โ˜’ 129 Q 744+13 T 757 โ˜‘ 757 Q 24+996 T 1020 โ˜’ 1011 Q 8+193 T 201 โ˜’ 101 Q 483+9 T 492 โ˜’ 491 ```
``` Iteration 9 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9365 - loss: 0.2275 - val_accuracy: 0.9657 - val_loss: 0.1393 Q 330+61 T 391 โ˜‘ 391 Q 207+82 T 289 โ˜’ 299 Q 23+234 T 257 โ˜‘ 257 Q 690+567 T 1257 โ˜‘ 1257 Q 293+97 T 390 โ˜’ 380 Q 312+868 T 1180 โ˜‘ 1180 Q 956+40 T 996 โ˜‘ 996 Q 97+105 T 202 โ˜’ 203 Q 365+44 T 409 โ˜‘ 409 Q 76+639 T 715 โ˜‘ 715 ```
``` Iteration 10 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 7s 5ms/step - accuracy: 0.9717 - loss: 0.1223 - val_accuracy: 0.9744 - val_loss: 0.0965 Q 123+143 T 266 โ˜‘ 266 Q 599+1 T 600 โ˜‘ 600 Q 729+237 T 966 โ˜‘ 966 Q 51+120 T 171 โ˜‘ 171 Q 97+672 T 769 โ˜‘ 769 Q 840+5 T 845 โ˜‘ 845 Q 86+494 T 580 โ˜’ 570 Q 278+51 T 329 โ˜‘ 329 Q 8+832 T 840 โ˜‘ 840 Q 383+9 T 392 โ˜‘ 392 ```
``` Iteration 11 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 7s 5ms/step - accuracy: 0.9842 - loss: 0.0729 - val_accuracy: 0.9808 - val_loss: 0.0690 Q 181+923 T 1104 โ˜‘ 1104 Q 747+24 T 771 โ˜‘ 771 Q 6+65 T 71 โ˜‘ 71 Q 75+994 T 1069 โ˜‘ 1069 Q 712+587 T 1299 โ˜‘ 1299 Q 977+10 T 987 โ˜‘ 987 Q 742+24 T 766 โ˜‘ 766 Q 215+44 T 259 โ˜‘ 259 Q 817+683 T 1500 โ˜‘ 1500 Q 102+48 T 150 โ˜’ 140 ```
``` Iteration 12 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9820 - loss: 0.0695 - val_accuracy: 0.9823 - val_loss: 0.0596 Q 819+885 T 1704 โ˜’ 1604 Q 34+20 T 54 โ˜‘ 54 Q 9+996 T 1005 โ˜‘ 1005 Q 915+811 T 1726 โ˜‘ 1726 Q 166+640 T 806 โ˜‘ 806 Q 229+82 T 311 โ˜‘ 311 Q 1+418 T 419 โ˜‘ 419 Q 552+28 T 580 โ˜‘ 580 Q 279+733 T 1012 โ˜‘ 1012 Q 756+734 T 1490 โ˜‘ 1490 ```
``` Iteration 13 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9836 - loss: 0.0587 - val_accuracy: 0.9941 - val_loss: 0.0296 Q 793+0 T 793 โ˜‘ 793 Q 79+48 T 127 โ˜‘ 127 Q 484+92 T 576 โ˜‘ 576 Q 39+655 T 694 โ˜‘ 694 Q 64+708 T 772 โ˜‘ 772 Q 568+341 T 909 โ˜‘ 909 Q 9+918 T 927 โ˜‘ 927 Q 48+912 T 960 โ˜‘ 960 Q 31+289 T 320 โ˜‘ 320 Q 378+548 T 926 โ˜‘ 926 ```
``` Iteration 14 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9915 - loss: 0.0353 - val_accuracy: 0.9901 - val_loss: 0.0358 Q 318+8 T 326 โ˜’ 325 Q 886+63 T 949 โ˜’ 959 Q 77+8 T 85 โ˜‘ 85 Q 418+40 T 458 โ˜‘ 458 Q 30+32 T 62 โ˜‘ 62 Q 541+93 T 634 โ˜‘ 634 Q 6+7 T 13 โ˜’ 14 Q 670+74 T 744 โ˜‘ 744 Q 97+57 T 154 โ˜‘ 154 Q 60+13 T 73 โ˜‘ 73 ```
``` Iteration 15 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9911 - loss: 0.0335 - val_accuracy: 0.9934 - val_loss: 0.0262 Q 24+533 T 557 โ˜‘ 557 Q 324+44 T 368 โ˜‘ 368 Q 63+505 T 568 โ˜‘ 568 Q 670+74 T 744 โ˜‘ 744 Q 58+359 T 417 โ˜‘ 417 Q 16+428 T 444 โ˜‘ 444 Q 17+99 T 116 โ˜‘ 116 Q 779+903 T 1682 โ˜‘ 1682 Q 40+576 T 616 โ˜‘ 616 Q 947+773 T 1720 โ˜‘ 1720 ```
``` Iteration 16 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9968 - loss: 0.0175 - val_accuracy: 0.9901 - val_loss: 0.0360 Q 315+155 T 470 โ˜‘ 470 Q 594+950 T 1544 โ˜‘ 1544 Q 372+37 T 409 โ˜‘ 409 Q 537+47 T 584 โ˜‘ 584 Q 8+263 T 271 โ˜‘ 271 Q 81+500 T 581 โ˜‘ 581 Q 75+270 T 345 โ˜‘ 345 Q 0+796 T 796 โ˜‘ 796 Q 655+965 T 1620 โ˜‘ 1620 Q 384+1 T 385 โ˜‘ 385 ```
``` Iteration 17 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9972 - loss: 0.0148 - val_accuracy: 0.9924 - val_loss: 0.0278 Q 168+83 T 251 โ˜‘ 251 Q 951+53 T 1004 โ˜‘ 1004 Q 400+37 T 437 โ˜‘ 437 Q 996+473 T 1469 โ˜’ 1569 Q 996+847 T 1843 โ˜‘ 1843 Q 842+550 T 1392 โ˜‘ 1392 Q 479+72 T 551 โ˜‘ 551 Q 753+782 T 1535 โ˜‘ 1535 Q 99+188 T 287 โ˜‘ 287 Q 2+974 T 976 โ˜‘ 976 ```
``` Iteration 18 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 7s 5ms/step - accuracy: 0.9929 - loss: 0.0258 - val_accuracy: 0.9973 - val_loss: 0.0135 Q 380+62 T 442 โ˜‘ 442 Q 774+305 T 1079 โ˜‘ 1079 Q 248+272 T 520 โ˜‘ 520 Q 479+736 T 1215 โ˜‘ 1215 Q 859+743 T 1602 โ˜‘ 1602 Q 667+20 T 687 โ˜‘ 687 Q 932+56 T 988 โ˜‘ 988 Q 740+31 T 771 โ˜‘ 771 Q 588+88 T 676 โ˜‘ 676 Q 109+57 T 166 โ˜‘ 166 ```
``` Iteration 19 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9977 - loss: 0.0116 - val_accuracy: 0.9571 - val_loss: 0.1416 Q 635+89 T 724 โ˜‘ 724 Q 50+818 T 868 โ˜‘ 868 Q 37+622 T 659 โ˜‘ 659 Q 913+49 T 962 โ˜‘ 962 Q 641+962 T 1603 โ˜’ 1503 Q 11+626 T 637 โ˜‘ 637 Q 20+405 T 425 โ˜‘ 425 Q 667+208 T 875 โ˜‘ 875 Q 89+794 T 883 โ˜‘ 883 Q 234+55 T 289 โ˜‘ 289 ```
``` Iteration 20 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9947 - loss: 0.0194 - val_accuracy: 0.9967 - val_loss: 0.0136 Q 5+777 T 782 โ˜‘ 782 Q 1+266 T 267 โ˜‘ 267 Q 579+1 T 580 โ˜‘ 580 Q 665+6 T 671 โ˜‘ 671 Q 210+546 T 756 โ˜‘ 756 Q 660+86 T 746 โ˜‘ 746 Q 75+349 T 424 โ˜‘ 424 Q 984+36 T 1020 โ˜‘ 1020 Q 4+367 T 371 โ˜‘ 371 Q 249+213 T 462 โ˜‘ 462 ```
``` Iteration 21 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 7s 5ms/step - accuracy: 0.9987 - loss: 0.0081 - val_accuracy: 0.9840 - val_loss: 0.0481 Q 228+95 T 323 โ˜‘ 323 Q 72+18 T 90 โ˜‘ 90 Q 34+687 T 721 โ˜‘ 721 Q 932+0 T 932 โ˜‘ 932 Q 933+54 T 987 โ˜‘ 987 Q 735+455 T 1190 โ˜‘ 1190 Q 790+70 T 860 โ˜‘ 860 Q 416+36 T 452 โ˜’ 462 Q 194+110 T 304 โ˜‘ 304 Q 349+70 T 419 โ˜‘ 419 ```
``` Iteration 22 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 40s 28ms/step - accuracy: 0.9902 - loss: 0.0326 - val_accuracy: 0.9947 - val_loss: 0.0190 Q 95+237 T 332 โ˜‘ 332 Q 5+188 T 193 โ˜‘ 193 Q 19+931 T 950 โ˜‘ 950 Q 38+499 T 537 โ˜‘ 537 Q 25+21 T 46 โ˜‘ 46 Q 55+85 T 140 โ˜‘ 140 Q 555+7 T 562 โ˜‘ 562 Q 83+873 T 956 โ˜‘ 956 Q 95+527 T 622 โ˜‘ 622 Q 556+558 T 1114 โ˜‘ 1114 ```
``` Iteration 23 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9835 - loss: 0.0572 - val_accuracy: 0.9962 - val_loss: 0.0141 Q 48+413 T 461 โ˜‘ 461 Q 71+431 T 502 โ˜‘ 502 Q 892+534 T 1426 โ˜‘ 1426 Q 934+201 T 1135 โ˜‘ 1135 Q 898+967 T 1865 โ˜’ 1855 Q 958+0 T 958 โ˜‘ 958 Q 23+179 T 202 โ˜‘ 202 Q 138+60 T 198 โ˜‘ 198 Q 718+5 T 723 โ˜‘ 723 Q 816+514 T 1330 โ˜‘ 1330 ```
``` Iteration 24 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 20s 14ms/step - accuracy: 0.9932 - loss: 0.0255 - val_accuracy: 0.9932 - val_loss: 0.0243 Q 4+583 T 587 โ˜‘ 587 Q 49+466 T 515 โ˜‘ 515 Q 920+26 T 946 โ˜‘ 946 Q 624+813 T 1437 โ˜‘ 1437 Q 87+315 T 402 โ˜‘ 402 Q 368+73 T 441 โ˜‘ 441 Q 86+833 T 919 โ˜‘ 919 Q 528+423 T 951 โ˜‘ 951 Q 0+705 T 705 โ˜‘ 705 Q 581+928 T 1509 โ˜‘ 1509 ```
``` Iteration 25 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9908 - loss: 0.0303 - val_accuracy: 0.9944 - val_loss: 0.0169 Q 107+34 T 141 โ˜‘ 141 Q 998+90 T 1088 โ˜‘ 1088 Q 71+520 T 591 โ˜‘ 591 Q 91+996 T 1087 โ˜‘ 1087 Q 94+69 T 163 โ˜‘ 163 Q 108+21 T 129 โ˜‘ 129 Q 785+60 T 845 โ˜‘ 845 Q 71+628 T 699 โ˜‘ 699 Q 294+9 T 303 โ˜‘ 303 Q 399+34 T 433 โ˜‘ 433 ```
``` Iteration 26 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 5ms/step - accuracy: 0.9965 - loss: 0.0139 - val_accuracy: 0.9979 - val_loss: 0.0094 Q 19+133 T 152 โ˜‘ 152 Q 841+3 T 844 โ˜‘ 844 Q 698+6 T 704 โ˜‘ 704 Q 942+28 T 970 โ˜‘ 970 Q 81+735 T 816 โ˜‘ 816 Q 325+14 T 339 โ˜‘ 339 Q 790+64 T 854 โ˜‘ 854 Q 4+839 T 843 โ˜‘ 843 Q 505+96 T 601 โ˜‘ 601 Q 917+42 T 959 โ˜‘ 959 ```
``` Iteration 27 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 72s 51ms/step - accuracy: 0.9952 - loss: 0.0173 - val_accuracy: 0.9992 - val_loss: 0.0036 Q 71+628 T 699 โ˜‘ 699 Q 791+9 T 800 โ˜‘ 800 Q 19+148 T 167 โ˜‘ 167 Q 7+602 T 609 โ˜‘ 609 Q 6+566 T 572 โ˜‘ 572 Q 437+340 T 777 โ˜‘ 777 Q 614+533 T 1147 โ˜‘ 1147 Q 948+332 T 1280 โ˜‘ 1280 Q 56+619 T 675 โ˜‘ 675 Q 86+251 T 337 โ˜‘ 337 ```
``` Iteration 28 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 8s 6ms/step - accuracy: 0.9964 - loss: 0.0124 - val_accuracy: 0.9990 - val_loss: 0.0047 Q 2+572 T 574 โ˜‘ 574 Q 437+96 T 533 โ˜‘ 533 Q 15+224 T 239 โ˜‘ 239 Q 16+655 T 671 โ˜‘ 671 Q 714+5 T 719 โ˜‘ 719 Q 645+417 T 1062 โ˜‘ 1062 Q 25+919 T 944 โ˜‘ 944 Q 89+329 T 418 โ˜‘ 418 Q 22+513 T 535 โ˜‘ 535 Q 497+983 T 1480 โ˜‘ 1480 ```
``` Iteration 29 1407/1407 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 7s 5ms/step - accuracy: 0.9970 - loss: 0.0106 - val_accuracy: 0.9990 - val_loss: 0.0048 Q 2+962 T 964 โ˜‘ 964 Q 6+76 T 82 โ˜‘ 82 Q 986+20 T 1006 โ˜‘ 1006 Q 727+49 T 776 โ˜‘ 776 Q 948+332 T 1280 โ˜‘ 1280 Q 921+463 T 1384 โ˜‘ 1384 Q 77+556 T 633 โ˜‘ 633 Q 133+849 T 982 โ˜‘ 982 Q 301+478 T 779 โ˜‘ 779 Q 3+243 T 246 โ˜‘ 246
</div> You'll get to 99+% validation accuracy after ~30 epochs.