CoCalc -- image_classification_efficientnet_fine

GitHub Repository: keras-team/keras-io
Path: blob/master/examples/vision/image_classification_efficientnet_fine_tuning.py
³⁵⁰⁷ views
1
"""
2
Title: Image classification via fine-tuning with EfficientNet
3
Author: [Yixing Fu](https://github.com/yixingfu)
4
Date created: 2020/06/30
5
Last modified: 2023/07/10
6
Description: Use EfficientNet with weights pre-trained on imagenet for Stanford Dogs classification.
7
Accelerator: GPU
8
"""
9

10
"""
11

12
## Introduction: what is EfficientNet
13

14
EfficientNet, first introduced in [Tan and Le, 2019](https://arxiv.org/abs/1905.11946)
15
is among the most efficient models (i.e. requiring least FLOPS for inference)
16
that reaches State-of-the-Art accuracy on both
17
imagenet and common image classification transfer learning tasks.
18

19
The smallest base model is similar to [MnasNet](https://arxiv.org/abs/1807.11626), which
20
reached near-SOTA with a significantly smaller model. By introducing a heuristic way to
21
scale the model, EfficientNet provides a family of models (B0 to B7) that represents a
22
good combination of efficiency and accuracy on a variety of scales. Such a scaling
23
heuristics (compound-scaling, details see
24
[Tan and Le, 2019](https://arxiv.org/abs/1905.11946)) allows the
25
efficiency-oriented base model (B0) to surpass models at every scale, while avoiding
26
extensive grid-search of hyperparameters.
27

28
A summary of the latest updates on the model is available at
29
[here](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet), where various
30
augmentation schemes and semi-supervised learning approaches are applied to further
31
improve the imagenet performance of the models. These extensions of the model can be used
32
by updating weights without changing model architecture.
33

34
## B0 to B7 variants of EfficientNet
35

36
*(This section provides some details on "compound scaling", and can be skipped
37
if you're only interested in using the models)*
38

39
Based on the [original paper](https://arxiv.org/abs/1905.11946) people may have the
40
impression that EfficientNet is a continuous family of models created by arbitrarily
41
choosing scaling factor in as Eq.(3) of the paper.  However, choice of resolution,
42
depth and width are also restricted by many factors:
43

44
- Resolution: Resolutions not divisible by 8, 16, etc. cause zero-padding near boundaries
45
of some layers which wastes computational resources. This especially applies to smaller
46
variants of the model, hence the input resolution for B0 and B1 are chosen as 224 and
47
240.
48

49
- Depth and width: The building blocks of EfficientNet demands channel size to be
50
multiples of 8.
51

52
- Resource limit: Memory limitation may bottleneck resolution when depth
53
and width can still increase. In such a situation, increasing depth and/or
54
width but keep resolution can still improve performance.
55

56
As a result, the depth, width and resolution of each variant of the EfficientNet models
57
are hand-picked and proven to produce good results, though they may be significantly
58
off from the compound scaling formula.
59
Therefore, the keras implementation (detailed below) only provide these 8 models, B0 to B7,
60
instead of allowing arbitray choice of width / depth / resolution parameters.
61

62
## Keras implementation of EfficientNet
63

64
An implementation of EfficientNet B0 to B7 has been shipped with Keras since v2.3. To
65
use EfficientNetB0 for classifying 1000 classes of images from ImageNet, run:
66

67
```python
68
from tensorflow.keras.applications import EfficientNetB0
69
model = EfficientNetB0(weights='imagenet')
70
```
71

72
This model takes input images of shape `(224, 224, 3)`, and the input data should be in the
73
range `[0, 255]`. Normalization is included as part of the model.
74

75
Because training EfficientNet on ImageNet takes a tremendous amount of resources and
76
several techniques that are not a part of the model architecture itself. Hence the Keras
77
implementation by default loads pre-trained weights obtained via training with
78
[AutoAugment](https://arxiv.org/abs/1805.09501).
79

80
For B0 to B7 base models, the input shapes are different. Here is a list of input shape
81
expected for each model:
82

83
| Base model | resolution|
84
|----------------|-----|
85
| EfficientNetB0 | 224 |
86
| EfficientNetB1 | 240 |
87
| EfficientNetB2 | 260 |
88
| EfficientNetB3 | 300 |
89
| EfficientNetB4 | 380 |
90
| EfficientNetB5 | 456 |
91
| EfficientNetB6 | 528 |
92
| EfficientNetB7 | 600 |
93

94
When the model is intended for transfer learning, the Keras implementation
95
provides a option to remove the top layers:
96
```
97
model = EfficientNetB0(include_top=False, weights='imagenet')
98
```
99
This option excludes the final `Dense` layer that turns 1280 features on the penultimate
100
layer into prediction of the 1000 ImageNet classes. Replacing the top layer with custom
101
layers allows using EfficientNet as a feature extractor in a transfer learning workflow.
102

103
Another argument in the model constructor worth noticing is `drop_connect_rate` which controls
104
the dropout rate responsible for [stochastic depth](https://arxiv.org/abs/1603.09382).
105
This parameter serves as a toggle for extra regularization in finetuning, but does not
106
affect loaded weights. For example, when stronger regularization is desired, try:
107

108
```python
109
model = EfficientNetB0(weights='imagenet', drop_connect_rate=0.4)
110
```
111
The default value is 0.2.
112

113
## Example: EfficientNetB0 for Stanford Dogs.
114

115
EfficientNet is capable of a wide range of image classification tasks.
116
This makes it a good model for transfer learning.
117
As an end-to-end example, we will show using pre-trained EfficientNetB0 on
118
[Stanford Dogs](http://vision.stanford.edu/aditya86/ImageNetDogs/main.html) dataset.
119

120
"""
121

122
"""
123
## Setup and data loading
124
"""
125

126
import numpy as np
127
import tensorflow_datasets as tfds
128
import tensorflow as tf  # For tf.data
129
import matplotlib.pyplot as plt
130
import keras
131
from keras import layers
132
from keras.applications import EfficientNetB0
133

134
# IMG_SIZE is determined by EfficientNet model choice
135
IMG_SIZE = 224
136
BATCH_SIZE = 64
137

138

139
"""
140
### Loading data
141

142
Here we load data from [tensorflow_datasets](https://www.tensorflow.org/datasets)
143
(hereafter TFDS).
144
Stanford Dogs dataset is provided in
145
TFDS as [stanford_dogs](https://www.tensorflow.org/datasets/catalog/stanford_dogs).
146
It features 20,580 images that belong to 120 classes of dog breeds
147
(12,000 for training and 8,580 for testing).
148

149
By simply changing `dataset_name` below, you may also try this notebook for
150
other datasets in TFDS such as
151
[cifar10](https://www.tensorflow.org/datasets/catalog/cifar10),
152
[cifar100](https://www.tensorflow.org/datasets/catalog/cifar100),
153
[food101](https://www.tensorflow.org/datasets/catalog/food101),
154
etc. When the images are much smaller than the size of EfficientNet input,
155
we can simply upsample the input images. It has been shown in
156
[Tan and Le, 2019](https://arxiv.org/abs/1905.11946) that transfer learning
157
result is better for increased resolution even if input images remain small.
158
"""
159

160
dataset_name = "stanford_dogs"
161
(ds_train, ds_test), ds_info = tfds.load(
162
    dataset_name, split=["train", "test"], with_info=True, as_supervised=True
163
)
164
NUM_CLASSES = ds_info.features["label"].num_classes
165

166

167
"""
168
When the dataset include images with various size, we need to resize them into a
169
shared size. The Stanford Dogs dataset includes only images at least 200x200
170
pixels in size. Here we resize the images to the input size needed for EfficientNet.
171
"""
172

173
size = (IMG_SIZE, IMG_SIZE)
174
ds_train = ds_train.map(lambda image, label: (tf.image.resize(image, size), label))
175
ds_test = ds_test.map(lambda image, label: (tf.image.resize(image, size), label))
176

177
"""
178
### Visualizing the data
179

180
The following code shows the first 9 images with their labels.
181
"""
182

183

184
def format_label(label):
185
    string_label = label_info.int2str(label)
186
    return string_label.split("-")[1]
187

188

189
label_info = ds_info.features["label"]
190
for i, (image, label) in enumerate(ds_train.take(9)):
191
    ax = plt.subplot(3, 3, i + 1)
192
    plt.imshow(image.numpy().astype("uint8"))
193
    plt.title("{}".format(format_label(label)))
194
    plt.axis("off")
195

196

197
"""
198
### Data augmentation
199

200
We can use the preprocessing layers APIs for image augmentation.
201
"""
202

203
img_augmentation_layers = [
204
    layers.RandomRotation(factor=0.15),
205
    layers.RandomTranslation(height_factor=0.1, width_factor=0.1),
206
    layers.RandomFlip(),
207
    layers.RandomContrast(factor=0.1),
208
]
209

210

211
def img_augmentation(images):
212
    for layer in img_augmentation_layers:
213
        images = layer(images)
214
    return images
215

216

217
"""
218
This `Sequential` model object can be used both as a part of
219
the model we later build, and as a function to preprocess
220
data before feeding into the model. Using them as function makes
221
it easy to visualize the augmented images. Here we plot 9 examples
222
of augmentation result of a given figure.
223
"""
224

225
for image, label in ds_train.take(1):
226
    for i in range(9):
227
        ax = plt.subplot(3, 3, i + 1)
228
        aug_img = img_augmentation(np.expand_dims(image.numpy(), axis=0))
229
        aug_img = np.array(aug_img)
230
        plt.imshow(aug_img[0].astype("uint8"))
231
        plt.title("{}".format(format_label(label)))
232
        plt.axis("off")
233

234

235
"""
236
### Prepare inputs
237

238
Once we verify the input data and augmentation are working correctly,
239
we prepare dataset for training. The input data are resized to uniform
240
`IMG_SIZE`. The labels are put into one-hot
241
(a.k.a. categorical) encoding. The dataset is batched.
242

243
Note: `prefetch` and `AUTOTUNE` may in some situation improve
244
performance, but depends on environment and the specific dataset used.
245
See this [guide](https://www.tensorflow.org/guide/data_performance)
246
for more information on data pipeline performance.
247
"""
248

249

250
# One-hot / categorical encoding
251
def input_preprocess_train(image, label):
252
    image = img_augmentation(image)
253
    label = tf.one_hot(label, NUM_CLASSES)
254
    return image, label
255

256

257
def input_preprocess_test(image, label):
258
    label = tf.one_hot(label, NUM_CLASSES)
259
    return image, label
260

261

262
ds_train = ds_train.map(input_preprocess_train, num_parallel_calls=tf.data.AUTOTUNE)
263
ds_train = ds_train.batch(batch_size=BATCH_SIZE, drop_remainder=True)
264
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
265

266
ds_test = ds_test.map(input_preprocess_test, num_parallel_calls=tf.data.AUTOTUNE)
267
ds_test = ds_test.batch(batch_size=BATCH_SIZE, drop_remainder=True)
268

269

270
"""
271
## Training a model from scratch
272

273
We build an EfficientNetB0 with 120 output classes, that is initialized from scratch:
274

275
Note: the accuracy will increase very slowly and may overfit.
276
"""
277

278
model = EfficientNetB0(
279
    include_top=True,
280
    weights=None,
281
    classes=NUM_CLASSES,
282
    input_shape=(IMG_SIZE, IMG_SIZE, 3),
283
)
284
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
285

286
model.summary()
287

288
epochs = 40  # @param {type: "slider", min:10, max:100}
289
hist = model.fit(ds_train, epochs=epochs, validation_data=ds_test)
290

291

292
"""
293
Training the model is relatively fast. This might make it sounds easy to simply train EfficientNet on any
294
dataset wanted from scratch. However, training EfficientNet on smaller datasets,
295
especially those with lower resolution like CIFAR-100, faces the significant challenge of
296
overfitting.
297

298
Hence training from scratch requires very careful choice of hyperparameters and is
299
difficult to find suitable regularization. It would also be much more demanding in resources.
300
Plotting the training and validation accuracy
301
makes it clear that validation accuracy stagnates at a low value.
302
"""
303

304
import matplotlib.pyplot as plt
305

306

307
def plot_hist(hist):
308
    plt.plot(hist.history["accuracy"])
309
    plt.plot(hist.history["val_accuracy"])
310
    plt.title("model accuracy")
311
    plt.ylabel("accuracy")
312
    plt.xlabel("epoch")
313
    plt.legend(["train", "validation"], loc="upper left")
314
    plt.show()
315

316

317
plot_hist(hist)
318

319
"""
320
## Transfer learning from pre-trained weights
321

322
Here we initialize the model with pre-trained ImageNet weights,
323
and we fine-tune it on our own dataset.
324
"""
325

326

327
def build_model(num_classes):
328
    inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
329
    model = EfficientNetB0(include_top=False, input_tensor=inputs, weights="imagenet")
330

331
    # Freeze the pretrained weights
332
    model.trainable = False
333

334
    # Rebuild top
335
    x = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
336
    x = layers.BatchNormalization()(x)
337

338
    top_dropout_rate = 0.2
339
    x = layers.Dropout(top_dropout_rate, name="top_dropout")(x)
340
    outputs = layers.Dense(num_classes, activation="softmax", name="pred")(x)
341

342
    # Compile
343
    model = keras.Model(inputs, outputs, name="EfficientNet")
344
    optimizer = keras.optimizers.Adam(learning_rate=1e-2)
345
    model.compile(
346
        optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
347
    )
348
    return model
349

350

351
"""
352
The first step to transfer learning is to freeze all layers and train only the top
353
layers. For this step, a relatively large learning rate (1e-2) can be used.
354
Note that validation accuracy and loss will usually be better than training
355
accuracy and loss. This is because the regularization is strong, which only
356
suppresses training-time metrics.
357

358
Note that the convergence may take up to 50 epochs depending on choice of learning rate.
359
If image augmentation layers were not
360
applied, the validation accuracy may only reach ~60%.
361
"""
362

363
model = build_model(num_classes=NUM_CLASSES)
364

365
epochs = 25  # @param {type: "slider", min:8, max:80}
366
hist = model.fit(ds_train, epochs=epochs, validation_data=ds_test)
367
plot_hist(hist)
368

369
"""
370
The second step is to unfreeze a number of layers and fit the model using smaller
371
learning rate. In this example we show unfreezing all layers, but depending on
372
specific dataset it may be desireble to only unfreeze a fraction of all layers.
373

374
When the feature extraction with
375
pretrained model works good enough, this step would give a very limited gain on
376
validation accuracy. In our case we only see a small improvement,
377
as ImageNet pretraining already exposed the model to a good amount of dogs.
378

379
On the other hand, when we use pretrained weights on a dataset that is more different
380
from ImageNet, this fine-tuning step can be crucial as the feature extractor also
381
needs to be adjusted by a considerable amount. Such a situation can be demonstrated
382
if choosing CIFAR-100 dataset instead, where fine-tuning boosts validation accuracy
383
by about 10% to pass 80% on `EfficientNetB0`.
384

385
A side note on freezing/unfreezing models: setting `trainable` of a `Model` will
386
simultaneously set all layers belonging to the `Model` to the same `trainable`
387
attribute. Each layer is trainable only if both the layer itself and the model
388
containing it are trainable. Hence when we need to partially freeze/unfreeze
389
a model, we need to make sure the `trainable` attribute of the model is set
390
to `True`.
391
"""
392

393

394
def unfreeze_model(model):
395
    # We unfreeze the top 20 layers while leaving BatchNorm layers frozen
396
    for layer in model.layers[-20:]:
397
        if not isinstance(layer, layers.BatchNormalization):
398
            layer.trainable = True
399

400
    optimizer = keras.optimizers.Adam(learning_rate=1e-5)
401
    model.compile(
402
        optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
403
    )
404

405

406
unfreeze_model(model)
407

408
epochs = 4  # @param {type: "slider", min:4, max:10}
409
hist = model.fit(ds_train, epochs=epochs, validation_data=ds_test)
410
plot_hist(hist)
411

412
"""
413
### Tips for fine tuning EfficientNet
414

415
On unfreezing layers:
416

417
- The `BatchNormalization` layers need to be kept frozen
418
([more details](https://keras.io/guides/transfer_learning/)).
419
If they are also turned to trainable, the
420
first epoch after unfreezing will significantly reduce accuracy.
421
- In some cases it may be beneficial to open up only a portion of layers instead of
422
unfreezing all. This will make fine tuning much faster when going to larger models like
423
B7.
424
- Each block needs to be all turned on or off. This is because the architecture includes
425
a shortcut from the first layer to the last layer for each block. Not respecting blocks
426
also significantly harms the final performance.
427

428
Some other tips for utilizing EfficientNet:
429

430
- Larger variants of EfficientNet do not guarantee improved performance, especially for
431
tasks with less data or fewer classes. In such a case, the larger variant of EfficientNet
432
chosen, the harder it is to tune hyperparameters.
433
- EMA (Exponential Moving Average) is very helpful in training EfficientNet from scratch,
434
but not so much for transfer learning.
435
- Do not use the RMSprop setup as in the original paper for transfer learning. The
436
momentum and learning rate are too high for transfer learning. It will easily corrupt the
437
pretrained weight and blow up the loss. A quick check is to see if loss (as categorical
438
cross entropy) is getting significantly larger than log(NUM_CLASSES) after the same
439
epoch. If so, the initial learning rate/momentum is too high.
440
- Smaller batch size benefit validation accuracy, possibly due to effectively providing
441
regularization.
442
"""
443

444
Product

Resources

Company