Part 2¶

Tutorial Part 2: Computer Vision with TransferLearning¶

Transfer Learning - storing knowledge gained while solving one problem and applying it to a different but related problem.

More information

TensorFlow has good selection of pre-trained models that can be imported right in TensorFlow model. It is called TensorFlow Hub. As a compliment to main framework Google has published also additional package called TensorFlow Datasets which has collection of the most popular datasets.

In this part of tutorial we will use The Standford Dogs dataset imported through TensorFlow datasets.

Check if gpu is available and TF version¶

In [82]:

            
                Copied!
                
import tensorflow as tf
tf.__version__
import tensorflow as tf
tf.__version__

Out[82]:

'2.9.2'

In [83]:

            
                Copied!
                
!nvidia-smi
!nvidia-smi

Mon Oct 24 16:53:07 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P0    30W /  70W |  14632MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Get data¶

In [84]:

            
                Copied!
                
import tensorflow_datasets as tfds
import tensorflow_datasets as tfds

In [85]:

            
                Copied!
                
                    
                    
                
                

        
(train_data, test_data), ds_info = tfds.load(name='stanford_dogs',
                                             split=['train', 'test'],
                                             shuffle_files=True,
                                             as_supervised=True,
                                             with_info=True,
                                             batch_size=32)
(train_data, test_data), ds_info = tfds.load(name='stanford_dogs',
                                             split=['train', 'test'],
                                             shuffle_files=True,
                                             as_supervised=True,
                                             with_info=True,
                                             batch_size=32)

In [86]:

            
                Copied!
                
ds_info
ds_info

Out[86]:

tfds.core.DatasetInfo(
    name='stanford_dogs',
    full_name='stanford_dogs/0.2.0',
    description="""
    The Stanford Dogs dataset contains images of 120 breeds of dogs from around
    the world. This dataset has been built using images and annotation from
    ImageNet for the task of fine-grained image categorization. There are
    20,580 images, out of which 12,000 are used for training and 8580 for
    testing. Class labels and bounding box annotations are provided
    for all the 12,000 images.
    """,
    homepage='http://vision.stanford.edu/aditya86/ImageNetDogs/main.html',
    data_path='~/tensorflow_datasets/stanford_dogs/0.2.0',
    file_format=tfrecord,
    download_size=778.12 MiB,
    dataset_size=744.72 MiB,
    features=FeaturesDict({
        'image': Image(shape=(None, None, 3), dtype=tf.uint8),
        'image/filename': Text(shape=(), dtype=tf.string),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=120),
        'objects': Sequence({
            'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
        }),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=8580, num_shards=4>,
        'train': <SplitInfo num_examples=12000, num_shards=4>,
    },
    citation="""@inproceedings{KhoslaYaoJayadevaprakashFeiFei_FGVC2011,
    author = "Aditya Khosla and Nityananda Jayadevaprakash and Bangpeng Yao and
              Li Fei-Fei",
    title = "Novel Dataset for Fine-Grained Image Categorization",
    booktitle = "First Workshop on Fine-Grained Visual Categorization,
                 IEEE Conference on Computer Vision and Pattern Recognition",
    year = "2011",
    month = "June",
    address = "Colorado Springs, CO",
    }
    @inproceedings{imagenet_cvpr09,
            AUTHOR = {Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and
                      Li, K. and Fei-Fei, L.},
            TITLE = {{ImageNet: A Large-Scale Hierarchical Image Database}},
            BOOKTITLE = {CVPR09},
            YEAR = {2009},
            BIBSOURCE = "http://www.image-net.org/papers/imagenet_cvpr09.bib"}""",
)

In [87]:

            
                Copied!
                
train_data, test_data
train_data, test_data

Out[87]:

(<PrefetchDataset element_spec=(TensorSpec(shape=(None, None, None, 3), dtype=tf.uint8, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>,
 <PrefetchDataset element_spec=(TensorSpec(shape=(None, None, None, 3), dtype=tf.uint8, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>)

Prepare data¶

Batch and prefetch¶

In [88]:

            
                Copied!
                
import tensorflow as tf
import tensorflow as tf

In [89]:

            
                Copied!
                
                    
                    
                
                

        
# train_data = train_data.batch(32).prefetch(tf.data.AUTOTUNE)
# test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)
# TFDS already batched dataset for us
train_data = train_data.map(lambda image, label: (tf.image.resize(image, (224, 224)), label)).prefetch(tf.data.AUTOTUNE)
test_data = test_data.map(lambda image, label: (tf.image.resize(image, (224, 224)), label)).prefetch(tf.data.AUTOTUNE)
len(train_data), len(test_data)
# train_data = train_data.batch(32).prefetch(tf.data.AUTOTUNE)
# test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)
# TFDS already batched dataset for us
train_data = train_data.map(lambda image, label: (tf.image.resize(image, (224, 224)), label)).prefetch(tf.data.AUTOTUNE)
test_data = test_data.map(lambda image, label: (tf.image.resize(image, (224, 224)), label)).prefetch(tf.data.AUTOTUNE)
len(train_data), len(test_data)

Out[89]:

(<PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>,
 <PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>)

Note:

Batch size - the number of samples that are passed to the network at once. Revisiting Small Batch Training for deep Neural Networks paper
Prefetching overlaps the preprocessing and model execution of a training step.
On the step s, the input pipeline is reading the data for step s+1.

tf.data.AUTOTUNE tunes value dynamically at runtime.

More information

Data Augmentation¶

Data Augmentation is important concept against overfitting problem.

Data Augmentation - a technique to increase the diversity of your training set by applying random (but realistic) transformations, such as image rotation.
More Information
Overfiting - concept in data science, which occurs when a statistical model fit exactly against its training data
More Information

In [90]:

            
                Copied!
                
                    
                    
                
                

        
# Build data augmentation layer
# Note: in TensorFlow models can be used as layers
from tensorflow.keras import layers

data_augmentation = tf.keras.models.Sequential([
    layers.RandomHeight(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomHeight
    layers.RandomWidth(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomWidth
    layers.RandomFlip(), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip
    layers.RandomZoom(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom
    layers.RandomRotation(0.2) # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation
], name='data_augmentation')
# Build data augmentation layer
# Note: in TensorFlow models can be used as layers
from tensorflow.keras import layers

data_augmentation = tf.keras.models.Sequential([
    layers.RandomHeight(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomHeight
    layers.RandomWidth(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomWidth
    layers.RandomFlip(), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip
    layers.RandomZoom(0.2), # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom
    layers.RandomRotation(0.2) # https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation
], name='data_augmentation')

Build model¶

Get pretrained model¶

Links:

In [91]:

            
                Copied!
                
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False # Freeze model's weigth
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False # Freeze model's weigth

Callbacks¶

ModelCheckpoint - saves model or model weights at some frequency.
More information
EarlyStopping - stops training when a monitored metric has stopped improving.
More information

In [92]:

            
                Copied!
                
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath='model_checkpoints/checkpoint.ckpt',
                                                         save_weights_only=True,
                                                         save_best_only=True,
                                                         save_freq='epoch',
                                                         verbose=1)
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath='model_checkpoints/checkpoint.ckpt',
                                                         save_weights_only=True,
                                                         save_best_only=True,
                                                         save_freq='epoch',
                                                         verbose=1)

In [93]:

            
                Copied!
                
early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                           verbose=1,
                                                           restore_best_weights=True,
                                                           patience=5)
early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                           verbose=1,
                                                           restore_best_weights=True,
                                                           patience=5)

Create a model with Functional API¶

In [94]:

            
                Copied!
                
inputs = tf.keras.layers.Input(shape=(224, 224, 3), name='input_layer')
x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D(name='global_average_pooling_layer')(x)
outputs = tf.keras.layers.Dense(120, activation='softmax', name='output_layer')(x)
inputs = tf.keras.layers.Input(shape=(224, 224, 3), name='input_layer')
x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D(name='global_average_pooling_layer')(x)
outputs = tf.keras.layers.Dense(120, activation='softmax', name='output_layer')(x)

In [95]:

            
                Copied!
                
model = tf.keras.models.Model(inputs, outputs, name='cv_model')
model.summary()
model = tf.keras.models.Model(inputs, outputs, name='cv_model')
model.summary()

Model: "cv_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 data_augmentation (Sequenti  (None, 224, 224, 3)      0         
 al)                                                             
                                                                 
 efficientnetb0 (Functional)  (None, None, None, 1280)  4049571  
                                                                 
 global_average_pooling_laye  (None, 1280)             0         
 r (GlobalAveragePooling2D)                                      
                                                                 
 output_layer (Dense)        (None, 120)               153720    
                                                                 
=================================================================
Total params: 4,203,291
Trainable params: 153,720
Non-trainable params: 4,049,571
_________________________________________________________________

In [96]:

            
                Copied!
                
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

In [97]:

            
                Copied!
                
                    
                    
                
                

        
model.fit(train_data,
          validation_data=test_data,
          validation_steps=int(0.2*len(test_data)),
          epochs=100,
          callbacks=[early_stopping_callback,
                     checkpoint_callback],
          verbose=1)
model.fit(train_data,
          validation_data=test_data,
          validation_steps=int(0.2*len(test_data)),
          epochs=100,
          callbacks=[early_stopping_callback,
                     checkpoint_callback],
          verbose=1)

Epoch 1/100
375/375 [==============================] - ETA: 0s - loss: 3.8025 - accuracy: 0.1755
Epoch 1: val_loss improved from inf to 2.24280, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 89s 221ms/step - loss: 3.8025 - accuracy: 0.1755 - val_loss: 2.2428 - val_accuracy: 0.4900
Epoch 2/100
375/375 [==============================] - ETA: 0s - loss: 3.0469 - accuracy: 0.3007
Epoch 2: val_loss improved from 2.24280 to 1.78028, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 78s 206ms/step - loss: 3.0469 - accuracy: 0.3007 - val_loss: 1.7803 - val_accuracy: 0.5590
Epoch 3/100
375/375 [==============================] - ETA: 0s - loss: 2.8219 - accuracy: 0.3380
Epoch 3: val_loss improved from 1.78028 to 1.69082, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 76s 201ms/step - loss: 2.8219 - accuracy: 0.3380 - val_loss: 1.6908 - val_accuracy: 0.5696
Epoch 4/100
375/375 [==============================] - ETA: 0s - loss: 2.6891 - accuracy: 0.3624
Epoch 4: val_loss improved from 1.69082 to 1.65986, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 76s 201ms/step - loss: 2.6891 - accuracy: 0.3624 - val_loss: 1.6599 - val_accuracy: 0.5625
Epoch 5/100
375/375 [==============================] - ETA: 0s - loss: 2.6073 - accuracy: 0.3811
Epoch 5: val_loss improved from 1.65986 to 1.56977, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 74s 196ms/step - loss: 2.6073 - accuracy: 0.3811 - val_loss: 1.5698 - val_accuracy: 0.5784
Epoch 6/100
375/375 [==============================] - ETA: 0s - loss: 2.5088 - accuracy: 0.4027
Epoch 6: val_loss did not improve from 1.56977
375/375 [==============================] - 70s 186ms/step - loss: 2.5088 - accuracy: 0.4027 - val_loss: 1.5869 - val_accuracy: 0.5749
Epoch 7/100
375/375 [==============================] - ETA: 0s - loss: 2.4902 - accuracy: 0.4025
Epoch 7: val_loss improved from 1.56977 to 1.55353, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 69s 184ms/step - loss: 2.4902 - accuracy: 0.4025 - val_loss: 1.5535 - val_accuracy: 0.5784
Epoch 8/100
375/375 [==============================] - ETA: 0s - loss: 2.3965 - accuracy: 0.4259
Epoch 8: val_loss improved from 1.55353 to 1.55162, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 69s 182ms/step - loss: 2.3965 - accuracy: 0.4259 - val_loss: 1.5516 - val_accuracy: 0.5831
Epoch 9/100
375/375 [==============================] - ETA: 0s - loss: 2.3380 - accuracy: 0.4368
Epoch 9: val_loss improved from 1.55162 to 1.50542, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 69s 184ms/step - loss: 2.3380 - accuracy: 0.4368 - val_loss: 1.5054 - val_accuracy: 0.6061
Epoch 10/100
375/375 [==============================] - ETA: 0s - loss: 2.4117 - accuracy: 0.4186
Epoch 10: val_loss did not improve from 1.50542
375/375 [==============================] - 67s 179ms/step - loss: 2.4117 - accuracy: 0.4186 - val_loss: 1.5490 - val_accuracy: 0.5831
Epoch 11/100
375/375 [==============================] - ETA: 0s - loss: 2.2914 - accuracy: 0.4478
Epoch 11: val_loss did not improve from 1.50542
375/375 [==============================] - 65s 173ms/step - loss: 2.2914 - accuracy: 0.4478 - val_loss: 1.5647 - val_accuracy: 0.5896
Epoch 12/100
375/375 [==============================] - ETA: 0s - loss: 2.2642 - accuracy: 0.4505
Epoch 12: val_loss did not improve from 1.50542
375/375 [==============================] - 64s 171ms/step - loss: 2.2642 - accuracy: 0.4505 - val_loss: 1.5737 - val_accuracy: 0.5837
Epoch 13/100
375/375 [==============================] - ETA: 0s - loss: 2.2836 - accuracy: 0.4470
Epoch 13: val_loss did not improve from 1.50542
375/375 [==============================] - 65s 172ms/step - loss: 2.2836 - accuracy: 0.4470 - val_loss: 1.5595 - val_accuracy: 0.5825
Epoch 14/100
375/375 [==============================] - ETA: 0s - loss: 2.2200 - accuracy: 0.4554
Epoch 14: val_loss improved from 1.50542 to 1.49176, saving model to model_checkpoints/checkpoint.ckpt
375/375 [==============================] - 63s 168ms/step - loss: 2.2200 - accuracy: 0.4554 - val_loss: 1.4918 - val_accuracy: 0.6002
Epoch 15/100
375/375 [==============================] - ETA: 0s - loss: 2.2305 - accuracy: 0.4529
Epoch 15: val_loss did not improve from 1.49176
375/375 [==============================] - 64s 170ms/step - loss: 2.2305 - accuracy: 0.4529 - val_loss: 1.5279 - val_accuracy: 0.5973
Epoch 16/100
375/375 [==============================] - ETA: 0s - loss: 2.2025 - accuracy: 0.4647
Epoch 16: val_loss did not improve from 1.49176
375/375 [==============================] - 64s 171ms/step - loss: 2.2025 - accuracy: 0.4647 - val_loss: 1.5671 - val_accuracy: 0.5879
Epoch 17/100
375/375 [==============================] - ETA: 0s - loss: 2.2152 - accuracy: 0.4616
Epoch 17: val_loss did not improve from 1.49176
375/375 [==============================] - 64s 170ms/step - loss: 2.2152 - accuracy: 0.4616 - val_loss: 1.4943 - val_accuracy: 0.6008
Epoch 18/100
375/375 [==============================] - ETA: 0s - loss: 2.1761 - accuracy: 0.4733
Epoch 18: val_loss did not improve from 1.49176
375/375 [==============================] - 62s 166ms/step - loss: 2.1761 - accuracy: 0.4733 - val_loss: 1.5502 - val_accuracy: 0.5949
Epoch 19/100
375/375 [==============================] - ETA: 0s - loss: 2.1318 - accuracy: 0.4801Restoring model weights from the end of the best epoch: 14.

Epoch 19: val_loss did not improve from 1.49176
375/375 [==============================] - 63s 169ms/step - loss: 2.1318 - accuracy: 0.4801 - val_loss: 1.5312 - val_accuracy: 0.5973
Epoch 19: early stopping

Out[97]:

<keras.callbacks.History at 0x7fb200340310>

In [100]:

            
                Copied!
                
model.evaluate(test_data) # Not the best practice, validation dataset should be used here
model.evaluate(test_data) # Not the best practice, validation dataset should be used here

269/269 [==============================] - 27s 102ms/step - loss: 1.5423 - accuracy: 0.5885

Out[100]:

[1.5422791242599487, 0.5884615182876587]

Different models with Stanfor Dogs Dataset

How we can improve our model?¶

Get more data
Try different architecture (more complex or simpler)
Train for Longer

So if you want reach good results - experiment, experiment, experiment! Good luck!

More model examples:¶

One more example of Computer Vision classifier can be found here with 99.27% accuracy, AlexNet (Convolutional Neural Network (CNN)).

Creating image tf.data.Dataset from folders with images can be found there as well. More information.