Inconsistent keras model.summary() output shapes on AWS SageMaker and EC2

0

I have the following model in a jupyter notebook:

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers



physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

SIZE = (549, 549)
SHUFFLE = False 
BATCH = 32
EPOCHS = 20

train_datagen =  DataGenerator(train_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE)
test_datagen =  DataGenerator(test_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE)


inp = layers.Input(shape=(*SIZE, 1))

x = layers.Conv2D(filters=549, kernel_size=(5,5), padding="same", activation="relu")(inp)
x = layers.BatchNormalization()(x)


x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)


x = layers.Conv2D(filters=549, kernel_size=(1, 1), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="sigmoid")(x)

model = Model(inp, x)

model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam())

model.summary()

Sagemaker and EC2 are running tensorflow 2.7.1. The EC2 instance is p3.2xlarge with Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20220607. The SageMaker notebook is using ml.p3.2xlarge and I am using the conda_tensorflow2_p38 kernel. The notebook is in an FSx Lustre file system that is mounted to both SageMaker and EC2 so it is definitely the same code running on both machines.

nvidia-smi output on SageMaker:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   37C    P0    24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvidia-smi output on EC2:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    51W / 300W |   2460MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11802      C   /bin/python3.8                    537MiB |
|    0   N/A  N/A     26391      C   python3.8                        1921MiB |
+-----------------------------------------------------------------------------+

The model.summary() output on SageMaker is:

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d (Conv2D)             (None, 549, 549, 1)       7535574   
                                                                 
 batch_normalization (BatchN  (None, 549, 549, 1)      4         
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 549, 549, 1)       2713158   
                                                                 
 batch_normalization_1 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_2 (Conv2D)           (None, 549, 549, 1)       301950    
                                                                 
 batch_normalization_2 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_3 (Conv2D)           (None, 549, 549, 1)       2713158   
                                                                 
=================================================================
Total params: 13,263,852
Trainable params: 13,263,846
Non-trainable params: 6

The model.summary() output on EC2 is (notice the shape change):


Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d (Conv2D)             (None, 549, 549, 549)     14274     
                                                                 
 batch_normalization (BatchN  (None, 549, 549, 549)    2196      
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 549, 549, 549)     2713158   
                                                                 
 batch_normalization_1 (Batc  (None, 549, 549, 549)    2196      
 hNormalization)                                                 
                                                                 
 conv2d_2 (Conv2D)           (None, 549, 549, 549)     301950    
                                                                 
 batch_normalization_2 (Batc  (None, 549, 549, 549)    2196      
 hNormalization)                                                 
                                                                 
 conv2d_3 (Conv2D)           (None, 549, 549, 549)     2713158   
                                                                 
=================================================================
Total params: 5,749,128
Trainable params: 5,745,834
Non-trainable params: 3,294
_________________________________________________________________

One other thing that is interesting, if I change my model on the EC2 instance to:

inp = layers.Input(shape=(*SIZE, 1))

x = layers.Conv2D(filters=1, kernel_size=(5,5), padding="same", activation="relu")(inp)
x = layers.BatchNormalization()(x)


x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)


x = layers.Conv2D(filters=1, kernel_size=(1, 1), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="sigmoid")(x)

model = Model(inp, x)

model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam())

My model.summary() output becomes:

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d_8 (Conv2D)           (None, 549, 549, 1)       26        
                                                                 
 batch_normalization_6 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_9 (Conv2D)           (None, 549, 549, 1)       10        
                                                                 
 batch_normalization_7 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_10 (Conv2D)          (None, 549, 549, 1)       2         
                                                                 
 batch_normalization_8 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_11 (Conv2D)          (None, 549, 549, 1)       10        
                                                                 
=================================================================
Total params: 60
Trainable params: 54
Non-trainable params: 6
_________________________________________________________________

In the last model the shape is similar to SageMaker but the trainable parameters are very low.

Any ideas as to why the output shape is different and why this is happening with the filters? When I run this model on my personal computer, the shape is the same as EC2. I think there might be an issue with SageMaker.

asked 2 years ago240 views
1 Answer
0

Hello, I am checking the versions you mentioned and in my notebook instance , using conda_tensorflow2_p38 I get Tensorflow version 2.5. Is it the same for you or have you upgraded it to tensorflow 2.7?: import tensorflow as tf print(tf.version) 2.5.0

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions