Inconsistent keras model.summary() output shapes on AWS SageMaker and EC2

Question

I have the following model in a jupyter notebook:

```python
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

SIZE = (549, 549)
SHUFFLE = False 
BATCH = 32
EPOCHS = 20

train_datagen =  DataGenerator(train_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE)
test_datagen =  DataGenerator(test_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE)

inp = layers.Input(shape=(*SIZE, 1))

x = layers.Conv2D(filters=549, kernel_size=(5,5), padding="same", activation="relu")(inp)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=549, kernel_size=(1, 1), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="sigmoid")(x)

model = Model(inp, x)

model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam())

model.summary()
```
Sagemaker and EC2 are running tensorflow 2.7.1. The EC2 instance is p3.2xlarge with Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20220607. The SageMaker notebook is using ml.p3.2xlarge and I am using the conda_tensorflow2_p38 kernel. The notebook is in an FSx Lustre file system that is mounted to both SageMaker and EC2 so it is definitely the same code running on both machines.

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

nvidia-smi output on EC2:
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    51W / 300W |   2460MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11802      C   /bin/python3.8                    537MiB |
|    0   N/A  N/A     26391      C   python3.8                        1921MiB |
+-----------------------------------------------------------------------------+
```

The model.summary() output on SageMaker is:

```python
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d (Conv2D)             (None, 549, 549, 1)       7535574   
                                                                 
 batch_normalization (BatchN  (None, 549, 549, 1)      4         
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 549, 549, 1)       2713158   
                                                                 
 batch_normalization_1 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_2 (Conv2D)           (None, 549, 549, 1)       301950    
                                                                 
 batch_normalization_2 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_3 (Conv2D)           (None, 549, 549, 1)       2713158   
                                                                 
=================================================================
Total params: 13,263,852
Trainable params: 13,263,846
Non-trainable params: 6

```

The model.summary() output on EC2 is (notice the shape change):

```python

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d (Conv2D)             (None, 549, 549, 549)     14274     
                                                                 
 batch_normalization (BatchN  (None, 549, 549, 549)    2196      
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 549, 549, 549)     2713158   
                                                                 
 batch_normalization_1 (Batc  (None, 549, 549, 549)    2196      
 hNormalization)                                                 
                                                                 
 conv2d_2 (Conv2D)           (None, 549, 549, 549)     301950    
                                                                 
 batch_normalization_2 (Batc  (None, 549, 549, 549)    2196      
 hNormalization)                                                 
                                                                 
 conv2d_3 (Conv2D)           (None, 549, 549, 549)     2713158   
                                                                 
=================================================================
Total params: 5,749,128
Trainable params: 5,745,834
Non-trainable params: 3,294
_________________________________________________________________
```

One other thing that is interesting, if I change my model on the EC2 instance to:

```python
inp = layers.Input(shape=(*SIZE, 1))

x = layers.Conv2D(filters=1, kernel_size=(5,5), padding="same", activation="relu")(inp)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=1, kernel_size=(1, 1), padding="same", activation="relu")(x)
x = layers.BatchNormalization()(x)

x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="sigmoid")(x)

model = Model(inp, x)

model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam())
```

My model.summary() output becomes:

```python
Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 549, 549, 1)]     0         
                                                                 
 conv2d_8 (Conv2D)           (None, 549, 549, 1)       26        
                                                                 
 batch_normalization_6 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_9 (Conv2D)           (None, 549, 549, 1)       10        
                                                                 
 batch_normalization_7 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_10 (Conv2D)          (None, 549, 549, 1)       2         
                                                                 
 batch_normalization_8 (Batc  (None, 549, 549, 1)      4         
 hNormalization)                                                 
                                                                 
 conv2d_11 (Conv2D)          (None, 549, 549, 1)       10        
                                                                 
=================================================================
Total params: 60
Trainable params: 54
Non-trainable params: 6
_________________________________________________________________
```
In the last model the shape is similar to SageMaker but the trainable parameters are very low.

Any ideas as to why the output shape is different and why this is happening with the filters? When I run this model on my personal computer, the shape is the same as EC2. I think there might be an issue with SageMaker.

Answer

Hello,
I am checking the versions you mentioned and in my notebook instance , using conda_tensorflow2_p38 I get Tensorflow version 2.5. Is it the same for you or have you upgraded it to tensorflow 2.7?:
import tensorflow as tf
print(tf.__version__)
2.5.0

Inconsistent keras model.summary() output shapes on AWS SageMaker and EC2

Contenus pertinents