Inconsistent keras model.summary() output shapes on AWS SageMaker and EC2
I have the following model in a jupyter notebook:
import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.optimizers import Adam from tensorflow.keras import layers physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0], True) SIZE = (549, 549) SHUFFLE = False BATCH = 32 EPOCHS = 20 train_datagen = DataGenerator(train_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE) test_datagen = DataGenerator(test_files, batch_size=BATCH, dim=SIZE, n_channels=1, shuffle=SHUFFLE) inp = layers.Input(shape=(*SIZE, 1)) x = layers.Conv2D(filters=549, kernel_size=(5,5), padding="same", activation="relu")(inp) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="relu")(x) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=549, kernel_size=(1, 1), padding="same", activation="relu")(x) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=549, kernel_size=(3, 3), padding="same", activation="sigmoid")(x) model = Model(inp, x) model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam()) model.summary()
Sagemaker and EC2 are running tensorflow 2.7.1. The EC2 instance is p3.2xlarge with Deep Learning AMI GPU TensorFlow 2.7.0 (Amazon Linux 2) 20220607. The SageMaker notebook is using ml.p3.2xlarge and I am using the conda_tensorflow2_p38 kernel. The notebook is in an FSx Lustre file system that is mounted to both SageMaker and EC2 so it is definitely the same code running on both machines.
nvidia-smi output on SageMaker:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 37C P0 24W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvidia-smi output on EC2:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 51W / 300W | 2460MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11802 C /bin/python3.8 537MiB |
| 0 N/A N/A 26391 C python3.8 1921MiB |
+-----------------------------------------------------------------------------+
The model.summary() output on SageMaker is:
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 549, 549, 1)] 0 conv2d (Conv2D) (None, 549, 549, 1) 7535574 batch_normalization (BatchN (None, 549, 549, 1) 4 ormalization) conv2d_1 (Conv2D) (None, 549, 549, 1) 2713158 batch_normalization_1 (Batc (None, 549, 549, 1) 4 hNormalization) conv2d_2 (Conv2D) (None, 549, 549, 1) 301950 batch_normalization_2 (Batc (None, 549, 549, 1) 4 hNormalization) conv2d_3 (Conv2D) (None, 549, 549, 1) 2713158 ================================================================= Total params: 13,263,852 Trainable params: 13,263,846 Non-trainable params: 6
The model.summary() output on EC2 is (notice the shape change):
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 549, 549, 1)] 0 conv2d (Conv2D) (None, 549, 549, 549) 14274 batch_normalization (BatchN (None, 549, 549, 549) 2196 ormalization) conv2d_1 (Conv2D) (None, 549, 549, 549) 2713158 batch_normalization_1 (Batc (None, 549, 549, 549) 2196 hNormalization) conv2d_2 (Conv2D) (None, 549, 549, 549) 301950 batch_normalization_2 (Batc (None, 549, 549, 549) 2196 hNormalization) conv2d_3 (Conv2D) (None, 549, 549, 549) 2713158 ================================================================= Total params: 5,749,128 Trainable params: 5,745,834 Non-trainable params: 3,294 _________________________________________________________________
One other thing that is interesting, if I change my model on the EC2 instance to:
inp = layers.Input(shape=(*SIZE, 1)) x = layers.Conv2D(filters=1, kernel_size=(5,5), padding="same", activation="relu")(inp) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="relu")(x) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=1, kernel_size=(1, 1), padding="same", activation="relu")(x) x = layers.BatchNormalization()(x) x = layers.Conv2D(filters=1, kernel_size=(3, 3), padding="same", activation="sigmoid")(x) model = Model(inp, x) model.compile(loss=tf.keras.losses.binary_crossentropy, optimizer=Adam())
My model.summary() output becomes:
Model: "model_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_3 (InputLayer) [(None, 549, 549, 1)] 0 conv2d_8 (Conv2D) (None, 549, 549, 1) 26 batch_normalization_6 (Batc (None, 549, 549, 1) 4 hNormalization) conv2d_9 (Conv2D) (None, 549, 549, 1) 10 batch_normalization_7 (Batc (None, 549, 549, 1) 4 hNormalization) conv2d_10 (Conv2D) (None, 549, 549, 1) 2 batch_normalization_8 (Batc (None, 549, 549, 1) 4 hNormalization) conv2d_11 (Conv2D) (None, 549, 549, 1) 10 ================================================================= Total params: 60 Trainable params: 54 Non-trainable params: 6 _________________________________________________________________
In the last model the shape is similar to SageMaker but the trainable parameters are very low.
Any ideas as to why the output shape is different and why this is happening with the filters? When I run this model on my personal computer, the shape is the same as EC2. I think there might be an issue with SageMaker.
- Le plus récent
- Le plus de votes
- La plupart des commentaires
Hello, I am checking the versions you mentioned and in my notebook instance , using conda_tensorflow2_p38 I get Tensorflow version 2.5. Is it the same for you or have you upgraded it to tensorflow 2.7?: import tensorflow as tf print(tf.version) 2.5.0
Contenus pertinents
- demandé il y a 8 moislg...
- demandé il y a un anlg...
- demandé il y a un anlg...
- demandé il y a 5 moislg...
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 9 mois
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 2 ans