%%bash
git clone https://github.com/ahaberlie/SVRIMG.git > /dev/null
cd SVRIMG && pip install . > /dev/null
pip install cartopy > /dev/null
pip install scikit-learn > /dev/null
pip install tensorflow > /dev/null

Cloning into 'SVRIMG'...

Exam 2 - Predicting storm mode using a convolutional neural network¶

Make sure when you connect to a Colab instance that you select “T4” to get a GPU.

Connect drop down arrow -> Change Runtime Type -> T4 GPU -> Save

If you do not do this, your training below may be slow.

Due May 6th at 11:59 p.m.¶

You must not make any edits to your notebook after the due date. If edits are made or the link is submitted after the due date, there is a 10% penalty each day it is late.
Rename the notebook in the following pattern: Exam_2_First_Last.ipynb. You must rename it on triton before you save and download. There is a 5% penalty if you do not rename it.

Unique exam rules:¶

You must use the same version of the notebook from when you upload to colab to when you submit the link on Blackboard.
I am going to apply a 50% penalty if there are not at least 3 separate unique dates (in central time) with evidence of substantial work in your notebook history.
Yes, this means you must start working on the Exam, at the latest, 3 days before the due date. No, changing the name does not count as an edit. I need to see legitimate, substantial work on 3 different days, including new/edited code, written answers, etc.

Examples of acceptable (✅) or unacceptable (❌) notebook history:

I see notebook edits only on 03/12/2026, 03/16/2026, and 03/20/2026. ✅
I see notebook edits only on 03/16/2026 and 03/17/2026. ❌
I see notebook edits only on 03/20/2026 ❌

NOTE: You can work on any 3 or more days before the due date, these are just example dates

When you are ready to submit:¶

File -> Save Notebook (do this often..)
Click Share (upper right)
Change general access to “Anyone with the link”
Share the notebook with ahaberlie1@gmail.com
Click “Copy Link” and then “Done”
Add a comment to this title cell by clicking the three dots in the upper right corner of this cell and clicking on “Add a Comment”. Type in “I have completed the presubmission steps on MM/DD/YYYY.” and click on “comment” to save the comment.
Submit the link on Blackboard under the Exam 2 submission link.

There is a 5% penalty for each email I have to send you to remind you to follow the steps above. Here are some tips if you are concerned about falsely being accused of missing steps:

Take a screenshot (with a clearly visible clock) of the changed settings on colab. You do not have to send this to me, just save it on your computer to show me in case you lose points.
Paste the copied URL into a browser in “Incognito Mode” (in other words, without your colab login information). If you are able to open the notebook, you at least have allowed “anyone” to view it.
Send me an email / check before the due date so that I can attempt to view and edit your notebook.

Exam 2 Description:¶

Your job is to use the storm mode dataset provided to you below to predict the y_ data values.

You must complete the exam on your own without a partner. If I see evidence of sharing code, all parties will receive a zero. Be prepared to explain your code in my office if I suspect extensive usage of ChatGPT or other generative AI tools.

There are 5 parts to this exam, each part should include code that comprehensively supports your findings, and at least one “markdown” cell describing what you are seeing, why you are making decisions, or anything else you would like me to know.

Please enter your zid in the code below to set the random_state. If you do not change this value to your ZID (without the ‘z’ or ‘a’), it is a 10% penalty. This will make your results slightly different from other students, which is expected and done on purpose.

By changing this to your ZID, you are agreeing to complete this exam in your own words, without assistance from genAI. Extensive / obvious usage of genAI will result in a 50% reduction in points for that answer. Your notebook history will be reviewed for suspicious edits. Do not import numpy again after running this cell.

import numpy as np

zid = 9999999

Here is the code that reads in the csv file (exam1.csv). Make sure the csv file is in the same folder as your notebook. If it is not, rerun the first cell in this notebook to download the file.

Make sure that you run the code above to set your “random seed” before continuing with the exam.

Do not change the code in the following cell block that reads in the csv file. Doing so will result in a 0.

It should look exactly like this when you are done:

from svrimg.utils.get_images import get_example_data

(x_train, y_train) = get_example_data('training', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_val, y_val) = get_example_data('validation', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_test, y_test) = get_example_data('testing', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")

print("Size of x_train", x_train.shape)
print("Size of x_val", x_val.shape)
print("Size of x_test", x_test.shape)

from svrimg.utils.get_images import get_example_data

(x_train, y_train) = get_example_data('training', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_val, y_val) = get_example_data('validation', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_test, y_test) = get_example_data('testing', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")

print("Size of x_train", x_train.shape)
print("Size of x_val", x_val.shape)
print("Size of x_test", x_test.shape)

Size of x_train (1331, 136, 136, 1)
Size of x_val (110, 136, 136, 1)
Size of x_test (300, 136, 136, 1)

Test to see if GPU is available¶

You should see something like:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

If not, make sure you follow the directions above to connect to a T4 GPU instance.

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices(device_type="GPU")

gpus

1. Description (10 pts)¶

Show me examples and the statistics of the images in the training data as we did in class. Think of what is important to know about image data.

Rubric:

The student comprehensively described the relevant dataset features using high quality figures, tables, or other relevant data (10 pts)
The student adequately described the relevant dataset features using figures, tables, or other relevant data (7 pts)
The student adequately described the dataset using relevant data (5 pts)
The student provided a low-quality description of the dataset using relevant data (3 pts)
The student did not describe the dataset (0 pts)

DESCRIPTION (write your description of the dataset here):

2. Preprocessing (10 pts):¶

Get the image data ready for training. Think of what steps we discussed / demonstrated in class.

Rubric:

The student correctly normalized the data and resized the data as described in class (10 pts)
The student incorrectly normalized the data and resized the data as described in class (5 pts)
The student made no attempt to normalize the data and/or did not resize the data as described in class (0 pts)

3. Training an image classifier (10 pts)¶

Train a the following convolutional neural network on your data using the correct subset of X and Y data as we demonstrated in class.

Rubric:

The student used the correct subset for training and tried at least 3 different learning rates, assessed their generalizability using the correct subset, and selected the best model configuration based on those results (10 pts)
The student used the correct subset for training and tried at least 3 different learning rates, assessed their generalizability using the incorrect subset, and selected the best model configuration based on those results (7 pts)
The student used the incorrect subset for training and tried at least 2 different learning rates, assessed their generalizability using the incorrect subset, and selected the best model configuration based on those results (5 pts)
The student used the incorrect subset for training 1 learning rate, assessed its generalizability using the incorrect subset, and selected the best model configuration with no evidence (3 pts)
The student did not use the correct subsets for training and selected the best model configuration with no evidence (1 pts)
The student did not train a classifier (0 pts)

Problem 3.a: Change the input shape to be the same size as one image in the dataset.

from tensorflow.keras import layers

input_shape = (1, 1, 1)

exam_model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3)),
        layers.MaxPooling2D(pool_size=(3, 3)),
        layers.Flatten(),
        layers.Dense(128, activation="relu"),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

Problem 3.b: Train the model as we did in class using 10 epochs for each model selection test.

JUSTIFICATION (write why/how you chose the model configuration):

4. Assessing the CNN classifier (10 pts):¶

Demonstrate the ability to assess how well the model predicts the labels using approaches we discussed in class.

HINT: Previous chapters go into detail about what performance metrics are needed and how to visualize those metrics.

Rubric:

The student used the correct subset for testing the generalizability of the classifier, calculated comprehensive performance metrics and a confusion matrix, assessed overfitting/underfitting, and summarized and interpreted the results (10 pts)
The student used the correct subset for testing the generalizability of the classifier, calculated some performance metrics, and summarized and interpreted the results (7 pts)
The student used the correct subset for testing the generalizability of the classifier, summarized and interpreted the results without any evidence (5 pts)
The student used the incorrect subset for testing the generalizability of the classifier, summarized and interpreted the results without any evidence (3 pts)
The made no attempt to assess the classifier (0 pts)

ASSESSMENT (write your analysis here):

5. Summary (10 pts)¶

Summarize the model workflow from start to finish, with a focus on how someone might use the model and what pitfalls / caveats / issues they may experience if using the model on a similar dataset.

The student provided a detailed assessment of the model, including where it did or did not perform well, what extra data may be needed to improve predictions, and explained why the model was producing the results it did (10 pts)
The student provided a detailed assessment of the model, including where it did or did not perform well, what extra data may be needed to improve predictions (7 pts)
The student provided a detailed assessment of the model, including where it did or did not perform well (5 pts)
The student provided a basic assessment of the model (3 pts).
The student did not attempt to summarize their findings (0 pts)

SUMMARY (write your summary here):