%%bash
git clone https://github.com/ahaberlie/SVRIMG.git > /dev/null
cd SVRIMG && pip install . > /dev/null
pip install cartopy > /dev/null
pip install scikit-learn > /dev/null

Cloning into 'SVRIMG'...

from svrimg.utils.get_images import get_example_data

(x_train, y_train) = get_example_data('training', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_val, y_val) = get_example_data('validation', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")
(x_test, y_test) = get_example_data('testing', data_dir=".", url="https://nimbus.niu.edu/svrimg/data/classifications/")

L11 - Feature Extraction¶

Directions:

Please rename the file by clicking on “LX-First-Last.ipynb” where X is the lab number, and replace First and Last with your first and last name.
Click File -> Save to make sure your most recent edits are saved.
In the upper right hand corner of the screen, click on “Share”. Click on “Restricted” and change it to “Anyone with the link”. Make sure you also share it with ahaberlie1@gmail.com.
Copy the link and submit it on Blackboard. Make sure you follow these steps completely, or I will be unable to grade your work.

Overview¶

This lab will help you understand scikit-image and its basic methods. We will walk through some examples of how scikit-image can help solve Geoscience problems. Periodically, I will 1) ask you to either repeat the demonstrated code in a slightly different way; or 2) ask you to combine two or more techniques to solve a problem.

You can use generative AI to help answer these problems. The answer should still be in your own words. Think of the generative AI descriptions as those from a book. You still have to cite your source and you cannot plagiarize directly from the source. For every question that you used generative AI for help, please reference the generative AI you used and what your prompt or prompts were.

However, it is crucial that you understand the code well enough to effectively use generative AI tools that are likely to be widely available and recommended for use at many organizations. Although they are improving at an incredible rate, they still produce bugs, especially with domain-specific and complex problems. Make sure that you verify the answers before putting them in your own words.

scikit-image¶

This package, sometimes called “skimage”, provides an interface to many methods for digital image processing. We will work with both RGB (3, y, x) and grayscale images (y, x).

Problem 1 (4 pts)¶

Create and alternate approach by extracting features from the images as we have done in previous classes.

For each subset, calculate the following statistics for each sample

Mean of pixel intensities (example given to you below)
Maximum pixel intensity (example given to you below)
Minimum pixel intensity
Count of pixels above 40 dBZ
Count of pixels above 50 dBZ

Combine these into a new training set. The np.stack example below only has the mean and max results. You need to add the others. The sizes should be:

New train shape =  (1331, 5)
New val shape =  (110, 5)
New test shape =  (300, 5)

import numpy as np

mean_train = np.mean(x_train, axis=(1,2)).squeeze()
mean_val = np.mean(x_val, axis=(1,2)).squeeze()
mean_test = np.mean(x_test, axis=(1,2)).squeeze()

max_train = np.max(x_train, axis=(1,2)).squeeze()
max_val = np.max(x_val, axis=(1,2)).squeeze()
max_test = np.max(x_test, axis=(1,2)).squeeze()

### your code below


# mean


# 40 dbz count


# 50 dbz count



### your code above

## change these below once you add the requested features
x_my_train = np.stack((mean_train, max_train), axis=1)
x_my_val = np.stack((mean_val, max_val), axis=1)
x_my_test = np.stack((mean_test, max_test), axis=1)

print("New train shape = ", x_my_train.shape)
print("New val shape = ", x_my_val.shape)
print("New test shape = ", x_my_test.shape)

New train shape =  (1331, 2)
New val shape =  (110, 2)
New test shape =  (300, 2)

Problem 2 (2 pt)¶

Use the new values to train a Random Forest model that works with the code below ## Your code above.

Answer in this markdown: How did the model do? Assess the performance.

from sklearn.ensemble import RandomForestClassifier

## Your code below








## Your code above

# Predict the labels for the test set
y_pred = rf_classifier.predict(x_my_test)

# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")

# Print the classification report
print(classification_report(y_test, y_pred))

importances = rf_classifier.feature_importances_
feature_importances = pd.DataFrame({'Feature': range(x_my_train.shape[1]), 'Importance': importances})
feature_importances = feature_importances.sort_values('Importance', ascending=False)
display(feature_importances)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_1983/3350439506.py in <cell line: 0>()
     13 
     14 # Predict the labels for the test set
---> 15 y_pred = rf_classifier.predict(x_my_test)
     16 
     17 # Create the confusion matrix

NameError: name 'rf_classifier' is not defined

Problem 3 (4 pts)¶

Use the template above to

In addition to the 5 features you extracted in Problem 1, add the following 3 features to your training, validation, and testing data:
- sum of edge detection response
- sum of corner detection response
- sum of ridge detection response

The new feature sizes should be:

New train shape =  (1331, 8)
New val shape =  (110, 8)
New test shape =  (300, 8)

Generate a new random forest with those new features
Create assessment products (like above).
Answer in the markdown: Assess the final model. Did it do better or worse? How do you think you could further improve the model?

ANSWER: