← Back to Projects

UPenn ENGR 1050 · December 2024

Cat/Dog Image Classifier

A PyTorch CNN trained on 24,000 images, a self-directed deep dive into convolutional neural networks that went well beyond course content.

Course

ENGR 1050 (Final Project)

Type

Individual project

Accuracy

97.8% validation

Why This Project?

This was the final project for an introductory engineering course where students could build anything related to class material. The most AI they'd covered was a single-layer perceptron. I wanted to go beyond, so I self-taught convolutional neural networks and built this classifier from scratch. I consulted resources to understand how CNNs work, particularly how they consider the spatial relationships between pixels, not just their individual values, making them far better suited for image processing than a simple perceptron.

Architecture

The CNN has 4 convolutional layers with increasing filter counts (64 → 128 → 256 → 512), each followed by BatchNorm, ReLU activation, and MaxPool. After flattening, three fully connected layers (1024 → 256 → 1) produce a binary classification with BCEWithLogitsLoss. Dropout (0.5) after the first FC layer helps prevent overfitting. Training used Adam optimizer (lr=0.0001) with a StepLR scheduler (step=10, gamma=0.1).

Conv2d(3→64) → BN → ReLU → Pool Conv2d(64→128) → BN → ReLU → Pool Conv2d(128→256) → BN → ReLU → Pool Conv2d(256→512) → BN → ReLU → Pool Flatten → FC(32768→1024) → Dropout(0.5) FC(1024→256) → FC(256→1) → Sigmoid

Input

128×128 RGB

Dataset

24,000 images

Split

80% train, 20% val

Epochs

150

Peak Accuracy

97.8%

Training Results

I initially planned around 50 epochs but decided to push it to 150 to see what would happen. The model converged rapidly. By epoch 15, validation accuracy was already above 97%. The remaining 135 epochs showed essentially no improvement, with train loss near zero and validation loss plateaued around 0.12, showing mild overfitting but excellent generalization.

Training and validation curves over 150 epochs
Training and validation curves over 150 epochs, convergence by epoch ~15, then diminishing returns

Real-World Testing

I tested the model on personal pet photos and internet images. When confident, it was very confident (100% on clear dog/cat photos). On ambiguous or out-of-distribution images like cartoon cats or meme images, the confidence dropped appropriately, showing the model had learned meaningful features rather than just memorizing the training set.

Example 1: dog classification
100% confident, dog
Example 2: cartoon cat classification
89.87%, cartoon cat (lower confidence)
Example 3: dog in blanket classification
100% confident, dog in blanket
Example 4: real cats classification
99.85%, real cats

This first exposure to machine learning helped me realize how much of a black box AI is, but conversely, how engineerable it is. The project showed that with the right architecture, data pipeline, and training setup, you can build something that genuinely works without fully understanding every mathematical detail of why it works. That tension between black-box behavior and practical engineering is what made ML click for me.