require(knitr)
require(ggplot2)
read_chunk("ex2/ex2_chunks.R")

1 Multi-class Classification

1.1 Dataset

ex3data1 <- cbind(1, as.matrix(read.csv("data/ex3data1.csv")))
initial_theta <- rep(0, times = ncol(ex3data1) - 1)

1.2 Visualizing the data

The script to visualize the greyscale digits was included in the resources and not actually a part of the assignment. I skipped it for now, but may revisit in the future. I should be able to modify part of exercise 7 to print the handwritten digits.

1.3 Vectorizing Logistic Regression

1.3.1 Vectorizing the cost function

I already implemented a vectorized vesion for exercise 2. I make use of ex2_chunks.R which contains all of the functions written for the previous exercise.

1.3.2 Vectorizing the gradient

1.3.3 Vectorizing regularized logistic regression

sig <- function(x){1 / (1 + exp(-x))}
h <- function(theta, x){
    # matrix multiplication is pairwise multiplication, then summed
    sig(sum(theta * x))
}
costFunction <- function(M, theta, lambda = 0){
    m <- nrow(M)
    X <- M[, 1:(ncol(M) - 1)]
    y <- M[, ncol(M)]

    J <- - (1 / m) * crossprod(c(y, 1 - y),
                               c(log(sig(X %*% theta)), log(1 - sig(X%*% theta)))) +
        (lambda / (2 * m)) * sum(theta ^ 2)

    grad <- (1 / m) * crossprod(X, sig(X %*% theta) - y) +
        (lambda / m) * theta
    list(J = as.vector(J), grad = as.vector(grad))
}

1.4 One-vs-all Classification

For one-vs-all classification, we just loop through each of the K classes and run logistic regression like we did in the previous exercise

thetas <- data.frame()

for(i in 1:10){
    Mi <- cbind(ex3data1[, 1:401], ex3data1[, 402] == i)
    thetai <- optim(par = initial_theta,
                       fn = function(x){costFunction(Mi, x)$J},
                       gr = function(x){costFunction(Mi, x)$grad},
                       method = "BFGS", control = list(maxit = 400))
    thetas <- rbind(thetas, thetai$par)
}

1.4.1 One-vs-all Prediction

The predicted class is just the class with the highest assigned probability

ex3pred1 <- apply(ex3data1, 1, FUN = function(x){
    which.max(as.vector(apply(thetas, 1, FUN = function(y){
        h(y, x[1:401])
    })))
})

sum(ex3data1[, 402] == ex3pred1) / nrow(ex3data1)
## [1] 0.9698

This is higher than the expected accuracy of 94.9%, although I’m not sure why.

2 Neural Networks

This is just a quick, non-generalized implementation of forward propagation. A more generalized version is implemented in the next exercise

Theta1 <- as.matrix(read.csv("data/ex3weights_Theta1.csv"))
Theta2 <- as.matrix(read.csv("data/ex3weights_Theta2.csv"))

z2 <- Theta1 %*% t(ex3data1[, 1:401])
a2 <- sig(z2)
a2 <- rbind(1, a2)

z3 <- Theta2 %*% a2
a3 <- sig(z3)

ex3pred2 <- apply(a3, 2, which.max)
sum(ex3data1[, 402] == ex3pred2) / nrow(ex3data1)
## [1] 0.9752

The expected accuracy is 97.5%.