ML from scratch

This repository has implementations & analysis on popular machine learning methods.

Libraries:

Numpy, Keras, Tensorflow, PyTorch

This structure follows the assignments for course CS 480/680: Machine Learning @ the University of Waterloo.

My course mark: 100%.

Open this project in Github pages here or click the links below for easy viewing of Jupyter Notebooks.

Implementation of densely connected neural nets and convolution neural nets in Keras and Tensorflow.
Comparing RELU and sigmoid activation units in convergence, complexity, and in the context of vanishing gradients.
Comparing RMSProp, Adagrad, and Adam optimizers in terms of convergence and performance on the given dataset.
Analysis on various architectural choices like filter sizes, strides, and max-pooling layers.

Recurrent Neural Networks

Implementation of sequence to sequence encoder/decoder recurrent neural networks with & without attention in PyTorch
Analysis on linear, GRU, and LSTM units in RNN's in the context of preserving information in the hidden state

Action Recognition in Videos - Literature Review

I review the state of the art literature for action recognition in videos from pre-deep learning approaches in 2013 to modern deep learning techniques in 2019.
I cover the architecture and reasoning behind improved dense trajectories, 3D CNN's, two-stream and single-stream CNN's, and channel seperated networks.
I review the state of the art performance on UCF101, Sports1M, and Kinetics datasets.

Next Steps: Research Project Idea

Idea: Control the flow of residuals in residual gates during training by slowly closing gates to "cool" down early layers after they arrive at some form of convergence. I hope this could reduce redundancies learned by residual networks and allow later epochs to focus on training deeper parts of the network with minimal updates to low level features.

Background

I'd like to investigate tuning methods for deep residual networks. In the 2015 paper, Deep Residual Learning for Image Recognition by He et al., the Microsoft Group researchers introduce the concept of residual blocks to increase the depth of convolution neural networks to 152 layers while alleviating the vanishing gradient problem experienced by earlier architectures like VGG & AlexNet. The skip connections they used allowed residuals to skip blocks of layers and travel deeper back to earlier layers of the network and meaningfully change weights. "Highway Networks" are a closely related architecture that use parameteric gates that learn their own weights through gradient descent which control how much of the residual to let through, however, these did not achieve comparable performance to deep residual networks.

Explanation

I would like to investigate the potential use of residual gates to improve the learning process for deep residual networks. Fully open residual gates, like the ones in ResNet-152, allow residuals to travel freely through residual blocks throughout all epochs. However, we could expect that in early parts of the training process we would like earlier layers to have many weight updates in order to converge to some effective feature extracting convolution filters. It could that be desirable that in later training epochs, earlier layers should have less weight updates to allow the network to focus on training deeper layers to create hierarchal features on top of the early layer features. We can achieve this by slowly closing off the residual connections during training from the early layers on to the deeper layers. Optimally, this could allow us to reduce redundancies in the network where early layer features are relearned at later layers of the network because the input of a block, x, is always added to the output given to the next block, F(x). This training method introduces ways for us to bias deep residual networks towards hypotheses that build on low level features. In a way, this process is similar to simulated annealing optimization as the network can be considered "hot" when all the connections are open but then as we "cool" the network we expect subnetworks to converge to local optima that are like local constraint satisfaction problems induced by the architecture of the network.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
1. K Nearest Neighbours & Linear Regression		1. K Nearest Neighbours & Linear Regression
2. Logistic Regression & Mixture of Gaussians		2. Logistic Regression & Mixture of Gaussians
3. Non-Linear Regression		3. Non-Linear Regression
4. Convolutional Neural Networks		4. Convolutional Neural Networks
5. Recurrent Neural Networks		5. Recurrent Neural Networks
.gitignore		.gitignore
Action Recognition Literature Review.pdf		Action Recognition Literature Review.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML from scratch

Contents

Next Steps: Research Project Idea

About

Releases

Packages

Languages

sharma0611/MLfromscratch

Folders and files

Latest commit

History

Repository files navigation

ML from scratch

Contents

Next Steps: Research Project Idea

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages