Olivia Lee - Portfolio

Mental Health Importance Classification

Predictive Modelling & Data Mining

Utilized 5 classification algorithms and 5 attribute selection methods to determine the best model for binary classification of whether employees of tech companies prioritise mental health as much as physical health.

Classification algorithms: k-NN, Naive Bayes, Random Forest, Support Vector Machines, Artificial Neural Networks

Attribute Selection Methods: Chi-square test, Lasso Regression, Decision Tree Induction, Forward Selection and Backwards Selection

Coding language: Python (pandas, numpy, sklearn, tensorflow, keras)

Code Report

Uber vs. Lyft Prices

Statistics & Hypothesis Testing

Performed data cleaning, data transformation and hypothesis testing to determine whether Uber and Lyft prices differ, and if so, how much they differ. Developed easily digestible visuals and interpreted findings using statistical metrics in a manner that is suitable for technical and non-technical audiences.

Hypothesis testing methods: Two-sample t-test, ANCOVA

Interpretation methods: Confidence Intervals, Beta Estimates, R-Squared, Visualisations (Residuals Graph, Histogram, Boxplot, Scatterplot, Scale-Location, Normal Q-Q)

Coding language: R (tidyverse)

Code Report

Image Resize

Computer Vision

Authored a computer vision algorithm in C++ that detects and removes unimportant pixels in images, and resizes them without distinguishably distorting the images.

Coding language: C++

*Code cannot be shared due to university's honor code

Post Classification

Natural Language Processing (NLP)

Composed a program using NLP and machine learning techniques that read blog posts and determined their subjects, allowing for efficient classification of past and future posts.

Coding language: C++

*Code cannot be shared due to university's honor code

Data Science Job Salary Prediction

Predictive Modelling

Incorporated various machine learning algorithms and dimensionality reduction to predict data science job salaries based on different attributes. Performed data cleaning and data preprocessing for efficient prediction and more accurate outcomes.

Machine learning algorithms: Logistic Regression, Naive Bayes, Random Forest, K-Nearest Neighbors

Coding language: Python (pandas, numpy, sklearn, matplotlib)

Code Report

Chicago Traffic & Accidents

Exploratory Data Analysis & Visualization

Conducted exploratory data analysis, data cleaning, and data transformation with the use of regression models to study the correlation of different variables with traffic accidents, determining leading causes.

Techniques: Linear Regression, Hypothesis Testing, Time Series Analysis

Coding language: R (tidyverse, ggplot, lubridate, cowplot, janitor)

Code

Data Science Portfolio

Projects

Mental Health Importance Classification

Predictive Modelling & Data Mining

Uber vs. Lyft Prices

Statistics & Hypothesis Testing

Image Resize

Computer Vision

Post Classification

Natural Language Processing (NLP)

Data Science Job Salary Prediction

Predictive Modelling

Chicago Traffic & Accidents

Exploratory Data Analysis & Visualization

Let's get in touch!

Phone

Email