Mental Health Importance Classification
Predictive Modelling & Data Mining
Utilized 5 classification algorithms and 5 attribute selection methods to determine the best model for binary classification of whether employees of tech companies prioritise mental health as much as physical health.
Classification algorithms: k-NN, Naive Bayes, Random Forest, Support Vector Machines, Artificial Neural Networks
Attribute Selection Methods: Chi-square test, Lasso Regression, Decision Tree Induction, Forward Selection and Backwards Selection
Coding language: Python (pandas, numpy, sklearn, tensorflow, keras)
Code
Report
Uber vs. Lyft Prices
Statistics & Hypothesis Testing
Performed data cleaning, data transformation and hypothesis testing to determine whether Uber and Lyft prices differ, and if so, how much they differ. Developed easily digestible visuals and interpreted findings using statistical metrics in a manner that is suitable for technical and non-technical audiences.
Hypothesis testing methods: Two-sample t-test, ANCOVA
Interpretation methods: Confidence Intervals, Beta Estimates, R-Squared, Visualisations (Residuals Graph, Histogram, Boxplot, Scatterplot, Scale-Location, Normal Q-Q)
Coding language: R (tidyverse)
Code
Report
Image Resize
Computer Vision
Authored a computer vision algorithm in C++ that detects and removes unimportant pixels in images, and resizes
them without distinguishably distorting the images.
Coding language: C++
*Code cannot be shared due to university's honor code
Post Classification
Natural Language Processing (NLP)
Composed a program using NLP and machine learning techniques that read blog posts and determined their subjects, allowing for efficient classification of past and future posts.
Coding language: C++
*Code cannot be shared due to university's honor code
Data Science Job Salary Prediction
Predictive Modelling
Incorporated various machine learning algorithms and dimensionality reduction to predict data science job salaries based on different attributes. Performed data cleaning and data preprocessing for efficient prediction and more accurate outcomes.
Machine learning algorithms: Logistic Regression, Naive Bayes, Random Forest, K-Nearest Neighbors
Coding language: Python (pandas, numpy, sklearn, matplotlib)
Code
Report
Chicago Traffic & Accidents
Exploratory Data Analysis & Visualization
Conducted exploratory data analysis, data cleaning, and data transformation with the use of regression models to study the
correlation of different variables with traffic accidents, determining leading causes.
Techniques: Linear Regression, Hypothesis Testing, Time Series Analysis
Coding language: R (tidyverse, ggplot, lubridate, cowplot, janitor)
Code