Data Science Portfolio

Hi, there! Welcome to a showcase of my data science, machine learning and statistics projects I've done. I am a graduate student with an expected graduation date in May 2023 and am open to data science and machine learning roles.

Combining my knowledge of mathematics from my undergraduate major with computer science and statistics has truly been an amazing turn in my career path. I am excited knowing that there is a field where I can apply my technical knowledge to create tangible difference in this world. Wherever I am, I strive to be at the forefront of innovation and be a part of a team that integrates the latest technologies and learns something new everyday. Data science and machine learning have introduced me to my passion, and I can't wait to dive deeper!

Projects

Mental Health Importance Classification

Predictive Modelling & Data Mining

Utilized 5 classification algorithms and 5 attribute selection methods to determine the best model for binary classification of whether employees of tech companies prioritise mental health as much as physical health.

Classification algorithms: k-NN, Naive Bayes, Random Forest, Support Vector Machines, Artificial Neural Networks

Attribute Selection Methods: Chi-square test, Lasso Regression, Decision Tree Induction, Forward Selection and Backwards Selection

Coding language: Python (pandas, numpy, sklearn, tensorflow, keras)


Code Report

Uber vs. Lyft Prices

Statistics & Hypothesis Testing

Performed data cleaning, data transformation and hypothesis testing to determine whether Uber and Lyft prices differ, and if so, how much they differ. Developed easily digestible visuals and interpreted findings using statistical metrics in a manner that is suitable for technical and non-technical audiences.

Hypothesis testing methods: Two-sample t-test, ANCOVA

Interpretation methods: Confidence Intervals, Beta Estimates, R-Squared, Visualisations (Residuals Graph, Histogram, Boxplot, Scatterplot, Scale-Location, Normal Q-Q)

Coding language: R (tidyverse)


Code Report

Image Resize

Computer Vision

Authored a computer vision algorithm in C++ that detects and removes unimportant pixels in images, and resizes them without distinguishably distorting the images.

Coding language: C++

*Code cannot be shared due to university's honor code

Post Classification

Natural Language Processing (NLP)

Composed a program using NLP and machine learning techniques that read blog posts and determined their subjects, allowing for efficient classification of past and future posts.

Coding language: C++

*Code cannot be shared due to university's honor code

Data Science Job Salary Prediction

Predictive Modelling

Incorporated various machine learning algorithms and dimensionality reduction to predict data science job salaries based on different attributes. Performed data cleaning and data preprocessing for efficient prediction and more accurate outcomes.

Machine learning algorithms: Logistic Regression, Naive Bayes, Random Forest, K-Nearest Neighbors

Coding language: Python (pandas, numpy, sklearn, matplotlib)


Code Report

Chicago Traffic & Accidents

Exploratory Data Analysis & Visualization

Conducted exploratory data analysis, data cleaning, and data transformation with the use of regression models to study the correlation of different variables with traffic accidents, determining leading causes.

Techniques: Linear Regression, Hypothesis Testing, Time Series Analysis

Coding language: R (tidyverse, ggplot, lubridate, cowplot, janitor)


Code

Let's get in touch!

As my current job search continues, I invite anyone to contact or connect with me! Please feel free to reach me at any of the following.

  • LinkedIn
  • Phone

    734-355-9533
  • Email

    oliviajoyilee@gmail.com