Evaluating Sentiments in Text

Description

In this project, I tackled the challenge of using an imbalanced data set of text from Reddit comments. Here, I used 3 categories of sentiment: positive, neutral, and negative. I performed statistical analysis, including measures of accuracy, f1-score, AUROC, precision, sensitivity, and specificity. I used this information to inform my hyperparameter selection for a linear-kernel support vector machine (SVM). For multiple combinations of values, I included penalty terms with L1-loss and hinge loss. Additionally, I implemented a quadratic-kernel SVM, using grid search and random search to optimize the parameters.

My optimized machine learning algorithm performed with an accuracy of 0.728 on a withheld dataset on categorizes sentiments into 1 of 3 possibilities!

GitHub Repository

If you’d like to see the code, please let me know! I’m unable to make the repository fully public as this project has material from my coursework from my machine learning course EECS 445 at the University of Michigan. The github can be found here.