This competition was part of an academic project for the course Machine Learning Project (CS2008P) at the IITM BS in Data Science Program.
- A Kaggle Competition to build the most accurate models for predicting the total amount paid by travelers for taxi rides.
- Successfully predicted Taxi Fare with an R^2 value of 94.6%, ranking 63/714 participants.
- Performed Exploratory Data Analysis, handled missing values through Imputation.
- Developed a robust system to test various sklearn ML estimators, including in a convenient manner. Tested 8+ estimators, including Linear Regression, Decision Trees, Random Forests, ExTra Trees and more. Key statistics were also visualized.
This competition was interesting to me, as with the onset of on-demand taxi applications like Ola, Uber and the like, it’s not always obvious how fare is calculated based on a variety of both qualitative and quantitative factors. This was a valuable learning to me.
Additionally, Scikit-Learn, as a library is an indespensible tool for developers and analysts alike. However, while training single model is convenient and straight-forward, training a variety of different models with various pipelines, and hyper parameters is not so straight forward. In this project, I have developed custom functions to facilitate the training various models and also compare them and choose the best among them.