Back to projects
November 2023
2 min read

Taxi Fare Prediction

A extensive end-to-end machine learning pipeline to predict Taxi Fare using various qualitative and quantitative factors.

This competition was part of an academic project for the course Machine Learning Project (CS2008P) at the IITM BS in Data Science Program.

  • A Kaggle Competition to build the most accurate models for predicting the total amount paid by travelers for taxi rides.
  • Successfully predicted Taxi Fare with an R^2 value of 94.6%, ranking 63/714 participants.
  • Performed Exploratory Data Analysis, handled missing values through Imputation.
  • Developed a robust system to test various sklearn ML estimators, including in a convenient manner. Tested 8+ estimators, including Linear Regression, Decision Trees, Random Forests, ExTra Trees and more. Key statistics were also visualized.

This competition was interesting to me, as with the onset of on-demand taxi applications like Ola, Uber and the like, it’s not always obvious how fare is calculated based on a variety of both qualitative and quantitative factors. This was a valuable learning to me.

Additionally, Scikit-Learn, as a library is an indespensible tool for developers and analysts alike. However, while training single model is convenient and straight-forward, training a variety of different models with various pipelines, and hyper parameters is not so straight forward. In this project, I have developed custom functions to facilitate the training various models and also compare them and choose the best among them.