What is Kaggle?

Kaggle is a huge data science community where machine learning practitioners around the world compete against each other in solving prediction problems. The data sets used in Kaggle competitions are uploaded by public and private companies (e.g., Google, Facebook) as well as by government agencies (e.g., the U. S. Department of Homeland Security).

A “Kaggler” wins a competition if her algorithm is the most accurate on a particular data set. Winners receive financial reward, job offers, and recognition from the community.

Kaggle competitions are one of the best places to practice your ML skills and learn about state-of-the-art ML method. for instance, you can learn a lot about the tricks of the trade by reading interviews with past winners.

ml4econ Kaggle Competition

One of your tasks in this course will be a Kaggle competition. In this competition, you will rely on the “Boston Housing Data” to train and test machine learning models learned in the course. In particular, you will be required to apply the tools introduced in the course in order to predict the median house value based on a set of area specific housing market features. For more details, please visit the competition’s website.

Getting Started

  1. Visit www.kaggle.com and open an account.
  2. Go to the ml4econ competition webpage.
  3. Review competition details - Objectives, deadline, data, evaluation, submission rules, etc.

Basic Kaggle Competition Workflow

  1. Acquire domain knowledge.
  2. Explore the data.
  3. Preprocessing (standardization, dummies, interactions, etc.).
  4. Choose a model class (Lasso, trees, NN, ensembles, etc.).
  5. Tune complexity (Cross validation).
  6. Submit your prediction.

Important note: Examining and preprocessing the data with the help of domain knowledge (a.k.a. “feature engineering”") is probably one of the most important steps in applied ML.

How Does Kaggle Ranking Work?

  • MSE for the public test immediately available upon submission.
  • MSE for the private test available only once the competition closes.
  • The split between public and private test sets is arbitrary and unknown in advance.

The competition’s final ranking is based on how well individuals perform on the PRIVATE test set.

Resources


A website created by Itamar Caspi using RMarkdown.

Disclaimers: (1) The official syllabus and the content on the official Moodle website shall always prevail in case of any discrepancy or inconsistency between this website and its official HUJI versions; (2) This website and its content do not necessarily reflect the views of the Bank of Israel or any of its staff.