Kaggle is a data science community that offers tools and training. Kaggle is a great way to get more experience with machine learning. The website hosts competitions, provides training, and more. A lot of employers ask applicants to do competitions in Kaggle.
I have created several Kaggle competitions for our available class. You can use it to learn machine learning and also have a friendly compitition with other students in our class. Don't worry if your solution doesn't get a high ranking. The goal is to learn about machine learning. Of course if you thrive on competition, then feel free to try to win every competition.
Overview
- Some vocabulary: the training data set has a response and a set of predictor variables for each observation. The test data set only has the predictor variables.
- I have posted several problems and data for you to model on Kaggle. You will develop a model to predict the responses using your model. When you think you have a good fit, use the test data set to produce predicted responses for the test data set. You will upload your predictions for the test data set. Kaggle has the true response values for the test data. Kaggle will compute the cost (loss) function and will list your ranking on the leaderboard.
- There will be two rankings: one on the public data set and one on the private data set.
The test data have 2 parts - a public subset of the test data set and a private subset. You wont know which is which. One subset produces the public ranking. Another produces a private ranking.
Your next steps
- Log into Kaggle. Your first time there you will have to create a Kaggle account.
- Go to one of our Kaggle InClass competitions (see below)
- Download the data. Alternatively, you can work in Kaggle with their notebooks, but it is probably easier to start with an analysis on your own computer.
- Develop a method to predict the data.
- Upload your answer in Kaggle and see your standing on the leaderboard. Click the "Submit Predictions" button.
- Improve your model to improve your standing on the leaderboard. You can upload multiple submissions. The leaderboard will show your best submission.
Kaggle data sets and competitions
Introduction to kaggle: Use this simple competition to learn about kaggle. You will fit a model to a small data set. This competiton was created just for our ISEC2020 short course.
If you are already familiar with kaggle, then you may want to investigate one of the other competitions. More ISEC2020 competitions are coming soon!