!
Bike Sharing Demand _ _ _ _
www.dataiku.com
Goal: forecast use of a city bikeshare system.
Get the Data!
http://www.kaggle.com/c/bike-sharing-demand/data
Load it in DSS
TRAIN - 10,886 rows
INPUT COLUMNS:
- Datetime - Season - Holiday
- Workingday - Weather - Temperature
- Atemp "feels like"
- Humidity - Windspeed
!
!
ADDITIONAL COLUMNS (not in test):
Casual - number of non-registered user rentals initiated Registered - number of registered user rentals initiated
!
!
Count - number of total rentals TARGET:
Make your first preparation script.
Parse the date to extract dates components (year, month, day, ..) Remove columns registered and casual.
Create new variables…
Create a recipe
Use this to « industrialize » the rebuild of this output.
So now, you are ready to make your first model Our philosophy at Dataiku:
Go fast on data cleaning and boring task to have a lot of time for the modeling part! :)
Wait… let’s analyse a little our data before!
Let’s plot the average count of bike taken by hour on working day & WE.
Add sliders on
weather conditions.
Go create a new model
- Test some algorithms.
- Understand the different evaluations metrics. (The leaderboard is score with RMSLE)
and score the test dataset.
Run the model on the test dataset
Make your first submission.
- Reshape it as demand by kaggle & download it.
See your perf on the leaderboad!
Next: improve your models
RE-BUILD RE-RE-BUILD
…
- Run more models.
- Modify the model code in the iPython notebook
- Explore the scikit-learn documentation…
- The kaggle forum, lot of code sharing about solutions.
- Datascience.net, the « french kaggle ».
- Paris Machine Learning Meetup (one every month)
- Pandas (Python for data analysis)
- Look for nbviewer in google or twitter.
- Check news on datatau.
- Check awesome-machine-learning list.
- Some cool blog: dataiku, yhat, datarobot, fastML, fulmicoton.com - About dev & viz: D3js, AngularJS, Flask