• Aucun résultat trouvé

Bike Sharing Demand _ _ _ _

N/A
N/A
Protected

Academic year: 2022

Partager "Bike Sharing Demand _ _ _ _"

Copied!
13
0
0

Texte intégral

(1)

!

Bike Sharing Demand _ _ _ _

www.dataiku.com

Goal: forecast use of a city bikeshare system.

(2)

Get the Data!

http://www.kaggle.com/c/bike-sharing-demand/data

(3)

Load it in DSS

(4)

TRAIN - 10,886 rows

INPUT COLUMNS:

- Datetime - Season - Holiday

- Workingday - Weather - Temperature

- Atemp "feels like"

- Humidity - Windspeed

!

!

ADDITIONAL COLUMNS (not in test):

Casual - number of non-registered user rentals initiated Registered - number of registered user rentals initiated

!

!

Count - number of total rentals TARGET:

(5)

Make your first preparation script.

Parse the date to extract dates components (year, month, day, ..) Remove columns registered and casual.

Create new variables…

(6)

Create a recipe

Use this to « industrialize » the rebuild of this output.

So now, you are ready to make your first model Our philosophy at Dataiku:

Go fast on data cleaning and boring task to have a lot of time for the modeling part! :)

(7)

Wait… let’s analyse a little our data before!

Let’s plot the average count of bike taken by hour on working day & WE.

Add sliders on

weather conditions.

(8)

Go create a new model

- Test some algorithms.

- Understand the different evaluations metrics. (The leaderboard is score with RMSLE)

and score the test dataset.

(9)

Run the model on the test dataset

(10)

Make your first submission.

- Reshape it as demand by kaggle & download it.

(11)

See your perf on the leaderboad!

(12)

Next: improve your models

RE-BUILD RE-RE-BUILD

- Run more models.

- Modify the model code in the iPython notebook

- Explore the scikit-learn documentation…

(13)

- The kaggle forum, lot of code sharing about solutions.

- Datascience.net, the « french kaggle ».

- Paris Machine Learning Meetup (one every month)

- Pandas (Python for data analysis)

- Look for nbviewer in google or twitter.

- Check news on datatau.

- Check awesome-machine-learning list.

- Some cool blog: dataiku, yhat, datarobot, fastML, fulmicoton.com - About dev & viz: D3js, AngularJS, Flask

Some resources you should check:

Références

Documents relatifs

Each type presents some specific features in terms of: the types of items concerned; the types of logistics connections created; the logistics value for users;

Compulsory engagement in CPD/lifelong learning systems or programmes The rapidly increasing supply of health care data and the growing concerns relating to patient safety, quality

We want to improve the overall quality of spreadsheets by introducing a visual lan- guage that supports users by visualizing the design of their spreadsheet and help them to

This allows to propose a second time the foun- dation for a framework that will be able to generate different middleware implementing autonomic loop and adapted to areas with

According to the design process results based on the technical task and technical requirements set in it, the “DAC” concept has technical characteristics presented in the

If we assume that basal transcription factor (TF), which is a term most commonly used in the domain of eukaryotic gene regulation, is equivalent to the most common

The inputs for the generated sequence are either human's gameplay or an already established source of randomness, the output – computer- generated sequence of moves or the sequence

The formula produces three characteristic values, which along with common knowledge rules and classification, form the basis of two com- puter applications: Random Recipe