Taxi Trip Duration¶
Install packages
Append notebooks directory to sys.path
Import packages
Create data directory
Download dataset
| VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge | duration | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2021-01-01 00:15:56 | 2021-01-01 00:19:52 | N | 1.0 | 43 | 151 | 1.0 | 1.01 | 5.5 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 6.80 | 2.0 | 1.0 | 0.00 | 3.933333 |
| 1 | 2 | 2021-01-01 00:25:59 | 2021-01-01 00:34:44 | N | 1.0 | 166 | 239 | 1.0 | 2.53 | 10.0 | 0.5 | 0.5 | 2.81 | 0.0 | None | 0.3 | 16.86 | 1.0 | 1.0 | 2.75 | 8.750000 |
| 2 | 2 | 2021-01-01 00:45:57 | 2021-01-01 00:51:55 | N | 1.0 | 41 | 42 | 1.0 | 1.12 | 6.0 | 0.5 | 0.5 | 1.00 | 0.0 | None | 0.3 | 8.30 | 1.0 | 1.0 | 0.00 | 5.966667 |
| 3 | 2 | 2020-12-31 23:57:51 | 2021-01-01 00:04:56 | N | 1.0 | 168 | 75 | 1.0 | 1.99 | 8.0 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 9.30 | 2.0 | 1.0 | 0.00 | 7.083333 |
| 7 | 2 | 2021-01-01 00:26:31 | 2021-01-01 00:28:50 | N | 1.0 | 75 | 75 | 6.0 | 0.45 | 3.5 | 0.5 | 0.5 | 0.96 | 0.0 | None | 0.3 | 5.76 | 1.0 | 1.0 | 0.00 | 2.316667 |
Duration distribution
Check statistics measures, due to long tail is difficult to use the distribution chart
One hot encoding
Target
Train Model
Make predictions
Compare predictions to actual values
Calculating RMSE
Lasso model
Ridge model
Define output path
Save the model