I have not been able to write for a while as the semester just started, but quite frankly that is a not an issue since no one other than me reads these posts :). Anyways, I wanted to do this week’s tidy tuesday as it was about French train delays which I got to get accustomated to while living in France.

library(tidyverse)
library(dynlm)
library(gganimate)
library(maptools)
library(maps)
library(lettercase)
library(magrittr)
library(ggfortify)
library(pander)
library(patchwork)
library(kableExtra)
trains_raw <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/full_trains.csv")

trains_raw[4:5] %<>% mutate_all(funs(tolower)) %<>% mutate_all(funs(str_title_case))
trains_raw <- trains_raw[,- which(names(trains_raw) %in% c("comment_delays_on_arrival","comment_cancellations", "comment_delays_at_departure","delay_cause_external_cause", "delay_cause_rail_infrastructure", "delay_cause_traffic_management", "delay_cause_rolling_stock","delay_cause_station_management", "delay_cause_travelers"))]

trains <- trains_raw[complete.cases(trains_raw),]

small_trains <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv") 
france <- map_data("france")

Data Exploration

Let’s start by exploring the data. I could not find a really good model to predict the delay of trains on a station at a particular time, so I opted to build 3 models that predict the number of trains late by at least 15mins, 30mins, and 60 mins.

I used a stepwise regression to select variables but here are a few pretty plots to justify the selection of the variables.

The plots above are very interesting as they show a clear relationship where the variance increases as the x-values increase. This makes poisson regression a good choice to have a better fit as x-values increase. I did however use a linear regression for trains that were late by at least 15 mins as the fit was better when looking at the AIC and \(R^2\).

g1 <-
    ggplot(data = trains) + geom_point(aes(x = total_num_trips, y = num_greater_15_min_late)) + geom_smooth(
    aes(x = total_num_trips, y = num_greater_15_min_late),
    method = "lm",
    formula = y ~ poly(x, 2)
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 15 min late") + labs(title = "Linear Regression for Trains that are Later than 10 min") + theme_bw()
    
    g2 <-
    ggplot(data = trains) + geom_point(aes(x = total_num_trips, y = num_greater_30_min_late)) + geom_smooth(
    aes(x = total_num_trips, y = num_greater_30_min_late),
    method = "lm",
    formula = y ~ poly(x, 2)
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 30 min late") + labs(title = "Prediction Using Poisson Regression for Trains that are Later than 30 min") + theme_bw()
    
    g3 <-
    ggplot(data = trains) + geom_point(aes(x = total_num_trips, y = num_greater_60_min_late)) + geom_smooth(
    aes(x = total_num_trips, y = num_greater_60_min_late),
    method = stats::glm,
    formula = y ~ x ,
    method.args = list(family = poisson(link = log))
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 60 min late") + labs(title = "Prediction Using Poisson Regression for Trains that are Later than 60 min") + theme_bw()
    
    g1

    g2

    g3

For the plots below there seems to not be the same variance relationship as the previous plots, however I still used the same models to plot the smoothers as they could give us an idea on whether if a poisson regression was a good idea or not

g1 <-
    ggplot(data = trains) + geom_point(aes(x = journey_time_avg, y = num_greater_15_min_late)) + geom_smooth(
    aes(x = journey_time_avg, y = num_greater_15_min_late),
    method = "lm",
    formula = y ~ poly(x, 2)
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 15 min late") + labs(title = "Linear Regression for Trains that are Later than 10 min") + theme_bw()
    
    g2 <-
    ggplot(data = trains) + geom_point(aes(x = journey_time_avg, y = num_greater_30_min_late)) + geom_smooth(
    aes(x = journey_time_avg, y = num_greater_30_min_late),
    method = "lm",
    formula = y ~ poly(x, 2)
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 30 min late") + labs(title = "Prediction Using Poisson Regression for Trains that are Later than 30 min") + theme_bw()
    
    g3 <-
    ggplot(data = trains) + geom_point(aes(x = journey_time_avg, y = num_greater_60_min_late)) + geom_smooth(
    aes(x = journey_time_avg, y = num_greater_60_min_late),
    method = stats::glm,
    formula = y ~ x ,
    method.args = list(family = poisson(link = log))
    ) + xlab("Total number of trains in the time period") + ylab("Number of trains greater than 60 min late") + labs(title = "Prediction Using Poisson Regression for Trains that are Later than 60 min") + theme_bw()
    
    g1

    g2

    g3

ggplot(data = trains) + geom_col(aes(x = arrival_station, y = num_greater_15_min_late)) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("City") + ylab("Trains greater than 15 min late")

ggplot(data = trains) + geom_col(aes(x = arrival_station, y = num_greater_30_min_late)) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("City") + ylab("Trains greater than 30 min late")

ggplot(data = trains) + geom_col(aes(x = arrival_station, y = num_greater_60_min_late)) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("City") + ylab("Trains greater than 60 min late")

It seems like there is a pattern that certain cities have significantly more trains that are late. However this might be also due to the number of trains present in the station at that time, so when doing statistical tests, it will be important to control for variations in the total number of trains to prove that some cities are statistically more likely to have late trains.

ggplot(data = trains, aes(x = month, y = num_greater_60_min_late, fill = month)) +  
       stat_summary(fun.y = sum, geom="bar", size = 1) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Month of the Year") + ylab("Trains greater than 60 min late")

#Statistical Model and Analysis

Models

Below are the models that I built to predict the number of trains late by at least 15 mins, 30 mins and 60 mins.

train <- trains[1:nrow(trains)*0.8,]
test <- trains[nrow(trains)*0.8:nrow(trains),]


#Predict number that is later than 15min
model <- lm(data = train, num_greater_15_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips)

e <- residuals(model)

sHat <- predict(lm(abs(e) ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips , data = train))

WLS <-lm(num_greater_15_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips , data = train, weights = 1/(sHat^2))

pander(summary.lm(WLS)["call"])
  • call: lm(formula = num_greater_15_min_late ~ factor(departure_station) + factor(arrival_station) + month + year + journey_time_avg + total_num_trips, data = train, weights = 1/(sHat^2))
pander(summary(WLS))
Table continues below
  Estimate Std. Error t value
(Intercept) -6231 261.8 -23.8
factor(departure_station)Angers Saint Laud -11.98 4.625 -2.591
factor(departure_station)Angouleme -2.339 4.9 -0.4773
factor(departure_station)Annecy 7.337 1.83 4.01
factor(departure_station)Arras 5.878 3.75 1.568
factor(departure_station)Avignon Tgv 14.41 3.283 4.388
factor(departure_station)Bellegarde (Ain) 2.45 1.928 1.271
factor(departure_station)Besancon Franche Comte Tgv -5.333 1.693 -3.15
factor(departure_station)Bordeaux St Jean 20.26 7.392 2.741
factor(departure_station)Brest -9.017 4.938 -1.826
factor(departure_station)Chambery Challes Les Eaux 10.11 1.885 5.364
factor(departure_station)Dijon Ville -9.509 2.059 -4.617
factor(departure_station)Douai 8.909 3.863 2.306
factor(departure_station)Dunkerque 6.918 3.823 1.81
factor(departure_station)Francfort -2.856 4.845 -0.5895
factor(departure_station)Geneve 1.787 1.811 0.9865
factor(departure_station)Grenoble -2.505 1.682 -1.49
factor(departure_station)Italie 7.28 2.027 3.592
factor(departure_station)La Rochelle Ville -10.16 4.917 -2.067
factor(departure_station)Lausanne 1.806 1.822 0.9915
factor(departure_station)Laval -8.848 4.799 -1.844
factor(departure_station)Le Creusot Montceau Montchanin -2.612 1.746 -1.496
factor(departure_station)Le Mans -9.394 4.513 -2.082
factor(departure_station)Lille -9.876 2.781 -3.551
factor(departure_station)Lyon Part Dieu -5.702 2.496 -2.284
factor(departure_station)Macon Loche 6.558 1.814 3.615
factor(departure_station)Marseille St Charles 1.372 1.992 0.6888
factor(departure_station)Metz -13.66 4.8 -2.847
factor(departure_station)Montpellier 12.87 2.451 5.25
factor(departure_station)Mulhouse Ville -7.934 1.738 -4.564
factor(departure_station)Nancy -21.58 4.716 -4.575
factor(departure_station)Nantes -9.848 4.789 -2.056
factor(departure_station)Nice Ville 17.34 3.071 5.645
factor(departure_station)Nimes 21.14 3.145 6.722
factor(departure_station)Paris Est -23.68 4.737 -4.998
factor(departure_station)Paris Lyon -32.52 2.791 -11.65
factor(departure_station)Paris Montparnasse -20.3 4.5 -4.51
factor(departure_station)Paris Nord -58.06 3.725 -15.58
factor(departure_station)Perpignan 14.46 2.653 5.452
factor(departure_station)Poitiers -10.04 4.712 -2.131
factor(departure_station)Quimper -7.726 4.938 -1.565
factor(departure_station)Reims -19.85 4.732 -4.195
factor(departure_station)Rennes -7.542 2.954 -2.553
factor(departure_station)Saint Etienne Chateaucreux 1.088 1.855 0.5864
factor(departure_station)St Malo -7.014 5.036 -1.393
factor(departure_station)St Pierre Des Corps -12.91 4.668 -2.767
factor(departure_station)Strasbourg -16.43 4.339 -3.787
factor(departure_station)Stuttgart -5.193 4.85 -1.071
factor(departure_station)Toulon 18.49 2.614 7.074
factor(departure_station)Toulouse Matabiau 4.56 5.357 0.8511
factor(departure_station)Tours -12.44 4.795 -2.594
factor(departure_station)Valence Alixan Tgv 19.71 2.839 6.943
factor(departure_station)Vannes -7.001 4.873 -1.437
factor(departure_station)Zurich -2.609 1.824 -1.43
factor(arrival_station)Angers Saint Laud -15.34 5.071 -3.026
factor(arrival_station)Angouleme -13.29 5.198 -2.558
factor(arrival_station)Annecy 2.377 2.599 0.9143
factor(arrival_station)Arras 17.81 4.561 3.905
factor(arrival_station)Avignon Tgv -3.702 2.9 -1.276
factor(arrival_station)Bellegarde (Ain) 5.157 2.684 1.922
factor(arrival_station)Besancon Franche Comte Tgv -1.015 2.556 -0.3972
factor(arrival_station)Bordeaux St Jean 12.3 6.55 1.877
factor(arrival_station)Brest -10.63 5.432 -1.957
factor(arrival_station)Chambery Challes Les Eaux 5.247 2.589 2.026
factor(arrival_station)Dijon Ville -9.187 2.843 -3.232
factor(arrival_station)Douai 24.05 4.639 5.184
factor(arrival_station)Dunkerque 24.16 4.664 5.181
factor(arrival_station)Francfort -5.427 5.061 -1.072
factor(arrival_station)Geneve 2.614 2.681 0.975
factor(arrival_station)Grenoble 5.46 2.637 2.07
factor(arrival_station)Italie 6.145 2.819 2.18
factor(arrival_station)La Rochelle Ville -11.07 5.51 -2.01
factor(arrival_station)Lausanne 4.488 2.625 1.71
factor(arrival_station)Laval -15.26 5.225 -2.921
factor(arrival_station)Le Creusot Montceau Montchanin -4.053 2.528 -1.603
factor(arrival_station)Le Mans -21.94 4.986 -4.399
factor(arrival_station)Lille 5.51 3.957 1.392
factor(arrival_station)Lyon Part Dieu -15.82 2.875 -5.502
factor(arrival_station)Macon Loche -2.696 2.508 -1.075
factor(arrival_station)Marseille St Charles 4.864 2.964 1.641
factor(arrival_station)Metz -17.81 5.036 -3.536
factor(arrival_station)Montpellier 9.748 3.08 3.165
factor(arrival_station)Mulhouse Ville 0.3909 2.599 0.1504
factor(arrival_station)Nancy -17.33 5.099 -3.399
factor(arrival_station)Nantes -13.11 5.166 -2.537
factor(arrival_station)Nice Ville 20.09 3.438 5.842
factor(arrival_station)Nimes 9.983 2.847 3.507
factor(arrival_station)Paris Est -18.45 5.144 -3.587
factor(arrival_station)Paris Lyon -31.28 3.305 -9.463
factor(arrival_station)Paris Montparnasse -22.51 4.734 -4.754
factor(arrival_station)Paris Nord -43.22 3.802 -11.37
factor(arrival_station)Perpignan 12.63 3.19 3.959
factor(arrival_station)Poitiers -24.1 5.08 -4.745
factor(arrival_station)Quimper -6.837 5.452 -1.254
factor(arrival_station)Reims -14.48 5.047 -2.868
factor(arrival_station)Rennes -14.74 3.855 -3.824
factor(arrival_station)Saint Etienne Chateaucreux 3.599 2.626 1.371
factor(arrival_station)St Malo -8.455 5.525 -1.53
factor(arrival_station)St Pierre Des Corps -25.91 4.965 -5.218
factor(arrival_station)Strasbourg -20.39 4.792 -4.255
factor(arrival_station)Stuttgart -4.502 5.083 -0.8855
factor(arrival_station)Toulon 16.96 3.62 4.683
factor(arrival_station)Toulouse Matabiau 1.228 5.703 0.2152
factor(arrival_station)Tours -13.31 5.289 -2.516
factor(arrival_station)Valence Alixan Tgv 0.6204 2.785 0.2228
factor(arrival_station)Vannes -6.246 5.39 -1.159
factor(arrival_station)Zurich 2.949 2.629 1.121
month 0.3716 0.02698 13.78
year 3.102 0.1299 23.88
journey_time_avg 0.01601 0.004668 3.43
total_num_trips 0.08216 0.00362 22.69
  Pr(>|t|)
(Intercept) 4.657e-117
factor(departure_station)Angers Saint Laud 0.009601
factor(departure_station)Angouleme 0.6332
factor(departure_station)Annecy 6.175e-05
factor(departure_station)Arras 0.1171
factor(departure_station)Avignon Tgv 1.173e-05
factor(departure_station)Bellegarde (Ain) 0.204
factor(departure_station)Besancon Franche Comte Tgv 0.001645
factor(departure_station)Bordeaux St Jean 0.006153
factor(departure_station)Brest 0.06794
factor(departure_station)Chambery Challes Les Eaux 8.623e-08
factor(departure_station)Dijon Ville 4.016e-06
factor(departure_station)Douai 0.02115
factor(departure_station)Dunkerque 0.07044
factor(departure_station)Francfort 0.5555
factor(departure_station)Geneve 0.324
factor(departure_station)Grenoble 0.1364
factor(departure_station)Italie 0.0003317
factor(departure_station)La Rochelle Ville 0.03881
factor(departure_station)Lausanne 0.3215
factor(departure_station)Laval 0.06532
factor(departure_station)Le Creusot Montceau Montchanin 0.1347
factor(departure_station)Le Mans 0.03745
factor(departure_station)Lille 0.0003883
factor(departure_station)Lyon Part Dieu 0.02242
factor(departure_station)Macon Loche 0.0003045
factor(departure_station)Marseille St Charles 0.491
factor(departure_station)Metz 0.004442
factor(departure_station)Montpellier 1.601e-07
factor(departure_station)Mulhouse Ville 5.18e-06
factor(departure_station)Nancy 4.902e-06
factor(departure_station)Nantes 0.03981
factor(departure_station)Nice Ville 1.768e-08
factor(departure_station)Nimes 2.051e-11
factor(departure_station)Paris Est 6.051e-07
factor(departure_station)Paris Lyon 7.236e-31
factor(departure_station)Paris Montparnasse 6.667e-06
factor(departure_station)Paris Nord 3.539e-53
factor(departure_station)Perpignan 5.299e-08
factor(departure_station)Poitiers 0.03319
factor(departure_station)Quimper 0.1177
factor(departure_station)Reims 2.786e-05
factor(departure_station)Rennes 0.01072
factor(departure_station)Saint Etienne Chateaucreux 0.5577
factor(departure_station)St Malo 0.1638
factor(departure_station)St Pierre Des Corps 0.005686
factor(departure_station)Strasbourg 0.0001547
factor(departure_station)Stuttgart 0.2843
factor(departure_station)Toulon 1.77e-12
factor(departure_station)Toulouse Matabiau 0.3948
factor(departure_station)Tours 0.00952
factor(departure_station)Valence Alixan Tgv 4.489e-12
factor(departure_station)Vannes 0.1509
factor(departure_station)Zurich 0.1528
factor(arrival_station)Angers Saint Laud 0.002495
factor(arrival_station)Angouleme 0.01058
factor(arrival_station)Annecy 0.3606
factor(arrival_station)Arras 9.597e-05
factor(arrival_station)Avignon Tgv 0.2019
factor(arrival_station)Bellegarde (Ain) 0.05474
factor(arrival_station)Besancon Franche Comte Tgv 0.6912
factor(arrival_station)Bordeaux St Jean 0.06053
factor(arrival_station)Brest 0.05038
factor(arrival_station)Chambery Challes Les Eaux 0.04279
factor(arrival_station)Dijon Ville 0.001241
factor(arrival_station)Douai 2.287e-07
factor(arrival_station)Dunkerque 2.32e-07
factor(arrival_station)Francfort 0.2836
factor(arrival_station)Geneve 0.3296
factor(arrival_station)Grenoble 0.0385
factor(arrival_station)Italie 0.02931
factor(arrival_station)La Rochelle Ville 0.04455
factor(arrival_station)Lausanne 0.08742
factor(arrival_station)Laval 0.003513
factor(arrival_station)Le Creusot Montceau Montchanin 0.1089
factor(arrival_station)Le Mans 1.114e-05
factor(arrival_station)Lille 0.1639
factor(arrival_station)Lyon Part Dieu 3.985e-08
factor(arrival_station)Macon Loche 0.2824
factor(arrival_station)Marseille St Charles 0.1009
factor(arrival_station)Metz 0.000411
factor(arrival_station)Montpellier 0.001562
factor(arrival_station)Mulhouse Ville 0.8804
factor(arrival_station)Nancy 0.0006834
factor(arrival_station)Nantes 0.01121
factor(arrival_station)Nice Ville 5.574e-09
factor(arrival_station)Nimes 0.0004581
factor(arrival_station)Paris Est 0.0003391
factor(arrival_station)Paris Lyon 4.998e-21
factor(arrival_station)Paris Montparnasse 2.065e-06
factor(arrival_station)Paris Nord 1.716e-29
factor(arrival_station)Perpignan 7.657e-05
factor(arrival_station)Poitiers 2.163e-06
factor(arrival_station)Quimper 0.2099
factor(arrival_station)Reims 0.004151
factor(arrival_station)Rennes 0.0001331
factor(arrival_station)Saint Etienne Chateaucreux 0.1705
factor(arrival_station)St Malo 0.126
factor(arrival_station)St Pierre Des Corps 1.907e-07
factor(arrival_station)Strasbourg 2.142e-05
factor(arrival_station)Stuttgart 0.3759
factor(arrival_station)Toulon 2.919e-06
factor(arrival_station)Toulouse Matabiau 0.8296
factor(arrival_station)Tours 0.01189
factor(arrival_station)Valence Alixan Tgv 0.8237
factor(arrival_station)Vannes 0.2466
factor(arrival_station)Zurich 0.2622
month 3.42e-42
year 9.711e-118
journey_time_avg 0.0006093
total_num_trips 3.287e-107
Fitting linear model: num_greater_15_min_late ~ factor(departure_station) + factor(arrival_station) + month + year + journey_time_avg + total_num_trips
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
4026 1.244 0.7027 0.6943
autoplot(WLS) + theme_bw()

#Predict number that is later than 30min
pois.model.30 <-  glm(data = train, num_greater_30_min_late ~  total_num_trips + month + year + journey_time_avg  + factor(departure_station)+factor(arrival_station),family=poisson(link=log))

pander(summary.lm(pois.model.30)["call"])
  • call: glm(formula = num_greater_30_min_late ~ total_num_trips + month + year + journey_time_avg + factor(departure_station) + factor(arrival_station), family = poisson(link = log), data = train)
pander(summary(pois.model.30))
Table continues below
  Estimate Std. Error z value
(Intercept) -379.5 12.94 -29.33
total_num_trips 0.00215 0.0001363 15.77
month 0.02993 0.001421 21.06
year 0.1892 0.00642 29.47
journey_time_avg 0.003165 0.0003405 9.295
factor(departure_station)Angers Saint Laud -0.8956 0.1245 -7.194
factor(departure_station)Angouleme -0.8331 0.1364 -6.108
factor(departure_station)Annecy -0.3453 0.08206 -4.208
factor(departure_station)Arras -0.4552 0.1283 -3.547
factor(departure_station)Avignon Tgv 0.4325 0.05529 7.822
factor(departure_station)Bellegarde (Ain) -0.2426 0.07548 -3.215
factor(departure_station)Besancon Franche Comte Tgv -0.9581 0.1068 -8.974
factor(departure_station)Bordeaux St Jean -0.885 0.1229 -7.198
factor(departure_station)Brest -1.48 0.1645 -9
factor(departure_station)Chambery Challes Les Eaux -0.05593 0.07101 -0.7877
factor(departure_station)Dijon Ville -0.2938 0.07603 -3.865
factor(departure_station)Douai -0.909 0.1503 -6.048
factor(departure_station)Dunkerque -0.6583 0.1408 -4.676
factor(departure_station)Francfort -1.525 0.1846 -8.263
factor(departure_station)Geneve -0.3808 0.08149 -4.673
factor(departure_station)Grenoble -0.5246 0.08114 -6.466
factor(departure_station)Italie -0.9357 0.106 -8.824
factor(departure_station)La Rochelle Ville -1.538 0.157 -9.797
factor(departure_station)Lausanne -0.8565 0.1035 -8.273
factor(departure_station)Laval -1.228 0.1486 -8.265
factor(departure_station)Le Creusot Montceau Montchanin -0.4839 0.1025 -4.719
factor(departure_station)Le Mans -0.685 0.1193 -5.742
factor(departure_station)Lille -0.6511 0.07347 -8.862
factor(departure_station)Lyon Part Dieu -0.03439 0.05949 -0.578
factor(departure_station)Macon Loche 0.01188 0.08648 0.1374
factor(departure_station)Marseille St Charles -0.0944 0.05231 -1.804
factor(departure_station)Metz -1.67 0.1828 -9.136
factor(departure_station)Montpellier 0.3139 0.05324 5.897
factor(departure_station)Mulhouse Ville -0.6383 0.08053 -7.927
factor(departure_station)Nancy -2.091 0.1863 -11.22
factor(departure_station)Nantes -0.9652 0.1211 -7.969
factor(departure_station)Nice Ville 0.08298 0.07971 1.041
factor(departure_station)Nimes 0.6371 0.05398 11.8
factor(departure_station)Paris Est 0.2203 0.1625 1.355
factor(departure_station)Paris Lyon -0.7934 0.06021 -13.18
factor(departure_station)Paris Montparnasse 0.2116 0.1324 1.598
factor(departure_station)Paris Nord -0.9015 0.1117 -8.071
factor(departure_station)Perpignan 0.03691 0.07511 0.4914
factor(departure_station)Poitiers -0.8402 0.1222 -6.873
factor(departure_station)Quimper -1.384 0.1636 -8.458
factor(departure_station)Reims -2.827 0.222 -12.74
factor(departure_station)Rennes -0.7896 0.1057 -7.471
factor(departure_station)Saint Etienne Chateaucreux -1.03 0.1238 -8.327
factor(departure_station)St Malo -1.821 0.1735 -10.5
factor(departure_station)St Pierre Des Corps -0.879 0.1257 -6.994
factor(departure_station)Strasbourg -1.605 0.1622 -9.895
factor(departure_station)Stuttgart -1.715 0.1833 -9.359
factor(departure_station)Toulon 0.422 0.05977 7.061
factor(departure_station)Toulouse Matabiau -1.246 0.1752 -7.113
factor(departure_station)Tours -1.742 0.1664 -10.47
factor(departure_station)Valence Alixan Tgv 0.6623 0.06312 10.49
factor(departure_station)Vannes -1.115 0.1516 -7.358
factor(departure_station)Zurich -0.8828 0.1018 -8.671
factor(arrival_station)Angers Saint Laud -0.9376 0.129 -7.268
factor(arrival_station)Angouleme -1.292 0.1469 -8.796
factor(arrival_station)Annecy -0.5259 0.08709 -6.039
factor(arrival_station)Arras -0.3057 0.1311 -2.332
factor(arrival_station)Avignon Tgv -0.0249 0.06162 -0.404
factor(arrival_station)Bellegarde (Ain) -0.2743 0.08477 -3.235
factor(arrival_station)Besancon Franche Comte Tgv -0.6572 0.1057 -6.216
factor(arrival_station)Bordeaux St Jean -0.8553 0.1265 -6.759
factor(arrival_station)Brest -1.204 0.1642 -7.335
factor(arrival_station)Chambery Challes Les Eaux -0.1446 0.08072 -1.792
factor(arrival_station)Dijon Ville -0.1558 0.07495 -2.079
factor(arrival_station)Douai -0.3018 0.1373 -2.198
factor(arrival_station)Dunkerque -0.05128 0.1286 -0.3987
factor(arrival_station)Francfort -1.449 0.1816 -7.977
factor(arrival_station)Geneve -0.4797 0.09266 -5.177
factor(arrival_station)Grenoble -0.09572 0.07833 -1.222
factor(arrival_station)Italie -0.6481 0.1064 -6.09
factor(arrival_station)La Rochelle Ville -1.455 0.1631 -8.921
factor(arrival_station)Lausanne -0.6956 0.1041 -6.684
factor(arrival_station)Laval -1.472 0.1549 -9.503
factor(arrival_station)Le Creusot Montceau Montchanin -0.7882 0.1199 -6.576
factor(arrival_station)Le Mans -1.108 0.1281 -8.648
factor(arrival_station)Lille -0.2837 0.07416 -3.825
factor(arrival_station)Lyon Part Dieu -0.3803 0.06203 -6.13
factor(arrival_station)Macon Loche -0.7034 0.1063 -6.62
factor(arrival_station)Marseille St Charles 0.02638 0.05424 0.4864
factor(arrival_station)Metz -1.709 0.1823 -9.377
factor(arrival_station)Montpellier 0.2288 0.05931 3.859
factor(arrival_station)Mulhouse Ville -0.1398 0.0744 -1.879
factor(arrival_station)Nancy -1.592 0.1816 -8.765
factor(arrival_station)Nantes -1.001 0.1253 -7.99
factor(arrival_station)Nice Ville 0.1703 0.08076 2.109
factor(arrival_station)Nimes 0.2817 0.0644 4.373
factor(arrival_station)Paris Est 0.659 0.1693 3.892
factor(arrival_station)Paris Lyon -0.7965 0.06159 -12.93
factor(arrival_station)Paris Montparnasse 0.2223 0.1314 1.692
factor(arrival_station)Paris Nord -0.7531 0.1176 -6.406
factor(arrival_station)Perpignan 0.02658 0.08208 0.3239
factor(arrival_station)Poitiers -1.222 0.1264 -9.665
factor(arrival_station)Quimper -1.112 0.1652 -6.732
factor(arrival_station)Reims -2.009 0.1991 -10.09
factor(arrival_station)Rennes -0.9455 0.1094 -8.641
factor(arrival_station)Saint Etienne Chateaucreux -0.9559 0.1164 -8.213
factor(arrival_station)St Malo -1.656 0.1748 -9.473
factor(arrival_station)St Pierre Des Corps -1.404 0.1338 -10.5
factor(arrival_station)Strasbourg -1.444 0.1542 -9.363
factor(arrival_station)Stuttgart -1.453 0.1844 -7.882
factor(arrival_station)Toulon 0.3048 0.06457 4.72
factor(arrival_station)Toulouse Matabiau -1.011 0.1795 -5.633
factor(arrival_station)Tours -1.733 0.1668 -10.39
factor(arrival_station)Valence Alixan Tgv -0.1012 0.07977 -1.268
factor(arrival_station)Vannes -0.8635 0.153 -5.642
factor(arrival_station)Zurich -0.3975 0.09354 -4.249
  Pr(>|z|)
(Intercept) 4.874e-189
total_num_trips 4.698e-56
month 1.729e-98
year 7.826e-191
journey_time_avg 1.469e-20
factor(departure_station)Angers Saint Laud 6.279e-13
factor(departure_station)Angouleme 1.01e-09
factor(departure_station)Annecy 2.58e-05
factor(departure_station)Arras 0.0003893
factor(departure_station)Avignon Tgv 5.19e-15
factor(departure_station)Bellegarde (Ain) 0.001306
factor(departure_station)Besancon Franche Comte Tgv 2.863e-19
factor(departure_station)Bordeaux St Jean 6.105e-13
factor(departure_station)Brest 2.256e-19
factor(departure_station)Chambery Challes Les Eaux 0.4309
factor(departure_station)Dijon Ville 0.0001112
factor(departure_station)Douai 1.468e-09
factor(departure_station)Dunkerque 2.932e-06
factor(departure_station)Francfort 1.422e-16
factor(departure_station)Geneve 2.975e-06
factor(departure_station)Grenoble 1.007e-10
factor(departure_station)Italie 1.104e-18
factor(departure_station)La Rochelle Ville 1.154e-22
factor(departure_station)Lausanne 1.304e-16
factor(departure_station)Laval 1.395e-16
factor(departure_station)Le Creusot Montceau Montchanin 2.366e-06
factor(departure_station)Le Mans 9.379e-09
factor(departure_station)Lille 7.856e-19
factor(departure_station)Lyon Part Dieu 0.5633
factor(departure_station)Macon Loche 0.8907
factor(departure_station)Marseille St Charles 0.07115
factor(departure_station)Metz 6.459e-20
factor(departure_station)Montpellier 3.713e-09
factor(departure_station)Mulhouse Ville 2.252e-15
factor(departure_station)Nancy 3.144e-29
factor(departure_station)Nantes 1.596e-15
factor(departure_station)Nice Ville 0.2978
factor(departure_station)Nimes 3.755e-32
factor(departure_station)Paris Est 0.1753
factor(departure_station)Paris Lyon 1.191e-39
factor(departure_station)Paris Montparnasse 0.1101
factor(departure_station)Paris Nord 6.984e-16
factor(departure_station)Perpignan 0.6231
factor(departure_station)Poitiers 6.284e-12
factor(departure_station)Quimper 2.717e-17
factor(departure_station)Reims 3.726e-37
factor(departure_station)Rennes 7.945e-14
factor(departure_station)Saint Etienne Chateaucreux 8.307e-17
factor(departure_station)St Malo 8.805e-26
factor(departure_station)St Pierre Des Corps 2.663e-12
factor(departure_station)Strasbourg 4.397e-23
factor(departure_station)Stuttgart 8.04e-21
factor(departure_station)Toulon 1.658e-12
factor(departure_station)Toulouse Matabiau 1.133e-12
factor(departure_station)Tours 1.214e-25
factor(departure_station)Valence Alixan Tgv 9.347e-26
factor(departure_station)Vannes 1.86e-13
factor(departure_station)Zurich 4.281e-18
factor(arrival_station)Angers Saint Laud 3.636e-13
factor(arrival_station)Angouleme 1.412e-18
factor(arrival_station)Annecy 1.549e-09
factor(arrival_station)Arras 0.01968
factor(arrival_station)Avignon Tgv 0.6862
factor(arrival_station)Bellegarde (Ain) 0.001215
factor(arrival_station)Besancon Franche Comte Tgv 5.107e-10
factor(arrival_station)Bordeaux St Jean 1.389e-11
factor(arrival_station)Brest 2.213e-13
factor(arrival_station)Chambery Challes Les Eaux 0.07315
factor(arrival_station)Dijon Ville 0.03759
factor(arrival_station)Douai 0.02793
factor(arrival_station)Dunkerque 0.6901
factor(arrival_station)Francfort 1.504e-15
factor(arrival_station)Geneve 2.25e-07
factor(arrival_station)Grenoble 0.2217
factor(arrival_station)Italie 1.127e-09
factor(arrival_station)La Rochelle Ville 4.631e-19
factor(arrival_station)Lausanne 2.328e-11
factor(arrival_station)Laval 2.048e-21
factor(arrival_station)Le Creusot Montceau Montchanin 4.83e-11
factor(arrival_station)Le Mans 5.263e-18
factor(arrival_station)Lille 0.0001305
factor(arrival_station)Lyon Part Dieu 8.76e-10
factor(arrival_station)Macon Loche 3.592e-11
factor(arrival_station)Marseille St Charles 0.6267
factor(arrival_station)Metz 6.817e-21
factor(arrival_station)Montpellier 0.000114
factor(arrival_station)Mulhouse Ville 0.06026
factor(arrival_station)Nancy 1.875e-18
factor(arrival_station)Nantes 1.353e-15
factor(arrival_station)Nice Ville 0.03496
factor(arrival_station)Nimes 1.223e-05
factor(arrival_station)Paris Est 9.924e-05
factor(arrival_station)Paris Lyon 2.949e-38
factor(arrival_station)Paris Montparnasse 0.09065
factor(arrival_station)Paris Nord 1.491e-10
factor(arrival_station)Perpignan 0.746
factor(arrival_station)Poitiers 4.23e-22
factor(arrival_station)Quimper 1.669e-11
factor(arrival_station)Reims 5.996e-24
factor(arrival_station)Rennes 5.58e-18
factor(arrival_station)Saint Etienne Chateaucreux 2.148e-16
factor(arrival_station)St Malo 2.708e-21
factor(arrival_station)St Pierre Des Corps 8.823e-26
factor(arrival_station)Strasbourg 7.717e-21
factor(arrival_station)Stuttgart 3.234e-15
factor(arrival_station)Toulon 2.353e-06
factor(arrival_station)Toulouse Matabiau 1.774e-08
factor(arrival_station)Tours 2.725e-25
factor(arrival_station)Valence Alixan Tgv 0.2047
factor(arrival_station)Vannes 1.682e-08
factor(arrival_station)Zurich 2.143e-05

(Dispersion parameter for poisson family taken to be 1 )

Null deviance: 29156 on 4025 degrees of freedom
Residual deviance: 9949 on 3915 degrees of freedom
autoplot(pois.model.30) + theme_bw()

#predict number that is later than 60 min
pois.model.60 <-  glm(data = train, num_greater_60_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips,family=poisson(link=log))

pander(summary.lm(pois.model.60)["call"])
  • call: glm(formula = num_greater_60_min_late ~ factor(departure_station) + factor(arrival_station) + month + year + journey_time_avg + total_num_trips, family = poisson(link = log), data = train)
pander(summary(pois.model.60))
Table continues below
  Estimate Std. Error z value
(Intercept) -350.3 21.47 -16.32
factor(departure_station)Angers Saint Laud -1.867 0.202 -9.242
factor(departure_station)Angouleme -1.82 0.2202 -8.264
factor(departure_station)Annecy -0.7036 0.1327 -5.303
factor(departure_station)Arras -1.036 0.2276 -4.552
factor(departure_station)Avignon Tgv 0.4042 0.08639 4.678
factor(departure_station)Bellegarde (Ain) -0.7273 0.1338 -5.436
factor(departure_station)Besancon Franche Comte Tgv -0.8643 0.1614 -5.354
factor(departure_station)Bordeaux St Jean -1.829 0.1989 -9.195
factor(departure_station)Brest -2.725 0.2673 -10.19
factor(departure_station)Chambery Challes Les Eaux -0.4116 0.1177 -3.497
factor(departure_station)Dijon Ville -0.2337 0.1222 -1.913
factor(departure_station)Douai -1.718 0.2732 -6.289
factor(departure_station)Dunkerque -1.515 0.2517 -6.019
factor(departure_station)Francfort -1.963 0.3158 -6.218
factor(departure_station)Geneve -0.7589 0.1388 -5.468
factor(departure_station)Grenoble -0.8302 0.1341 -6.192
factor(departure_station)Italie -1.806 0.1855 -9.733
factor(departure_station)La Rochelle Ville -2.773 0.2576 -10.77
factor(departure_station)Lausanne -1.3 0.1773 -7.333
factor(departure_station)Laval -2.041 0.2383 -8.567
factor(departure_station)Le Creusot Montceau Montchanin -0.9406 0.1962 -4.793
factor(departure_station)Le Mans -1.285 0.1895 -6.784
factor(departure_station)Lille -1.315 0.1197 -10.98
factor(departure_station)Lyon Part Dieu -0.04394 0.09537 -0.4607
factor(departure_station)Macon Loche -0.126 0.1443 -0.8731
factor(departure_station)Marseille St Charles -0.2048 0.08097 -2.529
factor(departure_station)Metz -1.982 0.3151 -6.29
factor(departure_station)Montpellier 0.1624 0.08214 1.977
factor(departure_station)Mulhouse Ville -0.7786 0.1283 -6.069
factor(departure_station)Nancy -2.191 0.3155 -6.944
factor(departure_station)Nantes -2.015 0.1959 -10.28
factor(departure_station)Nice Ville -0.5824 0.1301 -4.477
factor(departure_station)Nimes 0.5063 0.08388 6.036
factor(departure_station)Paris Est -0.3803 0.2584 -1.472
factor(departure_station)Paris Lyon -1.086 0.09556 -11.37
factor(departure_station)Paris Montparnasse 0.7059 0.2145 3.291
factor(departure_station)Paris Nord -0.4036 0.19 -2.124
factor(departure_station)Perpignan -0.4876 0.1207 -4.038
factor(departure_station)Poitiers -1.611 0.1953 -8.251
factor(departure_station)Quimper -2.546 0.2651 -9.603
factor(departure_station)Reims -3.035 0.391 -7.761
factor(departure_station)Rennes -1.436 0.1706 -8.416
factor(departure_station)Saint Etienne Chateaucreux -1.499 0.2214 -6.77
factor(departure_station)St Malo -2.715 0.2744 -9.894
factor(departure_station)St Pierre Des Corps -1.645 0.2019 -8.143
factor(departure_station)Strasbourg -1.878 0.2731 -6.879
factor(departure_station)Stuttgart -2.396 0.3207 -7.47
factor(departure_station)Toulon 0.05393 0.09513 0.567
factor(departure_station)Toulouse Matabiau -2.699 0.2873 -9.397
factor(departure_station)Tours -2.711 0.2758 -9.83
factor(departure_station)Valence Alixan Tgv 0.6568 0.09914 6.624
factor(departure_station)Vannes -2.022 0.243 -8.319
factor(departure_station)Zurich -1.092 0.1619 -6.745
factor(arrival_station)Angers Saint Laud -1.364 0.2126 -6.417
factor(arrival_station)Angouleme -1.836 0.2418 -7.592
factor(arrival_station)Annecy -0.9147 0.1642 -5.571
factor(arrival_station)Arras -0.9087 0.2367 -3.838
factor(arrival_station)Avignon Tgv 0.05482 0.1115 0.4919
factor(arrival_station)Bellegarde (Ain) -0.4447 0.1571 -2.831
factor(arrival_station)Besancon Franche Comte Tgv -0.5871 0.188 -3.123
factor(arrival_station)Bordeaux St Jean -1.414 0.2094 -6.754
factor(arrival_station)Brest -1.954 0.2707 -7.216
factor(arrival_station)Chambery Challes Les Eaux -0.1083 0.14 -0.7734
factor(arrival_station)Dijon Ville 0.1024 0.1333 0.7686
factor(arrival_station)Douai -1.287 0.2596 -4.957
factor(arrival_station)Dunkerque -0.7115 0.2247 -3.166
factor(arrival_station)Francfort -1.024 0.2965 -3.454
factor(arrival_station)Geneve -0.7069 0.1746 -4.048
factor(arrival_station)Grenoble -0.1543 0.1396 -1.105
factor(arrival_station)Italie -0.873 0.1796 -4.862
factor(arrival_station)La Rochelle Ville -2.269 0.2721 -8.34
factor(arrival_station)Lausanne -0.5708 0.1713 -3.332
factor(arrival_station)Laval -1.838 0.2531 -7.262
factor(arrival_station)Le Creusot Montceau Montchanin -0.8404 0.2317 -3.627
factor(arrival_station)Le Mans -1.373 0.2102 -6.532
factor(arrival_station)Lille -0.7733 0.1295 -5.971
factor(arrival_station)Lyon Part Dieu -0.07182 0.1088 -0.6601
factor(arrival_station)Macon Loche -0.7938 0.2038 -3.896
factor(arrival_station)Marseille St Charles 0.2078 0.09525 2.182
factor(arrival_station)Metz -1.009 0.2958 -3.41
factor(arrival_station)Montpellier 0.4045 0.1026 3.943
factor(arrival_station)Mulhouse Ville -0.2093 0.1358 -1.541
factor(arrival_station)Nancy -0.9827 0.2963 -3.317
factor(arrival_station)Nantes -1.403 0.2056 -6.825
factor(arrival_station)Nice Ville -0.1285 0.1386 -0.9268
factor(arrival_station)Nimes 0.5525 0.1093 5.055
factor(arrival_station)Paris Est 0.7847 0.2899 2.707
factor(arrival_station)Paris Lyon -0.7336 0.1071 -6.853
factor(arrival_station)Paris Montparnasse 1.171 0.2173 5.388
factor(arrival_station)Paris Nord -0.1813 0.209 -0.8678
factor(arrival_station)Perpignan -0.03003 0.1369 -0.2194
factor(arrival_station)Poitiers -1.614 0.208 -7.761
factor(arrival_station)Quimper -1.838 0.2723 -6.753
factor(arrival_station)Reims -1.142 0.3197 -3.573
factor(arrival_station)Rennes -1.227 0.1811 -6.777
factor(arrival_station)Saint Etienne Chateaucreux -0.924 0.2004 -4.612
factor(arrival_station)St Malo -2.358 0.2886 -8.172
factor(arrival_station)St Pierre Des Corps -1.771 0.2216 -7.992
factor(arrival_station)Strasbourg -0.87 0.2472 -3.52
factor(arrival_station)Stuttgart -1.004 0.3 -3.345
factor(arrival_station)Toulon 0.1894 0.1141 1.66
factor(arrival_station)Toulouse Matabiau -1.998 0.2974 -6.716
factor(arrival_station)Tours -2.008 0.2695 -7.449
factor(arrival_station)Valence Alixan Tgv 0.03144 0.141 0.2229
factor(arrival_station)Vannes -1.581 0.2532 -6.245
factor(arrival_station)Zurich -0.442 0.1646 -2.685
month 0.03619 0.002364 15.31
year 0.174 0.01065 16.34
journey_time_avg 0.00571 0.0005721 9.98
total_num_trips 0.001803 0.0002265 7.961
  Pr(>|z|)
(Intercept) 7.491e-60
factor(departure_station)Angers Saint Laud 2.423e-20
factor(departure_station)Angouleme 1.413e-16
factor(departure_station)Annecy 1.138e-07
factor(departure_station)Arras 5.304e-06
factor(departure_station)Avignon Tgv 2.894e-06
factor(departure_station)Bellegarde (Ain) 5.44e-08
factor(departure_station)Besancon Franche Comte Tgv 8.608e-08
factor(departure_station)Bordeaux St Jean 3.746e-20
factor(departure_station)Brest 2.101e-24
factor(departure_station)Chambery Challes Les Eaux 0.0004702
factor(departure_station)Dijon Ville 0.05579
factor(departure_station)Douai 3.197e-10
factor(departure_station)Dunkerque 1.756e-09
factor(departure_station)Francfort 5.05e-10
factor(departure_station)Geneve 4.541e-08
factor(departure_station)Grenoble 5.94e-10
factor(departure_station)Italie 2.19e-22
factor(departure_station)La Rochelle Ville 4.916e-27
factor(departure_station)Lausanne 2.25e-13
factor(departure_station)Laval 1.065e-17
factor(departure_station)Le Creusot Montceau Montchanin 1.642e-06
factor(departure_station)Le Mans 1.171e-11
factor(departure_station)Lille 4.687e-28
factor(departure_station)Lyon Part Dieu 0.645
factor(departure_station)Macon Loche 0.3826
factor(departure_station)Marseille St Charles 0.01143
factor(departure_station)Metz 3.168e-10
factor(departure_station)Montpellier 0.04804
factor(departure_station)Mulhouse Ville 1.289e-09
factor(departure_station)Nancy 3.802e-12
factor(departure_station)Nantes 8.291e-25
factor(departure_station)Nice Ville 7.56e-06
factor(departure_station)Nimes 1.581e-09
factor(departure_station)Paris Est 0.1411
factor(departure_station)Paris Lyon 6.106e-30
factor(departure_station)Paris Montparnasse 0.0009996
factor(departure_station)Paris Nord 0.03364
factor(departure_station)Perpignan 5.382e-05
factor(departure_station)Poitiers 1.576e-16
factor(departure_station)Quimper 7.762e-22
factor(departure_station)Reims 8.406e-15
factor(departure_station)Rennes 3.899e-17
factor(departure_station)Saint Etienne Chateaucreux 1.284e-11
factor(departure_station)St Malo 4.43e-23
factor(departure_station)St Pierre Des Corps 3.843e-16
factor(departure_station)Strasbourg 6.036e-12
factor(departure_station)Stuttgart 8.025e-14
factor(departure_station)Toulon 0.5707
factor(departure_station)Toulouse Matabiau 5.633e-21
factor(departure_station)Tours 8.382e-23
factor(departure_station)Valence Alixan Tgv 3.487e-11
factor(departure_station)Vannes 8.864e-17
factor(departure_station)Zurich 1.528e-11
factor(arrival_station)Angers Saint Laud 1.394e-10
factor(arrival_station)Angouleme 3.15e-14
factor(arrival_station)Annecy 2.531e-08
factor(arrival_station)Arras 0.0001238
factor(arrival_station)Avignon Tgv 0.6228
factor(arrival_station)Bellegarde (Ain) 0.004638
factor(arrival_station)Besancon Franche Comte Tgv 0.001789
factor(arrival_station)Bordeaux St Jean 1.439e-11
factor(arrival_station)Brest 5.346e-13
factor(arrival_station)Chambery Challes Les Eaux 0.4393
factor(arrival_station)Dijon Ville 0.4421
factor(arrival_station)Douai 7.148e-07
factor(arrival_station)Dunkerque 0.001547
factor(arrival_station)Francfort 0.0005519
factor(arrival_station)Geneve 5.165e-05
factor(arrival_station)Grenoble 0.269
factor(arrival_station)Italie 1.161e-06
factor(arrival_station)La Rochelle Ville 7.437e-17
factor(arrival_station)Lausanne 0.0008612
factor(arrival_station)Laval 3.818e-13
factor(arrival_station)Le Creusot Montceau Montchanin 0.0002866
factor(arrival_station)Le Mans 6.478e-11
factor(arrival_station)Lille 2.351e-09
factor(arrival_station)Lyon Part Dieu 0.5092
factor(arrival_station)Macon Loche 9.798e-05
factor(arrival_station)Marseille St Charles 0.02911
factor(arrival_station)Metz 0.0006502
factor(arrival_station)Montpellier 8.056e-05
factor(arrival_station)Mulhouse Ville 0.1233
factor(arrival_station)Nancy 0.0009104
factor(arrival_station)Nantes 8.795e-12
factor(arrival_station)Nice Ville 0.354
factor(arrival_station)Nimes 4.307e-07
factor(arrival_station)Paris Est 0.006791
factor(arrival_station)Paris Lyon 7.255e-12
factor(arrival_station)Paris Montparnasse 7.115e-08
factor(arrival_station)Paris Nord 0.3855
factor(arrival_station)Perpignan 0.8264
factor(arrival_station)Poitiers 8.409e-15
factor(arrival_station)Quimper 1.453e-11
factor(arrival_station)Reims 0.000353
factor(arrival_station)Rennes 1.23e-11
factor(arrival_station)Saint Etienne Chateaucreux 3.988e-06
factor(arrival_station)St Malo 3.021e-16
factor(arrival_station)St Pierre Des Corps 1.333e-15
factor(arrival_station)Strasbourg 0.0004321
factor(arrival_station)Stuttgart 0.0008225
factor(arrival_station)Toulon 0.09701
factor(arrival_station)Toulouse Matabiau 1.869e-11
factor(arrival_station)Tours 9.399e-14
factor(arrival_station)Valence Alixan Tgv 0.8236
factor(arrival_station)Vannes 4.244e-10
factor(arrival_station)Zurich 0.007251
month 6.705e-53
year 4.767e-60
journey_time_avg 1.865e-23
total_num_trips 1.701e-15

(Dispersion parameter for poisson family taken to be 1 )

Null deviance: 17368 on 4025 degrees of freedom
Residual deviance: 9184 on 3915 degrees of freedom
autoplot(pois.model.60) + theme_bw()

Analysis

As we said before the increase in the number of late trains in certain cities might be due to the higher number of trains in those stations. Hence, I would like to look at the anova tables when controlling for all other variables.

model <- lm(data = train, num_greater_15_min_late ~ total_num_trips + journey_time_avg + month + year + factor(departure_station) + factor(arrival_station) )

e <- residuals(model)

sHat <- predict(lm(abs(e) ~ total_num_trips + journey_time_avg + month + year + factor(departure_station) + factor(arrival_station) , data = train))

control <-lm(num_greater_15_min_late ~ total_num_trips + journey_time_avg + month + year + factor(departure_station) + factor(arrival_station) , data = train, weights = 1/(sHat^2))

pander(anova(control))
Analysis of Variance Table
  Df Sum Sq Mean Sq F value Pr(>F)
total_num_trips 1 5887 5887 3803 0
journey_time_avg 1 2760 2760 1783 1.828e-321
month 1 330 330 213.2 4.664e-47
year 1 856.4 856.4 553.2 1.547e-114
factor(departure_station) 53 2690 50.76 32.79 5.669e-268
factor(arrival_station) 53 1797 33.91 21.91 1.018e-179
Residuals 3915 6060 1.548 NA NA

My initial guess was that the total number of trains in the station was the reason why trains were late but even after controlling for all other variations, the departure and arrival station of the trains still have a high statistical impact on the number of trains late.

Since the models for the 30 and 60 min late trains are poisson models, I cannot do an F-test controlling for other variables, but looking at how similar the data for all response variables are, I am certain that the departure and arrival stations have a significant impact on the number of late trains.

Cross-Validation Rrrors and Models’ Performances

MAE.CV <- vector(length = 10)

for (i in 1:10){
    cvTestIndex<- sample(1:nrow(trains), nrow(train)/10)   
    cvDataTest  <- trains[cvTestIndex,]
    cvDataTrain <- trains[-cvTestIndex,]
    
        model <- lm(data = cvDataTrain, num_greater_15_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips)

    
    e <- residuals(model)
    
    sHat <- predict(lm(abs(e) ~ factor(departure_station) + factor(arrival_station)+ month + year + journey_time_avg + total_num_trips , data = cvDataTrain))
    
    weighted.ls.cv <-lm(num_greater_15_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips , data = cvDataTrain, weights = 1/(sHat^2))
    
    trainPred <- predict(weighted.ls.cv, newdata = cvDataTest) 
    
    MAE <- sum(abs( trainPred- cvDataTest$num_greater_15_min_late))/nrow(cvDataTest)
    
    MAE.CV[i] <- MAE
}
MAE.CV <- mean(MAE.CV)
pander(paste("The cross valisation error for predicting trains that will be at least 15 min late is ", MAE.CV))

The cross valisation error for predicting trains that will be at least 15 min late is 7.8401486857844

MAE.CV <- vector(length = 10)

for (i in 1:10){
    cvTestIndex<- sample(1:nrow(trains), nrow(train)/10)   
    cvDataTest  <- trains[cvTestIndex,]
    cvDataTrain <- trains[-cvTestIndex,]
    
        model <- glm(data = train, num_greater_30_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips,family=poisson(link=log))

    
    trainPred <- predict(model, newdata = cvDataTest) 
    
    MAE <- sum((abs(trainPred- cvDataTest$num_greater_30_min_late)))/nrow(cvDataTest)
    
    MAE.CV[i] <- MAE
}
MAE.CV <- mean(MAE.CV)
pander(paste("The cross valisation error for predicting trains that will be at least 30 min late is ", MAE.CV))

The cross valisation error for predicting trains that will be at least 30 min late is 8.9165172765

MAE.CV <- vector(length = 10)

for (i in 1:10){
    cvTestIndex<- sample(1:nrow(trains), nrow(train)/10)   
    cvDataTest  <- trains[cvTestIndex,]
    cvDataTrain <- trains[-cvTestIndex,]
    
        model <- glm(data = train, num_greater_60_min_late ~ factor(departure_station)+factor(arrival_station)+ month + year + journey_time_avg + total_num_trips,family=poisson(link=log))

    
    trainPred <- predict(model, newdata = cvDataTest) 
    
    MAE <- sum((abs(trainPred- cvDataTest$num_greater_60_min_late)))/nrow(cvDataTest)
    
    MAE.CV[i] <- MAE
}
MAE.CV <- mean(MAE.CV)
pander(paste("The cross valisation error for predicting trains that will be at least 60 min late is ", MAE.CV))

The cross valisation error for predicting trains that will be at least 60 min late is 3.04501014522092

Looking at the cross-validation errors, we can see that we can predict the number of trains late very accurately. I used a mean absolute error in order to measure the error between the true value and our predictions just to have a better idea of how close I was to the real result. Since the values for the response variables range from 0 to ~25000, this is an amazing performance. In particular, it is important to understand what the good performance of our model implies on the SNCF train delays.

Conclusion and Observations

Train delays, when not caused by external problems should be random events that are hard to predict. However since our model (which has only 6 explanatory variables) is able to predict the number of late trains. This seems to suggest that the delays in SNCF’s trains is a structural problem. In fact, looking at the anova and summary tables a few quick solutions to this delay problem seem to appear:

  • Increaseing the the sizes of stations or their operational capacities: The number of trains late seem to increase as the number of trains in the station increases, this seems to suggest that big train stations do not have the logistical capacity to handle the number of trains in the station.

  • There is an increase in the number of trains late in the summer months. This seems to suggest a lack of precautions taken by the SNCF to prevent track failures etc… due to heat.

  • Finally, there is a clear relationship between cities of arrival/departure and number of late trains. This relationship exists even when controling for the effect of all other variables. This suggests that the SNCF administration should investigate further certain regions in order to understand better if those regions face more infrastructure or labor issues and bring those regions closer to the national average by taking the nessecary measures.

If you have any questions or suggestions about the post, please drop me a line!