Ruta completa de un único taxi

Inspirándonos en una visualización de Chris Whong, hemos querido simularla.

Para ello, se ha obtenido el dataset de esa visualización, disponible en http://chriswhong.github.io/nyctaxi/. De ese dataset se han seleccionado todos los datos de un día para un único taxi. Como el archivo descargado era tan grande y sólo necesitábamos un pequeño conjunto de este, aquí mostramos únicamente el CSV creado a partir de la obtención de los datos necesarios. Este CSV se llama DiaTaxi.csv.

Ahora veremos que con este dataset se ha realizado:

  1. Un pequeño tratamiento de los datos
  2. Obtención de las rutas de los trayectos a través de una llamada a la API de Google Maps dado que el dataset sólo nos da información sobre los puntos de origen y fin del trayecto.
In [3]:
#Importación de las librerias necesarias
import pandas as pd
from datetime import datetime
import math
import datetime
In [4]:
#Llamada al dataset con la información sobre los datos de un día para un único taxi
data = pd.read_csv("DiaTaxi.csv")
data
Out[4]:
Unnamed: 0 Unnamed: 0.1 medallion hack_license vendor_id rate_code store_and_fwd_flag pickup_datetime dropoff_datetime passenger_count trip_time_in_secs trip_distance pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude pickup_hora dropoff_hora duracion_segundos duracion_minutos
0 0 4853067 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 03:32:37 2013-02-28 03:40:16 1 459 2.7 -73.998985 40.761002 -73.963074 40.766300 03:32:37 03:40:16 459.0 7.65
1 1 4858433 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:04:30 2013-02-28 04:10:20 1 349 1.6 -74.001343 40.739399 -73.991425 40.731869 04:04:30 04:10:20 350.0 5.83
2 2 4858434 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:12:01 2013-02-28 04:14:04 1 123 0.6 -73.995049 40.727573 -73.994247 40.721252 04:12:01 04:14:04 123.0 2.05
3 3 3855524 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:24:29 2013-02-28 04:46:04 1 1294 6.5 -73.983955 40.725346 -73.962120 40.798973 04:24:29 04:46:04 1295.0 21.58
4 4 4859952 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:56:03 2013-02-28 05:01:05 2 302 2.1 -73.990120 40.762035 -73.997993 40.737797 04:56:03 05:01:05 302.0 5.03
5 5 5209030 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:27:48 2013-02-28 05:39:55 1 726 4.9 -73.992142 40.764126 -73.935196 40.796272 05:27:48 05:39:55 727.0 12.12
6 6 5228844 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:42:59 2013-02-28 05:48:43 1 342 1.7 -73.941490 40.791965 -73.959343 40.774483 05:42:59 05:48:43 344.0 5.73
7 7 5216844 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:54:02 2013-02-28 05:56:44 1 162 1.0 -73.975533 40.752213 -73.985504 40.741493 05:54:02 05:56:44 162.0 2.70
8 8 4873477 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:13:25 2013-02-28 06:16:39 1 194 0.9 -73.994194 40.751144 -73.984360 40.761513 06:13:25 06:16:39 194.0 3.23
9 9 5218280 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:25:27 2013-02-28 06:31:40 1 372 1.6 -73.990211 40.756058 -73.969986 40.762260 06:25:27 06:31:40 373.0 6.22
10 10 5217862 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:40:07 2013-02-28 06:43:03 1 175 1.0 -73.975151 40.752060 -73.985313 40.741177 06:40:07 06:43:03 176.0 2.93
11 11 4886705 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 5 N 2013-02-28 06:50:21 2013-02-28 07:21:58 1 1896 17.3 -73.987671 40.738258 -73.782166 40.644760 06:50:21 07:21:58 1897.0 31.62
12 12 4864450 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 09:02:17 2013-02-28 09:47:10 1 2692 11.1 -73.865662 40.771130 -74.005043 40.730316 09:02:17 09:47:10 2693.0 44.88
13 13 4891613 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:04:27 2013-02-28 10:16:50 1 742 0.9 -73.984047 40.737625 -73.975143 40.749741 10:04:27 10:16:50 743.0 12.38
14 14 5250202 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:20:36 2013-02-28 10:35:34 1 898 1.0 -73.973465 40.752602 -73.984146 40.760006 10:20:36 10:35:34 898.0 14.97
15 15 4899879 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:39:24 2013-02-28 10:53:27 1 842 1.1 -73.986603 40.756931 -73.974510 40.758270 10:39:24 10:53:27 843.0 14.05
16 16 4891848 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:00:14 2013-02-28 11:06:41 1 387 1.1 -73.973816 40.763123 -73.984688 40.748859 11:00:14 11:06:41 387.0 6.45
17 17 5220486 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:07:58 2013-02-28 11:23:33 1 934 1.4 -73.985016 40.748283 -73.980049 40.761963 11:07:58 11:23:33 935.0 15.58
18 18 4889753 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:27:08 2013-02-28 11:33:02 1 353 1.6 -73.977539 40.766296 -73.968658 40.786140 11:27:08 11:33:02 354.0 5.90
19 19 5230129 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 12:01:06 2013-02-28 12:11:22 1 616 2.8 -73.973137 40.790482 -73.939919 40.805435 12:01:06 12:11:22 616.0 10.27
20 20 4358454 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:16:46 2013-02-28 14:20:19 1 213 0.9 -73.967979 40.802135 -73.975296 40.790077 14:16:46 14:20:19 213.0 3.55
21 21 5274637 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:21:31 2013-02-28 14:26:37 1 306 0.8 -73.976151 40.788754 -73.972382 40.781311 14:21:31 14:26:37 306.0 5.10
22 22 4364202 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:27:40 2013-02-28 14:40:42 1 782 1.7 -73.973557 40.779659 -73.961227 40.769127 14:27:40 14:40:42 782.0 13.03
23 23 4399762 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:43:46 2013-02-28 14:47:10 1 203 0.5 -73.966888 40.767292 -73.971664 40.760464 14:43:46 14:47:10 204.0 3.40
24 24 4425098 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:48:16 2013-02-28 15:01:48 1 812 1.0 -73.972458 40.759346 -73.981621 40.746754 14:48:16 15:01:48 812.0 13.53
25 25 5304216 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:03:46 2013-02-28 15:11:12 1 446 1.2 -73.983513 40.743973 -73.981995 40.757545 15:03:46 15:11:12 446.0 7.43
26 26 5268010 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:12:45 2013-02-28 15:22:34 1 588 1.4 -73.980980 40.759254 -74.002563 40.760693 15:12:45 15:22:34 589.0 9.82
27 27 4416107 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:35:48 2013-02-28 15:47:52 1 723 1.6 -74.004250 40.742203 -73.987595 40.753639 15:35:48 15:47:52 724.0 12.07
28 28 4404304 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:49:25 2013-02-28 15:56:20 1 415 1.0 -73.985001 40.753620 -74.000328 40.761204 15:49:25 15:56:20 415.0 6.92
29 29 4429201 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:57:38 2013-02-28 16:06:40 1 541 1.7 -73.999062 40.761135 -73.994888 40.742882 15:57:38 16:06:40 542.0 9.03
30 30 4437097 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:08:11 2013-02-28 16:27:03 1 1131 3.4 -73.993073 40.742371 -73.956802 40.771358 16:08:11 16:27:03 1132.0 18.87
31 31 5295737 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:29:03 2013-02-28 16:48:35 1 1171 3.9 -73.953987 40.770142 -73.984512 40.736214 16:29:03 16:48:35 1172.0 19.53
32 32 5285863 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:50:50 2013-02-28 16:59:34 1 523 1.4 -73.984261 40.737247 -73.995293 40.749680 16:50:50 16:59:34 524.0 8.73
33 33 5291394 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:00:09 2013-02-28 17:14:43 1 874 2.8 -73.994957 40.750187 -73.988258 40.727261 17:00:09 17:14:43 874.0 14.57
34 34 5302287 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:21:00 2013-02-28 17:23:54 1 174 0.5 -73.984116 40.725578 -73.982887 40.731018 17:21:00 17:23:54 174.0 2.90
35 35 5311526 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:28:46 2013-02-28 17:48:08 1 1161 2.6 -73.991341 40.732086 -73.982758 40.759644 17:28:46 17:48:08 1162.0 19.37
36 36 5286085 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 Y 2013-02-28 17:49:04 2013-02-28 17:52:25 1 200 0.4 -73.977997 40.760799 -73.984474 40.764027 17:49:04 17:52:25 201.0 3.35
37 37 5292518 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:53:46 2013-02-28 18:00:49 1 422 0.8 -73.984436 40.764111 -73.991989 40.759190 17:53:46 18:00:49 423.0 7.05
38 38 5310008 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:05:04 2013-02-28 18:16:54 1 710 2.0 -73.996635 40.753105 -73.976631 40.736141 18:05:04 18:16:54 710.0 11.83
39 39 4405495 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:18:55 2013-02-28 18:43:51 1 1495 2.9 -73.978798 40.737137 -73.982971 40.767139 18:18:55 18:43:51 1496.0 24.93
40 40 5326657 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:50:31 2013-02-28 18:57:21 1 410 0.8 -73.983330 40.760937 -73.991287 40.750076 18:50:31 18:57:21 410.0 6.83
41 41 4453207 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 19:49:45 2013-02-28 20:00:04 1 618 1.6 -73.983612 40.738155 -73.995911 40.717445 19:49:45 20:00:04 619.0 10.32
42 42 5326944 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:12:59 2013-02-28 20:21:33 1 513 2.1 -74.008774 40.713879 -73.987732 40.729984 20:12:59 20:21:33 514.0 8.57
43 43 5328064 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:29:13 2013-02-28 20:39:14 1 600 1.8 -73.990906 40.734627 -73.972176 40.745735 20:29:13 20:39:14 601.0 10.02
44 44 4437662 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:52:36 2013-02-28 21:12:16 1 1180 4.3 -73.996918 40.731537 -73.956650 40.771324 20:52:36 21:12:16 1180.0 19.67
45 45 1306888 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:20:07 2013-02-28 21:37:54 1 1066 3.3 -73.970596 40.761681 -73.993568 40.721722 21:20:07 21:37:54 1067.0 17.78
46 46 4441009 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:44:02 2013-02-28 21:48:17 1 255 1.0 -73.993988 40.724686 -73.999847 40.734581 21:44:02 21:48:17 255.0 4.25
47 47 5310477 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:49:41 2013-02-28 21:53:44 1 243 0.8 -73.997200 40.736732 -73.988594 40.745213 21:49:41 21:53:44 243.0 4.05
48 48 4449737 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:54:14 2013-02-28 21:56:35 1 140 0.8 -73.988571 40.745205 -73.994270 40.735416 21:54:14 21:56:35 141.0 2.35
49 49 4453338 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:57:17 2013-02-28 22:02:38 1 321 0.7 -73.994415 40.735252 -74.004173 40.732151 21:57:17 22:02:38 321.0 5.35
50 50 4452186 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:05:17 2013-02-28 22:15:49 1 632 3.5 -74.003792 40.732025 -73.990784 40.690674 22:05:17 22:15:49 632.0 10.53
51 51 4469555 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:21:18 2013-02-28 22:30:31 1 552 3.1 -73.988808 40.700756 -73.963615 40.713688 22:21:18 22:30:31 553.0 9.22
52 52 5322069 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:32:13 2013-02-28 22:37:47 1 334 1.3 -73.961754 40.713947 -73.939301 40.715984 22:32:13 22:37:47 334.0 5.57
53 53 4496765 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:41:35 2013-02-28 22:46:09 1 274 1.1 -73.939919 40.715862 -73.959610 40.714108 22:41:35 22:46:09 274.0 4.57
54 54 4459176 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:56:41 2013-02-28 23:04:28 1 466 2.0 -73.985687 40.726879 -74.003571 40.744179 22:56:41 23:04:28 467.0 7.78
55 55 5348712 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:05:43 2013-02-28 23:15:05 1 561 2.0 -74.005798 40.745010 -74.004761 40.723461 23:05:43 23:15:05 562.0 9.37
56 56 5345566 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:17:26 2013-02-28 23:22:15 1 288 1.5 -74.002434 40.729027 -73.988289 40.746529 23:17:26 23:22:15 289.0 4.82
57 57 4486866 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:23:36 2013-02-28 23:37:46 1 850 3.8 -73.988007 40.745441 -73.946434 40.775642 23:23:36 23:37:46 850.0 14.17
58 58 4460313 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:42:31 2013-02-28 23:57:31 1 900 6.9 -73.952095 40.777088 -74.005455 40.706661 23:42:31 23:57:31 900.0 15.00

Del mismo modo que en Jupiter Notebook "Limpieza de datos", calculamos la duración total de cada trayecto

In [5]:
pickup_hora = [pickup_time.split(" ")[1] for pickup_time in data["pickup_datetime"]]
dropoff_hora = [dropoff_time.split(" ")[1] for dropoff_time in data["dropoff_datetime"]]

data['pickup_hora'] = pd.Series(pickup_hora, index=data.index)
data['dropoff_hora'] = pd.Series(dropoff_hora, index=data.index)

duracion_segundos=[]
duracion_minutos=[]
date_format = "%Y-%m-%d %H:%M:%S"
for index, fecha in enumerate(data["pickup_datetime"]):
    a = datetime.strptime(data["pickup_datetime"][index], date_format)
    b = datetime.strptime(data["dropoff_datetime"][index], date_format)
    c = b - a
    segundos = c.total_seconds()
    duracion_segundos.append(segundos)
    minutos = c.total_seconds()/60
    duracion_minutos.append(minutos)
    
data['duracion_segundos'] = pd.Series(duracion_segundos, index=data.index)
data['duracion_minutos'] = pd.Series(duracion_minutos, index=data.index)
data.head()
Out[5]:
Unnamed: 0 Unnamed: 0.1 medallion hack_license vendor_id rate_code store_and_fwd_flag pickup_datetime dropoff_datetime passenger_count trip_time_in_secs trip_distance pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude pickup_hora dropoff_hora duracion_segundos duracion_minutos
0 0 4853067 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 03:32:37 2013-02-28 03:40:16 1 459 2.7 -73.998985 40.761002 -73.963074 40.766300 03:32:37 03:40:16 459.0 7.650000
1 1 4858433 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:04:30 2013-02-28 04:10:20 1 349 1.6 -74.001343 40.739399 -73.991425 40.731869 04:04:30 04:10:20 350.0 5.833333
2 2 4858434 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:12:01 2013-02-28 04:14:04 1 123 0.6 -73.995049 40.727573 -73.994247 40.721252 04:12:01 04:14:04 123.0 2.050000
3 3 3855524 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:24:29 2013-02-28 04:46:04 1 1294 6.5 -73.983955 40.725346 -73.962120 40.798973 04:24:29 04:46:04 1295.0 21.583333
4 4 4859952 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:56:03 2013-02-28 05:01:05 2 302 2.1 -73.990120 40.762035 -73.997993 40.737797 04:56:03 05:01:05 302.0 5.033333

Como podemos ver en la tabla anterior, las rutas del taxi durante ese día se presentan desordenas y es muy importante que estas se presenten en orden temporal para crear la animación correctamente en la visualización en Carto, por tanto, las vamos a ordenar según la hora de pickup.

In [6]:
data = data.sort_values(by=['pickup_hora'])
data
Out[6]:
Unnamed: 0 Unnamed: 0.1 medallion hack_license vendor_id rate_code store_and_fwd_flag pickup_datetime dropoff_datetime passenger_count trip_time_in_secs trip_distance pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude pickup_hora dropoff_hora duracion_segundos duracion_minutos
0 0 4853067 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 03:32:37 2013-02-28 03:40:16 1 459 2.7 -73.998985 40.761002 -73.963074 40.766300 03:32:37 03:40:16 459.0 7.650000
1 1 4858433 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:04:30 2013-02-28 04:10:20 1 349 1.6 -74.001343 40.739399 -73.991425 40.731869 04:04:30 04:10:20 350.0 5.833333
2 2 4858434 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:12:01 2013-02-28 04:14:04 1 123 0.6 -73.995049 40.727573 -73.994247 40.721252 04:12:01 04:14:04 123.0 2.050000
3 3 3855524 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:24:29 2013-02-28 04:46:04 1 1294 6.5 -73.983955 40.725346 -73.962120 40.798973 04:24:29 04:46:04 1295.0 21.583333
4 4 4859952 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 04:56:03 2013-02-28 05:01:05 2 302 2.1 -73.990120 40.762035 -73.997993 40.737797 04:56:03 05:01:05 302.0 5.033333
5 5 5209030 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:27:48 2013-02-28 05:39:55 1 726 4.9 -73.992142 40.764126 -73.935196 40.796272 05:27:48 05:39:55 727.0 12.116667
6 6 5228844 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:42:59 2013-02-28 05:48:43 1 342 1.7 -73.941490 40.791965 -73.959343 40.774483 05:42:59 05:48:43 344.0 5.733333
7 7 5216844 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 05:54:02 2013-02-28 05:56:44 1 162 1.0 -73.975533 40.752213 -73.985504 40.741493 05:54:02 05:56:44 162.0 2.700000
8 8 4873477 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:13:25 2013-02-28 06:16:39 1 194 0.9 -73.994194 40.751144 -73.984360 40.761513 06:13:25 06:16:39 194.0 3.233333
9 9 5218280 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:25:27 2013-02-28 06:31:40 1 372 1.6 -73.990211 40.756058 -73.969986 40.762260 06:25:27 06:31:40 373.0 6.216667
10 10 5217862 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 06:40:07 2013-02-28 06:43:03 1 175 1.0 -73.975151 40.752060 -73.985313 40.741177 06:40:07 06:43:03 176.0 2.933333
11 11 4886705 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 5 N 2013-02-28 06:50:21 2013-02-28 07:21:58 1 1896 17.3 -73.987671 40.738258 -73.782166 40.644760 06:50:21 07:21:58 1897.0 31.616667
12 12 4864450 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 09:02:17 2013-02-28 09:47:10 1 2692 11.1 -73.865662 40.771130 -74.005043 40.730316 09:02:17 09:47:10 2693.0 44.883333
13 13 4891613 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:04:27 2013-02-28 10:16:50 1 742 0.9 -73.984047 40.737625 -73.975143 40.749741 10:04:27 10:16:50 743.0 12.383333
14 14 5250202 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:20:36 2013-02-28 10:35:34 1 898 1.0 -73.973465 40.752602 -73.984146 40.760006 10:20:36 10:35:34 898.0 14.966667
15 15 4899879 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 10:39:24 2013-02-28 10:53:27 1 842 1.1 -73.986603 40.756931 -73.974510 40.758270 10:39:24 10:53:27 843.0 14.050000
16 16 4891848 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:00:14 2013-02-28 11:06:41 1 387 1.1 -73.973816 40.763123 -73.984688 40.748859 11:00:14 11:06:41 387.0 6.450000
17 17 5220486 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:07:58 2013-02-28 11:23:33 1 934 1.4 -73.985016 40.748283 -73.980049 40.761963 11:07:58 11:23:33 935.0 15.583333
18 18 4889753 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 11:27:08 2013-02-28 11:33:02 1 353 1.6 -73.977539 40.766296 -73.968658 40.786140 11:27:08 11:33:02 354.0 5.900000
19 19 5230129 1B5C0970F2AE8CFFBA8AE4584BEAED29 F444BD8FE21D550FD8836840A6F6629D CMT 1 N 2013-02-28 12:01:06 2013-02-28 12:11:22 1 616 2.8 -73.973137 40.790482 -73.939919 40.805435 12:01:06 12:11:22 616.0 10.266667
20 20 4358454 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:16:46 2013-02-28 14:20:19 1 213 0.9 -73.967979 40.802135 -73.975296 40.790077 14:16:46 14:20:19 213.0 3.550000
21 21 5274637 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:21:31 2013-02-28 14:26:37 1 306 0.8 -73.976151 40.788754 -73.972382 40.781311 14:21:31 14:26:37 306.0 5.100000
22 22 4364202 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:27:40 2013-02-28 14:40:42 1 782 1.7 -73.973557 40.779659 -73.961227 40.769127 14:27:40 14:40:42 782.0 13.033333
23 23 4399762 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:43:46 2013-02-28 14:47:10 1 203 0.5 -73.966888 40.767292 -73.971664 40.760464 14:43:46 14:47:10 204.0 3.400000
24 24 4425098 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 14:48:16 2013-02-28 15:01:48 1 812 1.0 -73.972458 40.759346 -73.981621 40.746754 14:48:16 15:01:48 812.0 13.533333
25 25 5304216 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:03:46 2013-02-28 15:11:12 1 446 1.2 -73.983513 40.743973 -73.981995 40.757545 15:03:46 15:11:12 446.0 7.433333
26 26 5268010 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:12:45 2013-02-28 15:22:34 1 588 1.4 -73.980980 40.759254 -74.002563 40.760693 15:12:45 15:22:34 589.0 9.816667
27 27 4416107 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:35:48 2013-02-28 15:47:52 1 723 1.6 -74.004250 40.742203 -73.987595 40.753639 15:35:48 15:47:52 724.0 12.066667
28 28 4404304 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:49:25 2013-02-28 15:56:20 1 415 1.0 -73.985001 40.753620 -74.000328 40.761204 15:49:25 15:56:20 415.0 6.916667
29 29 4429201 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 15:57:38 2013-02-28 16:06:40 1 541 1.7 -73.999062 40.761135 -73.994888 40.742882 15:57:38 16:06:40 542.0 9.033333
30 30 4437097 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:08:11 2013-02-28 16:27:03 1 1131 3.4 -73.993073 40.742371 -73.956802 40.771358 16:08:11 16:27:03 1132.0 18.866667
31 31 5295737 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:29:03 2013-02-28 16:48:35 1 1171 3.9 -73.953987 40.770142 -73.984512 40.736214 16:29:03 16:48:35 1172.0 19.533333
32 32 5285863 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 16:50:50 2013-02-28 16:59:34 1 523 1.4 -73.984261 40.737247 -73.995293 40.749680 16:50:50 16:59:34 524.0 8.733333
33 33 5291394 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:00:09 2013-02-28 17:14:43 1 874 2.8 -73.994957 40.750187 -73.988258 40.727261 17:00:09 17:14:43 874.0 14.566667
34 34 5302287 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:21:00 2013-02-28 17:23:54 1 174 0.5 -73.984116 40.725578 -73.982887 40.731018 17:21:00 17:23:54 174.0 2.900000
35 35 5311526 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:28:46 2013-02-28 17:48:08 1 1161 2.6 -73.991341 40.732086 -73.982758 40.759644 17:28:46 17:48:08 1162.0 19.366667
36 36 5286085 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 Y 2013-02-28 17:49:04 2013-02-28 17:52:25 1 200 0.4 -73.977997 40.760799 -73.984474 40.764027 17:49:04 17:52:25 201.0 3.350000
37 37 5292518 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 17:53:46 2013-02-28 18:00:49 1 422 0.8 -73.984436 40.764111 -73.991989 40.759190 17:53:46 18:00:49 423.0 7.050000
38 38 5310008 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:05:04 2013-02-28 18:16:54 1 710 2.0 -73.996635 40.753105 -73.976631 40.736141 18:05:04 18:16:54 710.0 11.833333
39 39 4405495 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:18:55 2013-02-28 18:43:51 1 1495 2.9 -73.978798 40.737137 -73.982971 40.767139 18:18:55 18:43:51 1496.0 24.933333
40 40 5326657 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 18:50:31 2013-02-28 18:57:21 1 410 0.8 -73.983330 40.760937 -73.991287 40.750076 18:50:31 18:57:21 410.0 6.833333
41 41 4453207 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 19:49:45 2013-02-28 20:00:04 1 618 1.6 -73.983612 40.738155 -73.995911 40.717445 19:49:45 20:00:04 619.0 10.316667
42 42 5326944 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:12:59 2013-02-28 20:21:33 1 513 2.1 -74.008774 40.713879 -73.987732 40.729984 20:12:59 20:21:33 514.0 8.566667
43 43 5328064 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:29:13 2013-02-28 20:39:14 1 600 1.8 -73.990906 40.734627 -73.972176 40.745735 20:29:13 20:39:14 601.0 10.016667
44 44 4437662 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 20:52:36 2013-02-28 21:12:16 1 1180 4.3 -73.996918 40.731537 -73.956650 40.771324 20:52:36 21:12:16 1180.0 19.666667
45 45 1306888 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:20:07 2013-02-28 21:37:54 1 1066 3.3 -73.970596 40.761681 -73.993568 40.721722 21:20:07 21:37:54 1067.0 17.783333
46 46 4441009 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:44:02 2013-02-28 21:48:17 1 255 1.0 -73.993988 40.724686 -73.999847 40.734581 21:44:02 21:48:17 255.0 4.250000
47 47 5310477 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:49:41 2013-02-28 21:53:44 1 243 0.8 -73.997200 40.736732 -73.988594 40.745213 21:49:41 21:53:44 243.0 4.050000
48 48 4449737 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:54:14 2013-02-28 21:56:35 1 140 0.8 -73.988571 40.745205 -73.994270 40.735416 21:54:14 21:56:35 141.0 2.350000
49 49 4453338 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 21:57:17 2013-02-28 22:02:38 1 321 0.7 -73.994415 40.735252 -74.004173 40.732151 21:57:17 22:02:38 321.0 5.350000
50 50 4452186 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:05:17 2013-02-28 22:15:49 1 632 3.5 -74.003792 40.732025 -73.990784 40.690674 22:05:17 22:15:49 632.0 10.533333
51 51 4469555 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:21:18 2013-02-28 22:30:31 1 552 3.1 -73.988808 40.700756 -73.963615 40.713688 22:21:18 22:30:31 553.0 9.216667
52 52 5322069 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:32:13 2013-02-28 22:37:47 1 334 1.3 -73.961754 40.713947 -73.939301 40.715984 22:32:13 22:37:47 334.0 5.566667
53 53 4496765 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:41:35 2013-02-28 22:46:09 1 274 1.1 -73.939919 40.715862 -73.959610 40.714108 22:41:35 22:46:09 274.0 4.566667
54 54 4459176 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 22:56:41 2013-02-28 23:04:28 1 466 2.0 -73.985687 40.726879 -74.003571 40.744179 22:56:41 23:04:28 467.0 7.783333
55 55 5348712 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:05:43 2013-02-28 23:15:05 1 561 2.0 -74.005798 40.745010 -74.004761 40.723461 23:05:43 23:15:05 562.0 9.366667
56 56 5345566 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:17:26 2013-02-28 23:22:15 1 288 1.5 -74.002434 40.729027 -73.988289 40.746529 23:17:26 23:22:15 289.0 4.816667
57 57 4486866 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:23:36 2013-02-28 23:37:46 1 850 3.8 -73.988007 40.745441 -73.946434 40.775642 23:23:36 23:37:46 850.0 14.166667
58 58 4460313 1B5C0970F2AE8CFFBA8AE4584BEAED29 D961332334524990D1BBD462E2EFB8A4 CMT 1 N 2013-02-28 23:42:31 2013-02-28 23:57:31 1 900 6.9 -73.952095 40.777088 -74.005455 40.706661 23:42:31 23:57:31 900.0 15.000000

Ahora guardamos el archivo modificado pues lo vamos a tener que usar fuera de Jupypter Notebook.

In [7]:
data.to_csv("DiaTaxi.csv")

El siguiente script no ha sido corrido en Jupyter Notebook pues forma parte de una librería encontrada para convertir las rutas de Google Maps en un archivo geojson y para ello, eran necesarios correr algunos otros archivos y requerimientos de la librería.

La librería encontrada y usada fue: https://github.com/kexin-zhang/gmaps2geojson

El script principal usado ha sido el siguiente, en el cual se informa explicitamente mediante comentarios qué fue lo que se cambió del código para adaptarlo a nuestro caso

In [ ]:
import requests
import polyline
import json
import os
import pandas as pd   #Necesitamos la librería pandas, por lo que la importamos

#Aquí utilizo mi key de Google Maps en ese momento, si lo quieres correr por tu cuenta, seguramente deberás usar otra.
os.environ["GMAPS_KEY"] = "AIzaSyCYaEtCgI-orYSze2msqULpaK-g0gl5jdQ"

class Writer:
    def __init__(self):
        self.features = []

    def query(self, src, dest, custom_label = None):
        request_url = 'https://maps.googleapis.com/maps/api/directions/json?origin="{0}"&destination="{1}"&key={2}'
        .format(src, dest, os.environ["GMAPS_KEY"])
        try:
            r = requests.get(request_url)
            results = r.json()
            route = results['routes'][0]['overview_polyline']['points']
            coords = polyline.decode(route)
        except Exception as e:
            print("Error querying for directions for {0} tp {1} -- {2}".format(src, dest, str(e)))
            return

        #reverse order to comply with geojson spec
        coords_list = [[lon, lat] for lat, lon in coords]

        default_name = "{0} to {1}".format(src, dest)
        self.features.append({
            "type": "Feature",
            "properties": {
                "name": custom_label or default_name
            },
            "geometry": {
                "type": "MultiLineString",
                "coordinates": [coords_list]
            }
        })
        return [[lat, lon] for lat, lon in coords]

    def save(self, filename):
        geojson = {"type": "FeatureCollection", "features": self.features}
        with open(filename, "w") as out:
            json.dump(geojson, out)

#Código adaptado a nuestro caso a partir de aquí.
#Se hace una llamada en bucle de los pickups y dropoffs de todos los puntos que tenemos 
#y se guardan en un archivo geojson
filename = "DiaTaxi.csv"
data = pd.read_csv(filename)

writer = Writer()
for index, latitude in enumerate(data["pickup_longitude"]):
    s = ",";
    seq_pickup = (str(data["pickup_latitude"][index]), str(data["pickup_longitude"][index]))
    pickup = s.join(seq_pickup)
    
    seq_dropoff = (str(data["dropoff_latitude"][index]), str(data["dropoff_longitude"][index]))
    dropoff = s.join(seq_dropoff)
    
    writer.query(pickup, dropoff)

writer.save("DiaTaxi.geojson")

Este script, llamado "DiaTaxiToGeojson.py", dentro de la carpeta "DiaTaxi-get-geojson", como hemos dicho, simplemente se ha corrido en Visual Studio y nos ha dado un archivo GEOJSON utilizado para:

  1. Incluirlo como capa de las rutas en Carto
  2. Conseguir un CSV con cada conjunto de coordenadas recorridas por el taxi, expuestas en un determinado intervalo de tiempo para crear el efecto animación

Para conseguir lo segundo, necesitamos trabajar con los datos del archivo geojson conseguido pero en formato csv por lo que transformamos el archivo geojson en csv mediante esta página: http://www.convertcsv.com/geojson-to-csv.htm (no se si habrá algún modo de hacerlo a través de Python).

In [9]:
#Llamada al dataset
filename = "DiaTaxiGeojson.csv"
data = pd.read_csv(filename)
data.head()
Out[9]:
latitude longitude altitude geometry coordinates name
0 NaN NaN NaN MultiLineString -73.99898,40.761,-73.99892,40.76108,-73.99832,... 40.76100200000001,-73.998985 to 40.7663,-73.96...
1 NaN NaN NaN MultiLineString -74.00142,40.73929,-73.993,40.73575,-73.99162,... 40.739399,-74.001343 to 40.731869,-73.991425
2 NaN NaN NaN MultiLineString -73.995,40.72755,-73.99586,40.72653,-73.99507,... 40.727573,-73.995049 to 40.721252,-73.994247
3 NaN NaN NaN MultiLineString -73.98404,40.72538,-73.98397,40.72545,-73.9835... 40.725346,-73.98395500000002 to 40.798973,-73....
4 NaN NaN NaN MultiLineString -73.99005,40.76201,-73.99054,40.76131,-73.9912... 40.762035,-73.99011999999998 to 40.737797,-73....

Podemos ver que, en la columna "coordinates" tenemos el conjunto de coordenadas de cada ruta seguida. Para que estas sean útiles, tendremos que poner cada conjunto de coordenadas en una fila diferente, diferenciando entre latitud y longitud.

In [10]:
#Comprobación de modo en que está expresado uno de los conjuntos de coordenadas
data["coordinates"][0]
Out[10]:
'-73.99898,40.761,-73.99892,40.76108,-73.99832,40.76084,-73.99624,40.75997,-73.99546,40.75965,-73.99498,40.76031,-73.99406,40.76156,-73.99316,40.7628,-73.99224,40.76405,-73.99042,40.76656,-73.98997,40.7672,-73.98989,40.76729,-73.98906,40.76846,-73.98806,40.7698,-73.98714,40.77105,-73.9862,40.77229,-73.98578,40.77292,-73.98533,40.77354,-73.98486,40.77416,-73.98372,40.77367,-73.98342,40.77354,-73.98249,40.77315,-73.98188,40.77292,-73.97903,40.77173,-73.97894,40.77171,-73.97878,40.77169,-73.97865,40.7717,-73.97842,40.77174,-73.97814,40.7718,-73.97795,40.77179,-73.97763,40.77168,-73.97719,40.77139,-73.97563,40.77048,-73.97432,40.76962,-73.97377,40.76932,-73.97347,40.76919,-73.97329,40.76913,-73.97272,40.76898,-73.97238,40.76885,-73.97174,40.76869,-73.97135,40.76855,-73.97081,40.76828,-73.97061,40.76818,-73.96867,40.76735,-73.96544,40.76598,-73.96503,40.76581,-73.96384,40.76531,-73.96347,40.76583,-73.96311,40.76632'

Incluimos cada una de las coordendas en una lista (todavía no diferenciamos entre latitud y longitud)

In [11]:
lista_coordenadas = []
for index, latitude in enumerate(data["coordinates"]):
    result = data["coordinates"][index].split(",")
    lista_coordenadas.append(result)

#Comprobación
lista_coordenadas[0][4]
Out[11]:
'-73.99832'

Como en nuestro caso todas las longitudes son negativas y todas las latitudes, positivas, utilizamos esto para diferenciar entre ambas y agregar cada conjunto en una lista separada.

In [12]:
longitudes = []
latitudes = []
        
for list in lista_coordenadas:
     for number in list:
            number = float(number)
            if number < 0:
                longitudes.append(number)
            else: 
                latitudes.append(number)
In [13]:
len(latitudes)
Out[13]:
2584

Por último, vamos a necesitar una columna que hable del momento en el que el taxi están en cada uno de los puntos de las rutas.

Idealmente, tenemos:

  1. El momento exacto en el que el taxi empieza y termina cada ruta
  2. La duración del trayecto
  3. El conjunto de coordenadas que se recorren en cada ruta

Con estos datos, podríamos adaptar cada conjunto de coordenadas a un momento más o menos preciso en el que el taxi pasó por ese punto pero esto es algo tedioso de conseguir puesto que:

  • No todos los trayectos tienen la misma duración
  • No todas las rutas tienen el mismo número de conjuntos de coordenadas
  • Para cada ruta, tenemos que tener en cuenta su hora exacta de pickup y dropoff

No parece que hacer esta diferenciación temporal merezca mucho la pena para una animación tan cortita por lo que se ha hecho algo más simple: Teniendo en cuenta el número de conjuntos de coordenadas que hay (2584), se ha expandido cada una de estas a lo largo de una franja horaria de 16 horas por lo que presentamos cada conjunto de coordenadas cada 23 segundos.

Hemos establecido 16 horas debido a que el taxi es conducido por 2 taxistas, asignandoles a cada uno una jornada laboral de 8 horas.

In [14]:
#Redondeo a la alza
math.ceil(16*60*60/len(latitudes))
Out[14]:
23

Establecemos el inicio del dia, con el primer conjunto de coordenadas, a las 8 de la mañana y a partir de ahí se le asigna una hora equidistante a cada conjunto de coordenadas.

In [17]:
lista = []
a = datetime.datetime(2013,2,28,8,0,0)
segundos = 23
for i in range(0,len(latitudes),1):
    a = a + datetime.timedelta(0,segundos)
    lista.append(a)

#Comprobación
lista[0:10]
Out[17]:
[datetime.datetime(2013, 2, 28, 8, 0, 23),
 datetime.datetime(2013, 2, 28, 8, 0, 46),
 datetime.datetime(2013, 2, 28, 8, 1, 9),
 datetime.datetime(2013, 2, 28, 8, 1, 32),
 datetime.datetime(2013, 2, 28, 8, 1, 55),
 datetime.datetime(2013, 2, 28, 8, 2, 18),
 datetime.datetime(2013, 2, 28, 8, 2, 41),
 datetime.datetime(2013, 2, 28, 8, 3, 4),
 datetime.datetime(2013, 2, 28, 8, 3, 27),
 datetime.datetime(2013, 2, 28, 8, 3, 50)]
In [18]:
#Creación del dataframe necesario
df = pd.DataFrame({'latitudes':latitudes, 'longitudes':longitudes, 'fecha': lista})
df.head()
Out[18]:
latitudes longitudes fecha
0 40.76100 -73.99898 2013-02-28 08:00:23
1 40.76108 -73.99892 2013-02-28 08:00:46
2 40.76084 -73.99832 2013-02-28 08:01:09
3 40.75997 -73.99624 2013-02-28 08:01:32
4 40.75965 -73.99546 2013-02-28 08:01:55

Una vez terminado el proceso, guardamos el dataframe en un CSV

In [20]:
df.to_csv("puntos_ruta_dia_taxi.csv")