Sunday, 5 February 2012

Open Data Release

After weeks of speculation, backtracking, votes, letters to the editor, hand wringing, accusations, recinded business plans, and finally a vote... The GPS data will flow on March 22nd, as per the orders of OC Transpo's political commission.

This is all such a strange, overblown issue. The company itself thinks that GPS data has a monetary value and could be used to generate revenue. They're likely right in that regard, although I question whether the company could put together a tendering process in a timely manner and actually use the tender to generate revenue before GPS data technology gets replaced by some other technology. Public tender is such a slow process. I can't imagine what it would accomplish.

Having said that, even the smallest amount of forethought into the use of GPS data at the management level of OC Transpo could have produced an in-house app long before the general public even thought they were missing out on something. After all, the company is already monitering the data within its own data infrastructure.

Prior to the installation of the new new GPS systems I was a lone voice of criticism of the fact that all OC Transpo technology seemed to be used almost solely in the supervision of its employees instead of as an assisstance to helping employees help customers. (Click here and scroll to « Everything is Automatic, Everything is Skin Deep ») There seems to be an odd way of thinking in the transit industry, and it's an epidemic of putting the service provider's needs ahead of the service itself. It took three different tracking technologies installed on OC Transpo buses before we finally got one that could actually provide the customer with information, and the company is still dithering over whether to actually use it in that vain.

It seems that there is way too much reluctance on behalf of municipal entities to gear the service towards the taxpayer, instead of the managers. Look how long it has taken to put arena bookings online. The city moves at the speed of bread. We pay for this information through our taxes. It is not City data, it is taxpayer data. They have made the right decision to push it forward, but that still doesn't mean ALL the data needs to be released. All the tinfoil hat wearing going on throughout the city and the forums that discuss these issues is getting tiresome. I've heard everything from terrorism worries to a big municipal cover-up on deadheading buses. 

You may take the bus to work. You may even have a few ideas on how to make the service better. That does not make you an expert on scheduling buses. This is the real problem with open data. We live in a society where Idol watchers think they are experts. They get all the terminology of singing down pat... "He's too sharp", "Her falsetto breaks too early"... and yet they're just repeating what they have heard other people say. Cherry picking a few deadheading buses off of a data set compiled of hundreds of thousands of time-points and trips could produce wasteful out-of-context opinions of the service that are grossly unwarranted. In context however, the data would represent something very different. It takes expertise to analyze this kind of data, folks. Simply being a bus rider doesn't make you Paul Harvey, even if you could look at all that information.

Ken Gray posted the following from Transit Activist Tim Lane:
Mr. Mayor and Councillors:
I believe that one of the main reasons OC Transpo are
so reluctant to release raw GPS data, is that it would allow
savvy outside programmers to track deadheading buses,
buses laying over, and buses that change routes throughout
the day.
And, Transpo doesn’t want this data to fall into the wrong
hands, because it might show, on careful analysis, that
many of the things that Transpo does with our buses,
in the name of “increased efficiency” in the operation of
the system, actually do the opposite.
They waste buses, and drivers, they use the wrong type
of bus on the wrong routes, they have buses sitting idle
at the busiest times of the day, they cause delays on one
route to quickly spread throughout other routes, and thus
all through the system, they cause drivers to forego much-
needed breaks, etc.
More specifically, the performance we have been getting
from our much-more-costly-to-purchase hybrid buses has
been considerably worse that the original expectations,
How much of this poorer performance can be laid at the
doorstep of one of OC Transpo’s most cherished operating
philosophies – Interlining?
I would love to have you Transit Commissioners ask that question,
and require OC Transpo to back up with raw data, their response.
Thank You,
Tim Lane

The X Files would be proud of this theory. He's partially right in the fact that the company should protect the non-revenue statistics from people like him. If programmers think they're getting the empty bus data, they're dreaming. Let's be honest here. The company will scrape the data to produce revenue service data only, and that's exactly what they should be doing. Do you really want an OC Transpo Failblog app that only sends out data on broken down buses, buses involved in accidents, buses held up due to problem passengers, buses doing charter work, buses in transit for repairs, buses laying up between runs, buses driving empty to start or finish their assignment, buses layed up as extra buses waiting to fill in for other routes that have the aforementioned list of delays, buses stuck in snow, buses housing the tenants of evacuated apartment buildings, training buses, buses with defective GPS units, buses on road tests for safety issues, buses being towed... are you getting the picture yet?

No, we don't want all that. The open data should produce a product that accurately represents in real time what the revenue service is doing in the context of the posted schedule, and how late or early a specific route may be along its planned timeline. The general public really gains no benefit from knowing the rest of the data, because it doesn't affect their specific route.

Now on the other hand, if you wanted to release the rest of that data to council... fine. I doubt they have the expertise to analyze it properly, but at least they could refer the data to the AG for some constructive analysis instead of having random programmers put the data into non-contextual apps for the sole purpose of manipulating it to whatever purpose suits their arguments.

The data will be available March 22nd according to the transit commission vote. This coincides with the first day of the spring booking for drivers. I'm anxious to see what the programmers will do with the data. I plan to follow myself in real time and report to you how accurate the apps may be. You all can figure out which bus is mine, just look for the one driving in circles.

No comments:

Post a Comment