How Do Smartphone Data Compare To Conventional Data When It Comes To Bicycling Activity?

Active transport is non-motorized travel, like walking and cycling, for practical purposes. Examinations of active transport are limited because of a lack of data that can be used to study these modes. City planners and other stakeholders are interested in learning more about how to encourage and facilitate active transport as it has potential positive effects including health benefits from increased physical activity and lowered carbon emissions in urban areas.

Our study entitled “Comparing spatial patterns of crowdsourced and conventional bicycling datasets,” published in Applied Geography, investigated whether new smartphone apps can be used to fill the data gap when studying active transport modes.

Trips that people take when walking and cycling tend to be shorter in distance and duration; because they occur at finer scales of movement, studying them requires more detailed data than their motorized counterparts. Cycling activity, in particular, is difficult to capture because conventional methods for collecting data are limited. One of the most common ways to collect cycling data is to conduct a manual count. During a typical manual count, people who are positioned at selected intersections tally the number of cyclists who pass through said intersection, usually recording the direction of travel as well.

While this technique has the benefit of recording every cyclist that passes by, it has a number of notable limitations. Route information (e.g., path taken, duration, and distance of ride), purpose of travel, and cyclist demographics cannot be gathered. The count locations are limited in number and therefore cannot be distributed evenly through a city which limits conclusions that can be made about cycling activity. Further, manual counts are typically conducted for short time periods such as a few hours or days once or twice a year which constrains their generalizability in terms of temporal trends (hourly, daily, or seasonal patterns of ridership). Automatic counters do exist and are operational in many cities which solves the problem of temporal limitations; however, they still suffer from the same spatial and informational problems as manual counts.

The proliferation of smartphone technologies, GPS in particular, offers new avenues for crowdsourcing cycling data and has the potential to overcome the limitations of conventional data. Smartphone-based apps that track activity utilize built-in GPS functionality to record where and when people ride their bicycles (among other activities). These detailed route data are therefore less limited in spatial and temporal scope than conventional data sources. They also often contain some basic demographic characteristics about the user and the purpose of their trips.

Despite potential benefits, there remain questions as to whether these crowdsourced cycling data are generalizable to broader populations of riders. Recording your cycling trips via smartphone app requires the motivation, access, and resources (e.g., money, time) to participate so data may be biased in terms of who is represented. Aging and elderly, student, low income, or occasional cyclists may be absent from these datasets — if planning and policy decisions relating to cycling are made based on biased information, problems with transport inequity and safety could be exacerbated.

Given differences between conventional and crowdsourced cycling data, we investigated how manual count data correspond to smartphone cycling data in Sydney Australia. Specifically, we examined differences between a local one-day manual count and crowdsourced ridership data from the cycling app Strava. Because Strava has a focus on fitness and competition, its data tend to represent recreational riders rather than those using cycling for commuting and other transport purposes.

We then asked whether there are socioeconomic or infrastructure characteristics that relate to the differences between datasets. Results highlighted where differences in bicycling ridership occurred and have implications for using crowdsourced data in planning and policy contexts.

The highest proportions of bicycle ridership occurred in the Sydney central business district for both datasets, though the manual count and crowdsourced data differed in patterns outside this area. When cycling for transport, we anticipate high levels of cycling in business districts and other employment centers.

Our analysis aimed to show where there were mismatches in relative ridership patterns between the two datasets. The Strava data showed high ridership proportions in the eastern Sydney suburbs which are not associated with expected centers of activity. A potential explanation is that Strava users are indeed seeking areas outside the urban core where they can partake in longer rides focused on fitness and recreation rather than transport.

We used rank difference to measure the relative importance of each count location within the data and to determine where there were locations of similar and dissimilar ridership, beyond what can be detected by examining high and low ridership alone. Our findings indicated that few locations had low rank differences relative to their neighbors. Low rank difference indicates that there is a similarity between the manual count and Strava data at that location. These locations, therefore, are those where planners and policymakers may consider the more-detailed crowdsourced data in the decision -making process. Potential explanations for the sociodemographic influence on ridership include little bicycle infrastructure (e.g., designated bicycle lanes), high rates of using public transport as a mode of travel for the journey to work, little residential land, and low relative disadvantage scores in terms of social and economic conditions.

Even fewer count locations showed a dissimilarity in their ridership patterns. In all instances, the Strava data showed higher ridership than the manual count data. While these locations were popular in the Strava dataset, they were removed from bicycling infrastructure which indicates again that Strava users are likely focused on fitness and recreation. Despite the lack of infrastructure, roads on the outskirts of the city still support cycling activity. Further, these areas scored higher in socioeconomic status which is typically associated with Strava ridership.

The majority of locations had no spatial association meaning that their rank difference did not differ from what we would expect in a random spatial process. These locations tended to have high bicycle infrastructure density and since many were located in the city center and Sydney harbor, there may be fewer route choices for cyclists to take which would lead to similar patterns of ridership between both data sources.

Overall we find that there are differences in ridership patterns between conventional manual count data and our crowdsourced Strava data. In particular, area socioeconomic status and bicycle facilities influence how ridership volume is distributed. These results have implications for planners and other stakeholders when planning and designing bicycle facilities. There has been a great push to increase ridership and bicycle infrastructure in many urban areas and decisions regarding where, when, and which type of facilities to install need to be made.

Areas of dissimilarity indicate where planners might target infrastructure improvements based on both data sources rather than one alone. Further, areas of similarity indicate where the data can be substituted for one another without additional bias in terms of ridership pattern volumes. That said, it is important that planners and other stakeholders do not use crowdsourced data alone, especially in areas where it differs significantly from conventional data, as these data may suggest changes that benefit only those types of riders who contribute to crowdsourced data sources.

These findings are described in the article entitled Comparing spatial patterns of crowdsourced and conventional bicycling datasets, recently published in the journal Applied GeographyThis work was conducted by Lindsey Conrow, Elizabeth Wentz, and Trisalyn Nelson from Arizona State University, and Christopher Pettit from the University of New South Wales.