⬅️ Back to agenda

Data / Survey: Listen & Engage

September 15, 2025

08:30 AM – 10:00 AM at Johnson Great Room

Big Data, Big Ideas, so Little Time

This session focuses on the evolving landscape of transportation planning and modeling through the lens of big data, with a particular emphasis on addressing data biases and leveraging diverse data sources for more accurate and equitable insights. Presentations cover the challenges and opportunities of using crowdsourced data (like Strava) for active transportation, the validation and application of commercial GPS data (INRIX) for origin-destination analysis and travel demand modeling, and the strategic use of various geolocation big data sources (LBS, CV, TSP) for different planning needs. Additionally, the session explores the economic impact of transit systems on real estate development and the use of radar data for identifying speed violators and their correlation with crashes. A key theme is the shift in travel behaviors due to external factors like the COVID-19 pandemic and e-commerce growth, and the advancements in household travel survey methodologies.

9 Sub-sessions:
Understanding Representativeness in Crowdsourced Apps Data for Bicyclists and Pedestrians

The rise of crowdsourced apps has transformed transportation research and planning, offering new ways to analyze mobility patterns. Among these, Strava stands out as one of the most widely used crowdsourced applications, providing valuable data on bicycling and walking activity. However, questions remain about how well Strava data represents the broader population, particularly given demographic biases that may skew insights and impact decision-making. Research indicates that Strava usage tends to be male-dominated, leading to the underrepresentation of women’s mobility patterns. This disparity has profound implications for transportation planning, as decisions based on skewed datasets may overlook women’s infrastructure needs.

This study investigates gender imbalance in Strava data usage through a comprehensive review of existing literature and an analysis of publicly available datasets. Women’s mobility behavior is often characterized by trip chaining, off-peak travel, and a preference for routes perceived as safer. These patterns may not align with the activity-focused nature of Strava users. As a result, Strava data may under capture more routine travel patterns associated with women’s mobility needs. Beyond gender imbalances, Strava data also exhibits spatial and socioeconomic biases. High-income, urban areas tend to have more Strava users, whereas lower-income communities and rural areas are often underrepresented. The findings suggest that while Strava provides valuable information on high-usage corridors and emerging mobility trends, it does not fully reflect the diversity of active transportation users.

Addressing these biases requires methodological refinements and data integration strategies. First, we recommend supplementing Strava data with more representative sources, such as census-based travel data, and targeted data collection efforts focused on underrepresented groups. Additionally, statistical weighting techniques and machine learning models can be employed to adjust for gender and socioeconomic disparities, ensuring a more balanced analysis.

The implications of this study extend beyond data accuracy, directly influencing transportation policies and infrastructure investments. Gender-responsive and equity-focused transportation planning requires a comprehensive understanding of all users’ needs, which is currently hindered by data limitations. This study underscores the need for more inclusive data collection practices. By improving the representativeness of crowdsourced mobility data, planners and policymakers can make more effective decisions, ultimately fostering more accessible active transportation networks for all users.

INRIX O-D Evaluation and Use for Spatial Downscaling of the NextGen O-D

The free NextGen O-D for 2022 and a passively-collected O-D from INRIX were used to validate Vermont's statewide travel demand model to support an update of its base-year. This effort began with an evaluation of the INRIX O-D for quality and reasonableness, followed by a series of comparison tests to see if it is better suited for use in this process than a randomly-generated O-D. The quality and reasonableness evaluation consisted of checks for unreasonably long trips, time-of-day feasibility of trips, matrix symmetry, data coverage, and data biases. The comparison tests consisted of a comparison of spatial correlation and matrix orientation, all within a unified zone geography. After filtering the INRIX O-D to include only those trips found to be reasonable and uniform, it was used to downscale the free NextGen O-D to provide a validation data source.

Harnessing the Power of Big Data in Transportation Planning: A Comparative Analysis of High Accuracy Geolocation Data Sources

This presentation is about the use of the various types of Big Data in transportation planning and modelling applications with actual public planning examples to support our findings. 

There is not a universal best data source that will always provide the most accurate and reliable results in every circumstance. Rather, even though all data sources have value, that value is driven by the attributes or applications that need to be planned, measured or modelled. We, therefore, strongly recommend a two-step process. First, determine what needs to be measured or modelled. Then, select the data source that best meets this need.  

The main three high accuracy geolocation Big Data sources are:

  • GPS data obtained from smartphone applications, also known as Location Based Services or LBS.

  • GPS data derived from devices installed in cars. This data can be provided either directly or indirectly by the car manufacturer (OEM’s). It is known as Connected Vehicle or CV data.

  • GPS data derived from devices installed as a component of a fleet management system in cars, light commercial vehicles or heavy trucks. Data from these devices is gathered by Telematics Service Providers, and the data is  known as TSP data. 

Although many derivative datasets can be obtained from these three data sources for use in transportation planning or modelling, we will focus on Demand Analysis, also known as Origin-Destination Matrices (O-D Matrices) The differences in technical features between each data source creates significant variations between the O-D Matrices that can be generated from each source. 

For example, data required for transit planning can be successfully generated by a combination of LBS data and TSP data. A robust, long distance travel model can leverage LBS data to close significant gaps from what would be deduced from a parallel CV based dataset. Important insights about freight may be derived from a TSP data based study.

Several actual projects will be highlighted in the presentation to illustrate these differences. These projects include a Texas Department of Transportation and Texas Transportation Institute statewide planning model that leverages LBS data in order to close gaps provided by a CV dataset (such as different modes of travel). Another example will describe how a major port authority used TSP data for a study to understand movement patterns of heavy trucks to and from ports to optimize the traffic flow around the port areas. This information was not available from using either LBS or CV data.

To conclude, in this session we will discuss the technical differences between each data source and how the utility of the O-D output is impacted. We will use actual projects to demonstrate and support our findings.   

 

 

Evaluating GPS Data Sample Rates: A Case Study of INRIX Data in Texas

Using passive Big Data for Origin-Destination (O-D) analysis has gained interest in transportation planning in recent years. Among many diverse types of passive data sources, GPS location data collected from mobile devices, vehicles and telematics devices demonstrate superior accuracy, consistency, and richness of detail. Hence, it is observed to play a prominent role in many transportation planning and mobility studies. However, passively collected GPS location data typically represents only a sample of traveler movements within a study area and must be expanded to fully quantify the regional travel patterns. Therefore, understanding the sample representativeness and sample rate of the GPS location data is crucial for its applications in travel and mobility analysis.

The Texas Department of Transportation (TxDOT) has a long history of using passive Big Data products to support transportation mobility and planning studies, including the GPS location-based INRIX data, which provides vehicle trajectory and trip information for three vehicle weight classes. This study evaluates the sample rate of INRIX vehicle trajectory and trip data across the state of Texas by comparing the vehicle trip counts derived from INRIX data against the traffic count data compiled from the Texas Statewide Traffic Analysis and Reporting System (STARS II).

This study began by pre-routing the INRIX vehicle trajectory data onto the statewide OpenStreetMap (OSM) road network provided by INRIX using map-matching technology. In this process, the segments of the OSM network where the INRIX vehicle trajectory data was available were identified, followed by implementing a custom geoprocessing tool to conflate them with the roadway segments in the STARS II roadway network, which contains traffic count data. Subsequently, a list of roadway segments with both INRIX vehicle trip counts and STARTS II traffic counts was compiled to facilitate sample rate evaluation.

Using over 19 million INRIX trips within Wichita Falls, TX, this study focused on deriving per-trip expansion factors for INRIX trips through a multi-dimensional evaluation of vehicle trajectories and trips. Within the study area, approximately 75% of the trips had corresponding field counts from 257 locations. For these trips, Gradient Descent optimization was applied to develop per-trip expansion factors. The resulting distribution exhibited a relatively narrow range, indicating stability in the method, and achieved a close alignment between the expanded trip counts and observed AADT values.

To represent the entire INRIX trip population, the study also addressed the remaining 25% of trips without field counts. Several approaches were evaluated, including an overall average expansion factor, average factors by trip length and vehicle class, and cluster analysis. Among these, the overall average expansion factor proved to be both efficient and adequate for expanding INRIX trips within the Wichita Falls study area. The final expanded INRIX trip totals were found to be closely aligned with the estimates from the TexPack model, demonstrating the robustness of the selected approach.

Station Area Market Trend Analysis – Rail Route Introduction

This study evaluates how commuter rail influences surrounding property markets and explores opportunities for value capture. Using parcel-level data from 2010 to 2022, the analysis examined residential and commercial property values within one mile of existing and potential station areas. A Difference-in-Difference (DID) model was applied to quantify rail’s impact on development patterns.

Results show that new rail openings are associated with statistically significant increases in both residential and commercial property values, reinforcing the potential for transit-oriented development. The study also highlights value capture strategies—including land-use policy adjustments and mobility fees—to support infrastructure investment and attract private development.

The findings emphasize the importance of strategic planning to maximize the economic and community benefits of rail expansion.

Speed Limit Violation and Fatalities in the Dallas-Fort Worth Area

In the Dallas-Fort Worth area there are around one thousand radar devices, called side-fires, that collect traffic volumes and speeds every 20 seconds on each lane of all freeways. These side-fires, that are spaced on average one mile from each other, generate 30 million records of data every day. One important application of this data, among others, is the ability to identify the exact number of vehicles that are driving above the speed limit. The analysis of the side-fires dataset has allowed us to identify the locations where there exists a higher percentage of speed violators as well as the time and day of the week when this situation is prevailing. The findings from this study were compared with the geographic data where crashes and fatalities occur, and it was determined if there is a correlation between these events and speed violation.  

From Biases to Opportunities: Leveraging Location-Based-Service (LBS) Data for Transportation Planning

Location-Based-Service (LBS) data sourced from numerous mobile devices that now accompany people everywhere has the potential to revolutionize the practice of transportation planning in data collection, model development and policy designs. Its potential is however hampered by the lack of transparency on the part of researchers, transportation professionals, and LBS data vendors.  At the same time, transportation agencies now face an increasing demand from LBS data vendors globally. This presentation will provide an overview of the biases in the LBS data. Specifically, I will present data quality issues and their effects on the resulting mobility metrics that are commonly used in the planning process. I will also discuss how LBS data can aid Household Travel Survey (HTS) data and transform the field of transportation planning.  Last, I will discuss ways that the community can collaborate to establish benchmark datasets and standards for trip inference and reporting while adhering to privacy constraints.  

From Big Data to Smart Decisions: Using Connected Vehicle Data to Power Regional Transportation Systems
Understanding Shifts in Household Travel: A Comparative Analysis of Surveys from the Houston Region

Household travel surveys (HTS) serve as a vital tool for enhancing travel demand models (TDM), forecasting future traffic volumes, assessing regional transportation plans, and understanding evolving travel behaviors, particularly as societal, economic, and technological factors shape mobility patterns over time. Cambridge Systematics (CS) analyzed data from the Texas Department of Transportation’s (TXDOT) 2023–2024 HTS and the agency’s previous survey conducted in 2007–2009. This longitudinal assessment offers insights into how travel behaviors have shifted over the past 15+ years, highlighting the influence of major external factors, including the COVID-19 pandemic, advancements in transportation technology, and changing consumer behaviors.

A key focus of this analysis is the lasting impact of the COVID-19 pandemic on travel behavior, particularly regarding work-related trips. The widespread adoption of working from home (WFH) and teleworking has transformed commuting patterns, reducing peak-hour travel demand and altering trip generation for jobs which can be performed remotely. Our findings suggest that WFH flexibility varies by industry and income, with higher-income workers and certain professional and technical industries having greater access to remote work, while jobs requiring physical presence remain largely in-person.

In addition to changes in commuting behavior, we explore shifts in discretionary travel, particularly the effects of e-commerce growth on shopping trips and commercial vehicle deliveries. As online shopping has become more widespread, household shopping trip frequency has shifted. Comparisons of HTS data over time indicate that overall household trip rates have decreased. Previously, trip rates consistently increased with income level, whereas recent HTS data shows a plateau at higher income levels. Similarly, recent household trip rates by household size do not exhibit the same rate of increase. Notably, large, high-income households are no longer generating as many trips as they once did. This shift may be attributed, in part, to the rise of ecommerce, as delivery services replace trips that would have previously been made to brick-and-mortar stores. At the same time, this shift has contributed to increased last-mile freight activity, as commercial vehicles make more frequent residential deliveries.

Beyond behavioral changes, this study evaluates the evolution of travel modes. The emergence of new mobility options, such as transportation network companies (TNCs) has influenced mode choice and accessibility. By analyzing survey responses across different travel modes, we assess how TNCs have integrated into the transportation landscape, whether they have replaced personal vehicle trips or supplemented public transit. We also examine transit ridership trends in light of these emerging options and other external factors.

Additionally, a significant aspect of this study is the transpiration in survey collection methodologies. HGAC’s 2007-2009 HTS relied on traditional paper-based household travel diaries, requiring participants to manually record their trips. In contrast, the 2023-2024 survey incorporates an online portal as well as smartphone GPS-based data collection, offering the potential for more precise trip details, reduced respondent burden, and improved data accuracy. While some do not use the app by choice or lack of access, the increased capture of trip information is worth its inclusion.