Modeling - Big Data Integration
September 15, 2025
08:30 AM – 10:00 AM at Thomas H. Swain RoomHurricane
This session will show how to improve travel forecast accuracy. It will also describe how "big" data sources can complement existing model capabilities. Lastly, the session will describe the development of a machine learning (ML) tool to assist in exploring a transit corridor feasibility analysis, as welll as how to leverage big data to study a key regional interchange serving diverse land uses that faces safety and efficiency challenges, while improving connections betwen key roadways, increase safety, and enhance mobility for all users.
4 Sub-sessions:The 15-minute podium slot will include a five-minute overview based on the executive summary of what has been completed for this research on Improving Travel Forecast Accuracy, followed by 10 minutes for next steps, collecting audience opinions via QR code, and Q&A.
Agencies increasingly look for ways that “big” data sources can complement existing model capabilities. Metro (Portland, Oregon's MPO) recently overhauled its Regional Mobility Policy (RMP), creating new demands on the modeling and data that support the policy's implementation. A core component of the updated RMP is "throughway reliability," defined by the number of hours per day that key regional roadways fail to meet a minimum travel speed, likely reflecting congested conditions and unreliable traffic flow. Throughways include both limited access freeways and local arterial streets that provide key regional connectivity for people and goods movement. Metro modeling and planning staff developed a new methodology to define and operationalize the new throughway reliability metric, leveraging the strengths and mitigating the weaknesses of regional travel models and observed traffic speed probe data.
Metro's trip-based regional travel demand model provides “typical day” hourly average speed estimates for both base-year and future planning scenarios. Modeled speeds at this level of measurement are subject to well-known inaccuracies due to aggregation, shortcomings in traffic assignment procedures, and other limitations. Relative differences between scenarios are thought to be more meaningful. Probe speed data—primarily from the National Performance Management Research Data Set (NPMRDS)—provides nearly continuous data on traffic speed but only for existing conditions. Metro staff used probe data to establish base conditions, and then applied relative changes between base and future-year model scenarios to project likely throughway reliability into the future.
Augmenting modeled speeds with observed probe data increased confidence in the initial level of traffic congestion, while modeled changes allowed projection in future-year planning scenarios. Initial results supported a recent long-range regional plan update, helping to identify both predicted improvements due to investment and ongoing areas of need across the system. Comparison of modeled versus observed speeds in the base year also provided input into future focus areas for potential model improvements.
Abstract Background
Transit investments can be impactful for smaller cities. They spur local development and improve citizens access to the region. However, estimation of the ridership for a new station along an existing line can be challenging when travel models are not readily available, no access to existing ridership data is allowed and you need results within four months.
This presentation explains how Palm Beach Gardens was able to estimate ridership potential on Brightline for their city using big data and machine learning, thus allowing the City to feel confident in the station being feasible.
Description of Abstract
This presentation describes the development of a machine learning tool to assist the City of Palm Beach Gardens in exploring the feasibility of a new station on the Brightline corridor. The City needed to prove to Brightline and local officials that a station along the Miami to Orlando corridor would be beneficial to the region and Brightline.
However, the use of Brightline’s proprietary travel model for this purpose was not an option. In addition, detailed ridership data by station and along the route was not readily accessible, nor was actual fares or origin and destination pair data for Brightline available. There was a STOPS model for the region, but it did not include Brightline and would need to be updated to post-pandemic conditions for the study, an extensive effort for the region. Given the existing model challenges in the region and the desire for results in three (3) months, the decision was made to use machine learning to understand the existing correlations between ridership at the existing Brightline stations. Random forest machine learning was used to estimate the variables that mattered the most for ridership based on data from Replica and the regional model datasets.
The model was trained on the initial 4 stations and then tested with data from the two new stations that were opened a year later. The ML model was able to predict the ridership at the two new stations within 5% of the actual, thus giving confidence in the model ability to predict.
The team was able to forecast what the demand at the proposed Palm Beach Gardens station would be, including the City testing changes to their land use policy to have more density near the stations. It showed that PBG would be viable and gave the City the confidence to approach Brightline for an official study.
Statement on Why Abstract is Noteworthy
This work is notable because it was a successful attempt to develop a forecast with limited OD flow data but utilizing big data to produce results in a quick turnaround. It proves that machine learning has application to transit planning without having to invest in a lot of time and effort for cities. It also highlights that perhaps using survey datasets combined with machine learning may be successful in producing better models for transit in the future.
The Herkimer-Oneida Counties Transportation Council (HOCTC), is conducting a Planning and Environment Linkages (PEL) Study of the I-90 Exit 31 interchange in Utica, NY. Exit 31, a key regional interchange serving diverse land uses that faces safety and efficiency challenges. The study aims to improve connections between key roadways, increase safety, and enhance mobility for all users. Key components of a PEL study include understanding existing and future travel demand, which feeds into traffic and environmental analyses, economic considerations and establishing a purpose and need for the project.
This presentation introduces a prototype travel demand model (TDM) tool developed to analyze existing and future network operations for a small Metropolitan Planning Organizations (MPOs). These agencies often do not have forecasting tools, which are cost prohibitive to develop and maintain. The presentation showcases how GPS data derived from TomTom, a big data vendor, was used to develop a model to forecast highway demand.
The tool developed in this case utilized a street network and a sample origin-destination (O-D) matrix at an operational level from TomTom. The O-D matrix was expanded using available count data to represent demand for a typical weekday. The network was reviewed for accuracy and connectivity and its attributes leveraged to speed-capacity relationships.
The average weighted trip tables were assigned to the network in TransCAD, a modeling software package. The modeled volumes were validated using traffic count data. The model was calibrated by adjusting network characteristics as appropriate. The base year validation was assessed using root mean square error (RMSE) and volume-count percent difference across various facility types. For interstates, the RMSE is 8%, with a volume-count ratio of -3.6%. Across all facility types, the RMSE is 30%, and the volume-count ratio is -5.6%. The tool was subsequently used for future scenario analysis by expanding the base year trip table using projected socio-economic data for each zone. However, additional trips were assigned to a few zones to account for a new activity center under development, which is expected to function as a special generator.
This tool differs from a traditional TDM in that it replaces trip generation and distribution steps of a conventional three-step model, using an expanded trip table derived from big data. It is simple and quick to operate.
The existing validated model can project future traffic by incorporating expected population and employment growth into a trip table. This helps plan roadway improvements and assess construction impacts. However, new major activity centers require separate trip allocation studies due to their significant influence on travel patterns.