This Washington Post census analysis contained a large volume of data, complex data preprocessing and needed to enable a personalized user experience in the interactive story. Census data extraction and analysis is a time consuming process. First, data reporters took on data preprocessing, utilizing The Post’s census data processing pipelines to rectify changes in tract boundaries and field definitions between the historic 1990 census dataset and 2020 test data. They then output a dataset of racial makeup by census tract from 1990 through 2020.
This preprocessed dataset was then handed over to the Newsroom Engineering team which used MTS to build the tileset that the graphics reporters would use to tell the complex story of racial and demographic shifts across the U.S. over the past 30 years.
Recipes. MTS configuration files, known as recipes, allow developers to iterate quickly after a single data upload.
“MTS recipes are a uniquely user-friendly way to configure the tileset layers and quickly test out changes in limited geographic areas before shipping changes to graphics reporters.” - Paige Moody, Newsroom Engineer
Speed. As a Spark-based service, MTS processed ~74k census tracts worth of time series race data in approximately one minute, enabling both rapid iteration and quick turnaround of revisions requested by the Graphics team.
Tile statistics. With readers on both desktop and mobile, The Washington Post needed insights into the performance. The nuanced data returned by the Job API endpoint and the statistics provided by the Tileset explorer give developers the insights to ensure maps are fast on all devices for all users.
Quick iteration. Because the dataset is effectively static (reprocessed just once upon release of the 2020 data), developers using MTS just needed to upload the dataset once, and then rely on recipe alterations to iterate on zoom extent and field values. Developers quickly incorporated feedback across teams, such as using percent rather than proportion for feature properties.
MTS allows the News Engineering team to build a streamlined tileset creation pipeline and a performant tileset, simplifying iteration and collaboration across reporting and engineering teams with seamless hand-offs throughout different portions of the project.
Why MTS is better than Tippecanoe for large datasets
Decreased Processing Time
MTS runs its tiling jobs on the cloud via Spark, enabling performance to be consistent and exceptionally fast. Tippecanoe is a legacy desktop tool and performance was largely determined by computing resources offered by an individual computer’s hardware creating a lot of variability in tiling time. For the Washington Post census dataset, Tippecanoe takes approximately ~8 minutes to produce a tileset, while MTS takes ~1 minute. As datasets become larger, the gap between processing times continues to widen between services.
Better visual and statistical tooling
The Tileset explorer is an inspection tool available in Studio, exclusively for MTS tilesets for visualizing tiled data, as well as tile size statistics, job and recipe history, and an x-ray data preview. The Tileset explorer offers unparalleled insight into the output tileset, helping to inform decisions around visualization and performance.
Seamless data pipeline automation
Tippecanoe is exclusively cli based, which means that tileset creation with Tippecanoe usually happens in an executable file (ie Make file), which are difficult to easily reuse across projects and services. MTS offers both API & CLI options, which makes it easier to integrate into one or more data pipelines.
More robust and referenceable data manipulation & GIS operations
With MTS, iteration is incredibly fast because you’re able to reference source files that have already been uploaded via a JSON configuration file (recipe). Recipes are a user-friendly way to set the data conditions of your output tileset. With Tippecanoe, you need to process & upload your source file each time you want to iterate and test out new tileset options. Tippecanoe tileset options are also set by CLI args/flags, which can be more difficult to reference and decipher.
The census story was the latest in a series of projects ranging from the 1968 DC riots to wildlife migration patterns that utilize Mapbox. If you want to showcase large datasets in high quality, interactive maps, start using MTS today!