Visitation Datasets v202303
Release notes summarize changes from the previous product version to this version. This is mostly of interest for existing customers who want to be informed about the latest changes and potential impact on their analysis.
Visitation Datasets v202303 comes with three major changes:
- Visitation to locations is now based on a machine learning model rather than on data aggregation.
- To reduce the amount of individual tables, several metrics are bundled together into only seven datasets. The daily visitation metric was removed because it contained too much noise.
- Locations are restricted to the retail industry.
Differences in this Version
From a data perspective, the biggest changes in this version are due to the new methodology - using a machine learning model to estimate foottraffic. Therefore, we focus on those differences here and investigate changes to the weekly visitation compared to the previous version.
Comparing the new versions' weekly foottraffic to the previous version for the year 2021, we see strong similarities. The mean percentage difference between the versions in absolute volume range between +16% (state level) to -1% (brand level), where a positive number indicates an increased estimate in the new version. We do also observe high similarities when correlating those absolute volumes across venues ().
- Visitaiton Dataset comes by default with a four day lag. Meaning that the data is available after four days.
- Dynamic Trade Areas contain a fraction and do not show the absolute person_count anymore.
- Cross Visitation focuses now on the venue-to-brand comparison since this was most requested by customers.
- With switching to a machine learning model to estimate visitation, we improved the robustness of the product. The previous product version was subject to volatility in underlying changes in the raw GPS supply.
- Several venues that previously had too little GPS supply (and thus data) are now included with visitation estimates.
- With the new bundling of tables, there are fewer tables to download to gain a full data experience.
- We now have the capability to include other data sources that can improve our modeling. Previously, visitation numbers were estimated solely based on GPS data. This is now complemented by additional contextual data.
- For a limited number of locations, the model estimates the same visitation over time (creating a flatline). Affected locations most often include convenience stores, drug or toy stores, or gas stations. These affected stores tend to be small-sized locations. Attention will be directed to those venues in the next revision.
- Visitation estimates are better in more urban areas This stems from the fact that mobility data is inherently more available in metropolitan areas. The larger the amount of data, the closer we can estimate to the underlying ground truth of that location.
- We observe a peak of elevated visitation in October 2022. Thereafter, visitation tends to fluctuate more. Nevertheless, it appears that more urban locations are less affected.
- The algorithm has been trained on historical data. Any unpredictable event that severely affects human mobility at local or national scale (e.g., natural catastrophe or pandemic) will not be captured by our algorithm immediately, but will require an update once time has passed.
Overall it is important to highlight that machine learning models will be improved over time. With that, this new version will only get better with each future version and, thus be less dependent on external factors (like new privacy regulations on GPS data).