Using Wappalyzer to Analyze CPU Times Across JS Frameworks

This is weird. There are only 462,646 pages in the table. What does it mean to have 699,620 nulls?

Great question. The answer is sort of. Wappalyzer’s detections are run during the crawl in WebPageTest. We can’t crawl back in time but we can use the HAR artifacts in BigQuery as a representation of the web page and run the detection logic against that. The HAR data only goes as far back as 2016 though. We’re tracking this work in Integrate Wappalyzer platform detection · Issue #19 · HTTPArchive/bigquery · GitHub.

We just need to convert the Wappalyzer detection signals into boolean expressions in BigQuery, using the HAR and page metadata as input, similar to Paul’s manual detection of Magento. Then we need to figure out where to save the results. Ideally we’d make it consistent with the WPT-based JSON data in the HAR files, but that requires overwriting 50 million rows :stuck_out_tongue:. Maybe we’ll extract the results into a separate dataset?