HTTP Archive New Leadership


#1

I announced the HTTP Archive six years ago. Six years ago! It has exceeded my expectations and its value continues to grow. In order to expand the vision, I’ve asked Ilya Grigorik, Rick Viscomi, and Pat Meenan to take over leadership of the project.

The HTTP Archive is part of the Internet Archive. The code and data are open source. The project is funded by our generous sponsors: Google, Mozilla, New Relic, O’Reilly Media, Etsy, dynaTrace, Instart Logic, Catchpoint Systems, Fastly, SOASTA mPulse, and Hosting Facts.

From the beginning, Pat and WebPageTest made the HTTP Archive possible. Ilya and Rick will join Pat to make the HTTP Archive even better. A few of the current items on the agenda:

  • Enrich the collected data during the crawl: detect JavaScript libraries in use on the page, integrate and capture LightHouse audits, feature counters, and so on.
  • Build new analysis pipelines to extract more information from the past crawls
  • Provide better visualizations and ways to explore the gathered data
  • Improve code health and overall operation of the full pipeline
  • … and lots more - please chime in with your suggestions!

Since its inception, the HTTP Archive has become the goto source for objective, documented data about how the Web is built. Thanks to Ilya, that data was brought to BigQuery so the community can perform their own queries and follow-on research. It’s a joy to see the data and graphs from HTTP Archive used on a daily basis in tech articles, blog posts, tweets, etc.

I’m excited about this next phase for the HTTP Archive. Thank you to everyone who helped get the HTTP Archive to where it is today. (Especially Stephen Hay for our awesome logo!) Now let’s make the HTTP Archive even better!