We extracted URLs from harachrive.runs* table (as in Adoption of HTTP Security Headers on the Web) to study security headers in HTTPArchive data. During this exercise, we noted the number of URLs for 5 specific days (with a gap of few months between any 2 days) does not follow a particular trend (e.g., with time the number URLs in harachrive.runs* table are not always increasing). As shown below, the number of URLs dropped on 12-01-18 (1262020), compared to previous data dump on 01-08-18 (1277461).
date | URLs |
---|---|
2017_07_01 | 475122 |
2018_01_01 | 490241 |
2018_05_01 | 504672 |
2018_08_01 | 1277461 |
2018_12_01 | 1262020 |
Therefore, we would like to know the follows:
- Isn’t the number of URLs used (in harachrive.runs* table) increase with time? If not why?
- What is the procedure used to extract headers? ( e.g., follow all redirection and consider the headers in the final landing page if multiple redirection are involved in a page load)
Thanks
Naya