Are the _blinkFeatureFirstUsed timestamps reliable?

Hi all,

I’ve been doing some HTTP Archive metrics research and I’ve come across a problem.

To begin with, I was checking if features (Blink use counters) were used at all, and looking for a correlation with the FCP metric. That gave results that looked reasonable. Then I thought to improve it by checking if the features were used before FCP to exclude things that couldn’t have influenced it.

This is where things start to look odd. I see a lot (~79k) of negative values in the data I use, illustrated by this query:
SELECT url,
CAST(JSON_EXTRACT(payload, ‘$._blinkFeatureFirstUsed.Features.DocumentAll’) AS FLOAT64) AS time
FROM httparchive.pages.2019_06_01_mobile
GROUP BY url, time
HAVING time < 0

I see that with httparchive.pages.2019_07_01_mobile there are far fewer, just 108. Was this a deliberate fix, does it make sense to compare time values starting with the most recent crawl? Or are the timelines of this and other metrics still different?

Thanks!

1 Like

No deliberate changes that I’m aware of. The timestamp is relative to the start of navigation and the start time shouldn’t be able to be reset at any point so negative values would be unexpected. The timestamps for everything come from the trace event timestamps so these (and all other) metrics should be on the same time clock.

The trace data is also only started right before the page navigation so if Chrome’s internal pages were triggering the events they still shouldn’t show up in the results.

If you can find a page that reproduces it on the public WebPageTest it would really help track it down. It MIGHT be possible to figure out if you send over a few test ID’s where the negative values were coming from since most of the raw trace data is saved.

1 Like

Modified @foolip’s query to include the WPT test ID:

SELECT url, wptid,
CAST(JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.DocumentAll') AS FLOAT64) AS time
FROM `httparchive.pages.2019_06_01_mobile`
JOIN `httparchive.summary_pages.2019_06_01_mobile`
USING (url)
GROUP BY url, wptid, time
HAVING time < 0
LIMIT 100

Here are some sample IDs and time feature counter values:

url wptid time
https://industrial.omron.fi/ 190601_Mx4P_964M -3427.751
http://www.freshremix.biz/ 190601_MxD_TZT -24675.638
https://www.chasingame.com/ 190602_Mx3M_73NG -599.839
http://teatr-versia.tomsk.ru/ 190601_Mx46_85Q4 -2914.999
https://wikiwon.com/ 190602_Mx1K_35GS -2350.382
https://smartbeautychoice.com/ 190604_Mx2P_58CM -11141.562
http://www.vipmobile.rs/ 190602_Mx39_6DS2 -18542.995
https://arc.eaa-online.org/ 190602_Mx24_46E9 -5018.716
http://www.dom-skazka.ru/ 190602_Mx45_85D3 -13505.582
https://eisundfeuer.fandom.com/ 190603_Mx6Q_D5GR -6103.407
http://samsunokul.net/ 190601_Mx2K_53RT -5091.867

Could this be interference from the previously loaded page? Is the same browser profile used to load multiple pages, and is there any way that trace events from a previous page could be delivered later?

Some of the time values of negative many seconds makes this hypothesis seem implausible, but then with negative values something is clearly wrong, so…

No. The browser is closed and re-launched with a completely clean profile for every page. I’ll take a look and see what I can find. I know there have been issues with the timestamps on some devtools events before and there’s some code to fix-up the timestamps so it’s possible these events aren’t getting fixed up.

1 Like

Great, thanks for looking into this!

I beefed up the logic around the timing so you shouldn’t see any more negative values.

Not sure exactly what was causing it before but I’m pretty sure it is because the trace events were coming in out of order and it was picking an incorrect start time while it was incrementally processing the feature data. I changed it to just store the raw timestamps while it is processing and then adjust them relative to the start timestamp in a post-processing step instead.

I also added some logic to keep track of all of the timestamps for each event, throw out any before the test start and use the earliest of the remaining times (if there is one) so I can guarantee there will never be negative values but it should make the data a tiny bit cleaner as well.

1 Like

Awesome, thanks Patrick! For existing crawls, is there a limit on how wrong the timestamps can be, or should they be distrusted entirely?

Sorry, I’m still not sure what the exact cause was so I don’t know how reliable (or not) the earlier timestamps were. They SHOULD have been reliable but clearly there was an edge case somewhere getting triggered.

Alright, to be safe I think I’ll just distrust the timestamps in current crawls. Do you know which crawl will be the first with fixed timestamps?

The August 1 crawl that just started.