Are the _blinkFeatureFirstUsed timestamps reliable?

foolip · July 29, 2019, 1:03pm

Hi all,

I’ve been doing some HTTP Archive metrics research and I’ve come across a problem.

To begin with, I was checking if features (Blink use counters) were used at all, and looking for a correlation with the FCP metric. That gave results that looked reasonable. Then I thought to improve it by checking if the features were used before FCP to exclude things that couldn’t have influenced it.

This is where things start to look odd. I see a lot (~79k) of negative values in the data I use, illustrated by this query:
SELECT url,
CAST(JSON_EXTRACT(payload, ‘$._blinkFeatureFirstUsed.Features.DocumentAll’) AS FLOAT64) AS time
FROM httparchive.pages.2019_06_01_mobile
GROUP BY url, time
HAVING time < 0

I see that with httparchive.pages.2019_07_01_mobile there are far fewer, just 108. Was this a deliberate fix, does it make sense to compare time values starting with the most recent crawl? Or are the timelines of this and other metrics still different?

Thanks!

patmeenan · July 29, 2019, 5:01pm

No deliberate changes that I’m aware of. The timestamp is relative to the start of navigation and the start time shouldn’t be able to be reset at any point so negative values would be unexpected. The timestamps for everything come from the trace event timestamps so these (and all other) metrics should be on the same time clock.

The trace data is also only started right before the page navigation so if Chrome’s internal pages were triggering the events they still shouldn’t show up in the results.

If you can find a page that reproduces it on the public WebPageTest it would really help track it down. It MIGHT be possible to figure out if you send over a few test ID’s where the negative values were coming from since most of the raw trace data is saved.

rviscomi · July 30, 2019, 4:02pm

Modified @foolip’s query to include the WPT test ID:

SELECT url, wptid,
CAST(JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.DocumentAll') AS FLOAT64) AS time
FROM `httparchive.pages.2019_06_01_mobile`
JOIN `httparchive.summary_pages.2019_06_01_mobile`
USING (url)
GROUP BY url, wptid, time
HAVING time < 0
LIMIT 100

Here are some sample IDs and time feature counter values:

url	wptid	time
`https://industrial.omron.fi`/	190601_Mx4P_964M	-3427.751
`http://www.freshremix.biz/`	190601_MxD_TZT	-24675.638
`https://www.chasingame.com/`	190602_Mx3M_73NG	-599.839
`http://teatr-versia.tomsk.ru/`	190601_Mx46_85Q4	-2914.999
`https://wikiwon.com/`	190602_Mx1K_35GS	-2350.382
`https://smartbeautychoice.com/`	190604_Mx2P_58CM	-11141.562
`http://www.vipmobile.rs/`	190602_Mx39_6DS2	-18542.995
`https://arc.eaa-online.org/`	190602_Mx24_46E9	-5018.716
`http://www.dom-skazka.ru/`	190602_Mx45_85D3	-13505.582
`https://eisundfeuer.fandom.com/`	190603_Mx6Q_D5GR	-6103.407
`http://samsunokul.net`/	190601_Mx2K_53RT	-5091.867

foolip · July 30, 2019, 5:13pm

Could this be interference from the previously loaded page? Is the same browser profile used to load multiple pages, and is there any way that trace events from a previous page could be delivered later?

Some of the time values of negative many seconds makes this hypothesis seem implausible, but then with negative values something is clearly wrong, so…

patmeenan · July 30, 2019, 7:34pm

No. The browser is closed and re-launched with a completely clean profile for every page. I’ll take a look and see what I can find. I know there have been issues with the timestamps on some devtools events before and there’s some code to fix-up the timestamps so it’s possible these events aren’t getting fixed up.

foolip · July 31, 2019, 7:44am

Great, thanks for looking into this!

patmeenan · July 31, 2019, 4:16pm

I beefed up the logic around the timing so you shouldn’t see any more negative values.

Not sure exactly what was causing it before but I’m pretty sure it is because the trace events were coming in out of order and it was picking an incorrect start time while it was incrementally processing the feature data. I changed it to just store the raw timestamps while it is processing and then adjust them relative to the start timestamp in a post-processing step instead.

I also added some logic to keep track of all of the timestamps for each event, throw out any before the test start and use the earliest of the remaining times (if there is one) so I can guarantee there will never be negative values but it should make the data a tiny bit cleaner as well.

foolip · August 2, 2019, 8:41am

Awesome, thanks Patrick! For existing crawls, is there a limit on how wrong the timestamps can be, or should they be distrusted entirely?

patmeenan · August 2, 2019, 1:21pm

Sorry, I’m still not sure what the exact cause was so I don’t know how reliable (or not) the earlier timestamps were. They SHOULD have been reliable but clearly there was an edge case somewhere getting triggered.

foolip · August 5, 2019, 6:35am

Alright, to be safe I think I’ll just distrust the timestamps in current crawls. Do you know which crawl will be the first with fixed timestamps?

patmeenan · August 5, 2019, 8:34am

The August 1 crawl that just started.

Topic		Replies	Views
Cannot find blink feature in the blink_features.usage table	9	965	April 20, 2021
Querying Blink features Analysis	0	1128	February 28, 2020
How to get new/recent features added to the blink_features.usage table?	9	1856	August 29, 2019
Query and expose feature usage share metrics Analysis	5	1225	April 17, 2019
Exploring Relationships Between Performance Metrics in HTTP Archive Data Analysis	4	3980	May 14, 2019

Are the _blinkFeatureFirstUsed timestamps reliable?

Related topics