Is there a way to crawl image size (pixel)

Hi I am trying to run some analysis on image data. More specially I want the image’s type, file size, and image size.

I managed to get the first two simply by querying a requests table:

SELECT
respSize,
mimeType
FROM
[httparchive:runs.latest_requests]
WHERE
mimeType CONTAINS ‘image’
LIMIT
1000000

But is there a simple way to get the image size? Or I will have to fetch each image myself to achieve so?

Thanks

Hi,
The Archive does not have the image dimensions, but you could get the image url, and then query the urls with imagemagick:

magick identify http://res.cloudinary.com/dougsillars/image/upload/v1532673490/IMG_20150625_192917267_o4bvyk.jpg

gives the response:
http://res.cloudinary.com/dougsillars/image/upload/v1532673490/IMG_20150625_192917267_o4bvyk.jpg=>IMG_20150625_192917267_o4bvyk.jpg JPEG **4160x2340** 4160x2340+0+0 8-bit sRGB 2655710B 0.000u 0:00.049

so something like:
xargs -n 1 magick identify < listofimageurls.csv -of csv >> output.csv
will query all of the urls, and give you a formatted CSV with the data you are looking for. You might even optimise the imagemagick query to get more detailed information about each image.

Also - [httparchive:runs.latest_requests] is no longer updated, and will be data from February 2018. You want to be using httparchive:summary_requests.2018_07_15_mobile (or desktop) to look at recent requests for images.

Doug

2 Likes