Is there a way to crawl image size (pixel)


#1

Hi I am trying to run some analysis on image data. More specially I want the image’s type, file size, and image size.

I managed to get the first two simply by querying a requests table:

SELECT
respSize,
mimeType
FROM
[httparchive:runs.latest_requests]
WHERE
mimeType CONTAINS ‘image’
LIMIT
1000000

But is there a simple way to get the image size? Or I will have to fetch each image myself to achieve so?

Thanks


#2

Hi,
The Archive does not have the image dimensions, but you could get the image url, and then query the urls with imagemagick:

magick identify http://res.cloudinary.com/dougsillars/image/upload/v1532673490/IMG_20150625_192917267_o4bvyk.jpg

gives the response:
http://res.cloudinary.com/dougsillars/image/upload/v1532673490/IMG_20150625_192917267_o4bvyk.jpg=>IMG_20150625_192917267_o4bvyk.jpg JPEG **4160x2340** 4160x2340+0+0 8-bit sRGB 2655710B 0.000u 0:00.049

so something like:
xargs -n 1 magick identify < listofimageurls.csv -of csv >> output.csv
will query all of the urls, and give you a formatted CSV with the data you are looking for. You might even optimise the imagemagick query to get more detailed information about each image.

Also - [httparchive:runs.latest_requests] is no longer updated, and will be data from February 2018. You want to be using httparchive:summary_requests.2018_07_15_mobile (or desktop) to look at recent requests for images.

Doug