Wednesday 30 September 2020

Of bookscanning and image sizes

 J, my life companion, is an accomplished pastel artist, and wanted to put some of her artwork into a competition.

Pre-Covid, this would have meant selecting a picture or two, getting them framed, driving somewhere, and watching someone from the exhibition team put them on the wall.

This year, of course, everything is different. Pictures are photographed, and the images uploaded to the exhibition website, where they are loaded into some gallery software.

Now, what was interesting about this process is that the exhibition organisers said to use a digital SLR for the images, not a mobile phone because of the image quality.

Now, J's artworks are normally something between A4 and A3 in size (that's because that's the sizes specialist paper for pastel work comes in), and for archival purposes she takes a picture with her iPhone, which has an 8 Megapixel camera, and archives them in iCloud, using what I'll call iPhoto (it's actually called Photos these days).

Apart from iPhoto's tendency to produce smaller than expected jpegs on export this works well as a process


Internally, Photos uses the newer High Efficiency Image File Format  rather than one of the other more standard formats to achieve an efficient use of resources using lossless compression.

As always, we can argue about compression, image formats and archiving, but using HEIF is no more at risk of introducing compression artefacts than anything else, and may even be better as it is claimed to be lossless.

Professionally though, most people use cameras for archiving work rather than mobile phones.

We've all seen pictures of archivists using digital SLR's mounted vertically on a stand to take images of old photographs, and obviously when you don't know the exact size of the image and want a high quality image this makes sense. 

But the question is what is good enough?

Well my little experiment using a photoscanning app on a phone has convinced me that a phone produces a good enough image, even if the OCR's result of the text would need a little work:


and there was report in Nature this morning (which I retweeted) about a group of scientists using the Covid hiatus to scan old lab  notebooks


now the interesting thing is that most of the work was done using mobile phone cameras and a phone scanning app - in other words the scientists concerned found the images perfectly adequate.

At the same time if one searches for book scanner Google shopping or Amazon, one gets results similar to this


delving into the specifications one finds that they all use a camera with a fixed image size - the cheaper ones tend to be designed to image only a set page size, usually A4, the more sophisticated 'bendy' ones can be adjusted to scan a page to a maximum paper size - usually A4 or A3. All, or almost all, use either an 8Megapixel or 5Megapixel camera - assuming the better or pricier devices using an 8MP camera, the cheaper fixed image size devices a 5MP camera.

I don't know this, but I'd guess that the scanners are using mobile phone camera assemblies. An 8MP image of an A4 page would give you roughly 300 dots per inch, which is pretty sharp and as sharp as many high quality printed images. (If you are planning to OCR the text, you actually don't want a supersharp image of old typeset pages as these can introduce artefacts that confuse the OCR software.)

So, where does that leave us?

For J's artwork, for a sub A4 image is probably good enough at 8MP and for book scanning it's certainly good enough for OCR.

If your image is bigger, yes there's probably an advantage in using a higher quality camera, but for most purposes 8MP is good enough ...



No comments: