Online Document Imaging
By John Fuex
As the Internet gains momentum as an application platform, those of us responsible for building, buying or otherwise supplying information infrastructures find ourselves scrambling to convert or replace legacy systems in order to take advantage of the collaborative power of the Web.
However, when it comes to the cornerstone application of litigation support -- image-enabled document databases -- the low bandwidth of the Internet can present major performance issues.
At the heart of the problem is the fact that the Internet is not yet a very efficient means for transmitting the large files typically associated with document imaging. Vendors have tried a variety of approaches to overcoming this limitation. Regardless of which side of the build/buy decision you fall on, it is important to have an understanding of how this problem is being addressed in order to make informed decisions on the most appropriate technology for your particular situation.
Because the problem of image delivery over the Internet is essentially that of limited bandwidth, the most direct answer is to increase the size of the "pipe" over which the information is carried, thus speeding image delivery. Increasing bandwidth improves application performance, but it comes at a cost. High-speed Internet connections typically have a high monthly cost, and may not be practical or possible for mobile/telecommuting employees or for satellite offices. However, when used in combination with other techniques, this can be a practical answer to performance problems. But this should not be the sole answer to the question of Image delivery speed.
By attacking this problem from the other side of the equation, the importance of the connection speed can be diminished by decreasing the size of the images being transferred. This technique is especially attractive because of the potential for cost savings realized through reduced storage requirements, decreased load on the server, and server side bandwidth savings. There are two methods for reducing the size of the image file:
- Reduce the amount of information: There are several methods by which the image quality can be reduced within an acceptable range in order to reduce the overall size of the image file. However, most of these techniques work only on color images, and are ineffective with the bi-tonal (black and white) images that comprise the bulk of legal document collections. The approach that most significantly reduces the size of images and works for both color and bitonal images is adjusting the resolution of the image. By simply scanning images at 200 DPI (dots per inch) rather than 300 DPI you can generally expect to see around a 30 percent reduction in file size (in TIFF G4 format). Because a computer monitor can only display around 200 dpi, this should not visibly affect the quality of your images. As an added bonus, scanning at this lower resolution will improve the throughput of your scanners and result in better OCR accuracy in most cases.
- Reduce the size of the information: Using compression it's possible to represent the same image in a smaller file without reducing the quality of the image. The most common compression used in document imaging is Tiff Group IV. This compression schema has become the standard both for its ability to store multiple pages in a single file, and more importantly store bi-tonal images in a small amount of space.
For example, an uncompressed scanned image of a single 8-inch by 11-inch page at 300 dpi will run about 1 meg. This same image, compressed using Tiff Group IV compression will typically be reduced to around 55K. While this amount of compression works well for desktop applications, it doesn't go far enough for the transmission on the Internet.
With an average page size of around 55K a typical 20-page document file would be more than a megabyte. On a 28.8 Kbps modem, this 20-page document can take more than five minutes to download.
Unfortunately, improvements in bi-tonal image compression have been slow, and there are few high compression alternatives to tiff. If you are considering an alternate image format to improve compression rates, keep the following in mind:
- How much support is available for this image format? Many of these formats have very few utilities available or require the purchase of expensive toolkits to work with them.
- Can they easily be converted to a standard format? When documents need to be produced or supplied to persons outside of the firm electronically. A way to export them to standard formats is essential.
- How fast can the viewer decompress the images? In some cases, images can take more than 10 seconds per page to decompress to the screen on low-end Pentium-class machines. Using these formats on low-end computers can quickly eat up any performance gains from faster transmissions.
In many situations, the users only need to see a few pages of the document to get the information they need. Thus, if you break the document images down to a page level and only deliver the individual pages as they are requested, the user need not download an entire document just to see the first page or two. This technique is effective with collections consisting of a lot of similar documents or where the users are very familiar with the documents.
This is a popular concept used by many Internet technologies. While this does not actually speed up the transmission of images, it does improve the perceived speed by progressively displaying the document as it is downloaded, allowing the user to see the beginning of the document, or a thumbnail version of the document while the rest is still downloading.
John Fuex is manager of research and development for Infoedge Technology Inc. Web site: www.InfoDox.com.