Anacomp

Capture FAQ

  1. What are the key steps involved in Anacomp's document capture process?
  2. What is the difference between a document, a page, and an image?
  3. What are the standard U.S. paper sizes?
  4. What is document preparation?
  5. When is document preparation required?
  6. What is document de-preparation?
  7. What is the standard format used to store images?
  8. What are the different types of PDF image formats?
  9. What is the image size of a scanned document?
  10. What is image resolution?
  11. What about color files or photographs?
  12. How are double-sided or duplex documents scanned?
  13. How are blank pages handled during duplex scanning?
  14. How are "skewed" images handled?
  15. Can I view combinations of images, text and index fields side-by-side?
  16. Can I open and display more than one document at a time?
  17. What is OCR?
  18. How accurate is OCR?
  19. What is ICR?
  20. What is OMR?
  21. What are barcodes?
  22. What is MICR?
  23. How are images indexed?
  24. Can Anacomp's customers scan documents themselves for viewing on docHarbor Online?
What are the key steps involved in Anacomp's document capture process?

Anacomp has developed an accurate and efficient capture process utilizing state-of-the-art hardware, software, and methodologies. The following table summarized each step within Anacomp's capture process.Top

Process Description
Document Preparation Prepare documents for scanning by removing fasteners, unfolding, repairing, sorting, inserting document separators, etc.
Scan Scan source documents, microfilm, or microfiche into scanning workflow.Recognition Automatically separate documents, identify forms, and perform auto recognition (OCR, ICR, OMR, barcodes, etc.).
Quality Control Rescan, validate form identification, and image quality. Validation Validate auto-recognized data, manual data entry Verification Blind data entry, independent verification if required.
Validation Validate auto-recognized data, manual data entry.
Verification Blind data entry, independent verification if required.
OCR Full Text Full text OCR for each document and output into a specified format if required.
PDF Generator Produce Adobe PDF images
Release Format the images and associated indexing data into the required output format.
Document De-Prep De-prepare documents if required.
What is the difference between a document, a page, and an image?

A "document" consists of one or more pages of data that are typically related or part of a logical group. A "page" generally refers to a physical piece of paper that may either contain data on a single side of the page (simplex) or on both sides of the page (duplex). An "image" generally refers to a digital representation of a single side of a page. For example, consider a monthly bank statement for Account 123456 which contains 12 duplex pages. This statement would then represent 1 document, 12 pages, and 24 images.Top

What are the standard U.S. paper sizes?

Please refer to table below and remember that each successive increase in paper size uses the long dimension of the previous size as the new short dimension, and then doubles the previous short dimension for the new long dimension.Top

Description Dimensions
A Size 8-1/2" x 11"
B Size 11" x 17"
C Size 17" x 22"
D Size 22" x 34"
E Size 34" x 44"
What is document preparation?

Document preparation is the manual process of preparing documents for scanning and can often be critical to the success of any imaging project. Document preparation typically involves but is not limited to the following:

When is document preparation required?

In most cases, document preparation is required prior to scanning. Customers may reduce the amount of document preparation performed by Anacomp prior to releasing the documents for scanning by modifying their internal workflow process. For example, documents may be stored unfolded or the customer may eliminate the use of staples when created.Top

What is document de-preparation?

Document de-preparation is the manual process of preparing documents for physical storage after they have been scanned. The de-preparation process will vary based upon the degree to which the documents must be handled. Document de-preparation typically involves but is not limited to the following:

  1. Returning documents into the envelopes, folders, or other containers from which they came
  2. Inserting staples, paper clips, or other fasteners that bound the pages prior to scanning
  3. Resorting documents
  4. Removing document or batch separator sheets
What is the standard format used to store images?

Black and white images, sometimes referred to as "bi-tonal images" are most commonly stored as standard TIFF files using CCITT Group 4 compression. Grayscale and color images may be stored as TIFF, JPEG, or PDF files and generally result in larger image files.

If scanned images are ingested into docHarbor Online as PDF, the user may select multiple documents and create a combined PDF document.Top

What are the different types of PDF image formats?

PDF stands for Portable Document Format and is the de facto standard created by Adobe. PDF files are compact, cross-platform, and can be viewed by anyone with an Acrobat Reader. There are two formats relating to document capture. The first format, PDF Image Only, is an image bitmap representation of the actual document. The image' s full text is not searchable. The second format, PDF Searchable Image, also referred to as Image+Hidden text, is a combination of a bitmap image in the PDF format with embedded text within the document. The full text of the document is searchable. This format is generally more costly to produce and should be considered when full text searches are justified.Top

What is the image size of a scanned document?

A single image typically occupies approximately 50 KB of disk space if the image is stored in TIFF Group IV. The actual size of an image depends upon the several factors including the image type (bi-tonal, grayscale, color), bit-depth, compression type, resolution, and document data density. The following table

Storage Amount Number of TIFF IV Images
1 MB ~20 images
1 GB ~20 images
Standard CD (images only) Between 10,000 to 15,000 images
Standard DVD (images only) Between 90,000 to 100,000 images
What is image resolution?

Image resolution is measured in terms of dpi (dots per inch) and typically range from 200 dpi to 400 dpi and impacts the file size of the image. The recommended resolution should take several items into consideration - the optimum balance between image quality and image size, number of pages per document, time to download documents from the image repository, and the use of recognition technologies including OCR, ICR, or OMR.

Top

What about color files or photographs?

Imaging systems should support black and white, grayscale and color images. Color files can be scanned with a color scanner or imported into an imaging system.Top

How are double-sided or duplex documents scanned?

An imaging system should provide two different ways to do this. It should support duplex scanners, which simultaneously scan both sides of a page. Also, with a simplex scanner, the user should be able to scan all the front sides, place the documents in upside down and scan all the back sides, and then the system should automatically collate the pages into the correct order.Top

How are blank pages handled during duplex scanning?

Blank pages may be automatically identified and ignored during the scanning process by configuring a predetermined threshold within the capture system. For example, a blank page threshold of 2,000 bytes could be configured to ignore any image that is less than 2,000 bytes in size. Anacomp works with each customer and each application to determine the optimum threshold for blank page handling.Top

How are "skewed" images handled?

Skewed (crooked or tilted) images can adversely affect the accuracy of the OCR process, so an imaging system should include software that recognizes skewed images and compensates for them.Top

Can I view combinations of images, text and index fields side-by-side?

To allow convenient access to document information, a well designed imaging system will allow the view screen to be configured to show the text, images, template index fields or thumbnail images.Top

Can I open and display more than one document at a time?

Some imaging systems will allow you to display multiple documents, with the number of documents you can have open simultaneously limited only by the amount of memory available.Top

What is OCR?

OCR stands for Optical Character Recognition, which is how a computer converts words in an unsearchable scanned image to searchable text. OCR engines can generally only recognize typed or laser printed text, not handwriting.Top

How accurate is OCR?

Accuracy on a freshly laser printed page in excellent condition is typically better than 95%. Accuracy on faxed, dirty or degraded documents may be significantly lower.Top

What is ICR?

ICR (Intelligent Character Recognition) is pattern-based character recognition and is also known as Hand Print Recognition. Handwritten text is more difficult for computers to recognize and results in higher error rates than printed text. ICR engines usually do best at recognizing constrained printing, which means block printed letters with one letter in each box. Accurate recognition of unconstrained handwriting, especially cursive handwriting, typically requires that the ICR engine be trained to recognize each user's style of writing.Top

What is OMR?

OMR (Optical Mark Recognition), also called Mark Sense Recognition, is the recognition of marks commonly used on forms, such as check marks, circled choices, and filled-in bubbles. OMR can be an important part of an imaging system for organizations that process many standard forms.Top

What are barcodes?

A barcode is an array of vertical rectangular marks and spaces in a predetermined pattern used to represent data. Barcodes are generally used in document capture systems to reduce or eliminate certain costs by automating document type identification or index capture. There are several barcode types used in document capture including code 39, code 128, and interleaved 2 of 5.Top

What is MICR?

MICR (Magnetic Ink Character Recognition) is a character recognition system used on bank checks. Special characters are printed using magnetized ink for automatic reading.Top

How are images indexed?

Images may be indexed using several capture methods. docHarbor's Professional Services team will evaluate the most accurate and cost-effective way to meet and/or exceed the scan project's requirements.

Indexing Method Description
Manual Data Entry A key operator manually views a document, identifies the index value, and manually data enters the value.
Manual Data Entry with double blind verification A key operator manually views a document, identifies the index value, and manually data enters the value. A second key operator repeats the same process and is unaware of the index value keyed by the first operator. If the values match, then the document is considered accurate. If the values do not match, then the document is handled as an exception.
Zonal OCR A template is used to identify specific regions of an image to predefined indexes. An OCR process is applied to each region and captures the index value based upon a predetermined confidence threshold.
Barcode Recognition A template is used to identify the location and type of barcode used within a document. A barcode recognition process is used to capture each barcode index valued based upon a predetermined confidence threshold.
Database Merge A unique index field is captured for each document using manual, zonal OCR, or barcode recognition. An automated process is created to match the unique index field with other index fields contained in the database. When a match is made, then the other index fields are associated with the document.
Hybrid Indexing docHarbor will utilize multiple indexing methods and develop the most accurate and cost-efficient system.
Top
Can Anacomp's customers scan documents themselves for viewing on docHarbor Online?

Yes. docHarbor's Professional Services team can assist with the recommended image and indexing format specifications.Top