Hudson Community Enterprises

Giving Special Needs People a Chance to Succeed

Document Scanning Glossary

Bitmap:  Images which are made from a combination of pixels (or individual squares), in various colors, to form a picture.  Enlarging the image allows you to see the individual pixels but distorts the picture.  Shrinking the picture causes the loss of pixels, which affects its clarity.  The formating of the bitmap arranges the individual pixels in a certain pattern which cannot be manipulated.

Burn:  The process of recording or writing data on CD’s or DVD’s.

Corrupt file:  A file whose data has been affected by an outside force, either computer based or environmental.  These could include viruses, hardware or software issues, power outages, temperatures or other factors.  

De-duplication:  The process of removing duplicate records from computer files.  The three types of de-duplication are: production, custodian, and case.

Document boundaries:  Every document includes a beginning and ending page.  These are the boundaries.  When a document is scanned, it is important that the user make note of these boundaries to avoid separate documents running together.

ECM:  Enterprise Content Management.

Enterprise Content Management:  The organization and storage of an organization's documents, both paper and digital, throughout the life of the documents.  

File Transfer Protocol (FTP):  The procedure for enabling data transfer over the Internet.  

Image key:  Term used for a file created by scanning a page of a document.

Imaging:  A document image created in a TIFF or PDF format illustrating how the document will appear when printed.

Index:  A database of words contained in a document and used by the software to locate a particular section of that document as requested by the user.  This index search provides faster access to the information than a search of the entire document.

JPEG:  Joint Photographic Expert Group  This standard image compression allows unlimited colors and is best used for photographic images.

Keyword search: User supplied word or words in searching a document or documents containing the specified word.  

Metadata:  Information about the characteristics of an image at the time it was created.  These include its size, color, resolution, format, history, etc. 

Native file:  The document in its original format.

OCR (Optical Character Recognition): OCR allows a scanned document to be converted into a searchable document capable of being copied and pasted into a new file.  As the technology depends on the quality of the printed material and the conversion accuracy of the software, its accuracy is considered to be 85 percent. 

Parent document:  Initial document in a group of similar documents which would include a transmittal letter or a cover sheet.

PDF (Portable Document Format):  A format owned by the Adobe Corporation and is used to transmit documents which can be protected from being altered.  It is used for safeguarding the integrity of published and printed documents.

Query:  A request for the retrieval of information from a database.

Render:  The processing of documents into a standard format (TIFFs, PDFs).  They can be printed or produced as an electronic file.

Repository:  An archival data-storage receptacle which can be accessed by the user.

Searchable TIFF:  A database image file with OCR‘d text which can be searched.  (See OCR)

TIFF (Tagged Image File Format):  A graphics file format considered to be the standard for lossless images and is basically a picture of a document.   

Unitization:  The process of assembling scanned pages into a document by means of Physical Unitization or Logical Unitization.  Physical Unitization involves objects: staples, paper clips folders, etc. in determining which pages complete the document. Logical Unitization involves human review of pages necessary to complete the document.

Connect with Us

      Hudson Community Enterprises on Google Plus