Who We Are The DIGIEON team has over 80 years combined experience in developing products and offering services primarily in data extraction, DMS, publishing, data capture, retrieval and conversion for companies across multiple verticals.
Home / About Us / White Papers
White Papers The following whitepaper details a sample production conversion. The key to an effective conversion is developing the right processes for the job, and the processes and workflow detailed below are simple and efficient and will yield successful results for any conversion and extraction service. DIGIEON employs task-specific technologies to perform the following tasks:
Document Preparation; Scanning; Indexing and Data Extraction; Bates Numbering; Quality Control; Formatting Digital Documents; Document Conversions; OCR/PDF Processing; Data Export; Project Management; Document Re-assembly.
We use state of the art equipment, facilities, software and experienced professionals to optimize scanning, imaging and retrieval towards providing document scanning services. We utilize the following processes towards building a complete and customized document conversion and storage system either at our secure and efficient facility or at on-site client locations. Our facility has separate work areas for document preparation, scanning, data capture and indexing and quality assurance.
  1. Document Preparation
  2. In order for documents to be scanned in a production environment, pages must be removed from folders and all bindings such as rubber bands, staples and paper clips must be removed. After the bindings have been removed it can be difficult to determine where documents begin and end. We recommended using bar code separator sheets for document breaks. Our process takes this method one step further. Each document bar code sheet is printed on pastel yellow colored paper and each bar code is unique. Each barcode name would be the final document name unless directed otherwise. In addition to unique document bar code sheets, we use unique box identification bar codes and if required, unique folder bar codes. We break large jobs into natural tracking units and we prepare a log sheet for each unit. Typically a tracking unit would be a box of documents. The log sheet for each box is provided and used at each step in the process. For instance the doc prep operator would indicate the bar code sequences used within the box for folders and documents. Special circumstances are annotated on the tracking sheet such as stained pages. This saves time in QA processing. If a bad image is encountered during the QA process, the first thing is to check the log sheet. This could preclude the operator from having to pull the originals to determine that the image shown is because of a problem with the original and not the scanning process. In some cases we do our indexing initially rather than later in the process. When the index information is to be extracted from a Folder Tab we capture an image of that tab along with the Folder bar code sheet. We assign a unique designation for each folder in a box by combining the box number with a folder sequence (Box 123 Folder 022). We print special Folder Level Bar Code Sheets for each folder. We capture an image of the bar code portion of the sheet along with the full image area of the folder tab with the index data. The image data from the tabs along with the bar code data are then keyed from the image to create the index. The index is synchronized with the scanned documents by the bar code sequence. This is Folder Level Tab Indexing. The bar code is used in our quality assurance process to confidently determine that all documents were scanned and separated properly. The log file is simply a list of bar codes used in a box with corresponding check boxes for each. At doc prep the doc prep box would be checked and initialed by the operator. When the operator wants to flag a tab they place a number inside the check box write same number on the back of the log sheet with a written comment about their concern. Notes might be made if oversized, torn, or faded pages were encountered within a tab. Notes might also be made if a copy of a page has a noticeable skew. Typically all of the pages within a folder and the corresponding barcode sheets would be treated as a batch of pages to be processed through the scanner. After document preparation the batch of pages would be remain with the original folder throughout the conversion process. After the folders successfully clear the Quality Control process the documents can be re-attached to the corresponding folder tabs. The yellow separator sheets stick out visually so it becomes a relatively easy process to properly rebind the folders.
  3. Scanning
  4. The scanning process takes place under strict quality control as it is the starting point of a record from a paper document or form to its electronic format. Documents are normally scanned in bi-tonal at 300 dpi for increased OCR efficiency. They can also be scanned at 200 dpi. The same log sheet prepared at the document preparation phase is passed on to the scan operator for logging folders. The folders should be provided and scanned in the same sequence as they were prepared. We will scan in the duplex mode and automatically delete blank pages by file size. Our scan software automatically reads the barcode on the separator sheets and creates a directory with the same name as the barcode and places all images scanned into that directory. The front and back images of the barcode sheet are automatically deleted. The scan operator will scan all images from a folder and then stop to place the stack of pages back in that same folder before going on to the next folder. When using production scanning equipment, it is typically more efficient to delete a batch if a double feed is encountered and re-scan the batch.
  5. Indexing and Data Extraction
  6. As mentioned earlier, indexing could be accomplished at the front end rather than the back end. This allows us to create the unique barcode separator sheets that become a key component throughout all following processes of the conversion. When the highest possible levels of accuracy are critical, we offer OCR (Optical Character Recognition) with cleanup and data entry. The index will be either keyed or indexed via an OCR wand and then printed on the log form. The operators at all following steps will be looking at the barcode sheets and the original documents. It is highly unlikely that a mis-typed index would not be noticed during any of the following processes. The other method of indexing we use is to pull information from the images of a document. Data entry operators enter data from scanned images that is not possible to be extracted using OCR. All this information is indexed and full text searchable. This double key indexing yields a 99% plus accuracy.
  7. Bates Numbering
  8. After the images have been processed and validated through rigorous Quality Assurance steps, the document images will have digital Bates Numbers added to each image if required. A program automatically generates and appends a unique Bates Number to every page in the file, which is then verified by a QC process. The process also checks for document to document continuation and that each number is a continuation of the previous one. The Bates Number is part of the image and not a layer added to the image. In order NOT to interfere with any information on an image we add a strip to the bottom of the image with the Bates Number as part of the image. As we add the Bates Numbers to their corresponding images, we track everything in a database. This allows us to later rename a document by the Bates Number of the first page of a document. We can also then record in the database the beginning and ending Bates numbers for each document.
  9. Quality Control
  10. At the outset, prior to starting any production work, the operations, technical and project management team communicate and mutually review the work requirements as per the SOW. Developing a process that best fits the job is one key to Quality Control. Document conversion is a tedious job and we know that mistakes will be made, even by the very best operators. Therefore, we try to develop a process and procedures that automatically flag any inconsistencies for further investigation and mistakes are easily discovered at the processes that follow. We have two stages of keying and three stages of QC, Final QC and Post FQC. For instance if the indexing operator mis-keyed an entry, it be easily caught at a following step. The mistake would be annotated on the log sheet and corrected. If the doc prep operator missed a folder it would be noted by the scan operator as he checks off the folders on the log sheet. If a scan operator skipped scanning a folder, the Quality Control operator would catch it when they compare the scan results against the log sheet. When poor quality images are encountered at Quality Control, the first step is to check the log sheet for any special annotations. A poor image due to a poor original may not be able to be corrected. If an image appears to be abnormally skewed you would typically have to go back to the original for rescanning. However, if the log sheet has a remark that the document in question has a skewed photocopy, you need not expend the time and effort to pull that original, thereby making the process a little more efficient. In this case the log file and the digital file names are the same so checking images against the log file is rather simple. If document or page of a document is flagged for rework, we will typically rescan and replace the entire document. Because the scanners are so fast, it is easier and less prone to error, if we treat at the document level rather than the page level.
  11. Formatting Digital Documents
  12. After scanning we process the images to correct for skewing of the images that may have occurred during the scanning process. In addition we can automatically crop the images to remove black borders, and de-speckle the images to remove extraneous background noise. We use a production grade PDF conversion package that creates multi-page PDF/A files. The software converts all images within a directory to a multi-page PDF/A and names the file the name of the parent directory. In this case the name of the directory was created from our barcode sheets which already have already incorporated the proper naming convention. The software is state of the art and can perform the following –
    • Improve the contrast resolution of all originally captured data from the contracts.
    • Ensure that all image raster content is ISO-compliant for consistent functioning within the Adobe compliant PDF wrapper files.
    • Image content is presented using ISO and Adobe compliant file to improve image display utility for near and long-term interpretation and archival purposes.
    • ISO methods used in above listed bullets significantly improve overall rendering/display performance and costs of reuse i.e., all raster content can be output bit for bit. Other common methods of putting raster content into PDF files from TIFF formats are adapted and suffer pixel sample rate losses. Further losses are also encountered when exported out of PDF’s.
    All images will be delivered, as indicated above, in Adobe/ISO-compliant and tested PDFs. While our software can produce CCITT compression within the PDF, we can also provide a much better, more efficient, standard compression by using JBIG2 lossless compression within a PDF for the same price. It will yield a smaller file that appears crisper and renders faster.
  13. Document Conversions
  14. We handle document conversion from any document format into the exact format required. We provide full support for most of the formats. Using a reliable and cost effective conversion process, documents are scanned and converted to a wide variety of formats, or transformed from one format to another.
  15. OCR/PDF Processing
  16. We add the Bates Number to the image prior to running the OCR process so that a user could do a full text search for an easy method of finding a Bates Number. By using OCR an operator can do full-text searching across an entire repository of information. OCR only works for typewritten characters. The results and accuracy of the OCR depend on the quality of the image. In OCR processing, the digital documents will be run through OCR software. OCR supplements the manual indexing as it can be done only for typewritten characters. By using OCR an operator can do full-text searching across an entire data repository.
  17. Data Export
  18. After the creation of multi-page PDF files, we will export the images along with all index information to a self-contained software product called DIGIArcx™. The product will consist of all images, full-text OCR data, document indexes mentioned above, and management software. The data can be contained on a DVD or a USB External drive. If desired the material can be transferred to a local hard drive for faster operation. Alternatively, all files can be loaded along with text into a document imaging system or to a web based document management system called DIGIDocx™. All generated files from the previous steps like PDF conversion, OCR file conversion, etc. are loaded into the system. These files are now part of an archive that enables easy search, retrieval and management.
  19. Project Management
  20. A senior project manager will be on-site through the conversion.
  21. Document Re-assembly
  22. As mentioned earlier document re-assembly is made easy because of the pastel colored barcode separator sheets. The documents will remain with their original folder throughout the process. The colored sheets and the sequencing of the sheets make it easy to determine where the document breaks occur and the operator also has the log sheet as a work list. Re-assembly would not occur until an entire job sheet has been cleared through Quality Control. The corresponding log sheet becomes a work list and checklist for folders that have been successfully re-assembled.