DSS SCANNED DOCUMENT PROJECT

This document describes the process for updating the DSS scanned document web page. The scanned document files have been put in dataset ds002.0.

Most of our scanned documents are created by the Image and Design Center. Cecilia ships them documents that need to be scanned. The IDC then uploads the scanned files to /huron/ftp/rossby/upload/papers.roy. The scanned files are scanned at 400bpi and are multi-page tif files with group 4 compression. The format of the file name is llnnnn.tif, where ll = 2 letter abreviation of the persion having he docs scanned and nnnn is a sequential number identifier for that persion. Cecilia then checks the scanned version against the original
Cecilia keeps an Excel page on her PC which describes each of the scanned documents. She also keeps track of which have been sent to the IDC, which have been returned and which have been checked and are ready for release and public consumption.
After the scanned file had been checked and approved by Ces, it has to be moved to /huron/ftp/docs/papers-scanned/tif. A PDF version is then created in /huron/ftp/docs/papers-scanned/pdf. The PDF version is what is served via the DSS scanned documents web page. This step is done by a DSS member.
The DSS scanned document web page has to be updated to reflect the new documents.
A script has been created to move the files from the upload area to /huron/ftp/docs/papers-scanned/tif directory, create the pdf and update the web page. The script is in /huron/dss/bin/scan2web. When adding 4 or 5 scanned docs to the web page, the script takes about 10-15 minutes to run. This is what the script does :
1. takes as input a TAB delimited version of the Excel file describing each scanned document. This is the file noted above on Ces's PC.
2. The script compares the original (in /huron/rossby...) of each document on the list to the version (in /huron/ftp/docs...). If there is a difference, a new version of the original is copied to the docs area and the PDF is created. This step is necessary as Roy occassionaly updates some of his documents. At this point, it takes about 2 hours to convert ALL of the documents from tif to PDF, so doing it this way is much faster.
  
  The script also copies any new files from the rossby/upload area to the docs/papers-scanned area and created the PDF files
3. A web page is then created and put in /huron/ftp/docs/papers-scanned/internal/papers.html. All the links in this page are then checked.
4. Output from the scipt goes to standard out AND you get a copy via email. The output will notify you of problems in finding files in the rossby/upload directory and will note any bad links.
  
  If everying is ok, You have to move the papers.html files from the internal directory to /huron/ftp/docs/papers-scanned directory

The files are periodically written to the MSS. Dataset ds002.0 has been created to keep track of the MSS files. Most of Roy's files are tarred 50 files to an MSS y volume. Files for everyone else are tarred into another file.