DSS SCANNED DOCUMENT PROJECT
This document describes the process for updating the DSS scanned document
web page. The scanned document files have been put in dataset ds002.0.
- Most of our scanned documents are created by the Image and Design
Center. Cecilia ships them documents that need to be scanned. The IDC
then uploads the scanned files to /huron/ftp/rossby/upload/papers.roy.
The scanned files are scanned at 400bpi and are multi-page tif files
with group 4 compression. The format of the file name is llnnnn.tif, where
ll = 2 letter abreviation of the persion having he docs scanned and nnnn
is a sequential number identifier for that persion.
Cecilia then checks the scanned version against
the original
- Cecilia keeps an Excel page on her PC which describes each of the
scanned documents. She also keeps track of which have been sent to
the IDC, which have been returned and which have been checked and are
ready for release and public consumption.
- After the scanned file had been checked and approved by Ces, it has to be
moved to /huron/ftp/docs/papers-scanned/tif. A PDF version is then
created in /huron/ftp/docs/papers-scanned/pdf. The PDF version is what
is served via the DSS scanned documents web page. This step is done by a
DSS member.
- The DSS scanned document web page has to be updated to reflect the new
documents.
- A script has been created to move the files from the upload area to
/huron/ftp/docs/papers-scanned/tif directory, create the pdf and update
the web page. The script is in /huron/dss/bin/scan2web. When adding
4 or 5 scanned docs to the web page, the script takes about 10-15 minutes
to run. This is what the script does :
- takes as input a TAB delimited version of the Excel file describing
each scanned document. This is the file noted above on Ces's PC.
- The script compares the original (in /huron/rossby...) of each document on the
list to the version (in /huron/ftp/docs...). If there is a difference, a
new version of the original is copied to the docs area and the
PDF is created. This step is necessary as Roy occassionaly updates some
of his documents. At this point, it takes about 2 hours to convert
ALL of the documents from tif to PDF, so doing it this way is much
faster.
The script also copies any new files from the rossby/upload area to
the docs/papers-scanned area and created the PDF files
- A web page is then created and put in
/huron/ftp/docs/papers-scanned/internal/papers.html. All the links
in this page are then checked.
- Output from the scipt goes to standard out AND you get a copy
via email. The output will notify you of problems in finding files
in the rossby/upload directory and will note any bad links.
If everying is ok, You have to move the papers.html files
from the internal directory to /huron/ftp/docs/papers-scanned directory
The files are periodically written to the MSS. Dataset ds002.0 has
been created to keep track of the MSS files. Most of Roy's files are tarred
50 files to an MSS y volume. Files for everyone else are tarred into
another file.