SuperJournal Production
Process
Home | Search | Demo | News | Feedback | Members
Only
Ann Apps, Manchester Computing, University of Manchester
SuperJournal Technical Report SJMC130
Contents:
1. Overall Approach
2. Journal Data Transfer
3. Journal Data Receipt
4. Verification
5. Archive
6. Journal Data Pre-Processing
7. Data Conversion
8. Work Tidy File
9. Application Data Load
10. Index New Data
11. Error Logging
12. Tables of Contents
13. New Journals
14. Background Tasks
15. Technical Appendices
This document details the actions required to process the journal data received from
the publishers, from receipt through to loading into the SuperJournal application. It is a
"user guide" defining the actions to perform during the Production process
rather than a description of the Data Handling and Conversion process which may be found
in [SJMC140].
Within this document, the journal names, and hence some conversion program names have
been masked. A "Manchester Computing internal" version of this document also
exists [SJMC131] which uses real journal and program names, thus providing the
"real" Data Handling User Guide used for day-to-day SuperJournal production.
For a diagrammatic representation of the Production Process see Appendix 15.1.
1.1 Data Handling Environment
All the data handling operations should be run as the user on the CS6400 supjinfo
so that all file ownership is consistent, except where stated differently. All executable
programs are accessible from the "PATH" of supjinfo. The SuperJournal
journals directory structure is defined in Appendix 15.4.
1.2 SuperJournal Data Spreadsheet
A SuperJournal Data Spreadsheet is maintained which records the data handling actions
performed from receipt to data load. Details of this spreadsheet are given in Appendix 15.2. Events which should be recorded are
indicated throughout this User Guide.
Journal data is generally transferred from the Publisher to SuperJournal by FTP. Data
consists of:
- either PDF (or PostScript) articles with associated SGML headers. In some cases the SGML
headers are packed into a single file,
- or Full text SGML or HTML articles with associated graphics, usually also PDF articles,
and in some cases SGML headers, possibly packed.
The actual format of supplied data may be:
- A tarred, gzipped file. This is the preferred supply format.
- A zipped file.
- A directory, or directories, containing the uncompressed files for the issue.
Details of how particular journals are supplied are given in Appendix 15.3.
Publishers should send an email notification of journal data transfer, but a check for
data arrival should be made on the FTP site regularly by:
du /superj1/Publishers | more
On receipt of a new journal issue a new directory should be created within the
SuperJournal standard journals directory structure, with a symbolic link from the main
journals directory where necessary. The SuperJournal journals directory naming convention
is defined in Appendix 15.4. Thus the directory will
be:
/superj<n>/<Publisher>/<JournalIdentifier>/V<Volume>I<Issue>
Copy the data to this new issue directory. Unzip and untar where necessary.
- For gzipped/tarred files:
- gunzip <file>.tar.gz
- tar xvfm <file>.tar
- For zipped files:
- sjunzip <file>.zip
- [sjunzip is an alias for a not generally available unzip facility on the CS6400]
It is sometimes necessary to flatten publisher supplied directory structures. All
supplied files should be in the main issue directory, except for any graphics files which
should be in a single sub-directory, which will eventually be called graphics.
Exceptions to this are:
- MGP1: there is a separate graphics directory for each article.
- MC1: there is a separate directory for each article.
3.1 Extraneous Files
Copy any extraneous files which may possibly be required in the future to another
directory. Delete any unwanted extraneous files, e.g. blank PDF pages; files in
unnecessary formats. Details of dealing with extraneous files are given in Appendix 15.3.4.
3.2 Unpack SGML Headers
For journals where the SGML headers are supplied packed into one file, this file must
be unpacked.
- For journals: CCS1; CCS3; CCS8; PS1; MGP15; MGP6;
MGP11.
- Change file extension of packed headers file to something other than ".sgml".
- ccs1hds < <HeadersFile>
- For MC11
- rmctrlms <HeaderFile>
- If the PDF files do not have a .pdf extension: mkextpdf -e .PDF *.PDF
- mc112sj <HeaderFile>
- [Note that the program mc112sj also renames the PDF files so that they are
consistent with the SGML header files]
Problems with the data transfer and requests for resupply should be sent to the
publisher at this stage, though there could be later problems with the actual data.
If the data has arrived and unzipped OK:
- Email an acknowledgement to the publisher.
- Catalogue on the SuperJournal Data Spreadsheet: journal issue; cover date; number of
files of each type; receipt date.
- Report the above information to the SuperJournal Production Coordinator. The Production
Coordinator should also be notified of bound paper copies of journal issues received by
Manchester Computing.
- If the journal issue directory is not on disk /superj1, create a symbolic link in /superj1/<Publisher>/<JournalIdentifier>/
by
- ln -s
/superj<n>/<Publisher>/<JournalIdentifier>/V<Volume>I<Issue>
.
- Check that the numbers of SGML and PDF files are consistent. Significant missing files
should be reported back to the publisher, and the data conversion process put on hold
until they are supplied, i.e. when the actions below do not resolve the problem.
- If there are too many PDF files, look at the extra ones.
- Change the file extension to ".pde" for any files which contain unwanted
pages, e.g. end pages, tables of contents.
- If an article such as a Book Review has one SGML header but separate PDF files for each
review, reassemble into one PDF file and delete any consequent extraneous PDF files.
- If a simple header is missing, then it may be created by SuperJournal rather than
waiting for publisher supply. This is often necessary for "Errata" articles, and
sometimes for "Reviews".
- If there are too many SGML header files.
- If a PDF file contains several short articles each of which has an SGML header, create
symbolic links to replicate the correlation between the SGML headers and PDF files.
5.1 Long Term Archive
Data which has been supplied tarred and gzipped or zipped is archived by using the
Legato Networker software attached to the CS6400.
- Copy the data to the directory: /superj2/Journals/SJ/ARCHIVE (Archiving is
performed from this directory so that any subsequent retrieval will also be to this
directory.)
- Archive by: nwarchive [Note that this command should not be run in the
background.]
- For each file to be archived, select the file and then "Start". Fill in the
options:
- Description: supjinfo <Date> <JournalIssue>
- Archive Pool: Archive
- Clone Pool: Archive Clone
- Compression: uncheck
- Clone: select
- Verify: select
- Grooming: select (to remove the files from the current directory when archiving is
complete)
5.2 Temporary Archive
To protect against any possible destructive errors in the data conversion process, copy
the files within the new issue directory to a temporary directory:
and if there are graphics files:
- cp graphics/* ../tmp/graphics/.
[Note that more detailed temporary archiving may be needed for journals with more
directory structure.]
These files should be removed when the data conversion is complete.
5.3 SuperJournal FTP Site Tidy
The newly supplied data should now be removed from the SuperJournal FTP site.
6.1 PDF Articles
6.1.1 PDF File Naming
If the file extension of the supplied PDF is not ".pdf", then this must be
corrected by:
- mkextpdf -e .<extn> *.<extn>
where <extn> is the supplied file extension, typically ".PDF".
If the SGML and PDF file names are inconsistent, but the difference is only in case:
to make all characters in filenames lower case.
Any further file renaming necessary should be performed by the SGML pre-processing
scripts detailed below, but occasionally manual intervention is necessary to make the PDF
and SGML file names consistent.
6.1.2 PDF Security
Indexing by the RetrievalWare search engine requires that the PDF articles have no
security settings at all, i.e. no password and no restrictions. PDF articles which are
known to have security settings, i.e. MGP5 and maybe PS1 (both of which use
passwords), should be processed using Acrobat Exchange to remove the security settings. To
use Acrobat Exchange on the CS6400 as user supjinfo type:
Any further PDF articles with security settings will be discovered during the Main
Header Conversion phase, and should then be corrected.
6.1.3 PDF Generation from PostScript
Journals CCS5 and PS18 are supplied as PostScript, generally one file per
page of the journal issue, which must be distilled into PDF articles.
- Move all the PostScript files into a separate directory called V<Volume>I<Issue>ps
- The files should have a ".ps" extension which is added or corrected by one of:
- addextps *
- mkextps -e .<extn> *.<extn>
- Identify the Table of Contents pages. Correlate the articles and pages using the SGML
headers.
- Put the pages into a directory for each article: art1, art2, etc., leaving
any extra pages in the main directory. It is important that these files are added to the
article directory in the correct page number order, because files are distilled in the
order they were added to the directory not alphabetically.
- Copy /superj2/Acrobat3/Distillr/Xtras/RunDirEx.txt to the main PostScript
directory for this issue. A copy of this file is needed for each article; if the SGML
headers are called e.g. s1.sgml, s2.sgml, etc., these files should be called
s1, s2, etc. In each of these files edit the "/Pathname" line to
read, e.g. for art1:
- /Pathname (art1/*.ps) def
- Distill to PDF by
- Check the s<n>.log files, in particular that the pages have been distilled
in the correct order.
- Move the created PDF files to the journal issue directory.
6.2 SGML Pre-Processing
6.2.1 Control Characters
If the supplied SGML files contain Control characters, in particular ControlM
characters at the end of lines, remove these characters by
where <extn> is the file extension for the SGML files. Note that for
journals MGP13 and MC4 this program must be run on both the SGML headers and
the full text SGML.
6.2.2 Journal Specific Pre-Processing
Journal specific pre-processing may rename the files, both SGML and PDF as well as
modifying the SGML. The pre-processing required for each journal is as follows. Note that
the file extension shown for the SGML files is the most commonly used one, but actual
supplied data may differ.
- CCS1; CCS3; CCS8; PS1; MC11
- No pre-processing is required.
- CCS2; CCS4; CCS6; CCS9; CCS12; PS2; PS6; PS9;
PS11; PS13; PS15
- CCS5; PS18
- Check that SGML headers will sort in page number order. The filename of the last article
may need changing.
- mkextsgml -e .ehd *.ehd
- Manually edit the authors within the SGML headers to: separate the authors; separate
their first and surnames; remove extra punctuation. Authors should be e.g.:
- <aug><au><fnms>Fred</fnms><snm>Bloggs</snm></au>
- <au><fnms>John</fnms><snm>Smith</snm></au></aug>
- CCS7; PS8; PS17
- newccs7pre -c <CoverDate> *.sgm [where <CoverDate> is of the form: mmyyyy]
- CCS10; PS4; PS16
- CCS11; PS3; PS7; PS10; PS14
- PS5; PS12; MC2; MC3; MC5; MC7
- ps5files -c <CoverDate> -n *.sgc [where <CoverDate> is of the form: mmyy].
Without the "-n" option author-affiliation references are generated from author
groupings. The "-n" option is generally required for current data, but not for
some back data.
- MGP1
- rmctrlms *.sgm [to remove possible Control characters at end of files]
- Rename any "a" or "b" files: e.g. rename 335a123 to 3350123a
- mkextsgml -e .sgm *.sgm
- grep "http:" *.sgml
- Check for and correct newlines, etc. in the URLs
- Look for any equations supplied as GIF files by:
- ls <v>n<i>p*/math* (where <v> is volume number and <i>
is issue number)
- Change SGML markup for any GIF equations
- Check that <v>n<i>p directories have write access
- MGP2
- addextsgml *
- grep "http:" *.sgml
- Check for and correct newlines, etc. in the URLs
- grep "<issue>" *.sgml (to check issue numbers)
- MGP3; MGP7
- MGP5
- MGP6
- domgp6html -v <Volume> -i <Issue> *.html
- Check the HTML files, particularly the headers and footers
- Check the PDF filenames in the generated file figsmv are correct and edit the
file if necessary
- figsmv 2> figs.log
- mgp6mgpfiles -h *.sgml (where the "-h" option indicates that HTML articles
are available as well as PDF)
- mkexthd *.sgml (to rename all the SGML headers with a ".hd" extension)
- MGP10; MGP12 (previous data)
- domgp10pre -c <CoverDate> *.gml (where <CoverDate> is of the
form: mmyy)
- MGP10; MGP12 (current data)
- MGP11; MGP15
- mgp6mgpfiles -p *.sgml (The -p option indicates that only PDF articles are
available)
- MGP13; MC4
- The initial files should be:
- SGML header: nnnnpppp.nnn
- PDF: nnnnpppp.pdf
- Full text SGML: ppppllll.new
- Graphics: graphics/ppppllllfn.gif
- where: n is a number; pppp is article first page; llll is article
last page; f may be one of "f", "s", "e",
"u" for figure, scheme, equation, unnumbered equation
- For MC4 only: grep "<LI" *.new | more
- Surround any lists found by <P><LIST>...</LIST></P>
- For MGP13 only: add <figr> references to the figure captions at the
end of the full text SGML file; merge all captions for one table into a single table with <BR>
separators.
- mkexthd -e .<extn> *.<extn> (where <extn> is the header
file extension)
- domc4fnames *.new (to correct figure, etc. identifiers)
- Pre-process header SGML:
- For MC4: domc4pre *.hd
- For MGP13: domgp13pre *.hd
- Assemble full text SGML file:
- For MC4: mc4sgml *.hd
- For MGP13: mgp1sgml *.hd
- grep "http:" *.sgml
- Check for and correct newlines, etc. in the URLs
- MC1
- The initial files are separate directories per article, generally A*, each
containing:
- main.sgm
- main.pdf
- figures
- Check that figure file types correspond to their file extensions
- Remove any garbage from the start of EPS files, i.e. anything preceding %PDF
- mkdir graphics
- Manually add article attributes ppf and ppl in the SGML files using the
publisher supplied .toc file.
- MC9
- Manually edit the authors within the SGML headers to: separate the authors; separate
their first and surnames; remove extra punctuation. Authors should be e.g.:
- <aug><au><fnms>Fred</fnms><snm>Bloggs</snm></au>
- <au><fnms>John</fnms><snm>Smith</snm></au></aug>
- mc9files *.ehd
6.2.3 Error Checking
After running the relevant journal pre-processing script check the named log file. Any
errors should be corrected in the base files and the pre-processing rerun before
progressing to the main data conversion. At this point a check should be made that the
numbers of SGML and PDF files are still the same as supplied. A difference in numbers
caused by file renaming may indicate:
- A mistyped page number in an SGML file.
- A mistyped volume number in an SGML file.
- A wrongly supplied SGML file, i.e. two SGML files contain the data for the same article
and another one is missing.
In the first two cases, the base SGML data should be corrected and the pre-processing
rerun. In the last case, this must be reported to the publisher and data conversion for
this issue put on hold until a correct article is resupplied.
7.1 Full Article SGML Conversion
7.1.1 Journal Specific Conversion
Journals supplied as full text SGML require the following journal specific conversion:
- MGP1
- mgp12sj *.sgml
- Check log file, correct any errors and rerun.
- mkdir graphics; chmod a+r graphics
- figsmv 2> figs.log
- Check figs.log. Sort out any problems.
- MGP2
- mgp22sj *.sgml
- Check log file, correct any errors and rerun. Note: remove any <subfig>s
manually to separate figures; for ",Jr.," in <refau> change to
";Jr.,".
- fsjmv
- figsmv 2> figs.log
- Check figs.log. Sort out any problems.
- MGP3; MGP7
- mgp32sj *.sgml
- Check log file, correct any errors and rerun.
- MGP13
- mgp132sj *.sgml
- Check log file, correct any errors and rerun. Typically some internal references require
manual insertion.
- figsmv 2> figs.log
- Check figs.log. Sort out any problems.
- MC4
- mc42sj *.sgml
- Check log file, correct any errors and rerun. Typically some internal references require
manual insertion.
- figsmv 2> figs.log
- Check figs.log. Sort out any problems.
- MC1
- mc12sj -v <Volume> -i <Issue> -c <CoverDate> A* [where <Volume>
is the volume number; <Issue> is the issue number; <CoverDate>
is of the form mmyy; article directories are A*]
- grep "http:" *.fsj [to check URLs]
- fsjmv
- figsmv 2>> figs.log & [This should ideally be run overnight because it
takes a long time.]
- Check figs.log. Sort out any problems.
7.1.2 General Full Article SGML Conversion
For all the above journals:
- Generate temporary HTML articles
- sj2html *.fsj
- Note that this creates temporary HTML articles whose existence is required for the Main
Header Conversion, without any reference linking. The HTML articles are regenerated later
in the conversion process.
- Ascertain Medline identifiers for article references
- sjref *.fsj [to extract the references in Medline format into the file medrefs]
- Email file medrefs to Medline at citation_matcher@ncbi.nlm.nih.gov
[This may be done as a different user from supjinfo.]
- Copy reply from Medline to file medids
- Generate SGML headers in files *.hd
- For MC4 only:
- Copy existing SGML files to a sub-directory spheads (Keywords are probably
incorrect in supplied headers.)
- For all the above journals except MGP13:
7.1.3 Figure Problems
Figure problems are generally:
- Unsupplied figures which should be reported to the publisher.
- Figure identifiers mistyped in the SGML file, with resulting reference to the wrong
figure.
- Misnamed figures, including case problems.
- Figures spread across several GIF files, denoted as "a", "b", etc.,
but not marked up as such in the SGML. In this case dummy figure definitions must be added
to the SGML file.
7.1.3.1 ImageMagick
ImageMagick is generally a useful tool for investigating figures. A figure may be
displayed by:
- imdisplay <fig>.gif [where imdisplay is an alias for: /superj1/ImageMagick/display]
7.2 PDF Article Reference Extraction
Journals for which PDF reference extraction programs exist require the following
conversion:
- MGP5
- mgp5refs *.fsj
- sjref *.fsj
- Email file medrefs to Medline as for full article SGML processing.
7.3 Main Header Conversion
7.3.1 Validation
- Run the header conversion validation program:
- For journals where the SGML headers are in files *.sgml:
- For journals where the SGML headers are in files *.hd:
- Check log file, sjconv.log, correct any errors and rerun.
7.3.2 Header Conversion
- Run the main header conversion program:
- For Communication and Cultural Studies and Materials Chemistry journals
where the SGML headers are in files *.sgml
- For Molecular Genetics and Proteins journals where the SGML headers are in files *.sgml
- For Political Science journals where the SGML headers are in files *.sgml
- For journals where the SGML headers are in files *.hd
- Check BRS verification log file sjbrsvfy.log. Errors about lengths of lines and
numbers of characters may be ignored, they simply warn that BRS has inserted a newline.
- Check log file sjconv.log. Correct any SGML errors and rerun. PDF errors may also
be indicated:
- PDF file has security settings: remove security with Acrobat Exchange, then rerun PDF to
text by:
- dopdf2txt *.pdf; chmod a+r *.txt
- PDF file is corrupt. It may be possible to mend the file using Acrobat Exchange,
otherwise it should be reported back to the publisher. If only one or two PDF files are
corrupt, a journal issue may be loaded with dummy versions of the PDF files as symbolic
links to: /superj1/Journals/SJ/dummy.pdf. These dummy PDF files can be replaced
later by the resupplied PDF files. Otherwise, the issue data conversion must be put on
hold until new PDF files are supplied.
- Other PDF error messages can generally be ignored, but the article should be checked in
Acrobat that it displays correctly.
7.4 Reference Linking
- Medline links are added to the SuperJournal Full SGML for Molecular Genetics and
Proteins journals supplied as full article SGML and those where references have been
extracted from the PDF files, ie: MGP1; MGP2; MGP3; MGP5; MGP7;
MGP13:
- Intra-SuperJournal links are added to the SuperJournal Full SGML for journals supplied
as full article SGML and those where references have been extracted from the PDF files,
ie: MGP1; MGP2; MGP3; MGP5; MGP7; MGP13; MC1;
MC4:
- addsj *.fsj [Note that the Main Header conversion must have been run before this
step so that any links to articles in the same issue are generated.]
- Medline links are added to the PDF files for journals where references have been
extracted from the PDF, i.e. MGP5:
7.5 HTML Generation
HTML articles and MiniContents are generated for journals supplied as Full SGML, i.e. MGP1;
MGP2; MGP3; MGP7; MGP13; MC1; MC4:
- sj2html *.fsj
- sjminc *.fsj
7.6 Medline Article Links
For Molecular Genetics and Proteins journals, Medline identifiers are determined
for the articles themselves:
- sjartref *.sj [to extract the references in Medline format into the file artmedrefs]
- Email file artmedrefs to Medline.
- Copy reply from Medline to file artmedids
- addartmed *.sj [which adds the Medline identifers to the SuperJournal Header SGML,
.sj, files, and also to the SuperJournal Full SGML, .fsj, files if these exist.]
- In reality the Medline identifiers are often not available for new journal issues, so
this step must be performed at a later time.
- To update just the .sj files use addartmedsj; to update just the .fsj
files use addartmedfsj.
When the data conversion for a new issue is complete:
- Delete any editor back-up files.
- Remove write permission from all the files as a security precaution:
- Delete temporary back-up files:
- rm ../tmp/* [../tmp/graphics/*]
- Catalogue on the SuperJournal Data Spreadsheet that conversion has been performed,
including information about the PDF files.
- Create the SuperJournal application data load file by editing: /superj6/NewApp/bin/headerFileList.tmp
- This file should contain a line for each journal issue to be loaded of the form:
- <Cluster>,/superj1/Journals/<Publisher>/<JournalIdentifier>/V<Volume>I<Issue>/catalog
- where <Cluster> may be: CCS; PS; MGP; MC
- Data is loaded overnight using the Unix at command, e.g. to load data at 2am the
following morning and mail output to supjinfo:
- at -m 0200 tomorrow
- at>perl loadExportMailProcess
- at>CtrlD
- Email notifications of the new journal issues loaded, both Table of Contents and
"Personal Alert", are sent out to users who have requested them, during the
SuperJournal application data load process.
- Check data load.
- Check the load log which is a dated file in: /superj6/NewApp/bin/loaderLog
- Check the new journal issues within the SuperJournal application.
- Check email notifications (it is sensible for the person responsible for the data
handling to have requested email alerts for all the journals within SuperJournal).
- Catalogue the data load, including the load date, on the SuperJournal Data Spreadsheet.
- If notes are required alongside the new issue's entry on the journal issue page
within the SuperJournal application, e.g. to indicate a combined issue such as
"1-2", these may be entered by:
- In the directory /superj6/NewApp/bin/utility run the program
- writeIssueNotice
- When prompted, type in the issue key in the format <JournalIdentifier>V<Volume>I<Issue>
- When prompted, type in the required notes, but replacing any spaces in the string with
"_". E.g. to add a note: "[Issue 1-3, Part 4]" type in
"[Issue_1-3,_Part_4]"
The SuperJournal application includes three search engines, requiring the new journal
issue data to be indexed by each of them.
10.1 Isite
The new issue article headers are indexed by Isite during the SuperJournal application
data load. No further action is required, except an occasional separate re-index when the
number of Isite databases becomes too large.
10.2 RetrievalWare
The data is indexed, as both article headers and full articles, by RetrievalWare during
the SuperJournal application data load. No further action is required for the article
header index. Following the data load it is necessary to check that the full article index
was performed correctly by:
- Check the file indexlog.dat in the appropriate RetrievalWare Index Log directory
(see Appendix 15.5).
- Note that this file is overwritten by each data load, so it must be checked in a timely
fashion.
- If the "number of files to index" at the start of the file and the
"number of files indexed" at the end of the file do not tally, determine which
files have not indexed correctly. Failure to index usually indicates a corrupt or secured
file. Index failures were a problem with earlier versions of RetrievalWare, but they
appear very rarely with the current version.
- Some PDF files have been created as images and do not contain any text. This is apparent
from the small number of characters indexed indicated at the end of the file indexlog.dat.
- Record on the SuperJournal Data Spreadsheet that the full article indexing was
successful, or if there were problem files record them. Note that there is a separate
column for PDF or HTML indexing in the journal clusters where this is relevant.
10.3 NetAnswer
The new issue article headers are indexed by BRS, the search engine which underlies
NetAnswer, as a separate operation during the data handling process. This action must be
performed under a user who has BRS load privileges. The data to be indexed is in BRS
format in the *.brs files in the issue directory which are created by the main
header conversion program. Indexing is run within the new issue directory.
- sjbrsldg<n> *.brs [where <n> is the number of the
relevant BRS database (see Appendix 15.6)]
- Check the log file in /superj1/Journals/SJ/brsld.log [checking for the string
"ABN" (ABNORMAL) is sufficient]. Note that this file is overwritten by each BRS
indexing.
- Copy the log file to the issue directory, to provide a record that the index has been
performed.
- Record on the SuperJournal Data Spreadsheet that the BRS index has been performed.
10.3.1 BRS Data Modification
If it becomes necessary to modify the data indexed by BRS:
- Find the BRS document number by using the simple BRS search interface brsmate.
- At the start of each file to be modified add:
- ..Document-Number:
- <5spaces><BRSDocumentNumber>
- Note that it is important that the <BRSDocumentNumber> is preceded on its
line by exactly five spaces.
- Verify the edited files by:
- sjbrsvfjg0 *.brs and check the log file for the verification.
- Modify the documents from the edited files:
All the SuperJournal data conversions write error and diagnostic information to log
files. Between each separate stage of the data conversion process these log files should
be checked. Any indicated errors should be corrected and the program re-run before
continuing to the next stage of the process. The majority of these errors are caused by
incorrect base data, but some may be program bugs which require correction. Error checking
has been indicated in the above sections only where errors are likely to occur, but every
log file should be checked. The particular data conversion program run indicates to the
data handler the name of the relevant log file.
The log files are left in the journal issue directory after the data handling process
is complete. The existence of a particular log file indicates that this part of the
process was performed.
Tables of Contents are created or updated within the SuperJournal/Journal/Issue
hierarchy during the data handling process. These files are created or updated
automatically during the Main Header Conversion phase. No manual intervention should be
required, except where a journal cluster is moved onto a new disk.
12.1 SuperJournal Contents
The top-level Table of Contents file is: /superj1/Journals/SuperJournal.toc.
This file lists the journals available within SuperJournal by cluster, and indicates the
latest issue. A new "dummy" entry must be made manually in this file if a new
journal is added to SuperJournal (see below).
12.2 Journal Contents
The journal contents file is: /superj<n>/Journals/<Publisher>/<JournalIdentifier>/<JournalIdentifier>.toc
where <n> is the number of the disk containing the data for the particular
journal. It lists the issues available for this journal, latest issue first. There is a
symbolic link to this file from /superj1/Journals/<Publisher>/<JournalIdentifier>/<JournalIdentifier>.toc.
If the disk where data for a journal is held is changed (e.g. Molecular Genetics and
Proteins data is held on /superj5 from January 1998, but on /superj2
previously):
- Move the journal contents file to the journal directory on the new disk.
- Change the symbolic link in the journal directory on /superj1 to point to the new
journal contents file.
12.3 Issue Contents
The issue contents file is: /superj<n>/Journals/<Publisher>/<Jid>/V<Volume>I<Issue>/<Jid>V<Volume>I<Issue>.toc
where <Jid> is the journal identifier. It lists the articles within the issue
and is created automatically.
If a new journal is added to SuperJournal
- A new "dummy" entry must be made within the relevant cluster in the
SuperJournal contents file: /superj1/Journals/SuperJournal.toc:
<sjjnl>
<jnlinfo>
<jid>JournalIdentifier</jid>
<pubdir>PublisherDirectory</pubdir>
<jtl>JournalTitle</jtl>
</jnlinfo>
</sjjnl>
- Other information about this journal within this contents file will be added
automatically when the first issue becomes available.
- The journal name, publisher, etc. must be added to the lists of journal information in
the relevant code header file of the SuperJournal SGML Generation program (see [SJMC140]).
14.1 Medline Article Links
At the time when a new Molecular Genetics and Proteins journal issue is
processed and loaded into SuperJournal the Medline identifiers for the articles are often
not available. It is necessary to ascertain these Medline identifiers and add them to the
article SuperJournal SGML article headers at a later time. The best strategy is to process
a batch of these unrecorded Medline identifiers at regular intervals. A note is made on
the SuperJournal Data Spreadsheet for journal issues whose article Medline identifers are
not yet known. For some journal issues, several attempts are necessary before Medline has
become aware of them.
14.2 Back References
During the generation of intra-SuperJournal links a list of unresolved "back
references" is automatically created. This is in a directory:
- /superj1/Journals/BackRefs
For each journal where references could not be found there is a file whose name is the
SuperJournal Journal Identifier. There is a two line entry in this file for each
unresolved reference to the particular journal:
- Full path name of journal issue where reference is made: /superj1/Journals/<Publisher>/<JournalIdentifier>/V<v>I<i>
- Referenced article: <JournalIdentifier> <Year> <Volume>
<Page>
At regular intervals, this list of "back references" should be inspected and
the references resolved where possible, the determined references being removed from the
list. This list of "back references" was introduced to cover a potential problem
of late-loading of some journals. In reality, most unresolved references are caused by
invalid references, usually incorrect year, volume or page.
This web site is maintained by epub@manchester.ac.uk
Last modified: July 07, 1999