[SJ Logo]SuperJournal Usage Statistics Generation:  User Guide

Home | Search | Demo | News | Feedback | Members Only


Ann Apps, Manchester Computing, University of Manchester

SuperJournal Technical Report SJMC262

Contents:
1.  Overall Approach
2.  Directory per Month
3.  Pre-Process Log Files from the Main SuperJournal Application
4.  Copy Information from Last Month
5.  Copy Other Log Files for this Month
6.  Pre-Process NetAnswer Log Files
7.  Pre-Process RetrievalWare Log Files
8.  Generate Usage Statistics
9.  Check for Errors
10.  Make Statistics Available within Project
11.  SuperJournal Web Site "Members Only" Pages

1.  Overall Approach

SuperJournal usage statistics are taken from logging within the SuperJournal application. These statistics are fed into the SuperJournal Evaluation Research undertaken by the HUSAT Research Institute at the University of Loughborough. They are also displayed in the "members only" section of the SuperJournal web site. The statistics are processed on a monthly basis, covering February 1997, when the SuperJournal application was launched, through November 1998.

This document describes, in a "User Guide" format, the monthly processing of the log files produced by the various parts of the SuperJournal application to generate the web pages which display usage statistics in tabular form and the SPSS data file of SuperJournal usage from which further statistics reports may be produced.

The processing of the various SuperJournal log files to generate the usage statistics in both HTML table and SPSS format is described in detail in [SJMC260]. The specifications of the log files, the generated web pages and the SPSS file are given in [SJMC261].

2.  Directory per Month

Usage statistics are maintained in /superj4/Logs within a directory per month. Each month a new directory should be created named: Mmmyy (e.g. Dec97):

Within this directory create two directories: Logs and Htbs:

3.  Pre-Process Log Files from the Main SuperJournal Application

The main SuperJournal application, which is controlled by the ODB-II database, logs user interactions to dated log files in /superj6/NewApp/logs. (A previous version of the application logged to /superj2/GODB/logs.)

3.1 Copy This Month's SuperJournal Application Log Files

Copy the month's main SuperJournal application log files into the Logs directory:

(e.g. *98.8.* for August 1998 log files).

3.2 Remove Invalid Entries

All invalid entries within the log files, i.e. those which contain the keyword "Invalid", are removed from the log files and compiled into a list within a file invallog, by running a script rminval within the Logs directory:

3.3 Manual Correction

4.  Copy Information from Last Month

4.1 Copy User Register

The User Register as updated by generation of the previous month's statistics is required as input to this month's log file processing:

where Lmmyy is the directory containing the previous month's statistics.

4.2 Copy Cumulative Journal Usage Tables

The previous month's journal usage tables are preserved, in the Htbs directory for appending to the current month's tables when they are created. These files have a .htb file extension.

5.  Copy Other Log Files for this Month

The log files generated by the other parts of the SuperJournal application, such as the search engines, is copied to the Logs directory.

5.1 Isite

The Isite search engine writes to dated log files in /superj1/Isite/logs/NEWAPPDB. (A previous version of the application logged to /superj1/Isite/logs/GODB.) Note that the Isite log files are distinguishable from the main application log files by an extra `.' in the name. For example, for 1st January 1998 the main application log file will be "logfile98.1.1" whereas the Isite log file will be "logfile.98.1.1".

(e.g. *.98.8.* for August 1998 log files)

5.2 NetAnswer

NetAnswer logs to a file called log.cfg in a designated directory for each BRS database, i.e. for each cluster. These directories are:

Cluster

Database

Log Directory

CCS SJG0 /superj2/BRSdbs/Logs/sjg0
MGP SJG1 /superj2/BRSdbs/Logs/sjg1
PS SJG2 /superj4/BRSdbs/Logs/sjg2
MC SJG3 /superj3/BRSdbs/Logs/sjg3

Move this month's part of log.cfg to `logmmyy' for each BRS database, e.g. log0898 for August 1998. It is advisable to do this as early in the new month as possible, preferably before any logging for the new month has been written. Then copy the month's log files into the Logs directory:

5.3 RetrievalWare

Processing RetrievalWare files is described in Section 7.

5.4 Multimedia and MiniContents

Logging of multimedia accesses is to a file /superj2/MMlogs/mmlog. Logging of many "low-level browsing" actions, such as MiniContents browsing and following reference links within SuperJournal, is to the file /super2/MMlogs/minclog.

Move this month's part of mmlog to `mmlogmmyy', and minclog to `minclogmmyy', e.g. minclog0898 for August 1998. It is advisable to do this as early in the new month as possible, preferably before any logging for the new month has been written. Then copy the month's log files into the Logs directory:

At the same time as "cutting off" these files for the month, check and then delete, after dealing with any problems, the application error log files for "Abstract Display" and "References Display": sjabs.log and sjref.log.

6.  Pre-Process NetAnswer Log Files

The NetAnswer log files require some manual editing before they are in a suitable format for input to the log file processing program.

6.1 Queries

s1 ANY
s2 TITLE
s3 KEYWORD
s4 ABSTRACT
s5 AUTHOR
s6 ADDRESS
s7 JOURNAL
s8 PUBLISHER

6.2 Download Tagged Bibliographic

Check for where a user has performed a "Download Search Results in a Tagged Bibliographic Format" and correct the BRS function number from `2' to `8'. These actions may be identified in the log file as a duplicate search where:

See [SJMC261] for a definition of the NetAnswer log file entries.

6.3 Abstracts: Determine SuperJournal Identifier

For log entries which indicate "View Header/Abstract", i.e. BRS function number is `1', the logged BRS document number must be manually edited to become the SuperJournal Article Identifier (SJAID) which is defined as <jnl>V<vol>I<ino>A<ano>. To determine the SJAID from the BRS Document Number:

6.4 Full Article File Paths

The majority of the full article retrieval log entries in the NetAnswer log files do not need any correction. The only ones which may require manual editing are:

7.  Pre-Process RetrievalWare Log Files

RetrievalWare logs to dated log files, one per day of use, in the directory /superj2/Excalibur6.5GO/rware/demos/logs. The RetrievalWare log files are converted into a format consistent with the main SuperJournal log files before processing by the main usage statistics generation program. Additionally, RetrievalWare document numbers are replaced by SuperJournal Article Identifiers for logged accesses to abstracts following a search. They are replaced by the file path for full article accesses. The RetrievalWare log file format, and the "SuperJournal RetrievalWare" format are specified in [SJMC261].

7.1 RetrievalWare Log Files Conversion

Copy the month's RetrievalWare log files for this month to an Rware directory within the directory for the month, and generate a single "SuperJournal RetrievalWare" format log file, rwlog. Note that the date in the log file name is always one day after the contained information. Within the directory /superj4/Logs/Mmmyy:

This last action creates the log file rwlog and also a list of accessed Document Numbers by cluster and file type (i.e. PDF, HTML, ABStract).

7.2 Determine SuperJournal Identifier

In order to determine the SuperJournal Article Identifier which corresponds to a particular RetrievalWare Document Number, it is necessary first to generate up-to-date catalogues from the RetrievalWare databases (libraries). The RetrievalWare libraries are in /superj2/Excalibur6.5GO/rware/demos/indexes. For each cluster there is both an abstract and a full article library:

Cluster

Abstract Library

Article Library

Communication and Cultural Studies ccs_abstracts_lib ccs_pdf_lib
Molecular Genetics and Proteins mgp_abstracts_lib mgp_pdf_html_lib
Political Science ps_abstracts_lib ps_pdf_lib
Materials Chemistry mc_abstracts_lib mc_pdf_html_lib

7.2.1 RetrievalWare Document Catalogues

Create up-to-date RetrievalWare Document Catalogues in /superj4/Logs/RWdocids. Suggested filenames for each catalogue are:

Cluster

Abstracts

Articles

CCS ccsa ccsp
MGP mgpa mgpp
PS psa psp
MC mca mcp

These files are created by appending several sub-catalogue files. Each of these files is created by calling rwhls for the appropriate library and then performing a search to extract document catalogues from the library. The strategy is to perform sufficient searches to extract all possible documents. The created document catalogue is not exclusive, it may contain the same document reference several times, but is adequate for the purpose of correlating RetrievalWare document numbers and SuperJournal article identifiers. The utility rwhls was written "in house" using the RetrievalWare API and is an abbreviation for: `/superj2/Excalibur6.5GO/rware/src/query/DOC_IDS_IN_RWLIBS/hlsquery /superj2/Excalibur6.5GO/rware/demos/config/rware.cfg'

7.2.1.1 Abstract Document Catalogue

To create an abstract document catalogue for the CCS cluster:

Abstract document catalogues are created in a similar way for the other clusters, the only exception being in Materials Chemistry where the fielded query for "Colloid and Polymer Science" should be performed with Colloid in the journal field.

7.2.1.2 Article Document Catalogue

To create a full article document catalogue for the CCS cluster:

Article document catalogues are created in a similar way for the other clusters. Suggested search words to retrieve documents from all journals are:

CCS

MGP

PS

MC

society molecular governance materials
communication protein public polymer
gazette embo political science
material human studies metals
critical chromosoma parliamentary crystals
cultural mammalian review  
screen therapy politics  
sound oncogene journal  
social research conflict  

7.2.2 RetrievalWare Document Number Lookup

Look up in the appropriate document catalogue the RetrievalWare document numbers shown in the rwlog file within abstract and full article access entries. The list given in the file rwdocnos may assist by listing all references and from which RetrievalWare library. Document numbers are found in the catalogue by searching for "#nnn " where "nnn" is the document number. Either the SuperJournal article identifier, for an abstract acccess, or the full article path should be copied to rwlogs to replace the document number. This is a manual editing process.

7.3 RetrievalWare Log File Inclusion

Copy the corrected SuperJournal format RetrievalWare log file to this month's Logs directory:

8.  Generate Usage Statistics

The SuperJournal usage statistics for a month Mmmyy are generated using the main log file processing program dologs (see [SJMC260]). Any syntax errors in the log files will stop the log file processing, so must be corrected and dologs re-run. When processing is successfully completed, the HTML pages generated are made "world readable" to allow for their inclusion on the SuperJournal web site.

8.1 Machine Location Codes

The log file processing program creates a file for each library, "lipsyymm.ips" (where l is the library code letter), which lists user machine and IP addresses unrecognised by the program. Where deduction of the machine's location is possible knowledge of these IP addresses and corresponding "place" codes is given to the program by including them in the source header file libmachine.hxx before recompiling the program (see [SJMC260] for the program documentation). This is a manual editing process, also involving "educated" guesswork. Re-running dologs after this recompilation will include the deduced "place" codes in the generated usage statistics.

8.2 Registrations Pre-February 1997

Amongst the usage statistics web pages generated by the log file processing program are lists per library of new registrations. These pages are available to project staff and possibly librarians only. In every case except February 1997 these pages list new registrations for the month which the statistics cover. But the registrations listed for February 1997, from when the first statistics are available, include details of all registrations prior to that date. Details of users who registered before February are included in the User Register used as input when processing February 1997 statistics. If the February 1997 statistics were to be re-processed for any reason, in order to include these users in the registration pages it is necessary to change a line of code within the log file processing program and re-compile the program. After the February 1997 statistics have been generated the code should be restored to its original form and the program recompiled again.

Within the file LogLib.C, within the procedure out_registrs:

9.  Check for Errors

Check the file `sjlg*.err' for SPSS errors. If this file shows any errors, the problem must be identified and corrected, followed by regeneration of the usage statistics.

10.  Make Statistics Available within Project

The generated HTML files and SPSS data file are made available to project staff. The HTML pages are included in the secure "private" statistics area of the SuperJournal Web site. All generated statistics are made available to project staff on the SuperJournal FTP site. Details of user registrations will be passed on to the librarians as necessary by the Project Office.

10.1 Update Indexes on Statistics Web Site

The indexes to the pages on the secure statistics web site require updating to include links to the newly created HTML pages. The statistics web site including the indexes is in /superj4/sjlogs. There are copies of the indexes in /superj4/sjlogs/indxs. It is recommended that these copies are edited and then copied to the web site. The indexes list the monthly usage statistics in reverse chronological order. The index pages which require updating are:

10.2 Link to New Web Pages

In order to make the generated HTML pages available on the statistics web site, symbolic links are made from the web site directory /superj4/sjlogs to the new HTML files. Where new pages supersede previous ones the previous ones should be removed from the statistics web site. These are the cumulative files for journal use and user lookup.

Scripts for creating these symbolic links are in /superj4/sjlogs/indxs. There are also scripts for creating similar symbolic links to make the new usage statistics pages available within the "members only" section of the SuperJournal web site. It is sensible to create these scripts at this point, even though the "members only" web site will be updated later (see Section 11).

The scripts contain lists of symbolic link creation commands:

For example:

These scripts are:

To create the symbolic links, these scripts are copied to the statistics web site directory and run after the previous month's cumulative files have been removed:

10.3 SuperJournal FTP Site

The new usage statistics HTML pages and SPSS portable file are made available to project staff within the SuperJournal FTP site (/superj1/Publishers) within the directory SuperJournal. The HTML pages will be in a new directory for the month (Mmmyy), the SPSS portable file will be in the existing directory SPSS. These files and directories must be made world readable to allow downloading via FTP.

These files should be removed from the FTP site when all project staff have downloaded the files they require.

10.4 Notification

Project staff should be notified when new usage statistics are available. Email notification should be sent to:

11.  SuperJournal Web Site "Members Only" Pages

SuperJournal usage statistics, excluding the "private" pages which identify users or journals, are made available to project members by their inclusion in the "members only" section of the SuperJournal web site. They are accessible via Members Only Project Results, beneath a list entitled Usage Statistics List.

The indexes to the usage statistics pages are provide by the Project Office. When these indexes are available they are copied to the "internal" part of the SuperJournal web site, after first amending links at the head of the pages to other web site pages to operate successfully from the "internal" section, using the utility dowebint. Symbolic links from the "internal" section of the web site to the new HTML files are generated using the scripts already created as described in Section 10.2. To create the symbolic links, these scripts must be copied to the "internal" web site directory and run after the previous month's cumulative files have been removed. The "internal" web site directory is /superj4/sjinternal with a symbolic link from ~supjinfo/WWW/sjinternal.


This web site is maintained by epub@manchester.ac.uk
Last modified: July 06, 1999