[SJ Logo]SuperJournal Log File Specification

Home | Search | Demo | News | Feedback | Members Only


Ann Apps, Manchester Computing, University of Manchester

SuperJournal Technical Report SJMC261

Contents:
1.  Overall Approach
2.  Definitions
3.  Log Files Specification
4.  Usage Statistics Reports
5.  SPSS Specification

1.  Overall Approach

SuperJournal usage statistics are taken from logging within the SuperJournal application. These statistics are fed into the SuperJournal Evaluation Research undertaken by the HUSAT Research Institute at the University of Loughborough. They are also displayed in the "members only" section of the SuperJournal Web site. The statistics are processed on a monthly basis, covering February 1997, when the SuperJournal application was launched, through November 1998.

This document specifies the log files produced by the various parts of the SuperJournal application (in Section 3), the Web pages generated each month which display usage statistics in tabular form (in Section 4), and the SPSS data file of SuperJournal usage from which further statistics reports may be produced (in Section 5). This specification was originally defined from requirements contained in "The SuperJournal Evaluation Plan" prepared by HUSAT (August 1996), but has since been refined, both to incorporate new features within SuperJournal, and to include usage report requirements from other SuperJournal stakeholders.

The processing of the various SuperJournal log files to generate the usage statistics in both HTML table and SPSS format is described in [SJMC260] and in a "User Guide" form in [SJMC262].

2.  Definitions

2.1 Browsing and Searching

Users find articles within SuperJournal either by searching with one of the Search Engines or by browsing through the hierarchical journal structure. All logged events have a search type set in the generated statistics, "browse" being regarded as a search type. Events which are not concerned with article discovery have a search type set to "Not Applicable". These are: Register; Login; View/Change Preferences; Access/Send Feedback; Email Alert. Note that access to "Help" is treated as a special case, with access to general "help" having a search type of "browse" and access to the "help" page for a search engine having that search type.

2.1.1 Searching

Article discovery by searching employs one of the SuperJournal search engines: Isite; NetAnswer; RetrievalWare; Author Index (via Isite); Keyword Index (via Isite); Personal Alert (via Isite). The search type is set appropriately in the generated statistics. It is possible for a user to access articles within SuperJournal without performing any "browse" action, by selecting "Search" immediately after Login.

2.1.2 Browsing

Simplistically, "Browsing" is article discovery which does not utilise one of the SuperJournal search engines. A user may find an article of interest by clicking on hypertext links through the journal hierarchy of: cluster list; journal list; issue list; issue table of contents; abstract or full article. At all levels there are options to move up, down, or sideways to next or previous, or the user may employ the Web browser's "Back" button. Within the SuperJournal application there are other hypertext links whose activation are recorded as browsing, for example a link to the journal issue Table of Contents on the Isite search results page. Browses will also be recorded when a user returns from a non-search option such as "Preferences" or "Feedback" if this feature was accessed from a "browse" screen. In all these cases, events in the generated statistics have a search type set to "Browse".

But there are also methods of article, and sub-article, discovery, again by following hyperlinks, which may be employed from the abstract or article display screens. These methods, and some orthogonal methods of article discovery, which could be regarded as "low level browsing" are described below.

2.1.2.1 Low Level Browsing

Low level browsing methods, and their appropriate search types in the generated statistics are listed. Note that a few of these methods have a search type of "browse" but the majority do not. Logging of these interactions is defined in more detail later in this document. Within the reports produced by HUSAT during the evaluation research most of this low level browsing will probably be recorded as use of "special features".

2.2 View Abstract

Within the SuperJournal application, whether browsing or searching, the user is generally given the option of either "View Abstract" or "View Article". The second of these options provides the user with the full text of the article in either PDF or HTML, depending on the particular journal's data supply format. "View Abstract" provides the user with a page of article header information, including the title, authors, and abstract (if available). Pedantically, these links should have been named "View Article Header", but would probably have been less easily understood by the end-user. All articles have this header information, and all articles within SuperJournal have an abstract (this was a requirement of the implementation of the SuperJournal application), but where an abstract was not supplied by a publisher, the wording of this abstract will be "Abstract unavailable". Potentially this lack of "real" abstracts for some articles may have affected end-user behaviour, and encouraged them always to go straight to the full article. Within this document wherever "View Abstract" is logged, the action logged is actually "View Article Header".

3.  Log Files Specification

Below are the specifications of the log file entries for all recorded events. In each case the specification is followed by an example. Keywords within the log file are shown in Italics on the specification line. In general, user events within SuperJournal are recorded in a consistent format of the form:

Date Time Machine name IP address Email Id Interaction type Further Information

Separate parts of the SuperJournal application log to separate files. The information in these files is merged during the log file processing which generates the various reports. In some cases, as indicated, some of this information is unavailable at the time of logging. The search engines NetAnswer and RetrievalWare produce log files according to their own defined format. Missing information is deduced during the processing of the logfiles. In particular some log file entries do not contain the user's email identifier, because this information is not available to the application performing the logging. This includes NetAnswer log file entries, and the "minicontents" (see Section 3.8, etc.) and "multimedia" (see Section 3.7) log file entries. In these cases, the user who has performed the interaction is deduced from the logged time and the logged IP address by the main log file processing program, as described in [SJMC260].

The values of "Event Type" and "Search Type" SPSS variables in the SPSS data file generated during the log file processing are indicated along with each event log file specification.

3.1 Main SuperJournal Application

The main SuperJournal application is controlled by the ODB-II database. It logs to dated log files, one per day, in a "logs" directory. The first five fields of each entry will always be the same (except in cases where the Email Id is not yet known), and so these fields are omitted from the individual specifications below. For example:

Date Time Machine name IP address Email Id
96.12.10 11.55.23 aa.mcc.ac.uk 130.88.201.22 ann.apps@mcc.ac.uk

The "Email Id" provides a unique identifier for each SuperJournal user.

3.1.1 Registration

3.1.1.1 Unsuccessful Registration - Invalid Library Name

unknown Registration "Invalid Library Name" "Library Name" "Password" "User's Name"
unknown Registration "Invalid Library Name" "bham" "lbypasswd" "Joe Bloggs"

3.1.1.2 Unsuccessful Registration - Invalid Library Password

unknown Registration "Invalid Library Password" "Library Name" "Password" "User's Name"
unknown Registration "Invalid Library Password" "Birmingham" "lbypasswd" "Joe Bloggs"

3.1.1.3 Unsuccessful Registration - Invalid Email

This occurs when a user provides an Email Identifier which cannot be a valid email address, i.e. it contains white space, or it doesn't contain an "@". The invalid email address will be in the "Email Id" field of the log entry.

unknown Registration "Invalid Email" "Library Name" "Password" "User's Name"
unknown Registration "Invalid Email" "Manchester" "lbypasswd" "Joe Bloggs"

3.1.1.4 Unsuccessful Registration - Invalid Domain

This occurs when a user's machine domain does not match the library.

unknown Registration "Invalid Domain" "Library Name" "Password" "Domain" "User's Name"
unknown Registration "Invalid Domain" "Manchester" "lbypwd" "ab.co.uk" "Joe Bloggs"

3.1.1.5 Successful registration

Registration Library "Name" "Status" "Address"
Registration Manchester "Ann Apps" "researcher" "MC, 0161 275 6039"

User academic status, which is selected by the user from a supplied list, may be:

Academic lecturer
Researcher researcher
Postgraduate student postg
Undergraduate student underg
Librarian librarian
Other User supplied

In some cases, where thought appropriate, an "other" status may be manually edited to one of the specific types before the log files are processed to generate the usage statistics. Some users determinedly type in their job title. Any "computer staff" are included with librarians. Visiting research staff become "researchers".

Note that the content of some of the registration fields (Name, Status, Address) may be an empty string because the user may have chosen not to fill in the details

Event Type Search Type
Register Not applicable

3.1.2 Login

3.1.2.1 Unsuccessful Login – Invalid User Name

Login "Invalid User Name"
Login "Invalid User Name"

3.1.2.2 Unsuccessful Login – Invalid Password

Login "Invalid Password" Invalid Password
Login "Invalid Password" anne

3.1.2.3 Successful Login

Login "Software Viewer"
Login "Mozilla_2.01 (Win16; I)"
Event Type Search Type
Login Not applicable

3.1.2.4 Successful Login Direct to Journal Screen Using ISSN

LoginEx "ISSN" "Software Viewer"
LoginEx "0957-9265" "Mozilla_2.01 (Win16; I)"

This is shown in the generated statistics as a Login, but with the journal and cluster name set. The following action in the log will be "View Journal" for the same journal. This feature was introduced into the SuperJournal application in July 1998, but it was unused before the end of the project, so there are no instances of its use in the log files.

Event Type Search Type Cluster Journal
Login Not applicable Cluster code Journal code

3.1.3 Browsing

Where appropriate, logging of a user's browsing actions indicates whether an article viewed is from a current or a back issue. The SuperJournal application regards the "current" issue of a journal as being the loaded issue with the most recent cover date. Journal issue and article identifiers in the log files are:

3.1.3.1 SuperJournal Clusters Page

This event occurs when the user views the list of journal clusters within SuperJournal.

View "SuperJournal Clusters"
View "SuperJournal Clusters"
Event Type Search Type
ViewSJC Browse

3.1.3.2 Cluster Screen

This event occurs when the user views the list of journals within a cluster.

View "Cluster"
View "Communication and Cultural Studies"
Event Type Search Type
ViewCluster Browse

3.1.3.3 Journal Screen

This event occurs when the user views the list of issues available for a particular journal.

View "Journal"
View "Cultural Critique"
Event Type Search Type
ViewJournal Browse

3.1.3.4 Issue (Table of Contents) Screen

This event occurs when the user views the table of contents of a particular journal issue.

View "Issue Id" Current | Back
View "EJCV11I3" Current
View "EJCV11I1" Back
Event Type Search Type
ViewIssue Browse

3.1.3.5 View Abstract

Abstract "SJAID" Current | Back
Abstract "EJCV11I3A2" Current
Abstract "EJCV11I1A2" Back
Event Type Search Type
ViewAbstract Browse

3.1.3.6 View Full Text

The file format of the full article may be deduced from the file extension.

Article "File path name" Current | Back
Article "Sage/EJC/V11I3/art1.pdf" Current
Article "Sage/EJC/V11I1/art1.pdf" Back
Event Type Search Type
ViewArticle Browse

3.2 Isite

The Isite search engine logs to dated log files, one per day of use, in an "Isite logs" directory. The format of the log files is the same as that for the main SuperJournal application, the Isite search engine being sufficiently integrated for user information to be available. It is not possible to deduce whether an article viewed after a search is from a "current" or a "back" issue.

3.2.1 Search

Search Isite Database "Query" No. hits Retrieval Time (secs)
Search Isite NEWAPPDB "abcd:1" 0 1.0
Search Isite NEWAPPDB "television:1" 11 4.0

Field names within the search query are included as e.g. "ABSTRACT/" or "TITLE/". If no field name is included the query was in "ANY/" field. The number of field names included in the search query indicates across how many fields the search was made.

Weightings are included within the search query following the colon, e.g. ":3"

The cluster searched, which is either "all" clusters or one specific cluster, is deduced by the log file processing program from the Isite Database name (note that the Isite database names were changed during the course of the project when the SuperJournal application was updated):

Isite Database

Old Name

Journal Cluster

NEWAPPDB GODB All
CCSNEWAPPDB CCSGODB Communication and Cultural Studies
MGPNEWAPPDB MGPGODB Molecular Genetics and Proteins
PSNEWAPPDB PSGODB Political Science
MCNEWAPPDB MCGODB Materials Chemistry
Event Type Search Type
Query Isite

3.2.2 View Abstract after Isite Search

Isite Database ABSTRACT SJAID
Isite NEWAPPDB ABSTRACT EJCV11I3A2
Event Type Search Type
ViewAbstract Isite

3.2.3 View Full Text after Isite Search

Isite Article File path name
Isite Article Sage/EJC/V11I3/art1.pdf
Event Type Search Type
ViewArticle Isite

3.2.4 View Table of Contents after Isite Search

This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.

View "Issue Id" Current | Back
View "EJCV11I3" Current
View "EJCV11I1" Back
Event Type Search Type
ViewIssue Browse

3.3 Index Lists – Authors

Searching via "index lists" is implemented by an initial listing of the authors or keywords by an ODB-II search, followed by a specific search via Isite. Thus the logging is distributed across the main application (ODB-II) log file and the Isite log file.

3.3.1 Search

This entry appears in the ODB-II log file.

Index Authors Retrieval Time (secs)
Index Authors 5
Event Type Search Type
IndexAuthors AuthorIndex

3.3.2 Search for Author's Articles

This entry appears in the Isite log file.

Search Author Database "Author Query" No. hits Retrieval Time (secs)
Search Author GODB "Joe Bloggs" 4 2.0
Event Type Search Type
Query AuthorIndex

3.3.3 View Abstract after Author Index Search

This entry appears in the Isite log file.

Author Database ABSTRACT SJAID
Author GODB ABSTRACT EJCV11I3A2
Event Type Search Type
ViewAbstract AuthorIndex

3.3.4 View Full Text after Author Index Search

This entry appears in the Isite log file.

Author Article File path name
Author Article Sage/EJC/V11I3/art1.pdf
Event Type Search Type
ViewArticle AuthorIndex

3.3.5 View Table of Contents after Author Index Search

This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.

View "Issue Id" Current | Back
View "EJCV11I3" Current
View "EJCV11I1" Back
Event Type Search Type
ViewIssue Browse

3.4 Index Lists – Keywords

Logging is similar to that for Author Index Lists (above) with logging distributed across the ODB-II and the Isite log files.

3.4.1 Search

This entry appears in the ODB-II log file.

Index Keywords Retrieval Time (secs)
Index Keywords 5
Event Type Search Type
IndexKeywds KeywordsIndex

3.4.2 Search for Keyword's Articles

This entry appears in the Isite log file.

Search Keywords Database "Keyword Query" No. hits Retrieval Time (secs)
Search Keywords GODB "television" 11 5.0
Event Type Search Type
Query KeywordsIndex

3.4.3 View Abstract after Keyword Index Search

This entry appears in the Isite log file.

Keyword Database ABSTRACT SJAID
Keyword GODB ABSTRACT EJCV11I3A2
Event Type Search Type
ViewAbstract KeywordsIndex

3.4.4 View Full Text after Keyword Index Search

This entry appears in the Isite log file.

Keyword Article File path name
Keyword Article Sage/EJC/V11I3/art1.pdf
Event Type Search Type
ViewArticle KeywordsIndex

3.4.5 View Table of Contents after Keyword Index Search

This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.

View "Issue Id" Current | Back
View "EJCV11I3" Current
View "EJCV11I1" Back
Event Type Search Type
ViewIssue Browse

3.5 NetAnswer

NetAnswer writes to its own log file using its own format in which fields are separated by commas. A single, continuously updated, log file (log.cfg) is generated in the relevant "logs" directory for each NetAnswer database, i.e. for each journals cluster. This file is recorded, and restarted, on a monthly basis. Each entry consists of the following fields:

Log Entry Field

Example

"HTTP_daemon" ""
"Remote Machine Name" "aa.mcc.ac.uk"
"Remote IP Address" "130.88.201.22"
"Server Address" "midas.ac.uk"
"Port Number" "80"
"Request Start Time (YYYMMDDhhmmss)" "19961204100652"
"Request End Time" "19961204100653"
"BRS User Id" "anonymous"
"BRS database" "SJG0"
Function (see below) 2
Error Code 0
No. bytes 7169
BRS Document Accession Number 264
Low End TOC 1
High End TOC 15
No. Documents Retrieved 15
"Query" "s1=&s2=television&...."

Where the values of "Function" may be:

1 BRS Full Document Display (i.e. Header and Abstract)
2 Table of Contents (i.e. Search Result)
4 Help Page Request
5 Help Page Request
8 Download in tagged bibliographic format
9 Full Article Display from header

When "Function" is 9, "Query" gives the full path name of the article file, and the in-between fields are set to zero.

Note that functions 8 and 9 redefine the NetAnswer functions, but it seemed unlikely that any conflict would occur. NetAnswer definitions of these functions are:

The following examples of NetAnswer log entries shows only the last 9 fields of each. The log file entries created by NetAnswer are manually edited during log file pre-processing to remove extraneous material, make them consistent with other log files, and make the search fields more readable (see [SJMC262]). In each example below, the entry as logged by NetAnswer is shown first, followed by the same entry after pre-processing.

3.5.1.1 Search

The first example shows no hits, the second shows 15.

"SJG0" 2 0 2038 0 0 0 0 "s1=abcd&s2=&..."
"SJG0" 2 0 7169 0 1 15 15 "s1=&s2=television&..."

The same examples after pre-processing:

"SJG0" 2 0 2038 0 0 0 0 "ANY=abcd"
"SJG0" 2 0 7169 0 1 15 15 "TITLE=television"

The final field which is the search query may contain any of, once only, in this order: ANY, TITLE, KEYWORD, ABSTRACT, AUTHOR, ADDRESS, JOURNAL, PUBLISHER. The number of these query field names included in the search query indicates across how many fields the search was made.

The specific cluster searched is deduced by the log file processing program from the BRS Database name:

BRS Database

Journal Cluster

SJG0 Communication and Cultural Studies
SJG1 Molecular Genetics and Proteins
SJG2 Political Science
SJG3 Materials Chemistry
Event Type Search Type
Query NetAnswer

3.5.1.2 View Abstract after NetAnswer Search

On "View Abstract", NetAnswer records the BRS document number for the abstract. This example shows a "View Abstract" entry for BRS document number 264 in BRS database SJG0.

"SJG0" 1 0 7169 264 1 15 15 "s1=&s2=television&..."

During pre-processing, the SuperJournal identifier of the article whose abstract has been viewed is ascertained, and edited into the log file. This is a manual operation, described in [SJMC262]. Current/back issue information is not recorded. The same example after pre-processing is:

"SJG0" 1 0 7169 "EJCV11I4A4" 1 15 15 "TITLE=television"
Event Type Search Type
ViewAbstract NetAnswer

3.5.1.3 View Full Article after NetAnswer Search

"View Full Article" following a NetAnswer search is logged via a SuperJournal cgi-script, the call to which is included in the URL for the full article shown on the NetAnswer "View Abstract" screen. It records the file path for the article viewed before displaying the article to the end-user. Current/back issue information is not recorded.

"SJG0" 9 0 0 0 0 0 1 "//sj/BRSpdf/Sage/EJC/V11I4/art4stat.pdf"
Event Type Search Type
ViewArticle NetAnswer

3.5.1.4 View NetAnswer Help

This is the logging generated when a user accesses "Help" from a NetAnswer screen. Logging of other "Help" pages is described in Section 3.10.5. (Note that occasionally NetAnswer logs "View Help" with function number 5 rather than 4.)

"SJG0" 4 0 7169 0 0 0 0 "TITLE=television"
Event Type Search Type
ViewHelp NetAnswer

3.5.1.5 Download Search Results in Tagged Bibliographic Format

SuperJournal allows a user to download the NetAnswer search results, or a selection of them, in bibliographic format. This is logged by NetAnswer as another "search".

"SJG0" 2 0 7169 0 1 15 15 "s1=&s2=television&..."

During pre-processing (see [SJMC262]), these entries are identified, by eye, and the function number changed to 8 so that later processing will record these entries correctly. The same example after pre-processing is:

"SJG0" 8 0 7169 0 1 15 15 "TITLE=television"

This user interaction will be noted in the generated statistics as a "Query" with a search type of "Tagged Download".

Event Type Search Type
Query TaggedDownload

3.6 RetrievalWare

The Excalibur RetrievalWare search engine logs to dated log files, one per day of use, in a "RetrievalWare logs" directory, using its own format. Note that the date within the log file name is always one day after the contained logging information. The RetrievalWare log files are converted by a program, described in [SJMC260] and [SJMC262], into a format consistent with the main SuperJournal log files, in a single file for each month. It is not possible to deduce whether an article viewed after a search is from a "current" or a "back" issue. RetrievalWare does not record the Machine and IP address of the user. This information will be filled in during the main log file processing, but in these log files they are shown as "aa.aa.aa" and "0.0.0" respectively.

Note that logging of RetrievalWare use was included from May 1998 onwards. The versions of RetrievalWare installed previously did not log events satisfactorily, and it was not possible to relate users to log file entries.

3.6.1 Search

The SuperJournal format for a RetrievalWare logged search is:

Date Time Machine Name IP Address Email Id
98.07.27 11:01:49 aa.aa.aa 0.0.0 ANN.APPS@MCC.AC.UK

followed by:

Search QryType Cluster "Query" No. hits Retrieval Time (secs)
Search SQRY CS "abcd" 0 0.0
Search SQRY CS "television" 11 3.138

The "QryType" field may contain, and this will be used to set the SPSS variable "RWare Type":

SQRY Smart Query
RSQRY Recurrent Smart Query
GEQRY Get Expert Query
EQRY Expert Query
BQRY Boolean Query
RBQRY Recurrent Boolean Query
QEQRY Query By Example

To allow for cross-cluster searching, the "Cluster" field may contain any of: CS, GP, PS, MC.

Event Type Search Type
Query RetrievalWare

3.6.1.1 RetrievalWare Format

The original RetrievalWare logging of the second of these two queries would be:

R: 11
U: "ANN.APPS@MCC.AC.UK"
T: 07/27/98 11:01:49.839
A: SQRY Started
Q: television
I: 129
R: 12
U: "ANN.APPS@MCC.AC.UK"
T: 07/27/98 11:01:49.839 -- 11:01:56.769 (6.930)
A: SQRY
Q: television
L: "ccs_abstracts_lib" (DOCS=493, QW=1, T=3.138)
I: 129
C: 493/500

3.6.2 View Abstract after RetrievalWare Search

GHIT Cluster Abstract "SJAID"
GHIT CS Abstract "EJCV11I3A2"
Event Type Search Type
ViewAbstract RetrievalWare

3.6.2.1 RetrievalWare Format

The original RetrievalWare logging of this "View Abstract" would be:

R: 13
U: "ANN.APPS@MCC.AC.UK"
T: 07/27/98 11:02:11.948 -- 11:02:16.726 (4.778)
A: GHIT
L: "ccs_abstracts_lib"
I: 129
D: 716 (0+16384)

There are several similar entries in the RetrievalWare log file for this one "GHIT" (Get Hits), the differences being in the times and the figures in parentheses in the "D" field. These entries are condensed into one entry in the SuperJournal format RetrievalWare log file.

The RetrievalWare document number (716, in the "D" field, in the example) is converted into the corresponding SuperJournal article identifier (SJAID) by a manual look-up described in [SJMC262].

3.6.3 View Full Text after RetrievalWare Search

GHIT Cluster Article "File path name"
GHIT CS Article "/superj1/Journals/Sage/EJC/V11I4/art4stat.pdf"

The RetrievalWare format for this log entry is similar to that for the "View Abstract" entry above, except that the RetrievalWare "library" (the "L" field) searched will be "ccs_pdf_lib".

Event Type Search Type
ViewArticle RetrievalWare

3.7 Multimedia

The logging of multimedia accesses from full article PDF files is in a separate multimedia log file, logged by a SuperJournal cgi-script when the multimedia item is accessed. This information is merged with that in the other log files during processing. The specification of the log file entry is similar to the ODB-II log file entries except that the user's email address is omitted because this information is not known.

Multimedia "File path name"
Multimedia "//sj/BRSpdf/Sage/EJC/V11I3/art1.xxx"

In fact, although multimedia accesses were logged, and could potentially be processed by the main log file processing program, multimedia access statistics are not produced. This was decided because there are so few multimedia items within SuperJournal that their usage was insignificant, and identifying access to them would identify a particular journal.

Event Type Search Type
ViewMultimedia Browse

3.8 Additional Functionality within HTML Articles

The logging of accesses from HTML articles and "Mini-Contents" is in a separate "minicontents" log file, logged by SuperJournal cgi-scripts when the article, etc. is accessed. This information is merged with that in the other log files during processing. The specification of the log file entry is similar to the ODB-II log file entries except that the user's email address is omitted because this information is not known.

3.8.1 Access MiniContents file from HTML Article

HTML MiniContents "File path name"
HTML MiniContents "/sj/BRSpdf/Springer /MG /V7I1/art1.minc.html"
Event Type Search Type
ViewMiniContents FromHTML

3.8.2 Access PDF file from HTML Article

HTML Article "File path name"
HTML Article "/sj/BRSpdf/Springer /MG /V7I1/art1.pdf"
Event Type Search Type
ViewArticle FromHTML

3.8.3 Access Article (HTML or PDF) from MiniContents

MiniContents Article "File path name"
MiniContents Article "/sj/BRSpdf/Springer/MG /V7I1/art1.[html|pdf]"
Event Type Search Type
ViewArticle FromMiniContents

3.8.4 Access next/previous MiniContents from MiniContents

MiniContents MiniContents "File path name"
MiniContents MiniContents "/sj/BRSpdf/Springer/MG/V7I1/art1.minc.html"
Event Type Search Type
ViewMiniContents FromMiniContents

3.8.5 Access Full Size Figure from Thumbnail, in either HTML article or MiniContents file

FullFigure "File path name"
FullFigure "/sj/BRSpdf/Springer/MG /V7I1/art1f1.gif"
Event Type Search Type
ViewFullFig FromThumbNail

3.8.6 Access Medline Abstract

It is possible to deduce from the previous log file entries whether this was selected from: within a "References" window; a PDF article; an HTML article; or an abstract (for the article itself). This deduction is made by program during the main processing of the SuperJournal log files, and the search type set accordingly to: "From References"; "From PDF"; "From HTML"; "From Abstract" .

Medline
Medline
Event Type Search Type
ViewMedline see above

3.9 References (Forward and Backward)

Logging is in the "minicontents" log file. The user's email address is omitted.

3.9.1 View Article's References in a Separate Window

SJBib "File path name"
SJBib "/superj1/Journals/Springer/MG/V7I1/art1.fsj"
Event Type Search Type
ViewReferences Browse

3.9.2 View Abstract via Bibliographic Reference Link from References Window or HTML Article

This logging occurs when a user follows a link to an article (in reality to the header) in SuperJournal from a "References" list. Within the generated statistics it will be noted as "View Abstract" with a search type of "from References".

SJRef "File path name"
SJRef "/superj1/Journals/Springer/MG/V7I1/art1.sj"
Event Type Search Type
ViewAbstract FromReferences

3.9.3 View Article's "Cited By" List in a Separate Window

This event occurs when a user views a list of articles which cite a particular article by clicking on that article's "View Cited By" link.

SJCBList "File path name"
SJCBList "/superj1/Journals/Springer/MG/V7I1/art1.cit"

"Cited By" links, i.e. forward reference chaining, were introduced into the SuperJournal application in July 1998, but this facility was unused by any University library users before the end of the project so there are no instances in the log files.

Event Type Search Type
ViewCitedBy Browse

3.9.4 View Abstract Via a "Cited By" Link

This logging occurs when a user follows a link to a abstract from a "Cited By" list. Within the generated statistics it will be noted as "View Abstract" with a search type of "from CitedBy".

SJCitBy "File path name"
SJCitBy "/superj1/Journals/Springer/MG/V7I1/art1.sj"
Event Type Search Type
ViewAbstract FromCitedBy

3.9.5 View Article Following Internal Reference or "Cited-By" Link

This logging occurs when a user views a full article from an abstract accessed via either a "References" or a "Cited By" link. Within the generated statistics it will be noted as "View Article" with a search type of "from References" or "from CitedBy".

Article "File path name"
Article "/superj1/Journals/Springer/MG/V7I1/art1.[pdf|html]"
Event Type Search Type
ViewArticle FromReferences / FromCitedBy

3.10 Other SuperJournal Functionality

3.10.1 Tagged Bibliographic Format

Logging is in the "minicontents" log file. The user's email address is omitted.

3.10.1.1 Download Abstract in Tagged Bibliographic Format

RefTag "File path name"
RefTag "/superj1/Journals/Springer/MG/V7I1/art1.sj"
Event Type Search Type
ViewAbstract TaggedDownload

3.10.1.2 Email Abstract in Tagged Bibliographic Format

EmailTag "File path name"
EmailTag "/superj1/Journals/Springer/MG/V7I1/art1.sj"
Event Type Search Type
ViewAbstract TaggedEmail

3.10.1.3 Download Article's References in Bibliographic Format

BibTag "File path name"
BibTag "/superj1/Journals/Springer/MG/V7I1/art1.fsj"
Event Type Search Type
ViewReferences TaggedDownload

3.10.2 Reading List

Reading List use is logged in the main ODB-II application log file. Abstracts may be added to, deleted from, viewed from, the reading list. They are referenced within the log file as a list of SJAIDs. During log file processing this list will be split into its constituent items, so that in the generated statistics there will be a separate entry for each "reading list" article.

3.10.2.1 Add to Reading List

HotList "SJAID List" Add
HotList "MGV7I1A1 MGV7I1A6" Add
Event Type Search Type
ViewAbstract AddReadList

3.10.2.2 Delete from Reading List

HotList "SJAID List" Delete
HotList "MGV7I1A1 MGV7I1A6" Delete
Event Type Search Type
ViewAbstract RemoveReadList

3.10.2.3 View Abstract from Reading List

HotList "SJAID List" View
HotList "MGV7I1A1 MGV7I1A6" View
Event Type Search Type
ViewAbstract FromReadList

3.10.3 Preferences

Logging is in the main ODB-II application log file.

3.10.3.1 Display Preferences Screen

View "Preferance Setting" (sic)
View "Preferance Setting"
Event Type Search Type
ViewPref Not applicable

3.10.3.2 Change Preferences

ChangePreference No. fields changed "Changed"
ChangePreference 3 "Home; SrchEng; "

The "Changed" field indicates which Preferences the user has changed. The quoted string may contain any of, in a semi-colon separated list: SrchEng; PrefCluster; StartScreen; Home; TimeOut; Email; Password; PAlert[CCS][PS][MGP][MC]. The cluster abbreviation(s) following "PAlert" indicate for which cluster(s) the user has set an alert. Although a change to "Preferred Cluster" is logged the actual chosen cluster is not logged.

Note that logging of the actual preferences changed was included from July 1998 onwards. Previous logging indicated the number of preferences changed only.

Event Type Search Type
ChangePref Not applicable

3.10.3.3 Change Email

The new Email address will appear in the "Email Id" field. The number of preferences changed includes the Email change. The contents of the "Changed" field is as specified above. The log file processing program will take particular note of this log file entry in order to keep track of the user in later sessions.

ChangePreference Email Old Email No. fields changed "Changed"
ChangePreference Email aa@mcc.ac.uk 1 "Email"
Event Type Search Type
ChangeEmail Not applicable

3.10.4 Feedback

Logging is in the main ODB-II application log file.

3.10.4.1 Access to Feedback Form

FeedbackForm
FeedbackForm
Event Type Search Type
AccessFeedBack Not applicable

3.10.4.2 Feedback Sent

Feedback Sent
Feedback Sent
Event Type Search Type
SendFeedBack Not applicable

3.10.5 Help

Access to the "Help" pages is logged in the main ODB-II application log file. Log entries for access to the main top-level "Help" page include the user's email identifier, but log entries for accesses to lower level "Help" pages do not.

HelpType
Help

HelpType may be one of: Help; HelpIsite; HelpNA; HelpPrefs.

Note that the logging of access to "Help" was included from July 1998 onwards.

Event Type Search Type
ViewHelp Browse
ViewHelp Isite
ViewHelp NetAnswer
ViewHelpPreferences Browse

3.10.6 Alerts

Alerts are sent out by the SuperJournal application to users who have requested them when new data is loaded into the application. The logged machine name and IP address are "cs6400.mcc.ac.uk" and "130.88.203.18". The times are typically in the early hours of the morning which is generally when journal data is loaded. These logged events should be ignored when any statistics of user actions are produced, when calculating session lengths, and when considering user access location.

3.10.6.1 Email Alert

An "Email Alert" is sent out to a user when a new journal issue is loaded if the user has requested an alert for that journal via the "Preferences" within SuperJournal.

EAlert "Issue Id"
EAlert "ONCV17I18"

Note that the logging of Email Alert was included from July 1998 onwards.

Event Type Search Type
EmailAlert Not applicable

3.10.6.2 Personal Alert

A "Personal Alert" is sent out to a user if a newly loaded article contains their "personal alert" search terms. A user may set "personal alert" search terms for each journal cluster via the "Preferences" within SuperJournal. The log entry contains a list of SJAIDs of the new articles which contain the search terms, within a particular cluster. There may potentially be a logged Personal Alert for each journal cluster, each one listing articles from possibly several journals. During log file processing this list will be split into its constituent items, so that in the generated statistics there will be a separate entry for each "alerted" article.

PAlert "Search Terms" "SJAID List"
PAlert "gene*" "ONCV17I18A2 ONCV18I17A4 ..."

Personal Alert was included in the SuperJournal application, with logging enabled, during July 1998. Use by university library users appears in the log files from October 1998 onwards.

Event Type Search Type
PersonalAlert PersonalAlert

3.11 User Register

In order to keep track of users in the log file processing from month to month and to provide some user profile information on a monthly basis, some of the information about each user is preserved in a User Register file. Data in the previous month's version of this file is input to the log file processing program along with the current month's log file entries. At the end of log file processing a new updated version of the User Register for the current month is output.

The User Register file contains a single line entry for each user, which consists of the following fields (the second column showing an example):

SJUser SJUser
Registration Date 1997.01.18
Registration Time 12:31:42
Registration Machine pc56.cam.ac.uk
Registration IP 123.45.678.99
Email Identifier f.bloggs@cam.ac.uk
Name "Fred Bloggs"
Library "cambridge"
Academic status "researcher"
Address "Dept of Biology, Cambridge"
Library code letter C
User number within library 57
Academic status code number 2
Number of sessions in previous month 9
Mean session length last month 3.78444
Last month standard deviation 3.22095

3.11.1 User Identification

In the generated SuperJournal statistics, users are identified, for anonymity, by a user code, composed from the library code and the user number within the library. For example, the user registered in the above example will be known as "C57" in the generated statistics. Real user names and identifiers are included in "private" statistics only. These private pages include information for HUSAT's use, and a user registration information page for each library.

Note that the ODB-II database which controls the SuperJournal application is not involved in the SuperJournal statistics processing beyond the generation of log file entries for user interactions. It does not have knowledge of these user identification numbers, which are generated by the log file processing program and become persistent by their inclusion in the log file processing User Register. Within the SuperJournal application users are identified by their registered email address. Also the ODB-II SuperJournal database does not record registration date/time.

Some log file entries do not contain the user's email identifier, because this information is not available to the application performing the logging. This includes NetAnswer log file entries, and the "minicontents" (see Section 3.8, etc.) and "multimedia" (see Section 3.7) log file entries. In these cases, the user who has performed the interaction is deduced from the logged time and the logged IP address, as described in [SJMC260], by the main log file processing program.

3.11.2 Library Codes

The single letter code for each library is given in the table below. All "libraries" other than the University libraries included in the SuperJournal evaluation research have the code "Z". This will include: publisher; manchester; husat; focus; editor; author; penguin; etc. Log file entries for users with a "Z" library code are included in the log file processing, but they are excluded from the generated statistics. "Z" library users are included in the User Register.

Library

Code

Birmingham

B

Bradford

A

Cambridge

C

De Montfort

D

Durham

E

Leeds

F

LSE

L

NIMR

N

Oxford

O

Sussex

S

UCL

U

Ulster

V

Warwick

W

Other

Z

3.11.3 User Machine Location

For the purposes of the evaluation research, there was a requirement to identify the location of a user's machine, i.e. whether and when SuperJournal was accessed from a departmental machine, a public machine, a machine at home, etc. The log files record the IP address and machine name for every interaction. Some attempt was made to identify the location of these machines, on an "educated guess" basis because IP address information from the libraries was unavailable. Knowledge about machine location was gradually built into the log file processing program, by listing each month the unidentified machines at each library, attempting to add location information, and then re-running the log file processing program to include location codes. Identification was made manually and was more successful for some libraries than others, but generally home accesses were identifiable as were accesses from abroad. Machine location codes have not been added for every month's statistics because the process was time-consuming. So care should be taken in interpreting any statistics and deductions made from these location codes. A large number will be recorded as "unknown" either because they have not been processed, or because that particular library's machine addresses are difficult to decipher. Also Manchester's knowledge of Oxbridge colleges may not be complete. Note that a location identified as "Manchester" indicates either an intervention by Manchester staff to sort out user problems or a log file entry such as an Email Alert which was initiated from Manchester, so logged events with a Manchester location should be ignored.

The identified locations are:

Note that location codes have been added to the log files for February 1997 through May 1997 and July 1998 onwards. In the logs for all other months the locations will be set to "Unknown".

3.12 Journal Catalogue

During the SuperJournal Data Conversion Process, described in [SJMC140], a journal catalogue entry is created for each journal issue as it is loaded. This journal catalogue is read by the statistics generation program (see [SJMC260]) to provide information on issue load date and journal accesses. The journal catalogue contains an entry for each issue, named:

where <jid> is the SuperJournal journal identifier; vvv is the volume number as 3 digits; iii is the issue number as 3 digits.

The first line of the file is:

SJLoad Load date
SJLoad 98.03.07

Following this is a line for each article of the form:

SJArt Cluster Volume Issue Year Journal Id Article Number Base filename
SJArt CCS 12 2 1998 EJC 4 art4xyz

where "Cluster" is the SuperJournal cluster, i.e. CCS, MGP, PS or MC.

3.12.1 Journal Masking

In order to make particular journals anonymous in the generated SuperJournal usage statistics, journal data names are "masked" by the log file processing program. This masking is performed using the data within the journal catalogue. A journal look up table is available to project staff, and each publisher has been informed of the journal codes for their own journals. After masking journal articles become (e.g. CCS2V1998I2A6):

Journal <cluster>nn
Volume Year
Issue Issue number within year
Article Article number within issue

3.12.2 Journal Data Spreadsheet

A SuperJournal Data Spreadsheet was maintained as part of the SuperJournal Data Handling process. This spreadsheet records information about the journal issues within SuperJournal including issue load date and the number of articles within each issue. This spreadsheet is detailed in an Appendix to "SuperJournal Production Process" [SJMC130].

3.13 Length of Session Time

It was not obvious how to record the end of a session because access to SuperJournal is via a Web browser, and the user is not required to logout. It has been assumed that a session ends at the time of the last recorded interaction before the next login. A session is defined as at least one significant interaction after login, or registration. Repeated logins, with no intervening interactions, are excluded during log file processing. Personal or Email Alerts, which are initiated from Manchester, are ignored when calculating session length.

It was suggested that contiguous short sessions should be merged into one session, e.g. sessions of less than five minutes. But deciding the appropriate "short" session time and developing an algorithm within the log file processing program was found to be too problematical. The reason why a user re-logged-in, whether from choice, because of network problems, or inexperience, was not ascertained.

3.14 Retrieval Time

The only retrieval times logged are for searches using a search engine. This is the machine retrieval time, rather than that experienced by the end-user. It is not possible to log retrieval times as seen by the end user, because these depend on the network and the user's Web browser. The only feedback possible to ascertain from the end-user's actions is when their next interaction occurs.

3.15 Unlogged Information

3.15.1 Unlogged Events

Logging of events which occur on the user's machine was not possible during the course of the SuperJournal project. Printing and downloading are controlled by the Web browsers and readers on the user's machine.

3.15.2 Unlogged Journal Information

The multimedia content of journals has not been logged. There is so little multimedia content in SuperJournal that logging of its submission, inclusion and use was not pursued.

3.15.3 Unlogged User Information

3.16 Other Issues

3.16.1 Manual Intervention

Some manual intervention is necessary in processing the application log files to generate the statistics:

3.16.2 SPSS String Lengths

The SPSS data format has an inflexible line length limit of 80 characters. This has necessitated fixing the string length of various information fields within the generated SPSS log files, with consequent truncating of information. The information field where this is most likely to be a problem is a search query. The SPSS fields affected by this possible truncation are:

Other string fields where a truncation problem is not envisaged are:

4.  Usage Statistics Reports

5.  SPSS Specification


This web site is maintained by epub@manchester.ac.uk
Last modified: July 07, 1999