SuperJournal Log File
SpecificationHome | Search | Demo | News | Feedback | Members Only
Ann Apps, Manchester Computing, University of Manchester
SuperJournal Technical Report SJMC261
Contents:
1. Overall Approach
2. Definitions
3. Log Files Specification
4. Usage Statistics Reports
5. SPSS Specification
SuperJournal usage statistics are taken from logging within the SuperJournal application. These statistics are fed into the SuperJournal Evaluation Research undertaken by the HUSAT Research Institute at the University of Loughborough. They are also displayed in the "members only" section of the SuperJournal Web site. The statistics are processed on a monthly basis, covering February 1997, when the SuperJournal application was launched, through November 1998.
This document specifies the log files produced by the various parts of the SuperJournal application (in Section 3), the Web pages generated each month which display usage statistics in tabular form (in Section 4), and the SPSS data file of SuperJournal usage from which further statistics reports may be produced (in Section 5). This specification was originally defined from requirements contained in "The SuperJournal Evaluation Plan" prepared by HUSAT (August 1996), but has since been refined, both to incorporate new features within SuperJournal, and to include usage report requirements from other SuperJournal stakeholders.
The processing of the various SuperJournal log files to generate the usage statistics in both HTML table and SPSS format is described in [SJMC260] and in a "User Guide" form in [SJMC262].
Users find articles within SuperJournal either by searching with one of the Search Engines or by browsing through the hierarchical journal structure. All logged events have a search type set in the generated statistics, "browse" being regarded as a search type. Events which are not concerned with article discovery have a search type set to "Not Applicable". These are: Register; Login; View/Change Preferences; Access/Send Feedback; Email Alert. Note that access to "Help" is treated as a special case, with access to general "help" having a search type of "browse" and access to the "help" page for a search engine having that search type.
Article discovery by searching employs one of the SuperJournal search engines: Isite; NetAnswer; RetrievalWare; Author Index (via Isite); Keyword Index (via Isite); Personal Alert (via Isite). The search type is set appropriately in the generated statistics. It is possible for a user to access articles within SuperJournal without performing any "browse" action, by selecting "Search" immediately after Login.
Simplistically, "Browsing" is article discovery which does not utilise one of the SuperJournal search engines. A user may find an article of interest by clicking on hypertext links through the journal hierarchy of: cluster list; journal list; issue list; issue table of contents; abstract or full article. At all levels there are options to move up, down, or sideways to next or previous, or the user may employ the Web browser's "Back" button. Within the SuperJournal application there are other hypertext links whose activation are recorded as browsing, for example a link to the journal issue Table of Contents on the Isite search results page. Browses will also be recorded when a user returns from a non-search option such as "Preferences" or "Feedback" if this feature was accessed from a "browse" screen. In all these cases, events in the generated statistics have a search type set to "Browse".
But there are also methods of article, and sub-article, discovery, again by following hyperlinks, which may be employed from the abstract or article display screens. These methods, and some orthogonal methods of article discovery, which could be regarded as "low level browsing" are described below.
2.1.2.1 Low Level Browsing
Low level browsing methods, and their appropriate search types in the generated statistics are listed. Note that a few of these methods have a search type of "browse" but the majority do not. Logging of these interactions is defined in more detail later in this document. Within the reports produced by HUSAT during the evaluation research most of this low level browsing will probably be recorded as use of "special features".
Within the SuperJournal application, whether browsing or searching, the user is generally given the option of either "View Abstract" or "View Article". The second of these options provides the user with the full text of the article in either PDF or HTML, depending on the particular journal's data supply format. "View Abstract" provides the user with a page of article header information, including the title, authors, and abstract (if available). Pedantically, these links should have been named "View Article Header", but would probably have been less easily understood by the end-user. All articles have this header information, and all articles within SuperJournal have an abstract (this was a requirement of the implementation of the SuperJournal application), but where an abstract was not supplied by a publisher, the wording of this abstract will be "Abstract unavailable". Potentially this lack of "real" abstracts for some articles may have affected end-user behaviour, and encouraged them always to go straight to the full article. Within this document wherever "View Abstract" is logged, the action logged is actually "View Article Header".
Below are the specifications of the log file entries for all recorded events. In each case the specification is followed by an example. Keywords within the log file are shown in Italics on the specification line. In general, user events within SuperJournal are recorded in a consistent format of the form:
| Date | Time | Machine name | IP address | Email Id | Interaction type | Further Information |
Separate parts of the SuperJournal application log to separate files. The information in these files is merged during the log file processing which generates the various reports. In some cases, as indicated, some of this information is unavailable at the time of logging. The search engines NetAnswer and RetrievalWare produce log files according to their own defined format. Missing information is deduced during the processing of the logfiles. In particular some log file entries do not contain the user's email identifier, because this information is not available to the application performing the logging. This includes NetAnswer log file entries, and the "minicontents" (see Section 3.8, etc.) and "multimedia" (see Section 3.7) log file entries. In these cases, the user who has performed the interaction is deduced from the logged time and the logged IP address by the main log file processing program, as described in [SJMC260].
The values of "Event Type" and "Search Type" SPSS variables in the SPSS data file generated during the log file processing are indicated along with each event log file specification.
The main SuperJournal application is controlled by the ODB-II database. It logs to dated log files, one per day, in a "logs" directory. The first five fields of each entry will always be the same (except in cases where the Email Id is not yet known), and so these fields are omitted from the individual specifications below. For example:
| Date | Time | Machine name | IP address | Email Id |
| 96.12.10 | 11.55.23 | aa.mcc.ac.uk | 130.88.201.22 | ann.apps@mcc.ac.uk |
The "Email Id" provides a unique identifier for each SuperJournal user.
3.1.1.1 Unsuccessful Registration - Invalid Library Name
| unknown | Registration | "Invalid Library Name" | "Library Name" | "Password" | "User's Name" |
| unknown | Registration | "Invalid Library Name" | "bham" | "lbypasswd" | "Joe Bloggs" |
3.1.1.2 Unsuccessful Registration - Invalid Library Password
| unknown | Registration | "Invalid Library Password" | "Library Name" | "Password" | "User's Name" |
| unknown | Registration | "Invalid Library Password" | "Birmingham" | "lbypasswd" | "Joe Bloggs" |
3.1.1.3 Unsuccessful Registration - Invalid Email
This occurs when a user provides an Email Identifier which cannot be a valid email address, i.e. it contains white space, or it doesn't contain an "@". The invalid email address will be in the "Email Id" field of the log entry.
| unknown | Registration | "Invalid Email" | "Library Name" | "Password" | "User's Name" |
| unknown | Registration | "Invalid Email" | "Manchester" | "lbypasswd" | "Joe Bloggs" |
3.1.1.4 Unsuccessful Registration - Invalid Domain
This occurs when a user's machine domain does not match the library.
| unknown | Registration | "Invalid Domain" | "Library Name" | "Password" | "Domain" | "User's Name" |
| unknown | Registration | "Invalid Domain" | "Manchester" | "lbypwd" | "ab.co.uk" | "Joe Bloggs" |
3.1.1.5 Successful registration
| Registration | Library | "Name" | "Status" | "Address" |
| Registration | Manchester | "Ann Apps" | "researcher" | "MC, 0161 275 6039" |
User academic status, which is selected by the user from a supplied list, may be:
| Academic | lecturer |
| Researcher | researcher |
| Postgraduate student | postg |
| Undergraduate student | underg |
| Librarian | librarian |
| Other | User supplied |
In some cases, where thought appropriate, an "other" status may be manually edited to one of the specific types before the log files are processed to generate the usage statistics. Some users determinedly type in their job title. Any "computer staff" are included with librarians. Visiting research staff become "researchers".
Note that the content of some of the registration fields (Name, Status, Address) may be an empty string because the user may have chosen not to fill in the details
| Event Type | Search Type |
| Register | Not applicable |
3.1.2.1 Unsuccessful Login Invalid User Name
| Login | "Invalid User Name" |
| Login | "Invalid User Name" |
3.1.2.2 Unsuccessful Login Invalid Password
| Login | "Invalid Password" | Invalid Password |
| Login | "Invalid Password" | anne |
3.1.2.3 Successful Login
| Login | "Software Viewer" |
| Login | "Mozilla_2.01 (Win16; I)" |
| Event Type | Search Type |
| Login | Not applicable |
3.1.2.4 Successful Login Direct to Journal Screen Using ISSN
| LoginEx | "ISSN" | "Software Viewer" |
| LoginEx | "0957-9265" | "Mozilla_2.01 (Win16; I)" |
This is shown in the generated statistics as a Login, but with the journal and cluster name set. The following action in the log will be "View Journal" for the same journal. This feature was introduced into the SuperJournal application in July 1998, but it was unused before the end of the project, so there are no instances of its use in the log files.
| Event Type | Search Type | Cluster | Journal |
| Login | Not applicable | Cluster code | Journal code |
Where appropriate, logging of a user's browsing actions indicates whether an article viewed is from a current or a back issue. The SuperJournal application regards the "current" issue of a journal as being the loaded issue with the most recent cover date. Journal issue and article identifiers in the log files are:
3.1.3.1 SuperJournal Clusters Page
This event occurs when the user views the list of journal clusters within SuperJournal.
| View | "SuperJournal Clusters" |
| View | "SuperJournal Clusters" |
| Event Type | Search Type |
| ViewSJC | Browse |
3.1.3.2 Cluster Screen
This event occurs when the user views the list of journals within a cluster.
| View | "Cluster" |
| View | "Communication and Cultural Studies" |
| Event Type | Search Type |
| ViewCluster | Browse |
3.1.3.3 Journal Screen
This event occurs when the user views the list of issues available for a particular journal.
| View | "Journal" |
| View | "Cultural Critique" |
| Event Type | Search Type |
| ViewJournal | Browse |
3.1.3.4 Issue (Table of Contents) Screen
This event occurs when the user views the table of contents of a particular journal issue.
| View | "Issue Id" | Current | Back |
| View | "EJCV11I3" | Current |
| View | "EJCV11I1" | Back |
| Event Type | Search Type |
| ViewIssue | Browse |
3.1.3.5 View Abstract
| Abstract | "SJAID" | Current | Back |
| Abstract | "EJCV11I3A2" | Current |
| Abstract | "EJCV11I1A2" | Back |
| Event Type | Search Type |
| ViewAbstract | Browse |
3.1.3.6 View Full Text
The file format of the full article may be deduced from the file extension.
| Article | "File path name" | Current | Back |
| Article | "Sage/EJC/V11I3/art1.pdf" | Current |
| Article | "Sage/EJC/V11I1/art1.pdf" | Back |
| Event Type | Search Type |
| ViewArticle | Browse |
The Isite search engine logs to dated log files, one per day of use, in an "Isite logs" directory. The format of the log files is the same as that for the main SuperJournal application, the Isite search engine being sufficiently integrated for user information to be available. It is not possible to deduce whether an article viewed after a search is from a "current" or a "back" issue.
| Search | Isite | Database | "Query" | No. hits | Retrieval Time (secs) |
| Search | Isite | NEWAPPDB | "abcd:1" | 0 | 1.0 |
| Search | Isite | NEWAPPDB | "television:1" | 11 | 4.0 |
Field names within the search query are included as e.g. "ABSTRACT/" or "TITLE/". If no field name is included the query was in "ANY/" field. The number of field names included in the search query indicates across how many fields the search was made.
Weightings are included within the search query following the colon, e.g. ":3"
The cluster searched, which is either "all" clusters or one specific cluster, is deduced by the log file processing program from the Isite Database name (note that the Isite database names were changed during the course of the project when the SuperJournal application was updated):
Isite Database |
Old Name |
Journal Cluster |
| NEWAPPDB | GODB | All |
| CCSNEWAPPDB | CCSGODB | Communication and Cultural Studies |
| MGPNEWAPPDB | MGPGODB | Molecular Genetics and Proteins |
| PSNEWAPPDB | PSGODB | Political Science |
| MCNEWAPPDB | MCGODB | Materials Chemistry |
| Event Type | Search Type |
| Query | Isite |
| Isite | Database | ABSTRACT | SJAID |
| Isite | NEWAPPDB | ABSTRACT | EJCV11I3A2 |
| Event Type | Search Type |
| ViewAbstract | Isite |
| Isite | Article | File path name |
| Isite | Article | Sage/EJC/V11I3/art1.pdf |
| Event Type | Search Type |
| ViewArticle | Isite |
This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.
| View | "Issue Id" | Current | Back |
| View | "EJCV11I3" | Current |
| View | "EJCV11I1" | Back |
| Event Type | Search Type |
| ViewIssue | Browse |
Searching via "index lists" is implemented by an initial listing of the authors or keywords by an ODB-II search, followed by a specific search via Isite. Thus the logging is distributed across the main application (ODB-II) log file and the Isite log file.
This entry appears in the ODB-II log file.
| Index | Authors | Retrieval Time (secs) |
| Index | Authors | 5 |
| Event Type | Search Type |
| IndexAuthors | AuthorIndex |
This entry appears in the Isite log file.
| Search | Author | Database | "Author Query" | No. hits | Retrieval Time (secs) |
| Search | Author | GODB | "Joe Bloggs" | 4 | 2.0 |
| Event Type | Search Type |
| Query | AuthorIndex |
This entry appears in the Isite log file.
| Author | Database | ABSTRACT | SJAID |
| Author | GODB | ABSTRACT | EJCV11I3A2 |
| Event Type | Search Type |
| ViewAbstract | AuthorIndex |
This entry appears in the Isite log file.
| Author | Article | File path name |
| Author | Article | Sage/EJC/V11I3/art1.pdf |
| Event Type | Search Type |
| ViewArticle | AuthorIndex |
This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.
| View | "Issue Id" | Current | Back |
| View | "EJCV11I3" | Current |
| View | "EJCV11I1" | Back |
| Event Type | Search Type |
| ViewIssue | Browse |
Logging is similar to that for Author Index Lists (above) with logging distributed across the ODB-II and the Isite log files.
This entry appears in the ODB-II log file.
| Index | Keywords | Retrieval Time (secs) |
| Index | Keywords | 5 |
| Event Type | Search Type |
| IndexKeywds | KeywordsIndex |
This entry appears in the Isite log file.
| Search | Keywords | Database | "Keyword Query" | No. hits | Retrieval Time (secs) |
| Search | Keywords | GODB | "television" | 11 | 5.0 |
| Event Type | Search Type |
| Query | KeywordsIndex |
This entry appears in the Isite log file.
| Keyword | Database | ABSTRACT | SJAID |
| Keyword | GODB | ABSTRACT | EJCV11I3A2 |
| Event Type | Search Type |
| ViewAbstract | KeywordsIndex |
This entry appears in the Isite log file.
| Keyword | Article | File path name |
| Keyword | Article | Sage/EJC/V11I3/art1.pdf |
| Event Type | Search Type |
| ViewArticle | KeywordsIndex |
This event occurs when the user selects the "View Table of Contents" option on the Isite search results screen, to view the contents of the journal issue containing the retrieved article. The logging is identical to the logging when the table of contents is viewed by browsing.
| View | "Issue Id" | Current | Back |
| View | "EJCV11I3" | Current |
| View | "EJCV11I1" | Back |
| Event Type | Search Type |
| ViewIssue | Browse |
NetAnswer writes to its own log file using its own format in which fields are separated by commas. A single, continuously updated, log file (log.cfg) is generated in the relevant "logs" directory for each NetAnswer database, i.e. for each journals cluster. This file is recorded, and restarted, on a monthly basis. Each entry consists of the following fields:
Log Entry Field |
Example |
| "HTTP_daemon" | "" |
| "Remote Machine Name" | "aa.mcc.ac.uk" |
| "Remote IP Address" | "130.88.201.22" |
| "Server Address" | "midas.ac.uk" |
| "Port Number" | "80" |
| "Request Start Time (YYYMMDDhhmmss)" | "19961204100652" |
| "Request End Time" | "19961204100653" |
| "BRS User Id" | "anonymous" |
| "BRS database" | "SJG0" |
| Function (see below) | 2 |
| Error Code | 0 |
| No. bytes | 7169 |
| BRS Document Accession Number | 264 |
| Low End TOC | 1 |
| High End TOC | 15 |
| No. Documents Retrieved | 15 |
| "Query" | "s1=&s2=television&...." |
Where the values of "Function" may be:
| 1 | BRS Full Document Display (i.e. Header and Abstract) |
| 2 | Table of Contents (i.e. Search Result) |
| 4 | Help Page Request |
| 5 | Help Page Request |
| 8 | Download in tagged bibliographic format |
| 9 | Full Article Display from header |
When "Function" is 9, "Query" gives the full path name of the article file, and the in-between fields are set to zero.
Note that functions 8 and 9 redefine the NetAnswer functions, but it seemed unlikely that any conflict would occur. NetAnswer definitions of these functions are:
The following examples of NetAnswer log entries shows only the last 9 fields of each. The log file entries created by NetAnswer are manually edited during log file pre-processing to remove extraneous material, make them consistent with other log files, and make the search fields more readable (see [SJMC262]). In each example below, the entry as logged by NetAnswer is shown first, followed by the same entry after pre-processing.
3.5.1.1 Search
The first example shows no hits, the second shows 15.
| "SJG0" | 2 | 0 | 2038 | 0 | 0 | 0 | 0 | "s1=abcd&s2=&..." |
| "SJG0" | 2 | 0 | 7169 | 0 | 1 | 15 | 15 | "s1=&s2=television&..." |
The same examples after pre-processing:
| "SJG0" | 2 | 0 | 2038 | 0 | 0 | 0 | 0 | "ANY=abcd" |
| "SJG0" | 2 | 0 | 7169 | 0 | 1 | 15 | 15 | "TITLE=television" |
The final field which is the search query may contain any of, once only, in this order: ANY, TITLE, KEYWORD, ABSTRACT, AUTHOR, ADDRESS, JOURNAL, PUBLISHER. The number of these query field names included in the search query indicates across how many fields the search was made.
The specific cluster searched is deduced by the log file processing program from the BRS Database name:
BRS Database |
Journal Cluster |
| SJG0 | Communication and Cultural Studies |
| SJG1 | Molecular Genetics and Proteins |
| SJG2 | Political Science |
| SJG3 | Materials Chemistry |
| Event Type | Search Type |
| Query | NetAnswer |
3.5.1.2 View Abstract after NetAnswer Search
On "View Abstract", NetAnswer records the BRS document number for the abstract. This example shows a "View Abstract" entry for BRS document number 264 in BRS database SJG0.
| "SJG0" | 1 | 0 | 7169 | 264 | 1 | 15 | 15 | "s1=&s2=television&..." |
During pre-processing, the SuperJournal identifier of the article whose abstract has been viewed is ascertained, and edited into the log file. This is a manual operation, described in [SJMC262]. Current/back issue information is not recorded. The same example after pre-processing is:
| "SJG0" | 1 | 0 | 7169 | "EJCV11I4A4" | 1 | 15 | 15 | "TITLE=television" |
| Event Type | Search Type |
| ViewAbstract | NetAnswer |
3.5.1.3 View Full Article after NetAnswer Search
"View Full Article" following a NetAnswer search is logged via a SuperJournal cgi-script, the call to which is included in the URL for the full article shown on the NetAnswer "View Abstract" screen. It records the file path for the article viewed before displaying the article to the end-user. Current/back issue information is not recorded.
| "SJG0" | 9 | 0 | 0 | 0 | 0 | 0 | 1 | "//sj/BRSpdf/Sage/EJC/V11I4/art4stat.pdf" |
| Event Type | Search Type |
| ViewArticle | NetAnswer |
3.5.1.4 View NetAnswer Help
This is the logging generated when a user accesses "Help" from a NetAnswer screen. Logging of other "Help" pages is described in Section 3.10.5. (Note that occasionally NetAnswer logs "View Help" with function number 5 rather than 4.)
| "SJG0" | 4 | 0 | 7169 | 0 | 0 | 0 | 0 | "TITLE=television" |
| Event Type | Search Type |
| ViewHelp | NetAnswer |
3.5.1.5 Download Search Results in Tagged Bibliographic Format
SuperJournal allows a user to download the NetAnswer search results, or a selection of them, in bibliographic format. This is logged by NetAnswer as another "search".
| "SJG0" | 2 | 0 | 7169 | 0 | 1 | 15 | 15 | "s1=&s2=television&..." |
During pre-processing (see [SJMC262]), these entries are identified, by eye, and the function number changed to 8 so that later processing will record these entries correctly. The same example after pre-processing is:
| "SJG0" | 8 | 0 | 7169 | 0 | 1 | 15 | 15 | "TITLE=television" |
This user interaction will be noted in the generated statistics as a "Query" with a search type of "Tagged Download".
| Event Type | Search Type |
| Query | TaggedDownload |
The Excalibur RetrievalWare search engine logs to dated log files, one per day of use, in a "RetrievalWare logs" directory, using its own format. Note that the date within the log file name is always one day after the contained logging information. The RetrievalWare log files are converted by a program, described in [SJMC260] and [SJMC262], into a format consistent with the main SuperJournal log files, in a single file for each month. It is not possible to deduce whether an article viewed after a search is from a "current" or a "back" issue. RetrievalWare does not record the Machine and IP address of the user. This information will be filled in during the main log file processing, but in these log files they are shown as "aa.aa.aa" and "0.0.0" respectively.
Note that logging of RetrievalWare use was included from May 1998 onwards. The versions of RetrievalWare installed previously did not log events satisfactorily, and it was not possible to relate users to log file entries.
The SuperJournal format for a RetrievalWare logged search is:
| Date | Time | Machine Name | IP Address | Email Id |
| 98.07.27 | 11:01:49 | aa.aa.aa | 0.0.0 | ANN.APPS@MCC.AC.UK |
followed by:
| Search | QryType | Cluster | "Query" | No. hits | Retrieval Time (secs) |
| Search | SQRY | CS | "abcd" | 0 | 0.0 |
| Search | SQRY | CS | "television" | 11 | 3.138 |
The "QryType" field may contain, and this will be used to set the SPSS variable "RWare Type":
| SQRY | Smart Query |
| RSQRY | Recurrent Smart Query |
| GEQRY | Get Expert Query |
| EQRY | Expert Query |
| BQRY | Boolean Query |
| RBQRY | Recurrent Boolean Query |
| QEQRY | Query By Example |
To allow for cross-cluster searching, the "Cluster" field may contain any of: CS, GP, PS, MC.
| Event Type | Search Type |
| Query | RetrievalWare |
3.6.1.1 RetrievalWare Format
The original RetrievalWare logging of the second of these two queries would be:
| R: | 11 |
| U: | "ANN.APPS@MCC.AC.UK" |
| T: | 07/27/98 11:01:49.839 |
| A: | SQRY Started |
| Q: | television |
| I: | 129 |
| R: | 12 |
| U: | "ANN.APPS@MCC.AC.UK" |
| T: | 07/27/98 11:01:49.839 -- 11:01:56.769 (6.930) |
| A: | SQRY |
| Q: | television |
| L: | "ccs_abstracts_lib" (DOCS=493, QW=1, T=3.138) |
| I: | 129 |
| C: | 493/500 |
| GHIT | Cluster | Abstract | "SJAID" |
| GHIT | CS | Abstract | "EJCV11I3A2" |
| Event Type | Search Type |
| ViewAbstract | RetrievalWare |
3.6.2.1 RetrievalWare Format
The original RetrievalWare logging of this "View Abstract" would be:
| R: | 13 |
| U: | "ANN.APPS@MCC.AC.UK" |
| T: | 07/27/98 11:02:11.948 -- 11:02:16.726 (4.778) |
| A: | GHIT |
| L: | "ccs_abstracts_lib" |
| I: | 129 |
| D: | 716 (0+16384) |
There are several similar entries in the RetrievalWare log file for this one "GHIT" (Get Hits), the differences being in the times and the figures in parentheses in the "D" field. These entries are condensed into one entry in the SuperJournal format RetrievalWare log file.
The RetrievalWare document number (716, in the "D" field, in the example) is converted into the corresponding SuperJournal article identifier (SJAID) by a manual look-up described in [SJMC262].
| GHIT | Cluster | Article | "File path name" |
| GHIT | CS | Article | "/superj1/Journals/Sage/EJC/V11I4/art4stat.pdf" |
The RetrievalWare format for this log entry is similar to that for the "View Abstract" entry above, except that the RetrievalWare "library" (the "L" field) searched will be "ccs_pdf_lib".
| Event Type | Search Type |
| ViewArticle | RetrievalWare |
The logging of multimedia accesses from full article PDF files is in a separate multimedia log file, logged by a SuperJournal cgi-script when the multimedia item is accessed. This information is merged with that in the other log files during processing. The specification of the log file entry is similar to the ODB-II log file entries except that the user's email address is omitted because this information is not known.
| Multimedia | "File path name" |
| Multimedia | "//sj/BRSpdf/Sage/EJC/V11I3/art1.xxx" |
In fact, although multimedia accesses were logged, and could potentially be processed by the main log file processing program, multimedia access statistics are not produced. This was decided because there are so few multimedia items within SuperJournal that their usage was insignificant, and identifying access to them would identify a particular journal.
| Event Type | Search Type |
| ViewMultimedia | Browse |
The logging of accesses from HTML articles and "Mini-Contents" is in a separate "minicontents" log file, logged by SuperJournal cgi-scripts when the article, etc. is accessed. This information is merged with that in the other log files during processing. The specification of the log file entry is similar to the ODB-II log file entries except that the user's email address is omitted because this information is not known.
| HTML MiniContents | "File path name" |
| HTML MiniContents | "/sj/BRSpdf/Springer /MG /V7I1/art1.minc.html" |
| Event Type | Search Type |
| ViewMiniContents | FromHTML |
| HTML Article | "File path name" |
| HTML Article | "/sj/BRSpdf/Springer /MG /V7I1/art1.pdf" |
| Event Type | Search Type |
| ViewArticle | FromHTML |
| MiniContents Article | "File path name" |
| MiniContents Article | "/sj/BRSpdf/Springer/MG /V7I1/art1.[html|pdf]" |
| Event Type | Search Type |
| ViewArticle | FromMiniContents |
| MiniContents MiniContents | "File path name" |
| MiniContents MiniContents | "/sj/BRSpdf/Springer/MG/V7I1/art1.minc.html" |
| Event Type | Search Type |
| ViewMiniContents | FromMiniContents |
| FullFigure | "File path name" |
| FullFigure | "/sj/BRSpdf/Springer/MG /V7I1/art1f1.gif" |
| Event Type | Search Type |
| ViewFullFig | FromThumbNail |
It is possible to deduce from the previous log file entries whether this was selected from: within a "References" window; a PDF article; an HTML article; or an abstract (for the article itself). This deduction is made by program during the main processing of the SuperJournal log files, and the search type set accordingly to: "From References"; "From PDF"; "From HTML"; "From Abstract" .
| Medline |
| Medline |
| Event Type | Search Type |
| ViewMedline | see above |
Logging is in the "minicontents" log file. The user's email address is omitted.
| SJBib | "File path name" |
| SJBib | "/superj1/Journals/Springer/MG/V7I1/art1.fsj" |
| Event Type | Search Type |
| ViewReferences | Browse |
This logging occurs when a user follows a link to an article (in reality to the header) in SuperJournal from a "References" list. Within the generated statistics it will be noted as "View Abstract" with a search type of "from References".
| SJRef | "File path name" |
| SJRef | "/superj1/Journals/Springer/MG/V7I1/art1.sj" |
| Event Type | Search Type |
| ViewAbstract | FromReferences |
This event occurs when a user views a list of articles which cite a particular article by clicking on that article's "View Cited By" link.
| SJCBList | "File path name" |
| SJCBList | "/superj1/Journals/Springer/MG/V7I1/art1.cit" |
"Cited By" links, i.e. forward reference chaining, were introduced into the SuperJournal application in July 1998, but this facility was unused by any University library users before the end of the project so there are no instances in the log files.
| Event Type | Search Type |
| ViewCitedBy | Browse |
This logging occurs when a user follows a link to a abstract from a "Cited By" list. Within the generated statistics it will be noted as "View Abstract" with a search type of "from CitedBy".
| SJCitBy | "File path name" |
| SJCitBy | "/superj1/Journals/Springer/MG/V7I1/art1.sj" |
| Event Type | Search Type |
| ViewAbstract | FromCitedBy |
This logging occurs when a user views a full article from an abstract accessed via either a "References" or a "Cited By" link. Within the generated statistics it will be noted as "View Article" with a search type of "from References" or "from CitedBy".
| Article | "File path name" |
| Article | "/superj1/Journals/Springer/MG/V7I1/art1.[pdf|html]" |
| Event Type | Search Type |
| ViewArticle | FromReferences / FromCitedBy |
Logging is in the "minicontents" log file. The user's email address is omitted.
3.10.1.1 Download Abstract in Tagged Bibliographic Format
| RefTag | "File path name" |
| RefTag | "/superj1/Journals/Springer/MG/V7I1/art1.sj" |
| Event Type | Search Type |
| ViewAbstract | TaggedDownload |
3.10.1.2 Email Abstract in Tagged Bibliographic Format
| EmailTag | "File path name" |
| EmailTag | "/superj1/Journals/Springer/MG/V7I1/art1.sj" |
| Event Type | Search Type |
| ViewAbstract | TaggedEmail |
3.10.1.3 Download Article's References in Bibliographic Format
| BibTag | "File path name" |
| BibTag | "/superj1/Journals/Springer/MG/V7I1/art1.fsj" |
| Event Type | Search Type |
| ViewReferences | TaggedDownload |
Reading List use is logged in the main ODB-II application log file. Abstracts may be added to, deleted from, viewed from, the reading list. They are referenced within the log file as a list of SJAIDs. During log file processing this list will be split into its constituent items, so that in the generated statistics there will be a separate entry for each "reading list" article.
3.10.2.1 Add to Reading List
| HotList | "SJAID List" | Add |
| HotList | "MGV7I1A1 MGV7I1A6" | Add |
| Event Type | Search Type |
| ViewAbstract | AddReadList |
3.10.2.2 Delete from Reading List
| HotList | "SJAID List" | Delete |
| HotList | "MGV7I1A1 MGV7I1A6" | Delete |
| Event Type | Search Type |
| ViewAbstract | RemoveReadList |
3.10.2.3 View Abstract from Reading List
| HotList | "SJAID List" | View |
| HotList | "MGV7I1A1 MGV7I1A6" | View |
| Event Type | Search Type |
| ViewAbstract | FromReadList |
Logging is in the main ODB-II application log file.
3.10.3.1 Display Preferences Screen
| View | "Preferance Setting" (sic) |
| View | "Preferance Setting" |
| Event Type | Search Type |
| ViewPref | Not applicable |
3.10.3.2 Change Preferences
| ChangePreference | No. fields changed | "Changed" |
| ChangePreference | 3 | "Home; SrchEng; " |
The "Changed" field indicates which Preferences the user has changed. The quoted string may contain any of, in a semi-colon separated list: SrchEng; PrefCluster; StartScreen; Home; TimeOut; Email; Password; PAlert[CCS][PS][MGP][MC]. The cluster abbreviation(s) following "PAlert" indicate for which cluster(s) the user has set an alert. Although a change to "Preferred Cluster" is logged the actual chosen cluster is not logged.
Note that logging of the actual preferences changed was included from July 1998 onwards. Previous logging indicated the number of preferences changed only.
| Event Type | Search Type |
| ChangePref | Not applicable |
3.10.3.3 Change Email
The new Email address will appear in the "Email Id" field. The number of preferences changed includes the Email change. The contents of the "Changed" field is as specified above. The log file processing program will take particular note of this log file entry in order to keep track of the user in later sessions.
| ChangePreference | Old Email | No. fields changed | "Changed" | |
| ChangePreference | aa@mcc.ac.uk | 1 | "Email" |
| Event Type | Search Type |
| ChangeEmail | Not applicable |
Logging is in the main ODB-II application log file.
3.10.4.1 Access to Feedback Form
| FeedbackForm |
| FeedbackForm |
| Event Type | Search Type |
| AccessFeedBack | Not applicable |
3.10.4.2 Feedback Sent
| Feedback | Sent |
| Feedback | Sent |
| Event Type | Search Type |
| SendFeedBack | Not applicable |
Access to the "Help" pages is logged in the main ODB-II application log file. Log entries for access to the main top-level "Help" page include the user's email identifier, but log entries for accesses to lower level "Help" pages do not.
| HelpType |
| Help |
HelpType may be one of: Help; HelpIsite; HelpNA; HelpPrefs.
Note that the logging of access to "Help" was included from July 1998 onwards.
| Event Type | Search Type |
| ViewHelp | Browse |
| ViewHelp | Isite |
| ViewHelp | NetAnswer |
| ViewHelpPreferences | Browse |
Alerts are sent out by the SuperJournal application to users who have requested them when new data is loaded into the application. The logged machine name and IP address are "cs6400.mcc.ac.uk" and "130.88.203.18". The times are typically in the early hours of the morning which is generally when journal data is loaded. These logged events should be ignored when any statistics of user actions are produced, when calculating session lengths, and when considering user access location.
3.10.6.1 Email Alert
An "Email Alert" is sent out to a user when a new journal issue is loaded if the user has requested an alert for that journal via the "Preferences" within SuperJournal.
| EAlert | "Issue Id" |
| EAlert | "ONCV17I18" |
Note that the logging of Email Alert was included from July 1998 onwards.
| Event Type | Search Type |
| EmailAlert | Not applicable |
3.10.6.2 Personal Alert
A "Personal Alert" is sent out to a user if a newly loaded article contains their "personal alert" search terms. A user may set "personal alert" search terms for each journal cluster via the "Preferences" within SuperJournal. The log entry contains a list of SJAIDs of the new articles which contain the search terms, within a particular cluster. There may potentially be a logged Personal Alert for each journal cluster, each one listing articles from possibly several journals. During log file processing this list will be split into its constituent items, so that in the generated statistics there will be a separate entry for each "alerted" article.
| PAlert | "Search Terms" | "SJAID List" |
| PAlert | "gene*" | "ONCV17I18A2 ONCV18I17A4 ..." |
| Event Type | Search Type |
| PersonalAlert | PersonalAlert |
In order to keep track of users in the log file processing from month to month and to provide some user profile information on a monthly basis, some of the information about each user is preserved in a User Register file. Data in the previous month's version of this file is input to the log file processing program along with the current month's log file entries. At the end of log file processing a new updated version of the User Register for the current month is output.
The User Register file contains a single line entry for each user, which consists of the following fields (the second column showing an example):
| SJUser | SJUser |
| Registration Date | 1997.01.18 |
| Registration Time | 12:31:42 |
| Registration Machine | pc56.cam.ac.uk |
| Registration IP | 123.45.678.99 |
| Email Identifier | f.bloggs@cam.ac.uk |
| Name | "Fred Bloggs" |
| Library | "cambridge" |
| Academic status | "researcher" |
| Address | "Dept of Biology, Cambridge" |
| Library code letter | C |
| User number within library | 57 |
| Academic status code number | 2 |
| Number of sessions in previous month | 9 |
| Mean session length last month | 3.78444 |
| Last month standard deviation | 3.22095 |
In the generated SuperJournal statistics, users are identified, for anonymity, by a user code, composed from the library code and the user number within the library. For example, the user registered in the above example will be known as "C57" in the generated statistics. Real user names and identifiers are included in "private" statistics only. These private pages include information for HUSAT's use, and a user registration information page for each library.
Note that the ODB-II database which controls the SuperJournal application is not involved in the SuperJournal statistics processing beyond the generation of log file entries for user interactions. It does not have knowledge of these user identification numbers, which are generated by the log file processing program and become persistent by their inclusion in the log file processing User Register. Within the SuperJournal application users are identified by their registered email address. Also the ODB-II SuperJournal database does not record registration date/time.
Some log file entries do not contain the user's email identifier, because this information is not available to the application performing the logging. This includes NetAnswer log file entries, and the "minicontents" (see Section 3.8, etc.) and "multimedia" (see Section 3.7) log file entries. In these cases, the user who has performed the interaction is deduced from the logged time and the logged IP address, as described in [SJMC260], by the main log file processing program.
The single letter code for each library is given in the table below. All "libraries" other than the University libraries included in the SuperJournal evaluation research have the code "Z". This will include: publisher; manchester; husat; focus; editor; author; penguin; etc. Log file entries for users with a "Z" library code are included in the log file processing, but they are excluded from the generated statistics. "Z" library users are included in the User Register.
| Library | Code |
| Birmingham | B |
| Bradford | A |
| Cambridge | C |
| De Montfort | D |
| Durham | E |
| Leeds | F |
| LSE | L |
| NIMR | N |
| Oxford | O |
| Sussex | S |
| UCL | U |
| Ulster | V |
| Warwick | W |
| Other | Z |
For the purposes of the evaluation research, there was a requirement to identify the location of a user's machine, i.e. whether and when SuperJournal was accessed from a departmental machine, a public machine, a machine at home, etc. The log files record the IP address and machine name for every interaction. Some attempt was made to identify the location of these machines, on an "educated guess" basis because IP address information from the libraries was unavailable. Knowledge about machine location was gradually built into the log file processing program, by listing each month the unidentified machines at each library, attempting to add location information, and then re-running the log file processing program to include location codes. Identification was made manually and was more successful for some libraries than others, but generally home accesses were identifiable as were accesses from abroad. Machine location codes have not been added for every month's statistics because the process was time-consuming. So care should be taken in interpreting any statistics and deductions made from these location codes. A large number will be recorded as "unknown" either because they have not been processed, or because that particular library's machine addresses are difficult to decipher. Also Manchester's knowledge of Oxbridge colleges may not be complete. Note that a location identified as "Manchester" indicates either an intervention by Manchester staff to sort out user problems or a log file entry such as an Email Alert which was initiated from Manchester, so logged events with a Manchester location should be ignored.
The identified locations are:
Note that location codes have been added to the log files for February 1997 through May 1997 and July 1998 onwards. In the logs for all other months the locations will be set to "Unknown".
During the SuperJournal Data Conversion Process, described in [SJMC140], a journal catalogue entry is created for each journal issue as it is loaded. This journal catalogue is read by the statistics generation program (see [SJMC260]) to provide information on issue load date and journal accesses. The journal catalogue contains an entry for each issue, named:
where <jid> is the SuperJournal journal identifier; vvv is the volume number as 3 digits; iii is the issue number as 3 digits.
The first line of the file is:
| SJLoad | Load date |
| SJLoad | 98.03.07 |
Following this is a line for each article of the form:
| SJArt | Cluster | Volume | Issue | Year | Journal Id | Article Number | Base filename |
| SJArt | CCS | 12 | 2 | 1998 | EJC | 4 | art4xyz |
where "Cluster" is the SuperJournal cluster, i.e. CCS, MGP, PS or MC.
In order to make particular journals anonymous in the generated SuperJournal usage statistics, journal data names are "masked" by the log file processing program. This masking is performed using the data within the journal catalogue. A journal look up table is available to project staff, and each publisher has been informed of the journal codes for their own journals. After masking journal articles become (e.g. CCS2V1998I2A6):
| Journal | <cluster>nn |
| Volume | Year |
| Issue | Issue number within year |
| Article | Article number within issue |
A SuperJournal Data Spreadsheet was maintained as part of the SuperJournal Data Handling process. This spreadsheet records information about the journal issues within SuperJournal including issue load date and the number of articles within each issue. This spreadsheet is detailed in an Appendix to "SuperJournal Production Process" [SJMC130].
It was not obvious how to record the end of a session because access to SuperJournal is via a Web browser, and the user is not required to logout. It has been assumed that a session ends at the time of the last recorded interaction before the next login. A session is defined as at least one significant interaction after login, or registration. Repeated logins, with no intervening interactions, are excluded during log file processing. Personal or Email Alerts, which are initiated from Manchester, are ignored when calculating session length.
It was suggested that contiguous short sessions should be merged into one session, e.g. sessions of less than five minutes. But deciding the appropriate "short" session time and developing an algorithm within the log file processing program was found to be too problematical. The reason why a user re-logged-in, whether from choice, because of network problems, or inexperience, was not ascertained.
The only retrieval times logged are for searches using a search engine. This is the machine retrieval time, rather than that experienced by the end-user. It is not possible to log retrieval times as seen by the end user, because these depend on the network and the user's Web browser. The only feedback possible to ascertain from the end-user's actions is when their next interaction occurs.
Logging of events which occur on the user's machine was not possible during the course of the SuperJournal project. Printing and downloading are controlled by the Web browsers and readers on the user's machine.
The multimedia content of journals has not been logged. There is so little multimedia content in SuperJournal that logging of its submission, inclusion and use was not pursued.
Some manual intervention is necessary in processing the application log files to generate the statistics:
The SPSS data format has an inflexible line length limit of 80 characters. This has necessitated fixing the string length of various information fields within the generated SPSS log files, with consequent truncating of information. The information field where this is most likely to be a problem is a search query. The SPSS fields affected by this possible truncation are:
Other string fields where a truncation problem is not envisaged are:
This web site is maintained by epub@manchester.ac.uk
Last modified: July 07, 1999