[SJ Logo]SuperJournal Application: Design

Home | Search | Demo | News | Feedback | Members Only


Ross MacIntyre, Manchester Computing, University of Manchester

SuperJournal Technical Report SJMC210

Contents:
1. Purpose of the Report
2. Overview
3. Requirements
4. Constraints
5. Approach
6. Acquired Software
7. Database Design
8. Implementation
9. Conclusions
Appendix A – High Level Functionality Requirements
Appendix B – Criteria for Acquiring Software
Appendix C – Example of Feature Specification
Appendix D – Features Sought from Search Engines
Appendix E – Example of NetAnswer PTF
Appendix F – Rollout of Functionality

1. Purpose of the Report

2. Overview

The objective of the SuperJournal Project research was to answer the question: "What do readers and authors really want from electronic journals" and to explore the implications for other participants in the publishing process, i.e. publishers, universities, their libraries, and academic researchers.

"What do you want from electronic journals" is an easy question to ask, but difficult for readers and authors to answer unless they have hands-on experience using electronic journals to provide a context for their views and opinions. The project therefore needed to develop an application for the delivery of the journals, but the development had to support particular requirements.

This report documents how the requirements were turned into application functionality, and how that functionality was implemented. The report does not document application features themselves in any detail. For information on the features, see SJMC230 – SuperJournal Application Special Features.

3. Requirements

The purpose of the electronic journal application was to deliver features and functionality so that readers could identify, by experience, those they value most. The application was therefore a test-bed, rather than a true electronic journal service. A key design consideration was that the features offered and the method of delivery should change over time, the incremental development incorporating feedback from users.

A list of value-added features was developed that were thought to be important and should be tested out. These value-added features were in the following areas:

In order to guide the development work, a rollout plan was developed for implementing the value-added features and enhancing them over time. The first release of the application, available with the first journal cluster, had basic functionality. As each successive cluster was launched, the application was upgraded with new features. By phasing each release with the launch of a new cluster, response to new features could be monitored as part of the evaluation studies.

See Appendix A for the list of the value-added features developed at the start of the Project.

4. Constraints

A number of factors constrained the application development from the outset:

The project was thus constrained to 3 full-time staff, reducing at the end of Year 3.

Contacts were maintained with other related research efforts, especially those within the eLib Programme. External advice was only to be sought where absolutely necessary. Possible areas were acknowledged as: multimedia functionality and data handling, software-specific technical advice, SGML, conversion processing and object-oriented design.

The application environment used a single, central server: Sun CS6400, running Solaris, housed at Manchester Computing. A move to multiple/distributed server architecture was planned during the project, the time-scale to be indicated in the Functionality Rollout. If implemented, the remote server(s) would be based at publisher, agent or other third party sites.

No additional effort was specifically planned to deal with accesses from outside the intended end-user communities. However, problems experienced were brought to the attention of the relevant Network Support units.

The application was designed assuming a graphics-capable browser was being used (typically Netscape or Microsoft Internet Explorer) in a Windows, Macintosh or X-terminal environment. While vendor specific extensions were to be avoided wherever possible, where they were exploited, they would be clearly identified within the application. The same holds true where a certain level of functionality was required within the browser, e.g. Frames support. Helper applications and plug-ins might need to be included, by the end-user, to explore multimedia elements.

(http://ukoln.bath.ac.uk/elib/wk_papers/stand.html).

5. Approach

Throughout, an attempt was made to always identify at least two ways of doing things, i.e. create choice not just at visible top-level interface, but also within the application architecture as well. So we looked for solutions that were new versus traditional, commercial versus freeware, external versus embedded, "simple" versus complex, explicit versus implicit, and so on.

Very little time was devoted to appearance, other than to avoid/lessen ambiguity.

5.1 Functionality

In designing an application to test these features, key considerations were time and resources. As the project was three years in length, an application had to be up and running, and available to all user sites within one year. That meant that developing application software from scratch was not a viable option. Instead the approach used was to find "off-the-shelf" software which provided the functionality wanted, and to assemble the different components in a way that would be seamless from the user's point of view.

The incremental nature of the application development and required a flexible and accommodating method for identifying enhancements. The following is a schematic of the Application Design process:

image32.gif (4733 bytes)

(SC = Steering Committee, PPC = Project Planning Committee, PM = Project Manager, TPM = Technical Project Manager)

The following list does not imply strict chronological sequence:

5.2 Software Acquisition

In February 1996, a "Call for Software Vendor Participation" was issued to around 40 companies with software products in relevant areas. The call for software was intended to inform software developers and vendors about the SuperJournal project and interest them in contributing applications and tools to build network electronic journals with multimedia features. Because the project was innovative and experimental, an informal approach was adopted to identify software that might be of interest. It was not a formal RFP with detailed specifications, evaluation criteria, and a long timescale for decisions.

SuperJournal sought software in four areas:

All respondents were initially matched against the evaluation criteria established for use throughout the project. See Appendix B – Criteria for Acquiring Software.

As a result of the call, the following offers were progressed:

A further three offers were examined but not pursued.

During the course of the project, a number of software products were examined and tested. Some were subsequently purchased, though at an academic or negotiated discount.

6. Acquired Software

6.1 Datastores

As stated in the approach, throughout we attempted to find at least two ways of doing things, all the better for contrast. The Project was going to deal with electronic text, plus (potentially) multimedia of unspecified type. The Project chose to utilise both a "state-of-the-art" object-oriented database management system (OODBMS) together with a more "traditional" text retrieval database system.

The above may beg the question: "Why was a database required at all?" (Definition: A database is a collection of data that is durable, shared, and accessible, and whose integrity is maintained.)

During the capacity planning exercise conducted during the initial planning period, it was evident that the amount of data to be transferred would be large (many hundreds of files). It was decided that the most immediate way of implementing an adequate configuration management facility would be via a database management system. Otherwise, the configuration management would have to be done manually or code would need to be developed. Both these options were undesirable, especially considering the Project's resource constraints.

Further to this, the formats and types of files submitted varied among publishers. Considering the journal contents would need to be extracted and presented to the user in a variety of ways, this being one of the Project's aims; this flexibility, once again, could most immediately be provided via a database management system.

This then raised the question : "Object-oriented or relational?" Although relational databases were the accepted way of storing business data, they still had limitations due to their strictly tabular information model.

The SuperJournal data potentially contained PDF, HTML, SGML and multimedia (e.g. sound and video) files, they were too varied and complex to be stored in tabular form, so relational was not the right database management system. Also, because the aim of the Project was to explore ways of making the electronic journal useful to the academic community, the application needed to consequently absorb user feedback. In other words, the application was intended to have frequent changes to meet users needs, potentially affecting both data descriptions and behaviour, as well as the application functionality.

Object technology was based on the ideas of controlling complex systems. The objects were independent of each other, so there were clean interfaces between the objects which maximised the potential for change in the system as a whole. To handle the complexity and the variation of the SuperJournal data, an object-oriented database management system was suitable.

6.1.1 ODB-II

This section gives an overview of the object-oriented database management dystem used (ODB-II). Fujitsu's ODB-II had the following key features which were needed by SuperJournal:

ODB-II provided a number of facilities for designing, building and testing databases and also for using and modifying them. The diagram below shows the available facilities and the relationship to the ODB-II base and other components of ODB-II object management system used by SuperJournal.

image33.gif (4518 bytes)

ODQL stands for "Object Database Query Language" and was a complete database programming language which provided a number of general purpose programming facilities. Most significant as far as the application was concerned, was the presence of an ODQL pre-processor, which allowed ODQL statements to be embedded in a host language, such as C. It also allowed dynamic ODQL statements to be executed at run-time.

There was a set of predefined reusable classes that provided multimedia support. They could be either used as defined or tailored to meet different requirements by using the properties of OO, i.e. inheritance and polymorphism.

Note, ODB-II version 1.1 was used throughout the project, despite it becoming an unsupported release by ICL. It was not possible to upgrade, to version 2 or greater, due to licensing issues. Also, the "Model Work" facility, which was a visual tool for modelling and schema design, was not included in the software received.

6.1.2 BRS/Search

In contrast to the generic nature of OO DBMS, the Project chose to use a DBMS specifically designed to deal with documents and one which was part of an integrated product range. Dataware's BRS/Search was selected.

BRS/Search was described by Dataware as being "a Full & Free Text Retrieval Database System".

The components of a BRS/Search database:

image34.gif (4376 bytes)

6.2 Application

The application was to be constructed using "off-the-shelf" software where possible, with development effort focusing on "gluing" the components together to construct the application. The gluing was performed using ODB-II as the development environment and for the main application itself. There had to be some unification, otherwise the project would have been implementing two or more completely separate applications.

The following "off-the-shelf" software was used in the SuperJournal application:

Search engines

Whiteboard

It was planned to use the following software during the Project, but neither was implemented:

6.2.1 Isite

Isite was part of an internet publishing software package containing a text indexer/search system (Isearch) and Z39.50 communication tools. It was developed by the CNIDR (Centre for Networked Information Discovery and Retrieval) and freely distributed on the Web. Other users include NASA, Library of Congress, U.S. Patent and Trademark Office and the American Astronomical Society.

Features of Isite:

6.2.2 NetAnswer

NetAnswer was the Web interface for the BRS/Search engine (again from Dataware).

The features supported by the search engine included Boolean, Positional, Fielded, Comparison searches and the use of wild cards. Additional extensions, such as ANSI standard thesaurus capabilities, automatic search expansion and pluralisation were also offered.

The presentation of search results was tailorable by the use of "Print Time Format" files, which defined what should be shown where and how.

An important feature of NetAnswer was the fact that all database accesses were logged, so usage analysis reports could be created containing statistics on usage patterns, connection time, searches performed and documents accessed – all vital for evaluation purposes.

6.2.3 RetrievalWare

RetrievalWare from Excalibur Technologies actually constituted a product range, including a search engine. The core technology used within the search engine was claimed to be unique, supposedly modelled on the way biological systems use neural networks to process information.

The features offered by the search engine were extensive and included word-meaning and pattern recognition-based searching as well as fuzzy, natural language, statistical and boolean searching.

The presentation interface was tailorable and full audit trails appeared to be available. A requirement of the application was that full article searching be supported at an early stage and this meant being able to index multiple formats, including SGML and PDF, which was possible with RetrievalWare.

Indexes were built without requirement to load data into another product-specific datastore, so the product could rightly be viewed as "bolt-on".

At the time, RetrievalWare were releasing a major new version of the software (v6.0), which put it ahead of others considered (Verity – Search97, PLS – PLWeb, OpenText – LiveLink Search, EBT – DynaWeb, Muscat – FX). See Appendix D – Features Sought from Search Engines.

6.2.4 WWWBoard

To offer the user some means of communicating with other users, an electronic bulletin board was selected. The script for WWWBoard (by Matt Wright) was freely distributed on the WWW and in widespread use. (It is still often recommended on "Web-support" type mailing lists.) It offered the facility to post a new message or respond to an earlier posted message. Each response to a particular message was stored immediately following it, indented to indicate it was a reply.

7. Database Design

7.1 ODB-II

The files received from publishers contained electronic versions of the contents of journals. Working from the position that the contents of the header files, marked up in SGML, consisted of objects, represented by tagged fields, an objectbase was defined which reflected the various objects and their relationships with each other. The design of the database was fundamental, the efficiency of the database design influencing the speed of the application to a large degree.

The following steps were performed:

The following is a simplified summary of the process.

Identify the Classes

The prime business object was the User, representing the end-user at a participating library, which was also a business object.

Consider the inherent structure of a collection of journals and how this could relate to browsing, defining a vertical hierarchy. Users would browse the application from:

SuperJournal top page = the collection of the four subject clusters

Cluster page = the collection of Journals

Journal page = the collection of Issues

Table of Contents page for an Issue

Article Abstract page (displays the header information of the article, with a hyperlink to the full text.)

Also, articles were written by authors, with full text presented in PDF, HTML, or both formats. Therefore the business objects of SuperJournal: Publisher, Cluster, Journal, JournalIssue, Article, Author, External (to represent files in PDF, HTML, SGML or other format) were identified.

As SuperJournal was to be a Web application, the pages had to be presented in HTML format. The objects which formed each HTML page were to be "fetched" from the database and written in the required format. So the HTML-type elements such as Text, Form, NameField, PasswordField, HiddenField, SelectField, CheckBox SubmitButton etc. were recognised as business objects.

A Toolbar was defined, which was to be displayed at the top of each page to assist with navigation within the application. The Toolbar consists of the following functions or Tools: Home, Up, Search, First, Previous, Next, Last, Preference, Feedback and Help; each of which were defined as objects.

Having identified the above business objects, classes were identified by three methods: direct mapping, abstraction and decomposition.

Direct Mapping

Some of the classes, such as User, Library, Publisher, Journal, etc. were identified directly from the corresponding business objects.

Abstraction

The abstraction method was used where a group of classes had something in common. The classes which were used to store the raw data or derived information all had something in common. For instance, they needed to be represented in HTML, to be able to get their object identifier etc, therefore a superclass (Media) which had those common properties, was defined.

The Section (i.e. a section of an HTML document), SuperJournal, Cluster, Journal, JournalIssue and Publisher classes had some common characteristics, e.g. they contained a list of other objects. A Cluster and Publisher both contained a list of Journals, Journal contained a list of Journal Issues, JournalIssue contained a list of Articles. A superclass (Container) was defined to own those common characteristics (imagine this superclass represents a branch on a tree which contains leaves and/or smaller branches). Again, classes such as Cluster, Journal, JournalIssue, have a permanent relationship with the object which contains it (upLevel), and their "upLevel" need to be linked within the tool bar displayed on the top of their HTML page (objects of the class Section did not). Therefore a superclass (ContainerList) was defined for them.

The classes which did not contain lists of other objects (plain leaves on the tree) had some common behaviour, for example they needed to know how to write their own HTML page. For the purposes of clean classification, an abstract class (MediaItem) was defined to be the superclass of those leaf classes.

Unfortunately the tree analogy could only be used in certain circumstances. While the structure of the Container objects were similar to tree branches, the structure of the objectbase as a whole was different from the tree structure. All objects were freely "floating" in the database, i.e. not fixed to any position. To form a Container object, it requireed not only a list of objects, but also positions for each object. A separate class (Node) was required which had an "index" property and the reference to the object which was contained in the Container object.

One exception was that the fields which make the Form objects are most likely unique to a particular form, they could be fixed with the "index" property, so they were classified into the Node class.

User and Author have some common information such as surname, forename etc, since users and authors could both be considered as types of person. A class Person was derived as the superclass of both User and Author classes, which stored the common attributes, relationships and methods.

Decomposition

Decomposition was used to identify additional classes connected to the original classes by relationships. For example, an HTML form could contain INPUT elements, such as checkbox, hidden, password, submit, text and file, or SELECT elements, etc.

Define the Class Hierarchy

When defining the hierarchy, the inclusion principle was employed, i.e. the "is-a" rule was used to check whether every "X was a Y", where wanted to make X a subclass of Y.

It was important to take advantage of polymorphism. (An operation is polymorphic if it is implemented in different ways for different classes of object.) For instance, the operation of writing HTML was implemented in different ways for different Media objects, such as objects in Journal class and objects in JournalIssue class. Polymorphism was one of the main strengths of object technology because it gave the potential for change. The complex conditional code was separated into each class of object. When the method code for a class of object needed to be amended, only this particular class was affected. When a new class was introduced, new methods were added to cater for it.

To exploit polymorphism, sometimes subclasses of a class were identified even though those subclasses have the same visible characteristics. For instance, a class named HotListField was identified as the subclass of the class SelectField (which was for the SELECT HTML element), the only reason was that the HotList style object needed a different way to write the HTML text from the normal SelectField class of object.

image35.gif (8278 bytes)

The class hierarchy of the SuperJournal database

Define the Attributes

The attributes of classes were "roughly" defined in the requirements analysis process, e.g. for the User class, the surname, forename and email, were obviously basic attributes required. The attributes associated with the journal articles were identified in the data mapping exercise using the SGML DTDs for the headers.

More attributes were recognised during the implementation stage, e.g. when implementing a method, it was found that a new attribute was required to store the necessary information. Also, along with a change in the business process, new attributes were recognised. For example, when a new feature allowing a user to collect together a group of articles (creating a "hot list") was added into the application, a new attribute which record the user's hotlist setting was required by the class User.

Attributes sometimes could be embedded in the methods. For instance, the User class has only one attribute "fullname" to replace "surname" "forename" attributes. The full name was in the format such as "Smith, John". Methods "getSurname()", "getForename()" did the job of extracting the surname and forname out from the fullname attribute. This provided much more flexibility regarding how the underlying data was to be stored. The disadvantage was that it made the performance decline, since retrieving a data directly from an attribute (especially an indexed attribute) was much quicker than retrieving the data through the method.

Define the Relationship

The way object technology dealt with relationships was different from a relational database. Each object in the database had a unique identifier, called an object reference. Relationships were represented directly by references from one object to another. This was different from the way primary keys were used to implement relationships in relational databases, requiring the joining of tables and matching of keys.

For a "one-to-many" relationship, the relationship could be stored in different ways:

1). User contained object identifier of "owning" library:

image36.gif (1166 bytes)

2). Both sides kept the references

image37.gif (1610 bytes)

3). Only "one" side kept the reference of "many".

image38.gif (1580 bytes)

4). Index created on the "many-to-one" reference

The "many" side kept the reference of the "one" and an index created on that reference. For example, the user had an attribute called "library" which referenced the library object, so an index could be created on the "library" attribute. When a query tried to retrieve all the user objects which belonged to a library object, the retrieval time was much shorter if the "library" attribute had been indexed if the number of user objects was large.

All of the four methods have their own characteristics and each was considered before a design choice was made. For example, JournalIssue and Article have a one-to-many relationship. Method 2 was used to store the relationships, as a Journal Issue needed to keep the references of its Articles to efficiently write its HTML page. Otherwise, it would have had to find its Articles via a query. Conversely, as the Toolbar needed to be able to identify each Article's "uplevel", the Issue, it was more efficient to keep the reference than to execute a query to find which issue it was contained within.

Define the Methods

In defining the methods, the following factors were considered:

Define the Objectbase

An objectbase was a physical container for stored objects. Only one class family could map to one objectbase. The reasons for having multiple objectbases may be:

For SuperJournal, only one class family was needed and the disk space was adequate, therefore there was only one objectbase. It was felt that the host machine was adequately resilient for the purposes of providing a service for this project.

7.2 BRS/Search

Defining the BRS/Search database was straightforward, as might be expected from a text retrieval database. Using the results of the data mapping exercise, which identified which tagged data was to be stored, the database definition file was created. Each field to be stored was given a name and also classified into "searchable" and "displayable".

The following lists the tagged items to be stored:

8. Implementation

The rollout of functionality was based on incremental development and implementation. A primary requirement of the application was that it should provide the end-user with choices. For a systematic evaluation to be supported, it had to be clear what choices were available to users at any point in time. Therefore, the introduction of new functionality was linked to other "firm" events, e.g. new subject cluster and new user community. The development consisted of a series of clearly visible "step increases" in functionality, data and user communities. See Appendix F – Rollout of Functionality.

The following sections document the implementation activities undertaken for the datastores and main components of the application, with screen shots to illustrate. More detailed documentation concerning various "special features" developed are contained in SJMC230 – Special Features. The section dealing with the creation of the objectbase is lengthy compared to BRS/Search, but this accurately reflects the amount of implementation effort expended.

8.1 Datastores

8.1.1 ODB-II

Database Implementation

The database implementation was started immediately the design was finalised. For each class, there were two types of files containing ODQL statements which defined the new class and its methods; called <classname>.class and <classname>.methods, respectively.

The code which defined the new class consisted of straight ODQL statements. The methods code was written in a host language, C, with embedded ODQL statements.

The files could be loaded into the database from the command line.

For methods, after the code was loaded, CompileProcedure was used to link the object codes together to make executable method code.

Setting Up the Database

Defining an ODB-II objectbase involved the following steps:

Defining the Objectbase

Defining the objectbase reserved the file space to be used to store user data. (The objectbase was created using the system command mkob.) The newly created objectbase could be expanded or deleted afterwards. As a rule-of-thumb, an objectbase should be expanded when it becomes more than 70% full.

Defining a Class Family

A class family consisted of a collection of related classes. Each class family was contained entirely within one objectbase, and each objectbase held just one class family. The class family CFmedia was defined using the system command newcf. (The corresponding deletion command was delcf.)

Defining Classes

Classes located in the same class family needed to have unique names. Classes were defined in ODQL, using the defineClass command. The class definitions contained:

The following ODQL statements defines class Person:

defineClass CFmedia::Person
super: Composite
{
maxInstanceSize: 4;
class:
String shortName;
instance:
String forename;
String surname;
String title;
String status;
String postalAdd;
String email;
String url;
String getName();
};

The class Person was within the class family CFmedia. Its superclass was Composite, which was a system predefined class. The maximum size of instances of this class was 4 kbytes (the range of this value was between 4 kbytes and 256 kbytes, default value was 4 kbytes). Person had class properties of shortName, instance properties of forename, surname, title, status, postalAdd, email and url, instance level methods of getName(). All instance and class properties had the data type of String. The data type the method getName() returned was String, and there were no parameters for it.

The final stage of defining the class was to build it using the buildClass system command. This validated that all the definitions were consistent with each other. After a class had been built, instances of the class could be created and the properties of the class were accessible. At the time of building a class, classes which were referred to in the class definition did not necessarily have to have built, just defined.

Methods code did not have to be entered before buildClass was executed. The process could be in a sequence of :

buildClass-enter methods code-compile the code,
or
enter compile buildClass,
or
enter bulidClass compile,

but the instances were not able to invoke the methods before the methods code was compiled. buildClass could be used to rebuild classes that had already been built, without an error being reported. It not only built the classes specified, but also built the superclasses of the specified classes if those superclasses had not yet been built.

Implement the Methods

Methods most strongly distinguished an object database from other technologies such as relational database, since the database could not only be used to store and retrieve information but also to process it.

A method was an operation which ran on the instances of a class or the class itself. It was defined by using defineProcedure ODQL statement. The method code was a combination of a host language (C or C++) and embedded ODQL. Typically a host language was used for the complex computations and for access to the external environment and ODQL for access to the ODB-II database. The following is an example of defining the getName() method for the Person class using C and embedded ODQLs.

defineProcedure String
CFmedia::Person::instance:getName()
{
$defaultCF CFmedia;
$String forename, surname, name;
$forename = self.forename;
$surname = self.surname;
$name = forename.stringCat(" ").stringCat(surname);
$return(name);
};

defineProcedure checked the basic syntax, such as if the method being defined conformed to the prototype declared in the class definition, whether there was a "return" ODQL statement at the end, etc.

Before the entered method code could be invoked, the method had to be compiled using the compileProcedure ODQL statement, which compiled all the methods for a specified class. Thus all methods code for a class had to have been defined before compileProcedure was invoked.

Both class definitions and method implementation code could be stored in a file, then entered into the database or by entering the ODQL code interactively at the terminal.

It was much better to enter the class or method definitions from a UNIX file than from the terminal, since the ODQL statements were stored in the file which could be used for reloading the code and for reusing the code in other applications, etc. It was even more desirable to enter the methods code from the file, as method code needed to be changed if there were syntax errors found by the compiler, which was quite likely during development.

For the SuperJournal database code, each class definition was stored in a file named <classname>.class.nel and methods code for each class was stored in a file named <classname>.methods.nel. All database code was entered from files.

For ease of subsequent maintenance, the setup process was automated. To set up a new version of the application, once positioned in an appropriate directory, the command make could be typed.

8.1.2 BRS/Search

BRS/Search consisted of nine main files:

  1. Database Configuration Table – Contained the location and access details of the database files.
  2. Form – Equivalent to a database definition file, contained the specification of each paragraph in the database, as to what its search and display characteristics should be.
  3. Dictionary – Contained the document and occurrence counts for each search term. The dictionary was in effect an alphabetically sorted list of every searchable word in the database. Each entry contained pointers to the inverted file.
  4. Reverse Dictionary – May optionally be created for performance reasons, but was not used.
  5. Inverted – Contained the precise document, paragraph and sentence location of every searchable word in the database.
  6. Text – Contained the text of every document in the database.
  7. Text Index – Was a list of every document in the database with a series of pointers to the Text file and was used for display purposes.
  8. Information – Was the compiled version of the form file and was reloaded every time the form file was changed.
  9. Status – May optionally be used to keep track of changes.

The implementation consisted of creating the Database Configuration Table, which was done via the BRS/Maint system administration facility, and the Form file, using the design prepared. Separate databases were created for each subject cluster to support cluster-specific searching. BRS/Search does allow searching across databases, so this was not a limitation.

The entries within the Form file looked like:

Shortname = ISSN
Longname = ISSN Number
Inputname = <issn>
Flags = NO-ABBREV DOUBLPOST COMPR-F (No abbreviation recognised, Hyphenated terms can be searched for as both a single expression and the individual words in the term, Compress the number as for Numbers and Maths symbols)
Verify = NUMBERS AMOUNTS MIN-CHARS=9 MASK=9999"-"9999 (Numbers allowed, Numbers plus symbols -+,.#$ allowed, Nine characters, Four numbers followed by a dash and another four numbers expected)

Shortname = ATL
Longname = Article Title
Inputname = <atl>
Subpars = 5 (Up to 5 titles allowed)
Verify = ANYTHING DATA-REQUIRED LINELEN = 250 (Any text accepted, Mandatory field, Up to 250 characters)

The NetAnswer implementation did not require any modification of the BRS/Search components.

8.2 Main Application

8.2.1 Initial Access

Access to the system was subject to authentication at individual level; the user's email address was selected as an identifier. (For full description see SJMC240 – SuperJournal Registration and Login.) When a person entered their identifier, an ODB-II session commenced, connected to the objectbase and established the user's application preferences.

By default, they were shown the top-level page, which consisted of a list of the four subject clusters. The user also saw the "toolbar", the main navigation mechanism for the first time. It is described below:

The SuperJournal Application "Toolbar"

A toolbar was defined to be displayed at the top of each page to assist with navigation within the application. The tool bar consisted of the following functions:

image47.gif (1126 bytes)image46.gif (1086 bytes)image45.gif (1095 bytes)image44.gif (1096 bytes)image43.gif (1071 bytes)image42.gif (1072 bytes)image41.gif (1111 bytes)image48.gif (1272 bytes)image40.gif (1120 bytes)image39.gif (1121 bytes)

home up search first previous next last preference feedback help

The sections that follow cover the core browsing facility, the search engines and the references a user could set within the application.

Preferences

SuperJournal allowed the user to set various "preferences" to customise the way the system looked and behaved. The choices persisted between sessions, i.e. SuperJournal "remembered" each user's preferences. Choices could be set for the following:

Additionally, the Preferences page was how a user would access and modify their "Reading list" of chosen articles.

Browsing

When the user logged in they saw a list of the four journal clusters:

image49.gif (45590 bytes)

This was the default "Home Page". Browsing could then take place in a "vertical" fashion from the list of clusters down to the level of individual articles. Clicking on a cluster showed the list of journals:

image50.gif (41166 bytes)

Clicking on a journal showed the list of issues (alternatively user could click on the most recent issue):

image51.gif (50561 bytes)

Clicking on an issue showed the table of contents:

image52.gif (50044 bytes)

Clicking on "Abstract" showed information about the article, e.g. title, authors, affiliations, full reference, keywords (where the journal includes keywords) and abstract (where supplied by the publisher).

image53.gif (33498 bytes)

Clicking on "Full Article" showed the article itself:

image54.gif (63950 bytes)

Browsing "horizontally", e.g. from issue to issue, abstract to abstract, etc, was supported via the toolbar, using "first", "next", "previous" and "last" icons.

Simplistically, what was happening at each stage, was that the application assembled together the objects that constituted the page. Each object contained an ordered set of object identifiers which were its constituents, or "members". For example, when looking at a page of journals, if a particular journal was selected, the application accessed the relevant Journal object, and used the property containing the ordered set of object identifiers for the issues.

It should be noted that the application was converting the extracted text to HTML on-the-fly. The URL displayed as the browser's location was a "one-time" URL, derived from the current session information, including user identifier. This was very flexible, but did entail more processing than just retrieving an HTML file.

Searching

As mentioned above, selecting the search icon then prompted the user to select a search engine. If the user wanted to always use a particular search tool, the choice could be saved via the "Preferences" feature.  The choices were Isite, NetAnswer and RetrievalWare.

image55.gif (39048 bytes)

The main features of each search engine are described in separate sections below.

8.2.2 Isite

The Isearch CGI scripts were modified to conform to the SuperJournal application presentation. The SuperJournal toolbar was put on each Isite page, which as well as providing a common "look and feel" within the application, also gave the facility to go the first, previous, next and last search results. Also included on the search hit list were links to the Journal page and Issue page (which are objects in the ODB-II database). This was to allow a user to see the rest of the contents for an issue containing a particular article, or to see what other issues were available.

There were five SuperJournal Isite databases, one for each of the four subject clusters and one global database to support cross-database searching. The Isite files were "SGML-like" tagged text. They were extracted from the ODB-II database with the necessary information for the Isite search. The "SGML-like" tags enabled fielded searches to be performed.

There were three types of search offered by Isite: simple, boolean and advanced.

The Boolean search was used by the project to provide a "simple" interface, then similar to those available on the WWW.

The screen image below shows a search for the terms "mouse" and "genome", both of which were to appear in the article title:

image56.gif (40219 bytes)

The search results in 9 hits.

image57.gif (30271 bytes)

The hit list was created from the Isite index database itself, but the links rely upon using the object identifiers allocated by ODB-II. This illustrates how Isite has been integrated with ODB-II, which contrasts with the other two search engines.

8.2.3 NetAnswer

As already stated, NetAnswer was the Web interface for the BRS/Search engine from Dataware. It was implemented as a means of searching the data which had been stored in the BRS/Search database. So, although it was called from the ODB-II application, it did not have any contact with the ODB-II objectbase itself.

The version of NetAnswer used was version 1.1, which was stateless. It could not remember the relationship between calls made by the same user, hence each search request sent from the Web needed to launch independent BRS/Search sessions.

NetAnswer implementation was controlled by a configuration file (neta.cfg) which determined:

The print-time format files allow the results of the search to be formatted for display. This includes the insertion of HTML tags for headings, line breaks, etc. and conditional statements to deal with eventualities such as no data. See Appendix E – Example of NetAnswer PTF File.

The screen shots below show a search for "mouse" and "genome", both to appear with article title.

image58.gif (43183 bytes)

Ten hits were found and the fields defined within the "toc" PTF file: Title, Author(s) and Journal Issue, are shown. Each hit listed was linked to the equivalent of a header record.

image59.gif (45185 bytes)

The header record, or "Full Document Display" shows the contents of the fields held with the BRS/Search database. Note that the full article was accessed via a link to an external file, i.e. the contents were not stored within the database.

image60.gif (38240 bytes)

The application Toolbar was simulated within NetAnswer as far as possible. The successive, or horizontal, browsing allowed the user to move directly from one hit header record, to the next.

A feature was provided to allow users to choose the display of the search results in brief format or tagged format. Also to allow users to download selected search results in tagged format. A set of (perl) programs were used to filter the search output from the NetAnswer CGI (netacgi) executable, for tailoring the search results pages.

8.2.4 RetrievalWare

RetrievalWare was a commercial search engine from Excalibur Technologies. Between January 1996 and September 1998, it was upgraded from version 6.0, version 6.0.1, version 6.5, to version 6.5.1. Lastly, patches were implemented for bugs in version 6.5.1.

RetrievalWare (RW) indexes for particular sets of documents/files called libraries. Within SuperJournal, eight RW libraries were created, one for header data and one for full text of the articles, for each of the four clusters. A user was able to choose which libraries to search over. The libraries could contain multiple format files, e.g. in the case of Molecular Genetics & Proteins (MGP), PDF, HTML and SGML files were searched.

The following screen shows MGP full text and abstract libraries having been selected.

image61.gif (39839 bytes)

The figure below shows the Query screen. Search terms were entered in the textbox ("natural language" querying was supported). The remaining items allowed the user to tune the behaviour of the search engine.

image62.gif (39056 bytes)

Using the fields on the left, limits could be set for the number of documents to retrieve and both the maximum number of wildcard words and spelling variations to be used during the search. The fields at the foot of the screen allowed "fielded searches" and were derived dynamically from the library/ies selected, they were not defined in the HTML screen display.

A concept search for "mouse" and "genome" anywhere in the header or text of the article itself has been initiated.

image63.gif (50091 bytes)

This shows the results for the query "mouse genome". The top half of the screen contains a list of the hits, one of which, number 2, is shown in the bottom frame, with hit terms highlighted. In this example, the hit term highlighting method was set to display the "score" for each hit term and to create a link allowing navigation from one hit to the next, in rank order. The hits shown in the top half can be displayed in the bottom frame by clicking directly on the entry, or the "up" and "down" pointers on the centre bar allow the user to move to the "next" or "previous". The centre bar is showing the hit being viewed currently, but this is also a link, which will display the document in a new window, as shown below:

RetrievalWare used a Dictionary and Thesaurus to assist in searching. The "Expert" tab allowed the user to adjust the use made, for example limiting the interpreted meanings to only a subset of the many displayed. "Weight" could be used to indicate one of the terms was of more importance in the query than the other(s).

This example shows the various meanings for "mouse" but no meanings were found for "genome", highlighting where the use of a subject-specific thesaurus may have added value.

image64.gif (43162 bytes)

8.2.5 WWWBoard

The white board used by SuperJournal was Matt Wright's WWWBoard, which was down loaded from the URL: http://www.worldwidemart.com/scripts (scripts archive). It was implemented to provide an online discussion area for SuperJournal users.

A WWWBoard was created for each of the four clusters, each in its own sub-directory: CCSBoard, MGPBoard, PSBoard and MCBoard.

Each WWWBoard instance consisted of the following main files and directories:

Inside the SuperJournal application, on the bottom of the each cluster page, a hyperlink pointed to the corresponding WWWBoard html page. Each WWWBoard message was also sent to superjournal@mcc.ac.uk, so that the SuperJournal staff could check/respond to the message.

On the html page of a WWWBoard, all messages were displayed sorted by message number. When a new message was submitted, the perl cgi script for the WWWBoard found the number recorded in the file data.txt, named the new message as <number obtained from data.txt>.html, and incremented the number in the data.txt file.

The hyperlink to the new message file was appended to the message area in the wwwboard.html file. Messages responding to an existing message were displayed in an appropriate layered fashion.

image65.gif (47594 bytes)

9. Conclusions

A rapid application development approach was successfully applied during the project. Considering where and how the feature requirements were established, it would have been impractical to work in any other way. The application was being developed as a means to an end, and one of the project outputs was a requirements list for future systems, so effectively it's output was an input.

In terms of what was created, it was encouraging that positive feedback was received from users concerning the application and content. (It is an old adage that people normally get in touch to tell you when something doesn't work or they see something they don't like.) This was the acid-test of the application design.

Some specific observations:

Appendices

Appendix A – High Level Functionality Requirements
Appendix B – Criteria for Acquiring Software
Appendix C – Example of Feature Specification
Appendix D – Features Sought from Search Engines
Appendix E – Example of NetAnswer PTF
Appendix F – Rollout of Functionality

Appendix A. High Level Functionality Requirements

Availability

Functionality

Multimedia

Content

Timeliness

Convenience

Performance

Presentation

Appendix B. Criteria for Acquiring Software

The following criteria were applied during the software selection process.

The Vendor

Vendor's Software

Terms

Appendix C. Example of Feature Specification

SuperJournal Functionality

Title: Forward chaining via abstracts

Description: The articles within SuperJournal will be enhanced to include a "cited-by" list of references.

Priority: 2

Method:

Dependencies/Constraints:

Test Process: Validation via beta-test application.

Implementation Process: The data will require 2 passes, as it would be overly complex to try and keep each journal in sync with other titles. The first pass would produce a "candidate list" which would then be resolved during the second pass and the "cited-by" links created.

Effort: 5 days Data Handler, 3 days Software Developer to amend application

Elapsed Time: 1 month

Estimated Start Date: 9 March 1998

Target Implementation Date: 6 April 1998

Appendix D. Features Sought from Search Engines

Vendor:

OpenText

PLS

Verity

Verity

Excalibur

Product:

LiveLink Search

PLWeb

SearchPDF

Topic IS

RetrievalWare

Search Features
Wildcard/Stemming

y

y

y

y

y

Proximity

y

y

n

y

 
Boolean

y

y

y

y

y

Natural Language

n

y

n

y

y

Numeric/Date Ranges

y

y

n

y

y

Pluralisation

y

y

n

y

y

Fuzzy Logic

y

y

n

y

y

Phrase/Idiom

y

y

n

y

y

Morphology(ie root level recognition)

y

y

n

y

y

Dictionary

n

y

n

y

y

Thesaurus

n

y

n

y

y

User Weighting

y

y

n

y

y

Search Scoping/Constraining

y

y

n

y

y

Search Term Browsing (ie viewable indexes)

n

n

n

y

y

Intuitive Searching (ie "more like this")

y

y

n

y

y

User Customisable Search Screens

y

n

n

n

n

Display Features
Auto Summary

y

y

n

y

y

Auto Result Ranking

y

y

y

y

y

Search Term Highlighting

y

y

y

y

y

User Configurable Display(eg sorting, fields)

y

?

n

n

n

Implementor Configurable Display

y

y

y

y

y

Convert-to-view (ie HTML-on-the-fly)

y

y

n

y

y

System Features
Dynamic Indexing

y

y

n

y

y

Secure/Authorisation

y

y

n

y

y

Audit Trail

y

y

n

y

y

Online Help

y

y

y

y

y

Full-featured API

y

y=CPI

n

y

y=SDK

Batch Processing

y

y

n

y

y

Creates Own Indexes

y

y

n

y

y

Profile/Agent Software

n

y

n

y

Profiling

Distributed/Concurrent Index Searching

y

y

n

y

y

Extensible – Other Software

y

y

n

y

y

Separate Datastores – Open

y

?

n

y

y

Data Formats
HTML

y

y

n

y

y

PDF

y

y-Solaris2.3

y

y

y

SGML

y

 

n

y

y

ASCII

y

y

n

y

y

Other

40

 

n

y

y

Appendix E. Example of NetAnswer PTF

The following is part of a print-time format file used by NetAnswer to format search results for display.

:VALID S* SELECT-PARAS
\~ This is the full record display.
\~ File name = /somewhere/Search/Config/Saves/Common/DOC.fmt
\~
:IF DELETED NODOC
: BREAK
:ENDIF
\~:SET RIGHT=77
\~:SET FMT=ON
<TITLE>SuperJournal MGP Database -- Document Display</TITLE>
<BODY BGCOLOR="#FFFFFF">
<H2>Search Results: NetAnswer - Full Record Display</H2>
<I>Document :| </I>| #DOC of #RSLT<BR>
<HR>
\~
:IF ATL
\o1Title: \f1
<B>[ATL]</B>
<BR><BR>
:ENDIF
\~
:IF AUS
\o1Author(s): \f1
<B>[AUS]</B>
:ENDIF
\~
:IF AFF
\~\o1Affiliation(s):\f1
:SET BUF2="<BR>"
:SET SUBLD=#BUF2
[AFF]
:ENDIF
<BR><BR>
\~
:IF JTL
\o1Journal:\f1
<B>[JTL]</B>|Vol|[VOL]Iss|[INO]|[CD:DE "No Date Avail."]pp[SPG]-[EPG]
<BR><BR>
:ENDIF
\~
:IF ART
:SET BUF1=ART:1L
<HR>
<A HREF="http://www.superjournal.ac.uk/cgi-bin/netacgi/go/reftex.pl?/superjournal/Journals/#BUF1">Download in Tagged Format</A>
<HR>
:ENDIF
\~
:IF ABS
\o1Abstract:\f1
[ABS]
:ENDIF
<BR>
\~
:IF ART
\o1Full Article: \f1
:SET BUF1=ART:1L
<A HREF="http://www.superjournal.ac.uk/cgi-bin/superjournal/logs/sjdisplay//sj/BRSpdf/#BUF1"> [Full Article in PDF] </A>
<BR><BR>
:ENDIF
\~

Appendix F. Rollout of Functionality of the SuperJournal Application

Release 1 (November 1996)

Release 2 (May 1997)

Release 3 (April 1998)

Release 4 (August 1998)

 

This web site is maintained by epub@manchester.ac.uk
Last modified: April 30, 1999