[SJ Logo]SuperJournal Application: Special Features

Home | Search | Demo | News | Feedback | Members Only


Ross MacIntyre, Manchester Computing, University of Manchester

SuperJournal Technical Report SJMC 230

Contents:
1.  Purpose of the Report
2.  Overview
3.  Special Features Developed for SuperJournal Application
4.  Features Not Implemented
5.  Conclusions

1. Purpose of the Report

2. Overview

The project required the development of a variety of features, some of which came included within software acquired, others required code to be developed. The features were introduced in a succession of scheduled releases during the project, harmonised with the evaluation activities.

The following lists the releases and the features introduced and highlights with an asterix (*) those features that were "home-grown", as opposed to acquired.

Release 1 (November 1996)

Release 2 (May 1997)

Release 3 (April 1998)

Release 4 (August 1998)

The "home-grown" special features are the subject of the detailed section below, which described briefly how the feature was developed and illustrates how the feature was manifest within the application.

3. Special Features Developed for the SuperJournal Application

The features have been grouped in to the following areas of functionality:

3.1 Persistent Personal Preferences
3.2 Alerting
3.3 Browsing
3.4 Saving References
3.5 Linking

3.1 Persistent Personal Preferences

A key element of the project was the provision of choice to the user. This covered a number of areas and for some features the user could register a preference that would persist. This prevented the user from having to repeatedly make the same choice each time they used the application. Choices could be set for the following:

Additionally, the Preferences page was how a user would access and modify their "Reading list" of chosen articles. See Section 3.3 for further detail.

pref1bw.gif (19315 bytes)

 

pref2bw.gif (16569 bytes)

The preferences were stored as properties of the user, actually within the instance of the object User, within the objectbase. An example print out below illustrates the types of properties and includes some values for illustration:

username; (=email address)
password; (=PID)
sessionKey;
servername;
loginTime;
lastAccessTime;
timeOut, default: 7200; (unit =seconds)
libName;
ipAddress;
previousPage = CFmedia::Journal::35;
searchEngine, default:"No Preference";
preferedCluster = "Molecular Genetics and Proteins";
startScreen, "Previous Session";
homePage, default:"SuperJournal Clusters";
set ccsNotifyNewIssue, default:EMPTY;
set mgpNotifyNewIssue = {"Mammalian Genome", "Transgenic Research"};
set psNotifyNewIssue, default:EMPTY;
set ppNotifyNewIssue, default:EMPTY;
set preferredReading = {CFmedia::Article::2570, CFmedia::Article::2571, CF::Article::2733};
ccsSearchword = "genetic engineering, medical ethics"
mgpSearchword = "mouse gen*";
psSearchword = NIL;
mcSearchword= NIL

After the user submits any preferences, a confirmation screen is displayed sent to the user. Once confirmed the new User object is "committed", i.e. made permanent.

3.2 Alerting

Email alerting service based on tables of contents

As indicated above in the "NotifyNewIssue" property, a user may choose to receive, via email, the table of contents for new issues subsequently loaded.

emailtocbw.gif (12373 bytes)

Alerting service based on personal profile

The user can enter search terms for any of the clusters. When new data is loaded, these terms were entered in a search across the new data. The user would then be sent an email highlighting any new hits. An example below shows the results after an issue of Mammalian Genome was loaded, which successfully matched 2 articles searching for "mouse", "cDNA" or "gas5":

emailbw.gif (10608 bytes)

3.3 Browsing

Browse author and keyword indexes

In order to assist users who were unsure as to what keywords may have been declared for an article, or perhaps wanted to see a rough guide to the contents, viewable keyword indexes were designed. The same was done for author names, where spelling variations could result in failed searches. The viewable lists could be seen as a means to search and be guaranteed a hit. Because of the numbers involved, the interface required the user to specify one or more starting letters for either keyword or author together with the subject cluster of interest.

keybw.gif (9088 bytes)

The selections were then used in a database query, which located all keyword or author objects, which were within the chosen cluster and began with the chosen letter. A count was kept during extraction, of the number of associated articles, as this was stored as a property of the object, in the form of a set of article identifiers. So for each distinct keyword/author, the number of articles was established.

The above keyword selection from CCS would include:

keylistbw.gif (11419 bytes)

See below a small part of the Political Science cluster author index, where author names begins with "M". Note that "Dennis MacShane" also appears with forename spelt "Denis".

authlistbw.gif (4598 bytes)

The keywords are actually links, which perform a search using the keyword(s) directly (using the search engine Isite). So in the case of "critical discourse analysis", the user would see:

keyresbw.gif (20796 bytes)

Because the keywords and author names were retrieved dynamically, there was a performance overhead. It would be preferable to create indexes as data gets loaded, as this would reduce response times.

Personal Reading List

A user could create a personal reading list of articles of interest and get direct access to them on future occasions. From any abstract the user can hit the "Add to Reading List" button:

read1bw.gif (9590 bytes)

or use the checkboxes from a Table of Contents. The article is then remembered for quick access subsequently.

read2bw.gif (12651 bytes)

To look at the Reading List, the user goes to the Preferences screen and the Reading List can be seen as a pull-down menu. (The navigation buttons: first, previous, next & last, also work within the context of the reading list, but the user was reminded in case this was initially confusing.) There are corresponding buttons to remove articles from the reading list if they are no longer of interest.

read3bw.gif (7417 bytes)

Selecting to read the "Sooty foot..." article would next show:

sootybw.gif (16290 bytes)

"Mini-contents"

In an attempt to support faster browsing of articles for the user, a "shorthand" version has been created for those titles received in SGML format. These "mini-contents" files contain:

The mini-contents files are linked together with "next" and "previous" links, allowing the user to move quickly between them.

minicont.gif (17546 bytes)

3.4 Bibliographic references

The user can obtain bibliographic references via and for the following:

Article References Included in Header Data

Abstracts contain a direct link to the list of bibliographic references, where these can be extracted from the text of the article. So in the case of the article viewed from the reading list shown above, the References link would take the user to:

sooty2bw.gif (10165 bytes)

Save the Reference for a Particular Article

From any Abstract, a user can save the information in a standard tagged format, so it could be put into a database or "reference manager" and used when compiling bibliographies. The record can be obtained either directly as a text file to be downloaded and stored on their PC, or it can be sent via email. This may be more convenient if the user did not wish to, or was unable to, store files locally on the computer being used.

sootyemail.gif (5304 bytes)

Save the Bibliographic References from an Article

The entire bibliographic reference section of an article could be downloaded in standard tagged format, as before either on-screen or by email.

Save Search Results from NetAnswer

The results from a NetAnswer search (one of the three search engines implemented in the application) can also be saved in the same standard format.

netansbw.gif (10645 bytes)

3.5 Linking

The following types of linking were developed and included within the application.

Links within Article or to External URLs

Where articles have included a reference to a Web address, either by tagging the link explicitly as a URL, or it is found via a string search and parse, the SuperJournal header will contain the link enabled as a hypertext link.

Links from Bibliographic References to Medline Abstracts

medlinebw.gif (14436 bytes)

Links between SuperJournal Articles

Where an article's bibliographic references are accessible (i.e. the article is in HTML) and cite articles already loaded in SuperJournal (either HTML or PDF), links have been created to take the user there directly. A virtual catalogue was created of all SuperJournal contents and this was used for matching during data processing. Any successful matches resulted in a "SuperJournal" link being inserted in the citing article. In most cases, the SuperJournal link sits alongside a MEDLINE link, but the advantage is that the full text of the article is available within SuperJournal.

dualrefbw.gif (4198 bytes)

Links from Bibliographic References to Abstracts (Articles in PDF Format)

While it was straightforward to implement the above linking mechanisms where the article was received in SGML format, it was more involved implementing this for PDF articles. The required functionality was specified by Manchester Computing and developed by the PDF Research Group at Nottingham University. A tailored text extraction program was run against the PDF file, to obtain the text of the reference section. This text was then parsed and passed to the relevant reference database, in this case MEDLINE. The resultant reference identifiers were included in a URL and inserted into a file per article. Each file was then used by another script to enable the link in the PDF article. Note that the list of references was retained and served as HTML. Below is an example of an article in PDF format which has the reference information available:

pdfrefbw.gif (16729 bytes)

Within the PDF article itself, the links were created and indicated by means of a "black box" outline (see below). Note the cursor shape is a pointed finger, denoting the presence of a weblink:

pdfref2bw.gif (80297 bytes)

Forward Chaining

Using the SuperJournal "Table of Contents" files (see SJMC140 – SuperJournal Data Conversion Process for description), the citation matching process produced both a link to cited articles within SuperJournal, together with another list of "cited-by" links which were stored with the article and could be viewed in addition to an article's abstract data.

The list of links displayed in "cited-by" was produced dynamically, to ensure it used the latest data, rather than relying on retrospective changes being made to each article.

citedby.gif (9945 bytes)

Only references from SGML articles could be processed initially, so the feature was restricted to a subset of the science clusters, however, the cited article itself could be any format. To deal with backfile problems during take-on, a "candidate citation list" was created. If during take-on matches were found, a file of updates was created. A batched approach was acceptable here.

The cited-by links could be pictured as a Table of Contents, with links to the Abstract information of the citing articles.

The data required two passes, as it would have been overly complex to try and keep each journal "in sync" with other titles. The first pass produced a "candidate list" which was then resolved during the second pass and the "cited-by" links created.

External Linking to SuperJournal

Participating sites could build links direct to each journal, e.g. from their OPACs or electronic journal Web pages, so registered users could bypass the "Welcome" screen and get to a journal more directly.

The library issues a call to a script including the journal's ISSN. When the user invokes the call, a login screen is displayed, the user logs in as usual, and then the user sees the list of journal issues available as the first screen once in the application.

4. Features not implemented

A number of features were explored, but never implemented.

"Flick-book"

There was an identified requirement to mimic the effect of a person flicking through a journal until "something" makes them stop. The trigger that makes a person stop will often be a non-text item, such as an image or an equation, though some may react to headings, etc. A version of this for text can be seen at the Cornix Web site (http://www.vallier.com/tenax/cornix.html) The idea here was to display images for example (gif, jpeg, etc), which were actually hypertext links. So a figure would link to the document containing it, the positioning being down to the precision of the link. As the user may well have "flicked" past the item of interest, the applet needed to be stoppable and allow the user to backtrack, an image at a time. It should also allow them to restart, should they not wish to follow the link, or return afterwards. It was to be possible for the user to adjust the notional speed of display; the actual speed of rendering affected the display speed.

This feature was developed as a Java applet, but suffered from the security restrictions of the language. Once downloaded, the images displayed as expected, but the hyperlinking was only possible to another Java applet, in this case a browser applet. It was then not possible to invoke any further external links from that browser applet. See below for example of applet and linked article downloaded.

fullflickbw.gif (37001 bytes)

The feature was not developed further, though it was suggested that the use of Java servlets may have solved some of the security restriction. The feature was also developed in parallel in Javascript, but this was dogged with problems and was dropped fairly quickly.

HTML from PDF

As many of the articles being provided were only submitted in PDF format, methods of creating other formats were examined. The most obvious being the creation of HTML from PDF, where the PDF had been created from text rather than from an image, e.g. TIFF page scan.

A PDF-to-text utility was already in use, as part of the data conversion process, and this was initially used in conjunction with a text to HTML utility, to create some "mock-ups" for comment. Work focused on text-only journals. There was some interest amongst Publishers, but the quality was not acceptable.

More recently, a PDF-to-HTML plug-in for Acrobat Exchange called GENUS from ICENI was tested. Thought the quality was much higher, it was still not acceptable, especially for scientific papers, where there were problems with tables and annotated images.

The images below show how a table appears in the PDF article, followed by the extracted version, which was unable to recreate the layout of the table.

tablepdf.gif (15473 bytes)

 

tableiceni.gif (6770 bytes)

"Thumbnail Extraction"

Code was developed to extract thumbnails from PDF articles, and then covert to GIF format. They could then be displayed without the need to start Acrobat.

It was felt that this may be useful if a user wanted to locate a particular illustration, but could not recall in which article within which issue it appeared. The alternative would be to download each article file individually. Linking to a position in the PDF document would also have been possible, but the positions (PDFmarks) would need to have been created.

The images could also have been used in the "Flick Book", described earlier, as a way to quickly scan PDF files.

This facility was developed, see below, but not implemented, due to other priorities.

thumbsbw.gif (11861 bytes)

5. Conclusions

The project was able to develop and successfully implement the required functionality with few exceptions, already noted. At various stages, certain enhancements were added to the "wish list" (see SJMC210 – SuperJournal Application Design) but not taken further. They include:

PDF Links – To Dynamic Page Containing Options

The links inserted into the PDF articles actually called a script, rather than going direct to the reference. This was done for the purposes of logging who was invoking the link, however a natural extension would be to use this method in order to support multiple linking from the PDF document. This could be a page that was produced dynamically, or could offer a choice of external A&I databases, etc.

This concept could be viewed in the same vein as a "metadata response page" discussed in The DOI initiative: current position and way forward by Norman Paskin (http://www.doi.org/white-paper-3.pdf).

PDF References for and from All Titles

The extraction of the references from PDF articles was only implemented for a small number for titles within the project. The mechanism does offer scope for "opening up" larger numbers of files in a production environment. While the conversion of files from PDF to HTML is still imperfect, the extraction of references does work, as it is far simpler.

Bibliographic References, Offer Pick-list, Rather than "All-or-Nothing"

The facility to allow the saving of the reference section for an article should allow the user to select which individual references to download, similar to the way search results can be downloaded from NetAnswer. This would save the user downloading more than required.

Also, other bibliographic formats could be supported, rather than just plain tags, e.g. Papyrus, ProCite, Idealist, EndNote, Reference Manager and BIBTeX .

Keywords/Author Indexes

The lists of Authors and Keywords should be produced in batch mode as part of the data processing and be indexed. The number of authors in particular, soon becomes onerous to extract when done in real time.

External Linking

The facility was introduced to enter the application at issue list level, but this could easily be extended to support access at table of contents (TOC) and article levels. This method was coded to support this, as the project was collaborating with SilverPlatter at the time, but the linking was never enabled. Access at TOC level would clearly be appropriate from the email TOC alerts sent to registered users. Access at article level would be an appropriate extension to the personal alert feature, linking the new hits to the full text within SuperJournal.

 

This web site is maintained by epub@manchester.ac.uk
Last modified: April 30, 1999