ETI/UNESCO Meeting Review (posted for D. Lazarus)

From
Date
A summary of the recent meeting:

'Disseminating Biodiversity Information', cosponsored by ETI/UNESCO,
ESF, and the University of Amsterdam, held in Amsterdam March
24th-27th.

(reported by Dave Lazarus, Museum fuer Naturkunde, Invalidenstrasse 43,
D-10115 Berlin, Germany. email: lazarus@fub46.zedat.fu-berlin.de).

This meeting was attended by approx. 100 people, who primarily
discussed taxonomic databases.  Attendees from western Europe
dominated, but there was a very good representation from outside
western Europe as well, particularly eastern Europe and the US, and
some participants from widely scattered lands - Russia, Australia and
Costa Rica among them.  Unfortunately there were no participants from
the mid east, Africa or Asia. Most participants were biologists, from
academia or government funded agencies.

After an opening Plenary session, the meeting broke into 2 parallel
sessions of talks, with afternoon breaks to look at posters and see
computer demos of programs.  Workshops (discussion groups) were held on
the 26th on selected general themes.  The results of the meeting will
eventually be published, and it is expected that these will also play a
role in determining future funding priorities within the European
Community.

Many of the talks/demos described or showed individual efforts to
design and input data into taxonomic databases.  These ranged from
simple flat file structures (lists), hypertext documents, including Web
versions, and relational database systems of varying complexity. Other
talks concentrated on more specific aspects of taxonomic databases,
such as general data models, representation of synonomies within
databases, and data exchange; while several talks described
international programs that support taxonomic database research and
development.

The majority of talks were concerned with research databases for a
specific biologic group, biological surveys for conservation, or the
management of biological collections in Museums.  While there were a
couple of talks on gene databases and paleontologic databases, the
large majority were focused on living organisms, including a couple on
living culture collections.

Although I cannot claim to have seen more than about half the
presentations, I think that some generalizations are permissible.

First, there was a striking similarity in the design of many of the
databases, in the way they partitioned data, stored it, and in the way
they interacted with the user.  This appears to be convergence on
common design principles rather than imitation.

The most general expression of these common design principles is
probably best seen in the newly developed (indeed, it is still in Beta
test) Museum collections management database 'Zoe'.  This database's
development was headed by Dr. Julian Humphries at Cornell, and was
cosponsored by the American Association of Systematics Collections and
the MUSE project.  Key features include: an internal relational
database engine modelled on the 1992 ASC Workshop Data Model; an easy
to use graphical user interface; separation of these two parts
permitting implementation either as a single user database, or as a
large client server application; ability to handle diverse biologic
groups, and paleontological collections; comprehensive supporting
features, such as loan management, label generation, bibliographies,
and security features. An interface to the WWW is also supported.

Several other databases were described that share many, though not all
of these features, such as the INBIO database (Barrientos, Costa Rica),
BIODAT (Lampe, Bonn), CROFlora (Nicolic et al., Zagreb), and Platypus
(Houston and Shattuck, Australia).  Even at the more abstract level
many similarities were seen, as for example between the CDEFD data
model for botanical collections (Berendsohn et al., Berlin) and the ASC
model used in 'Zoe'.  Thus, given the will to do so, there appears to
be no great technical problem in developing true international
standards for taxonomic database (more on this theme later!).

Another (predictable) trend at the meeting was interest in publishing
databases on the WWW.  Net aware traditional databases and html based
databases were the most common approaches to this.  One particularly
interesting set of globally linked databases was described by Jack
Leunissen (EMBnet) for DNA sequences.  This system, consisting of more
than 100 globally distributed WWW nodes, uses custom (Unix based)
search software that provides at least some of the relational
capability of a traditional relational database system, but is based on
a primary data set of individual 'flat files' or data tables
distributed across the entire net of nodes.

Most of the software described at the meeting is Microsoft Windows
based, although some Mac, Unix, and DOS software was also described.
Many of the programs are available from the authors, although sometimes
there are some restrictions or technical complications, as many
programs were not originally written with the idea of standalone
portability in mind. BIODAT, a DOS based program best suited to smaller
collections and single users, is however particularly good in this
respect, as was specifically designed for wide distribution (it is for
example multilingual), and is completely free.  Zoe, although being
marketed as a commercial product, is probably going to be quite
reasonably priced (preliminary estimates are about US$100/year for a
single user licence, on up to $10K/year or more - but only for very
large users, such as some of the world's largest Museums).  Zoe will
also be available in single user form for free - tho without
documentation or tech support.

On the organisational side, the importance of the ASC, CDEFD and the
Taxonomic Database Working Group (TDWG) in setting standards became
apparent.  It is to be hoped that these groups will work even more
closely with each other in the future, and that awareness of their
existence and purpose will become more widespread among taxonomists of
all specialties.  One positive note was struck by the discussion group
on Data Models and Exchange Standards, which concluded that another new
committee on this topic was not needed, and instead recommended that
the efforts of the TDWG be supported and strengthened.  This sort of
cooperation and a willingness to avoid duplication of effort will go a
long way to achieving the goals of universally available quality
database software, easy data exchange, etc.

Some things about the meeting were notable by their absence.  I have
already commented on the absence of participants from Asia and Africa.
And, considering the number of paleontologists there are in the world,
and the strong taxonomic component of their field, their presence at the
meeting was also rather sparse. Hopefully these gaps in attendence will
be remedied at any future meeting.

Also surprising (at least to me) was the absence of any presentation
describing what I would call a 'research taxonomist's database' - that
is, a database that can handle complex relationships between different
taxonomic levels, and between individual elements of different authors'
usage of taxonomic concepts, e.g. 'Xus yus as used by author A is
correct, only photographs 2 and 3 of author B's usage is accepted,
while only the late Pliocene portion of author C's usage is accepted'.
I was unable to find anyone who knew of such software, and found at
least one seemingly knowledgable expert who thought that such software
simply did not exist. With the possible exception of Paleobank (see
below) I also do not know of any such software.  Anyone with suggestions
please contact me!

Some closing thoughts and personal opinions:

Anyone needing a collections oriented database should check out the Zoe
package when it becomes available (they are giving a May 1996 estimate
for version 1.0).  If you have only a DOS capable computer, look at
BIODAT.  If you are using a Mac or Unix system, you may need to develop
your own (Although Dr. Humphries thinks that Zoe, which was developed
in Microsoft Access, might - just - run acceptably fast as a client to
a server located elsewhere, even using an emulator like Softwindows. It
hasn't yet been tested this way however).  Also try asking about the
several programs already in use at Museums like Berkeley, London, and
Paris.  These latter were not (so far as I know) primarily developed to
be portable standard applications, nor do I know of their availability.
But in my experience, these institutions have been very open and
helpful to anyone inquiring.

One major data model and software package whose absence at the meeting
was regrettable was Paleobank, developed at Kansas by Chang, Kaesler
and others.  Their data model is very comprehensive (I would estimate
it at about 100 tables, including the correlation tables used to
resolve n X n relationships), and covers many areas in more detail than
do the ASC or CDEFD documents.  Paleobank is particularly detailed in
handling nomenclature and publications, and as such may be usable as a
research taxonomist's program (cf comments above).  However, it is not
a collections database, nor does it handle field collecting and
biological data in anything more than a rudimentary fashion.

If, for whatever reason, you are interested in developing new
taxonomically oriented databases you should find out as much as
possible about the ASC, CDEFD, Paleobank and TDWG efforts before
beginning your own design.  It can save you a lot of effort, prevent
design errors, and make your data sets much more compatible with the
rest of the worlds' taxonomic databases. I for example will be
programming in the next few months a very limited version of a
collections database for the paleontology institute at my Museum using
4th Dimension, a cross platform relational database package. I hope
(eventually) to make it largely ASC/CDEFD/Paleobank/Zoe compatible.

Indeed, anyone doing taxonomic database development should consider
converging on these 'standards'.  These models are at the moment
partially expressed in the rather arcane language of 'data entity
diagrams' etc, rather than entirely in a language of actual data
tables, relational links etc. However, one can easily translate this
terminology into the ones you find in your relational database
programmer's book - both the ASC and CDEFD documents give you the
definitions you need to do this, while Paleobank's documentation is
very complete, with all tables, fields, and even field characteristics
carefully spelled out and illustrated.

A more general observation is that these various 'standards', although
very similar in general approach, nonetheless stress very different
areas of taxonomy database design.  They tend to complement each other,
so I recommend looking at them all.  The ASC model is (for me at least)
an easy read, and is the only one to explicitly combine collections
management with paleontology. The CDEFD model however covers some areas
- such as collections management - in more detail, and also has more
complete field definitions. Paleobank, as pointed out above, has very
detailed modelling of nomenclature and publications. (Some contact
addresses follow at the bottom of this review).

Many people want to use a  computer program simply to summarize and/or
publish taxonomic information - a sort of digital taxonomic catalog.
The easiest to use software I have seen for this purpose so far
(although the internal data model may not be very compatible with the
above 'standards') is ETI's Linnaeus package, which is available for
Mac and Windows.  It is really more a hypertext application than a
database. Linneaus does, as a result, have some limitations, even as a
catalog program, such as having only a rudimentary synonym tracking
capability. It is also not directly WWW aware. One interesting hypertext
alternative that is WWW aware was presented by Ewald Langer
(Tuebingen), who described a html based hypertext system (the Digital
Exsiccate of Fungi, abbreviated here as DEF).  It should be possible to
replace the actual content of his system (tho perhaps not too easily)
with data for another taxonomic group. Such html documents are of
course inherently WWW accessible, but also can be browsed in local mode
with most Web browsers.

Lastly, there are several other, more specialised database programs out
there for taxonomists etc., which alas were not described at the
Amsterdam meeting, and about which I for the most part do not know
enough about to comment on.  There is for example, (tho I have yet to
see a full copy or true demo of it) the Nannostrat program
developed by Dr. Bonnemaison, and being sold by the Micropaleontology
Press/AMNH. Nannostrat, I believe, is more specifically aimed at
micropaleontologists. I would love to hear from others who are
developing/using other paleontology-taxonomy database programs.


- dave lazarus

contacts:

about the meeting and/or the proceedings:

          Dr. Walter Los: los@bio.uva.nl

general working groups:

ASC -     Dr. Julian Humphries: jmh3@cornell.edu
CDEFD -   Dr. Walter Berendsohn: wgb@zedat.fu-berlin.de
TDWG -    Dr. F. Pando (TDWG secretary): pando@ma-rjb.csic.es or
          Dr. Susan Hollis: sue@soton.ac.uk

specific programs:

BIODAT -  Dr. K.-H. Lampe, Zool. Forschungsinst. Alexander Koenig,
          Bonn, Germany. fax: 0049-0228-216-979

CROFlora -Dr. Toni Nikolic, Dept. Botany, Uni. Zagreb, Croatia
          fax: +385-141-9295

DEF -     Dr. Ewald Langer: ewald.langer@uni.tuebingen.de

EMBnet -  Dr. Jack Leunissen: CAOS/CAM Center, Univ. Nijmegen, NL
          fax: +31-243-652-977

INBIO -   Dr. Herbert Barrientos: nbarr@lantana.inbio.ac.cr

Linnaeus- ETI, University of Amsterdam, info@eti.bio.uva.nl

Paleobank-Univ. Kansas Paleontological Inst.
                  http://ukanaix.cc.ukans.edu/~paleo/

Platypus- Dr. William Houston: khouston@anca.gov.au  or
          Dr. Steven Shattuck:steves@ento.csiro.au

Zoe -     Dr. Julian Humphries: jmh3@cornell.edu