Paleo21 - Computers, databases, etc.

From
Date

The white paper by Norm MacLeod et al. does a good job of outlining issues
of quantification and databases in paleontology, but I would like to see
consideration given also to another facet of computer applications in our
field, that might be characterized as "embedded intelligence" or
"expertness".  This has a bearing also on an issue raised in the white
papers on biostratigraphy and systematics - namely, the growing shortage of
experts in those fields.

AN  EXAMPLE  PROGRAM

To set the stage for this memo, I'll briefly describe a program called
COREXPERT that we developed at Scripps Institution to help technicians and
students with limited experience to make reliable descriptions of our
sediment cores.  This was an appropriate field for experimentation with an
expert program, because Cenozoic pelagic sediment sequences contain only a
limited number of fossil groups, mineral species, types of sedimentary
structures and bedding contacts, etc.  When data on sediment samples are
being entered using this program, mineral and fossil constituents are
entered from menus, each item of which has a hypertext link to textual and
graphic information to assist in its identification.  Additionally, there
is information on where and under what conditions that constituent commonly
occurs.

As the user inputs percentages of microfossil groups and mineral
constituents in a sample, the program checks a set of rules to see whether
there are any anomalies.  For example, if the user enters an unusually high
or low percentage for a constituent, the program displays a warning and
explains any special circumstances under which such extreme values can
occur.  The user can then adjust the estimated percentage to conform to
expectations, or keep the original entry.

When all the constituent percentages have been entered, the program checks
the input for unexpected and expected associations of constituents.  For
example, if sponge spicules have been recorded the program checks to see
whether radiolarians have also been recorded, since the latter almost
always accompany the former in pelagic sediments.  The user can either
modify the entry to conform to the "expert" expectation, or retain the
anomalous record - which can then become a target for special investigation.

Besides checking the validity of data entry, the program also allows
intelligent inference of information not explicitly stored in the database.
 For example, heteropods are an uncommon sedimentary constituent not stored
in a separate field in our database.  But the program can infer that
heteropods are probably present in the sediment if the sample contains 15%
more each of foraminifera and coccolithophorids, and the water depth is
less than 4,000 meters.  This rule can be invoked when the database is
searched for occurrences of heteropods.

A non-technical account of this software has been published by Tway and
Riedel in the Jan/Feb issue of  PC AI.  It illustrates how relatively easy
it is to embed some "intelligence" into data entry and retrieval software.

EXTENSION  TO  STRATIGRAPHY AND  PALEOENVIRONMENT

It would be a straightforward matter to incorporate a similar level of
expertise into a system for entry of taxa into a database to be used for
biostratigraphic and paleoenvironmental purposes.  The user could have
access to textual and graphic aids to species identification.  The software
could recognize anomalous presences or absences in each assemblage
recorded, and it could look at samples in a sequence up and down a sediment
column and interpret paleoenvironmental changes and zonal boundaries,
explaining itself as it proceeded.  In this way, expertise could be passed
on from seasoned veterans to inexperienced beginners.

AND  TO  STRATIGRAPHIC  SYNTHESES

It is possible to envisage an expert system of a higher level of complexity
to make "intelligent" stratigraphic syntheses from large groups of
sequences described in terms of fossil occurrences, sedimentological
characters, paleomagnetism, isotopic data and so on.

It has always bothered me that probabilistic methods of stratigraphic
correlation ignore a large amount of information on the  RELATIVE
RELIABILITY  of each earliest and latest occurrence of a taxon in a single
sequence, and similarly each interpretation of a magnetic reversal,
isotopic shift, etc.  

Take, for example, limits of stratigraphic range of a fossil taxon in a
sequence.  The reliability of such an observed limit can range from very
poor to very good, depending on a number of easily determined factors.  The
reliability will be greater for a taxon which is present as tens of
individuals per sample, than for one present as one or two individuals per
sample.   It will be greater for a taxon easily distinguished from all
co-occurring taxa, than for one that is distinguished with difficulty.  It
will be greater for a locality well within the area of distribution of the
taxon, than for a locality near the periphery of the distribution.  And
there are a number of other factors involved in determining this
reliability - see Riedel and Westberg, 1982, DSDP vol.LXVII, p.289 [Please
excuse the egocentricism.].

Rules could be written into a program that calculated an "index of
reliability" for each determination of an earliest and latest occurrence in
each sequence, on the basis of such factors as these.  And an appropriate
set of reliability-determining factors could be developed also for
non-fossil characters used in stratigraphic correlation.  Automatically
calculated indices of reliability could greatly improve the quality of
correlations between numbers of sequences, by using some "intelligent"
weightings rather than applying simple majority rules.


Bill R.



W. Riedel
Scripps Institution of Oceanography
UCSD
La Jolla, CA 92093-0220

wriedel@ucsd.edu
phone (619) 534-4386
fax   (619) 534-0784

. . . .  May the Force be with you . . . .