THE TASKS OF IDENTIFICATION

Basic Rule and Practical Work

Many different approaches to identification have been described (e.g., Fortuner, 1993) and categorized in various ways (e.g., deterministic vs probabilistic methods). However, some biologists believe that all of these approaches are avatars of a single basic identification rule. For example, it can be said that a deterministic approach is in fact a probabilistic approach with probability = 100%.

We believe that these competing views could be reconciled by making a clear distinction between the underlying rule and the practical implementation of the rule. To make this very clear, we will talk of the practical tasks of identification, as experienced by people who do identifications - the identifiers.

When an identifier uses a dichotomous key, he is not interested in the fact that his key is in fact a probabilistic system with 100% probability. What he is interested in is to eliminate the obviously different species. We call this elimination process a task of identification. Theoreticians are welcome to analyze these tasks and try to implement them according to some fundamental rule, and in fact this is what they must do for any system to be optimized, but, beyond theories, the rules defined must help the identifiers accomplish the various tasks needed for identification.

List of Identification Tasks

Most efforts to develop identification aids start and end with the dichotomous key and its computer spin-offs, including the multiple entry key (a.k.a. polyclave, interactive key, etc.). There are many other identification tasks, starting with the most commonly used: instant recognition. Here is a preliminary list of identification tasks:

- recognition
- elimination
- comparison
- browsing
- selection

Each is briefly described below, including a definition taken from Webster's Unabridged Dictionary.

Recognition

"The identification of something as being of a certain kind"
The most direct recognition is to know at first glance the species to which a specimen belongs (as when you see a small four-legged furry animal with whiskers and you say: "This is a cat"). Recognition can also be the identification of a specimen as a member of a promorph or of a particular couplet in a well-thumbed key. A good identification system must take advantage of this very powerful task and allow the user to enter its result and jump to the relevant species or list of species.

Elimination

"To leave out of consideration"
When the identifier is faced with a list of taxa, he often starts by eliminating those that are obviously different from the unknown specimen. The operative word here is "obviously" and we believe that the very drastic task of elimination should stop when the remaining differences are not "obvious." (See Primary Identification Criteria.) With this caveat in mind, the best way to accomplish the task of elimination is to use a dichotomous key (or one of its computer equivalents), as long as the job ends at the first questionable couplet.

Comparison

"The act of considering the relation between things in order to estimate their similarities or differences"
Another task the identifier may want to do is to measure how similar or dissimilar the unknown is from the remaining species. A major difference between comparison and elimination is that no species is left out of consideration, which makes it safer to use differences that are not obvious. There are many different ways to do comparisons, using various coefficients of similarity or dissimilarity (see review in Sneath and Sokal, 1973).

Browsing

"To glance through [taxon descriptions] in a leisurely way"
In fact, this task often is the last ditch frantic attempt of the hapless identifier who has exhausted all the other approaches and failed to reach an answer. By glancing through descriptions, preferably illustrated ones, he hopes to stumble upon something that looks somewhat similar to the unknown. A better type of browsing is directed browsing where the identifier does not rely on chance alone but gives some indications to the system as to what the unknown looks like.

Selection

"To choose in preference of others"
Selection is the opposite of elimination. Instead of starting with a particular group (e.g., a promorph) and eliminating all the members of the group that are obviously different, selection builds up an ad-hoc group using a set of conditions, for example, all the members of promorph P that possess character C. This is a good example of different tasks based on the same basic rule (because selecting some taxa is the same as eliminating the other taxa), but used quite differently by the identifier.

The tasks defined above are example of broad, basic tasks that are very commonly done by identifiers. Each one can be accomplished in different ways. For example, another possible selection task would be to define the habitat of the specimens to be identified and select the species that are the most likely to be present in this habitat. This could be linked to probabilistic algorithms.

Besides tasks that are part of the identification process itself, other tasks need to be supported too, for example data entry or verification of a prospective answer.

The Perfect Identification System

The best multiple entry computerized key in the world is useless when the identifier wants to compare the unknown with a particular species. Identification is not a mechanical process. Each identifier wants to do many different things depending on each particular identification session and depending on the various phases of a session. A typical identification might start when the identifier recognizes a promorph and selects a number of species in this promorph according to a character particularly obvious in the unknown. He might then eliminate some of the selected species using other primary identification criteria. Once the list is narrowed to a few manageable species, he might want to do a comparison using a similarity coefficient. He may end the identification session by verifying that all the characters of the selected species are present in the unknown. Other sessions might require different tasks.

We believe that the relative lack of success of computerized identification methods is due in part to the fact that most (all?) restrict the user to a single task.

We advocate a multi-task identification system based on the concept of expert workstation, as defined in Diederich and Milton (1993) (see bibliography).

_____________________
Sneath, PHA, and Sokal, RR (1973). Numerical Taxonomy. San Francisco: Freeman.