Report of the
Biological Collections Data Standards Workshop
August 18-24, 1992
REPORT OF THE
BIOLOGICAL COLLECTIONS DATA STANDARDS WORKSHOP
(August 18-24, 1992)
TABLE OF CONTENTS
I. Introduction
II. An Information Model for Biological Collections
A. A Context for Information Modeling
B. Components of the Information Model
C. List of Entities
D. Entity Descriptions and Their Attributes
E. List of Relationships
F. Relationship Descriptions
APPENDIX A Information Model - Definitions and Conventions
APPENDIX B Data Standards Workshop Participants
REPORT OF THE
BIOLOGICAL COLLECTIONS DATA STANDARDS WORKSHOP
(AUGUST 18-24, 1992)
I. Introduction
The Association of Systematics Collections Committee on
Computerization and Networking met at Cornell University in
Ithaca, New York, from August 18-24, 1992. Co-chairs Julian
Humphries and Janet Gomon organized the meeting with
assistance from Elaine Hoagland of ASC. The focus of the
workshop was to initiate the process of establishing data
standards for biological collection information.
The opportunity exists for natural history museums to be at
the forefront of digital access to information about
specimens, taxa, and organismal biology. Conservation
biologists, molecular geneticists, ecologists, functional
morphologists, law enforcement officials -- the list is very
long of those who are potential users of natural history
collection information. But traditional means of access:
personal visits to collections and long diligent searches of
paper records guarantees that most of these people will find
other means of acquiring the information they need or act
with insufficient information. If these researchers could
have simple, rapid access to the huge amount of knowledge
that our collections represent, then natural history
collection institutions can be at the vanguard of
information providers.
In order to support a broad audience accessing our
collections, as well as ensure efficient access for
traditional users, certain guidelines and rules by which we
record information about collections will need to be
established. Workshop participants agreed that previous
standardization efforts had primarily focused on individual
elements of collection information and that no
interdisciplinary model of this information existed. It was
decided that for a cross-disciplinary effort to succeed, a
high-level description of biological collections was
required. Workshop participants undertook this effort by
producing a draft information model for biological
collections, described in this report.
The draft model is being circulated to scientific societies
and made available via Taxacom. Comments are welcome.
II. An Information Model for Biological Collections
A. A Context for Information Modeling
The purpose of most databases is to describe things and
processes in the real world. Descriptions of the real world
are maintained in data structures reflecting the categories
of information that are of interest to users. An
information model is a tool for designing databases and
represents a highly structured description of information in
the real world. It contains specifications for the both
data structures and the rules that must be followed to keep
the data internally consistent within a database. It allows
problem domain experts to describe the domain without
becoming programmers. Because information models describe
the real world, they are independent of the hardware or
software tools used in a particular database implementation.
They evolve or change only as the problem domain changes or
the needs of users change. Information modeling has been
proven effective in the development of numerous business and
scientific databases, and information management
professionals have come to regard modeling is the basis for
designing correct, consistent, sharable, and flexible
databases (Fleming & von Halle, 1989).
Models are particularly useful when the problem domain is
large, and when the desired database is intended to serve a
diverse community of users. In such cases, the database
design almost certainly will require input from several
experts. One of the most important reasons for building an
information model is that it allows multiple experts to
contribute to the problem description, and allows the
description to be validated or revised as necessary by
additional reviewers. Once completed, the model also serves
as a communication tool between domain experts and database
programmers.
B. Components of an Information Model
Information models typically have two components, a highly
structured textual description, and one or more
illustrations that summarize the model. The illustrations
are called entity-relationship diagrams (ERDs), and depict
the principal information entities of the problem domain, as
well as the interrelationships among them. The structural
components of an information model, including the
diagramming conventions, are defined and explained in
Appendix A.
Figure 1 represents a "first cut" at a high-level
information model for biological collection catalogs. (It
spans four pages, so we encourage readers to remove these
pages and paste them together.) The textual description of
the model follows the summary illustration and contains an
alphabetical list of entities, a definition and description
of each entity (including example data elements in some
cases), an alphabetical list of relationships, and the
relationship descriptions.
C. List of Entities (Supertypes and Subtypes in
Alphabetical Order):
AGENT
ASSOCIATED-COLLECTING-EVENT
CITATION
COLLECTING-EVENT
COLLECTING-EVENT-CITATION
COLLECTING-METHOD
COLLECTING-UNIT
COLLECTING-UNIT-CITATION
COLLECTION
COLLECTOR
DERIVED-OBJECT
DERIVED-OBJECT-TYPE
DETERMINATION
DETERMINER
ELEVATION (delete?)
ESTUARINE-HABITAT-DESCRIPTION
FRESHWATER-HABITAT-DESCRIPTION
GAZETTEER-CITATION
GEOLOGICAL-TIME-SCALE
GEOMETRIC-LOCALITY
HABITAT-DESCRIPTION
HABITAT-TYPE
LINE
LOCALITY
LOCALITY-CITATION
LOT
MARINE-HABITAT-DESCRIPTION
NAME
NAME-USE
NODE
NODE-USE
ORGANIZATION
PALEO-COLLECTING-EVENT
PERSON
PLATFORM
POINT
POLYGON
PREPARATION-TECHNIQUE
PREPARATOR
REAL-WORLD
RECENT-COLLECTING-EVENT
REFERENCE
SPECIMEN
SPECIMEN-ASSOCIATION
SPECIMEN-ASSOCIATION-TYPE
SPECIMEN-COMPONENT
SPECIMEN-COMPONENT-TYPE
STORAGE-LOCATION
STORAGE-MEDIUM
STORAGE-REGIME
TERRESTRIAL-HABITAT-DESCRIPTION
TIME
TRANSACTION
TRANSACTOR
UNSORTED-LOT
D. Entity Descriptions
Entity Name: AGENT (supertype)
Definition:
A PERSON, ORGANIZATION, or PLATFORM that performs actions on
various biological and collection entities.
Subtypes:
PERSON
ORGANIZATION
PLATFORM
Primary Key:
AGENT-ID
Foreign Keys:
Target Entity: none
Data Elements:
Example Data Elements:
AGENT-ID
AGENT-TYPE-CD
Remarks:
The subtype entities are collected into the AGENT supertype
because more than one of the AGENT subtypes may play the
same role in relationships with other entities. For
example, any combination of PERSON, ORGANIZATION and
PLATFORM may serve as a COLLECTOR in a COLLECTING-EVENT.Entity Name: ASS
Definition:
Establishes and describes a (recursive) relationship between
two COLLECTING-EVENTs.
Primary Key:
ASSOCIATED-COLLECTING-EVENT-ID
COLLECTING-EVENT-ID
Foreign Keys:
Target Entity: COLLECTING-EVENT
Data Elements: COLLECTING-EVENT-ID
Target Entity: COLLECTING-EVENT
Data Elements: ASSOCIATED-COLLECTING-EVENT-ID
Example Data Elements:
ASSOCIATED-COLLECTING-EVENT-ID
COLLECTING-EVENT-ID
ASSOCIATION-NAM
An arbitrary name given to an association of
COLLECTING-EVENTs.
Examples: Albatross; International Indian Ocean
Expedition; Bill & Ted's Excellent Adventure.
ASSOCIATION-DESCRIPTION-TXT
Describes how COLLECTING-EVENTs are related.
Examples: Same expedition; same cruise; same locality;
replicate sampling protocol.
Entity Name: CITATION (supertype)
Definition:
Subtypes:
SPECIMEN-CITATION
LOCALITY-CITATION
GAZETTEER-CITATION
COLLECTING-EVENT-CITATION
ETC.
Primary Key:
CITATION-ID
Foreign Keys:
Target Entity: REFERENCE
Data Elements: REFERENCE-ID
Example Data Elements:
CITATION-ID
CITATION-TYPE-CD
REFERENCE-ID
Entity Name: COLLECTING-EVENT (supertype)
Definition:
The act of collecting zero or more COLLECTING-UNITs at a
particular LOCALITY and TIME.
Subtypes:
PALEO-COLLECTING-EVENT
RECENT-COLLECTING-EVENT
Primary Key:
COLLECTING-EVENT-ID
A unique tag (surrogate key) to allow other entities to
connect to COLLECTING-EVENT.
Foreign Keys:
Target Entity: LOCALITY
Data Elements: LOCALITY-ID
Example Data Elements:
COLLECTING-EVENT-ID
A unique tag to allow other entities to connect to
COLLECTING-EVENT.
COLLECTING-EVENT-TYPE-CD
A classification attribute, indicating the type (kind)
of COLLECTING-EVENT.
STATED-TIME-TXT
Specification of points and/or intervals of time in
absolute or indefinite units, or relative to each
other.
Examples: evening; late 1980s; spring; March or June;
17:56:01, 12 JUN 1992; three hours after second dredge
haul.
STATED-LOCALITY-TXT
Original statement (literal quotation) of the location
of the COLLECTING-EVENT.
COLLECTING-EVENT-COMMENTS-TXT
(Unstructured text.)
Example: Nothing was collected at this station; dredge
not adequately cleaned between hauls.
Entity Name: COLLECTING-EVENT-CITATION
Definition:
A subtype of REFERENCE, which associates a REFERENCE work
with a COLLECTING-EVENT.
Primary Key:
CITATION-ID
Foreign Keys:
Target Entity: COLLECTING-EVENT
Data Elements: COLLECTING-EVENT-ID
Example Data Elements:
Entity Name: COLLECTING-METHOD
Description:
A description of the technique(s), equipment, and/or
process(es) by which COLLECTING-UNITs are collected.
Primary Key:
COLLECTING-METHOD-ID
Foreign Keys: none
Example Data Elements:
COLLECTING-METHOD-ID
COLLECTING-METHOD-DESCRIPTION-TXTEntity Name: COLLECTING-UNIT (super
Definition:
An operational sample, typically, but not necessarily a
SPECIMEN, LOT, or a DERIVED-OBJECT from a single COLLECTING-
EVENT. A COLLECTING-UNIT may be an UNSORTED-LOT, LOT,
SPECIMEN, SPECIMEN-COMPONENT, or DERIVED-OBJECT.
Subtypes:
UNSORTED-LOT
LOT
SPECIMEN
SPECIMEN-COMPONENT
DERIVED-OBJECT
Primary Key:
COLLECTING-UNIT-ID
Foreign Keys:
Target Entity: COLLECTING-EVENT
Data Elements: COLLECTING-EVENT-ID
Target Entity: HABITAT-DESCRIPTION
Data Elements: HABITAT-DESCRIPTION-ID
Example Data Elements:
COLLECTING-UNIT-ID
COLLECTING-UNIT-TYPE-CD
NUMBER-OF-ITEMS-CNT
HABITAT-DESCRIPTION-ID
(or SPECIMEN-RELATED-HABITAT-DESCRIPTION-ID if a
separate entity is used for finer-scale descriptions)
Entity Name: COLLECTION
Definition:
An assemblage of biological specimens maintained by an
educational or research institution to be used as a research
resource in biological systematics and/or ecology.
Primary Key:
COLLECTION-ID
Foreign Keys:
ORGANIZATION-ID
Example Data Elements:
COLLECTION-ID
COLLECTION-NAM
ORGANIZATION-ID
Entity Name: COLLECTOR
Definition:
A person, platform, or organization (AGENT) that collects
biological collecting units.
This isn't a real entity; it duplicates the AGENT entity
Primary Key:
COLLECTOR-ID
Foreign Keys: none
Example Data Elements:
Entity Name: DERIVED-OBJECT
Definition:
A COLLECTING-UNIT of one or more observations, images, or
representations of UNSORTED LOTs, LOTs, SPECIMENs,
ASSOCIATED SPECIMENs or SPECIMEN-COMPONENTs.
Primary Key:
DERIVED-OBJECT-ID (=COLLECTING-UNIT-ID)
Foreign Keys:
Target Entity: COLLECTING-UNIT
Data Elements: ORIGINAL-COLLECTING-UNIT-ID
Target Entity: DERIVED-OBJECT-TYPE
Data Elements: DERIVED-OBJECT-TYPE-ID
Example Data Elements:
DERIVED-OBJECT-ID (=COLLECTING-UNIT-ID)
DERIVED-OBJECT-TYPE-ID
ORIGINAL-COLLECTING-UNIT-ID (=COLLECTING-UNIT-ID)
Entity Name: DERIVED-OBJECT-TYPE
Definition:
The class of DERIVED-OBJECTs obtained from a COLLECTING-
UNIT.
Primary Key:
DERIVED-OBJECT-TYPE-ID
Foreign Keys:
Target Entity:
Data Elements:
Example Data Elements:
DERIVED-OBJECT-TYPE-ID
DERIVED-OBJECT-TYPE-NAME
DERIVED-OBJECT-TYPE-DESCRIPTION-TXT
Entity Name: DETERMINATION
Definition:
An association of a COLLECTING-UNIT with a NAME by an
authority at a particular time.
Primary Key:
COLLECTING-UNIT-ID
NAME-ID
DETERMINATION-DAT
Foreign Keys:
Target Entity: COLLECTING-UNIT
Data Elements: COLLECTING-UNIT-ID
Target Entity: NAME
Data Elements: NAME-ID
Example Data Elements:
COLLECTING-UNIT-ID
NAME-ID
DETERMINER-ID (=AGENT-ID)
Entity Name: DETERMINER
Definition: An authority (person) that makes the
association between a taxon name and a
COLLECTING-UNIT
Primary Key: DETERMINER-ID (=AGENT-ID)
Foreign Keys: none
Example Data Elements:
Same as PERSON
Entity Name: ELEVATION
Definition: Is this a real entity; or is it the Z
coordinate of any point contained in one of
the 3 subtypes of geometric locality subtypes
(point, line, or polygon)? -- expressed as
deviation from sea-level in meters;
different from STATED-ELEVATION, which may be
in other units, or expressed as a range.
Primary Key:
Foreign Keys:
Example Data Elements:
Entity Name: ESTUARINE-HABITAT-DESCRIPTION
Definition:
Primary Key:
HABITAT-DESCRIPTION-ID
Foreign Keys: none
Example Data Elements:
Entity Name: FRESHWATER-HABITAT-DESCRIPTION
Definition:
Primary Key:
HABITAT-DESCRIPTION-ID
Foreign Keys: none
Example Data Elements:
Entity Name: GAZETTEER-CITATION
Definition: A named place.
Primary Key:
GAZETTEER-CITATION-ID
Foreign Keys:
Target Entity: GAZETTEER-CITATION
Data Elements: CONTAINING-GAZETTEER-CITATION-ID
Example Data Elements:
GAZETTEER-CITATION-ID
CONTAINING-GAZETTEER-CITATION-ID
PLACE-NAM
PLACE-TYPE-CD
Entity Name: GEOLOGICAL-TIME-SCALE
Definition:
Primary Key:
Foreign Keys:
Target Entity:
Data Elements:
Example Data Elements:
Entity Name: GEOMETRIC-LOCALITY
Definition:
A geographical location defined (in a standard coordinate
system) by a point, line, or polygon.
Primary Key: LOCALITY-ID
Foreign Keys: none
Example Data Elements:
Entity Name: HABITAT-DESCRIPTION
Definition:
A description of the physical and biotic environment at the
time and place of a COLLECTING-EVENT.
Subtypes:
TERRESTRIAL-HABITAT-DESCRIPTION
FRESHWATER-HABITAT-DESCRIPTION
MARINE-HABITAT-DESCRIPTION
ESTAURINE-HABITAT-DESCRIPTION
ETC.
Primary Key:
HABITAT-DESCRIPTION-ID
Foreign Keys: none
Example Data Element Groups:
Geomorphic Features
Physical/Chemical Measurements
Sampling Scale
Meteorological Data
Life Zone
Vegetation Type
Soil Type
Entity Name: HABITAT-TYPE
Definition: An
Primary Key:
HABITAT-TYPE-ID
Foreign Keys:
Example Data Elements:
HABITAT-TYPE-ID
HABITAT-TYPE-NAM
HABITAT-TYPE-DEFINITION-TXT
Entity Name: LINE
Definition:
A complex data type including latitude, longitude, and
direction.
(Are we really talking about a chain of points, each with a
three dimensional location? We might consult David Mark
about this.)
Primary Key:
LOCALITY-ID
Foreign Keys: none
Example Data Elements: (as represented in a GIS)
Entity Name: LOCALITY
Definition:
A geographical mappable location.
Primary Key:
LOCALITY-ID
Foreign Keys:
Example Data Elements:
LOCALITY-ID
LOCALITY-NAME
Entity Name: LOCALITY-CITATION
Definition:
An association between a reference and a locality.
Primary Key:
CITATION-ID
Foreign Keys:
Target Entity: LOCALITY
Data Elements: LOCALITY-ID
Example Data Elements:
Entity Name: LOT
Definition:
A COLLECTING-UNIT of one or more individuals of the same
taxon from the same COLLECTING-EVENT.
Primary Key:
LOT-ID (=COLLECTING-UNIT-ID)
Foreign Keys:
Target Entity: UNSORTED-LOT
Data Elements: UNSORTED-LOT-ID (=COLLECTING-UNIT-ID)
Example Data Elements:
COLLECTING-UNIT-ID
AGE-RANGE
SIZE-RANGE
DISTRIBUTION-OF-DUPLICATES
STAGES
SEXES
Entity Name: MARINE-HABITAT-DESCRIPTION
Definition:
Primary Key:
MARINE-HABITAT-DESCRIPTION-ID (=HABITAT-DESCRIPTION-ID)
Foreign Keys: none
Example Data Elements:
Entity Name: NAME
Definition:
Literature citation for the original source of a name.
Primary Key:
NAME-ID
Foreign Keys:
Target Entity: REFERENCE
Data Elements: REFERENCE-ID
Example Data Elements:
NAME-ID
NAME-NAM
AUTHOR-NAM
PAGE-CNT
REFERENCE-ID
(Original-Rank) not needed if we assume every name is
represented by a name use (classification).
Entity Name: NAME-USE
Definition:
The application of a NAME (including a synonym) to a NODE-
USE.
Primary Key:
REFERENCE-ID
NODE-ID
NAME-ID
Foreign Keys:
Target Entity:
Data Elements:
Example Data Elements:
RANK-CD
NAME-STATUS-CD
Entity Name: NODE
Definition:
A vertex on a directed acyclic or cyclic graph; a tip or
junction in a classification hierarchy or overlapping
classification hierarchies.
Primary Key:
NODE-ID
Foreign Keys: none
Example Data Elements:
NODE-ID
Entity Name: NODE-USE
Definition:
Any particular instance (a published reference) of a NODE is
a placeholder for the name of a taxon in a classification
hierarchy.
Primary Key:
REFERENCE-ID
NODE-ID
Foreign Keys:
Target Entity: NODE-USE
Data Elements: PARENT-NODE-REFERENCE-ID
PARENT-NODE-ID
Example Data Elements:
REFERENCE-ID
NODE-ID
PARENT-NODE-REFERENCE-ID
PARENT-NODE-ID
Entity Name: ORGANIZATION
Definition:
Primary Key:
ORGANIZATION-ID (=AGENT-ID)
Foreign Keys: none
Example Data Elements:
ORGANIZATION-ID (=AGENT-ID)
ACRONYM-CD
DEPARTMENT-NAM
INSTITUTION-NAM
Entity Name: PALEO-COLLECTING-EVENT
Definition:
Primary Key:
PALEO-COLLECTING-EVENT-ID (=COLLECTING-EVENT-ID)
Foreign Keys:
Target Entity: LOCALITY
Data Elements: LOCALITY-ID
Example Data Elements:
Bed
Stated-Age
Dating-Method
Lithology
Entity Name: PERSON
Definition:
Primary Key:
PERSON-ID (=AGENT-ID)
Foreign Keys: none
Example Data Elements:
PERSON-ID (=AGENT-ID)
LAST-NAM
FIRST-NAM
TITLE-TXT
Entity Name: PLATFORM
Definition:
Primary Key:
PLATFORM-ID (=AGENT-ID)
Foreign Keys:
Target Entity: (We may wish to record a relationship between
PLATFORM and ORGANIZATION.)
Data Elements:
Example Data Elements:
PLATFORM-ID (=AGENT-ID)
PLATFORM-NAM
Entity Name: POINT
Definition:
Latitude, longitude, elevation.
Primary Key:
LOCALITY-ID
Foreign Keys: none
Example Data Elements:
LATITUDE-DEGREES
LATITUDE-MINUTES
LATITUDE-SECONDS
LATITUDE-DIRECTION
LONGITUDE-DEGREES
LONGITUDE-MINUTES
LONGITUDE-SECONDS
LONGITUDE-DIRECTION
ACCURACY
ELEVATION
Entity Name: POLYGON
Definition:
A complex data type with an array of latitude and longitude
data.
Primary Key:
LOCALITY-ID
Foreign Keys: none
Example Data Elements:
Accuracy
Entity Name: PREPARATION-TECHNIQUE
Definition:
An action taken to develop or preserve a specimen that
departs from, or goes beyond the standard processing of a
specimen. A preparation technique may produce a derived
object.
Primary Key:
COLLECTING-UNIT-ID
PREPARATION-TECHNIQUE-ID
Foreign Keys:
Target Entity: COLLECTING-UNIT
Data Elements: COLLECTING-UNIT-ID
Target Entity: PREPARATOR (=AGENT)
Data Elements: PREPARATOR-ID (=AGENT-ID)
Example Data Elements:
COLLECTING-UNIT-ID
PREPARATION-TECHNIQUE-ID
PREPARATOR-ID
Entity Name: PREPARATOR
Definition:
An agent that performs a preparation technique.
Primary Key:
PREPARATOR-ID (=AGENT-ID)
Foreign Keys: none
Example Data Elements:
PREPARATOR-ID (=AGENT-ID)
Entity Name: REAL-WORLD
Definition: Biological entities subject to a COLLECTING-EVENT,
DETERMINATION or other actions.
Primary Key:
REAL-WORLD-ID
Foreign Keys:
Target Entity:
Data Elements:
Example Data Elements:
Entity Name: RECENT-COLLECTING-EVENT
Definition:
Primary Key:
RECENT-COLLECTING-EVENT-ID (=COLLECTING-EVENT-ID)
Foreign Keys:
Target Entity: HABITAT-DESCRIPTION
Data Elements: RECENT-HABITAT-DESCRIPTION-ID
(=HABITAT-DESCRIPTION-ID)
Example Data Elements:
Entity Name: REFERENCE
Definition:
A published or unpublished work that contains information
about a biological collection entity.
Examples: an article, book, occasional report, field notes,
map, catalog, etc.
Primary Key:
REFERENCE-ID
Foreign Keys:
None
Example Data Elements:
REFERENCE-KIND-CODE
REFERENCE-DESCRIPTION-TEXT
REFERENCE-AUTHOR-NAME
REFERENCE-PUBLISHED-DATE
REFERENCE-TITLE-TEXT
REFERENCE-JOURNAL-NAME
REFERENCE-VOLUME-IDENTIFIER
REFERENCE-ISSUE-IDENTIFIER
REFERENCE-PAGES-IDENTIFIER
REFERENCE-PUBLISHER-NAME
REFERENCE-PUBLISHER-CITY-NAME
Entity Name: SPECIMEN
Definition:
A COLLECTING-UNIT of one or more individuals or parts of
individuals from a single COLLECTING-EVENT.
Primary Key:
SPECIMEN-ID (=COLLECTING-UNIT-ID)
Foreign Keys:
Target Entity: LOT
Data Elements: LOT-ID (=COLLECTING-UNIT-ID)
Example Data Elements:
SPECIMEN-ID
SPECIMEN-SEX-CD
SPECIMEN-PHENOLOGY-CD
SPECIMEN-LIFE-STAGE-CD
SPECIMEN-STANDARD-LENGTH-DMSN
SPECIMEN-AGE-QTY
Entity Name: SPECIMEN-ASSOCIATION
Definition:
An association of COLLECTION-UNITs, either SPECIMENs, LOTs,
UNSORTED LOTs.
Primary Key:
COLLECTING-UNIT-ID
ASSOCIATED-COLLECTING-UNIT-ID
Foreign Keys:
Target Entity: COLLECTING-UNIT
Data Elements: COLLECTING-UNIT-ID
Target Entity: ASSOCIATED-COLLECTING-UNIT
(=COLLECTING-UNIT)
Data Elements: ASSOCIATED-COLLECTING-UNIT-ID
(=COLLECTING-UNIT-ID)
Target Entity: SPECIMEN-ASSOCIATION-TYPE
Data Elements: SPECIMEN-ASSOCIATION-TYPE-ID
Example Data Elements:
Entity Name: SPECIMEN-ASSOCIATION-TYPE
Definition:
A kind of SPECIMEN-ASSOCIATION.
Primary Key:
SPECIMEN-ASSOCIATION-TYPE-ID
Foreign Keys: none
Example Data Elements:
Entity Name: SPECIMEN-CITATION
Definition:
Associates a specimen and a reference work, and describes
the relationship between them.
Subtypes:
TYPE-SPECIMEN-CITATION
Primary Key:
CITATION-ID
Foreign Keys:
Target Entity: Specimen
Data Elements: SPECIMEN-ID
Example Data Elements:
SPECIMEN-CITATION-KIND-CODE
SPECIMEN-CITATION-PAGE-ID
SPECIMEN-CITATION-PLATE-ID
SPECIMEN-CITATION-FIGURE-ID
SPECIMEN-CITATION-REMARKS-TXT
Entity Name: SPECIMEN-COMPONENT
Definition:
A COLLECTING-UNIT of individual organisms or parts of
individual organisms from a single COLLECTING-EVENT.
Primary Key:
SPECIMEN-COMPONENT-ID (=COLLECTING-UNIT-ID)
Foreign Keys:
Target Entity: SPECIMEN (=COLLECTING-UNIT)
Data Elements: SPECIMEN-ID (=COLLECTING-UNIT-ID)
Target Entity: SPECIMEN-COMPONENT-TYPE
Data Elements: SPECIMEN-COMPONENT-TYPE-ID
Example Data Elements:
SPECIMEN-COMPONENT-ID (=COLLECTING-UNIT-ID)
SPECIMEN-ID (=COLLECTING-UNIT-ID)
SPECIMEN-COMPONENT-TYPE-ID
Entity Name: SPECIMEN-COMPONENT-TYPE
Definition:
A class or kind of SPECIMEN-COMPONENT.
Primary Key:
SPECIMEN-COMPONENT-TYPE-ID
Foreign Keys: none
Example Data Elements:
SPECIMEN-COMPONENT-TYPE-ID
SPECIMEN-COMPONENT-TYPE-NAM
Entity Name: STORAGE-LOCATION
Definition:
The physical location of a COLLECTING-UNIT in relation to a
COLLECTION.
Primary Key:
STORAGE-LOCATION-ID
Foreign Keys: none
Example Data Elements:
STORAGE-LOCATION-ID
STORAGE-LOCATION-DESCRIPTION (a local issue)
Entity Name: STORAGE-MEDIUM
Definition:
The physical medium, container, mount, used for the STORAGE-
REGIME OF COLLECTING-UNITs.
Primary Key:
STORAGE-MEDIUM-ID
Foreign Keys: none
Example Data Elements:
STORAGE-MEDIUM-ID
STORAGE-MEDIUM-CD
e.g.: sheet, packet, box, jar, case, shelf, vial
Entity Name: STORAGE-REGIME
Definition:
The physical location, kind of storage and availability of a
COLLECTING-UNIT in relation to a COLLECTION.
Primary Key:
STORAGE-REGIME-ID
Foreign Keys:
Target Entity: STORAGE-LOCATION
Data elements: STORAGE-LOCATION-ID
Target Entity: STORAGE-MEDIUM
Data elements: STORAGE-MEDIUM-ID
Example Data Elements:
STORAGE-REGIME-ID
STORAGE-LOCATION-ID
STORAGE-MEDIUM-ID
START-DAT
END-DAT
AUTHORITY-NAM
COMMENTS-TXT
Entity Name: TERRESTRIAL-HABITAT-DESCRIPTION
Definition:
Primary Key:
Foreign Keys: none
Target Entity: RECENT-COLLECTING-EVENT
Data elements: RECENT-COLLECTING-EVENT-ID
(=COLLECTING-EVENT-ID)
Example Data Elements:
Entity Name: TIME
Definition:
Translation of Stated-Time into instant(s) and/or
duration(s) of calendar time, where such can be made
unambiguously.
Primary Key:
TIME-ID
Foreign Keys: (Some confusion exists around cardinalities in the
relationship between TIME and COLLECTING-EVENT,
and therefore, placement of the foreign key)
Example Data Elements:
CLOCK-TIME-QTY (point)
START-CLOCK-TIME-QTY
END-CLOCK-TIME-QTY
CLOCK-TIME-QUALIFIER-CD
TIME-ZONE-CD (determined from location and date)
DATE
START-DAT
END-DAT
DATE-QUALIFIER-CD
Entity Name: TRANSACTION
Definition:
An action that changes the location, physical custody or
ownership status of a COLLECTING-UNIT.
Primary Key:
Foreign Keys: none
Example Data Elements:
Entity Name: TRANSACTOR
Definition:
An agent that performs TRANSACTIONs.
Primary Key:
TRANSACTOR-ID (=AGENT-ID)
Foreign Keys: none
Example Data Elements:
Entity Name: UNSORTED-LOT
Definition:
A COLLECTING-UNIT of mixed taxa from a single COLLECTING-
EVENT.
Primary Key:
UNSORTED-LOT-ID (=COLLECTING-UNIT-ID)
Foreign Keys:
Target Entity: ORIGINAL-UNSORTED-LOT (=COLLECTING-UNIT)
Data Elements: ORIGINAL-UNSORTED-LOT-ID (=COLLECTING-UNIT-
ID)
Example Data Elements:
UNSORTED-LOT-ID (=COLLECTING-UNIT-ID)
ORIGINAL-UNSORTED-LOT-ID (=COLLECTING-UNIT-ID)
E. LIST OF ENTITY RELATIONSHIPS
ENTITY relationship ENTITY
AGENT writes, edits, or publishes REFERENCE
COLLECTING-EVENT refers to COLLECTING-EVENT
COLLECTING-EVENT takes place at LOCALITY
COLLECTING-EVENT involves REAL-WORLD
COLLECTING-EVENT occurs in TIME
COLLECTING-EVENT-CITATION refers to COLLECTING-EVENT
COLLECTING-METHOD is used in COLLECTING-EVENT
COLLECTING-UNIT results from COLLECTING-EVENT
COLLECTING-UNIT refers to COLLECTING-UNIT
COLLECTING-UNIT has STORAGE-REGIME
COLLECTING-UNIT-CITATION refers to COLLECTING-UNIT
COLLECTOR (=AGENT) participates in COLLECTING-EVENT
DERIVED-OBJECT is derived from COLLECTING-UNIT
DERIVED-OBJECT-TYPE validates DERIVED-OBJECT
DETERMINATION is made on COLLECTING-UNIT
DETERMINATION involves NAME-USE
DETERMINER (=AGENT) makes DETERMINATION
GAZETTEER-CITATION is contained within GAZETTEER-CITATION
LOCALITY is closest to/contained within GAZETTEER-CITATION
LOCALITY is bounded by GEOLOGICAL-TIME-SCALE
LOCALITY-CITATION refers to LOCALITY
LOT is sorted from UNSORTED-LOT
NAME is based on COLLECTING-UNIT
NAME is used in NAME-USE
NAME-USE applies to NODE-USE
NODE-USE is contained in (parent-)NODE-USE
NODE-USE involves NODE
PALEO-COLLECTING-EVENT is bounded by GEOLOGICAL-TIME-SCALE
PREPARATION-TECHNIQUE is applied to a COLLECTING-UNIT
PREPARATOR (=AGENT) uses PREPARATION-TECHNIQUE
RECENT-COLLECTING-EVENT is described by HABITAT-DESCRIPTION
REFERENCE contains CITATION
REFERENCE establishes NAME
REFERENCE contains NAME-USE
REFERENCE contains NODE-USE
SPECIMEN is sorted from LOT
SPECIMEN-ASSOCIATION-TYPE validates SPECIMEN-ASSOCIATION
SPECIMEN-COMPONENT-TYPE validates SPECIMEN-COMPONENT
SPECIMEN-COMPONENT is derived from SPECIMEN
STORAGE-REGIME uses STORAGE-MEDIUM
STORAGE-REGIME has STORAGE-LOCATION
TRANSACTION involves COLLECTING-UNIT
TRANSACTOR (=AGENT) ? COLLECTION
TRANSACTOR (=AGENT) participates in TRANSACTION
F. RELATIONSHIP DESCRIPTIONS
Relationship: AGENT <> REFERENCE
Each AGENT writes, edits, or publishes zero to many
REFERENCEs.
Each REFERENCE is written, edited, or published by one to
many AGENTs.
Relationship: COLLECTING-EVENT <> COLLECTING-EVENT
Each COLLECTING-EVENT is associated with zero to many
COLLECTING-EVENTS.
Relationship: COLLECTING-EVENT <> LOCALITY
Each COLLECTING EVENT takes place at one and only one
LOCALITY.
Each LOCALITY may be the subject of one or more COLLECTING
EVENTs.
Relationship: COLLECTING-EVENT <> REAL-WORLD
Each COLLECTING-EVENT involves one and only one REAL-WORLD.
Each (the?) REAL-WORLD is subject to zero to many
COLLECTING-EVENTS
Relationship: COLLECTING-EVENT <> TIME
Each COLLECTING-EVENT occurs at, or spans a range of, one
and only one TIME.
Each point or range of TIME may encompass zero to many
COLLECTING-EVENTS.
Relationship: COLLECTING-EVENT-CITATION <> COLLECTING-EVENT
Each COLLECTING-EVENT-CITATION refers to one and only one
COLLECTING-EVENT.
Each COLLECTING-EVENT is referenced in zero to many
COLLECTING-EVENT-CITATIONS.
Relationship: COLLECTING-METHOD <> COLLECTING-EVENT
Each COLLECTING-METHOD is used in zero to many COLLECTING-
EVENTS.
Each COLLECTING-EVENT is conducted using one to many
COLLECTING-METHODS.
Relationship: COLLECTING-UNIT <> COLLECTING-EVENT
Each COLLECTING-UNIT results from one and only one
COLLECTING-EVENT.
Each COLLECTING-EVENT produces zero to many COLLECTING-
UNITs.
In several disciplines there is no distinction between a
"biological" individual and specimen; a single individual
can be collected only once. In other disciplines (e.g.,
Botany, Vertebrate Paleontology, Invertebrate Zoology), it
is possible to collect only part of an individual in one
event, and return to collect additional parts or samples
later. Whether a SPECIMEN may be collected in more than one
COLLECTING-EVENT will depend on the definition (scope) of
COLLECTING-UNIT.
Relationship: COLLECTING-UNIT <> COLLECTING-UNIT
Each COLLECTING-UNIT is associated with zero to many
COLLECTING-UNITS (through the SPECIMEN-ASSOCIATION entity).
Relationship: COLLECTING-UNIT <> STORAGE-REGIME
Each COLLECTING-UNIT has one and only one STORAGE-REGIME.
Each STORAGE-REGIME may involve zero to many COLLECTING-
UNITS.
Relationship: COLLECTING-UNIT-CITATION <> COLLECTING-UNIT
Each COLLECTING-UNIT-CITATION refers to one and only one
COLLECTING-UNIT.
Each COLLECTING-UNIT may be referred to by zero to many
COLLECTING-UNIT-CITATIONs.
Relationship: COLLECTOR (=AGENT) <> COLLECTING-EVENT
Each COLLECTOR participates in one or more COLLECTING-
EVENTs.
Each COLLECTING-EVENT is conducted by one or more
COLLECTORs.
Relationship: DERIVED-OBJECT <> COLLECTING-UNIT
Each DERIVED-OBJECT is made from one and only one
COLLECTING-UNIT.
Each COLLECTING-UNIT produces zero to many DERIVED-OBJECTS.
Relationship: DERIVED-OBJECT-TYPE <> DERIVED-OBJECT
Each DERIVED-OBJECT-TYPE validates zero to many DERIVED-
OBJECTs.
Each DERIVED-OBJECT is validated by one and only one
DERIVED-OBJECT-TYPE.
Relationship: DETERMINATION <> COLLECTING-UNIT
Each DETERMINATION is made on one and only COLLECTING-UNIT.
Each COLLECTING-UNIT may be have one to many DETERMINATIONs.
Relationship: DETERMINATION <> NAME-USE
Each DETERMINATION involves one and only one NAME-USE.
Each NAME-USE is involved in zero to many DETERMINATIONS.
Relationship: DETERMINER (=AGENT, PERSON) <> DETERMINATION
Each DETERMINER makes zero to many DETERMINATIONs.
Each DETERMINATION is made by one and only one DETERMINER.
Relationship: GAZETTEER-CITATION <> GAZETTEER-CITATION
Each GAZETTEER-CITATION is contained in one and only one
GAZETTEER-CITATION.
Each GAZETTEER-CITATION contains zero to many GAZETTEER-
CITATIONS
This relationship represents a set-subset hierarchy among
named places in a gazetteer. A strict hierarchical
representation may not be adequate for our purposes as it
represents only the "contains/is contained in" relationship
between places. The relationship does not encompass the
"overlaps" and "is adjacent to" relationships.
Relationship: LOCALITY <> GAZETTEER-CITATION
Each LOCALITY is closest to or contained within one and only
one GAZETTEER-CITATION.
Each GAZETTEER-CITATION is close to or contains zero to many
LOCALITYs.
The "close to" and "contains" relationships are semantically
distinct, not mutually exclusive, and the related objects
may be different. Therefore, they should be represented in
the model as two distinct relationships between the same
entities (this is allowed).
Relationship: LOCALITY <> GEOLOGICAL-TIME-SCALE
Each LOCALITY is bounded by zero to many (points on the)
GEOLOGICAL-TIME-SCALE.
Each point on the GEOLOGICAL-TIME-SCALE limits (the range
of) zero to many LOCALITYs.
Relationship: LOCALITY-CITATION <> LOCALITY
Each LOCALITY is cited in zero to many LOCALITY-CITATIONs.
Each LOCALITY-CITATION refers to one and only one LOCALITY.
Relationship: LOT <> UNSORTED-LOT
Each LOT is sorted from zero or one UNSORTED-LOT.
Each UNSORTED-LOT is sorted into zero to many LOTs.
Relationship: NAME <> COLLECTING-UNIT
Each NAME is based on zero to many COLLECTING-UNITS.
Each COLLECTING-UNIT is the type specimen(s) for zero to
many NAMEs. (Names based on the same type are objective
synonyms.)
Relationship: NAME <> NAME-USE
Each NAME is used in one to many NAME-USEs.
Each NAME-USE uses one and only one NAME.
Relationship: NAME-USE <> NODE-USE
Each NAME-USE applies to one and only one NODE-USE.
Each NODE-USE has one to many NAME-USEs.
Relationship: NODE-USE <> PARENT-NODE-USE
Each NODE-USE is contained in one and only one (parent-
)NODE-USE.
Each (parent)-NODE-USE contains zero to many NODE-USEs.
Relationship: NODE-USE <> NODE
Each NODE-USE involves one and only one NODE.
Each NODE may be involved in one to many NODE-USEs.
Relationship: PALEO-COLLECTING-EVENT <> GEOLOGICAL-TIME-SCALE
Each PALEO-COLLECTING-EVENT is bounded by zero to many
(points on the) GEOLOGICAL-TIME-SCALE.
Each (point on the) GEOLOGICAL-TIME-SCALE limits (the range
of) zero to many PALEO-COLLECTING-EVENTs.
Relationship: PREPARATION-TECHNIQUE <> COLLECTING-UNIT
Each PREPARATION-TECHNIQUE is performed on one and only one
COLLECTING-UNIT.
Each COLLECTING-UNIT is prepared in one to many PREPARATION-
TECHNIQUEs.
Relationship: PREPARATOR (=AGENT) <> PREPARATION-TECHNIQUE
Each PREPARATOR (=AGENT) uses one and only one PREPARATION-
TECHNIQUE.
Each PREPARATION-TECHNIQUE is performed by one and only one
PREPARATOR.
Relationship: RECENT-COLLECTING-EVENT <> HABITAT-DESCRIPTION
RECENT-COLLECTING-EVENT is described by zero to many
HABITAT-DESCRIPTIONs.
Each HABITAT-DESCRIPTION describes one and only one RECENT-
COLLECTING-EVENT.
Relationship: REFERENCE <> CITATION
Each REFERENCE contains one to many CITATIONs.
Each CITATION is contained in one and only one REFERENCE
Relationship: REFERENCE <> NAME
Each REFERENCE contains zero to many NAMEs.
Each NAME is contained in one and only one REFERENCE.
Relationship: REFERENCE <> NAME-USE
Each REFERENCE contains zero to many NAME-USEs.
Each NAME-USE is contained in one and only one REFERENCE.
Relationship: REFERENCE <> NODE-USE
Each REFERENCE contains zero to many NODE-USEs.
Each NODE-USE is contained in one and only one REFERENCE.
Relationship: SPECIMEN <> LOT
Each SPECIMEN is sorted from zero to one LOT.
Each LOT is sorted into zero to one SPECIMENs.
Relationship: SPECIMEN-ASSOCIATION-TYPE <> SPECIMEN-ASSOCIATION
Each SPECIMEN-ASSOCIATION-TYPE validates zero to many
SPECIMEN-ASSOCIATIONs.
Each SPECIMEN-ASSOCIATION is validated by one and only one
SPECIMEN-ASSOCIATION-TYPE.
Relationship: SPECIMEN-COMPONENT-TYPE <> SPECIMEN-COMPONENT
Each SPECIMEN-COMPONENT-TYPE validates zero to many
SPECIMEN-COMPONENTs.
Each SPECIMEN-COMPONENT is validated by one and only one
SPECIMEN-COMPONENT-TYPE.
Relationship: SPECIMEN-COMPONENT <> SPECIMEN
Each SPECIMEN-COMPONENT is derived from one and only one
SPECIMEN.
Each SPECIMEN is represented by zero to many SPECIMEN-
COMPONENTS.
Relationship: STORAGE-REGIME <> STORAGE-MEDIUM
Each STORAGE-REGIME uses one and only one STORAGE-MEDIUM.
Each STORAGE-MEDIUM is used in zero to many STORAGE-REGIMEs.
Relationship: STORAGE-REGIME <> STORAGE-LOCATION
Each STORAGE-REGIME has one and only one STORAGE-LOCATION.
Each STORAGE-LOCATION may be involved in one to many
STORAGE-REGIMES.
Relationship: TRANSACTION <> COLLECTING-UNIT
Each TRANSACTION involves one to many COLLECTING-UNITs.
Each COLLECTING-UNIT is involved in one to many
TRANSACTIONs.
Relationship: TRANSACTOR (=AGENT) <> COLLECTION
TRANSACTOR ? COLLECTION
Relationship: TRANSACTOR (=AGENT) <> TRANSACTION
Each TRANSACTOR participates in zero to many TRANSACTIONs.
Each TRANSACTION is conducted by zero to many TRANSACTORs.
APPENDIX A
INFORMATION MODEL - DEFINITIONS AND CONVENTIONS
Information models typically have two components, a structured
textual description, and one or more illustrations that summarize
the model. The illustrations are called entity-relationship
diagrams (ERDs), and depict the principal entities of the problem
domain, as well as the interrelationships among them.
Figure 2 illustrates the two basic components of an ERD:
entities (boxes) and relationships (the lines between the boxes).
An entity is a grouping of people, places, physical objects,
events, actions, or even concepts that can be described by the
same information categories or attributes. Example entities from
biological collections might include SPECIMEN, COLLECTING-EVENT,
and LOCALITY. Individual things or events, etc, that comprise an
entity are called instances (not depicted because a model focuses
is on generalities). In a relational database implementation of
an information model, entities and attributes translate into data
tables and their associated data fields; instances translate into
the rows of a table.
The attributes of an entity are the place holders for data. They
are important not only because they flesh-out the information of
interest, but because restrictions on the values attributes may
or must contain ultimately affects the scope and definition of
the entity. The first restriction on attributes is that they
must be single-valued at any given time. If an attribute
legitimately may have many values simultaneously (that a list of
values needs to be recorded at a given time for a given
instance), the supposed attribute probably isn't an attribute in
the context of an information model, but is rather another entity
or a relationship. The convention followed in information
modeling is to remove multi-valued attributes into their own
entities.
Another important aspect of entity attributes, is that, for each
entity, a combination of attributes must be identified or chosen
that distinguishes every instance in the entity. Every instance
must be uniquely identified. This rule grounds the model in
reality. If information is to be recorded about a thing in the
real world, the thing must be identifiable. The identifying
information and descriptive information must always be
associated. The identifying attributes of an entity are called
its primary key. Any attribute that is part of the primary key
must always be populated with data for every valid instance; it
can never be blank.
Repeated interactions or associations among the things in the
real world are represented in the model as relationships.
Relationships are depicted as lines between entities. Note that
while relationships connect entities in the diagram, they are
understood to represent possible associations between instances
contained in the entities. Relationships are instance-to-
instance, rather than group-to-group. Example relationships from
biological collections might include (expressed in words):
1) a SPECIMEN is collected in a COLLECTING-EVENT, and
2) a COLLECTING-EVENT occurs at a LOCALITY.
Relationships between instances are not always one to one. For
example, a single COLLECTING-EVENT may produce more than one
SPECIMEN. The symbols on the line next to an entity (a circle,
cross-hatch, or crow's foot) depict the cardinality of the
relationship; the number of individuals in the entity that may be
related to a single individual in the other entity (at the
opposite end of the line), zero, one, or many, respectively.
Note that relationships are directional, and the symbols at
opposite ends of a line are usually different. (The words
describing a relationship may also change with direction.) The
outer symbol (closer to the entity) indicates the maximum, and
may be either a cross-hatch or a crow's foot. A cross-hatch
indicates that, at most, one instance in that entity may be
related to a single instance in the other. A crow's foot
indicates that many instances may be related to a given instance
in the other entity. In Figure 2 A, the relationship between the
COLLECTING-EVENT and SPECIMEN entities is one-to-many; a single
COLLECTING-EVENT may be produce many SPECIMENS. From the
perspective of the SPECIMEN entity, the relationship is many-to-
one; a SPECIMEN is collected in one and only one COLLECTING-
EVENT.
The inner symbol (further from the entity) may be either a zero
or a one, and indicates the minimum number of individuals that
must be present in that entity for a given instance in the other.
For example, the relationship between the COLLECTING-EVENT and
SPECIMEN entities (Figure 2 A) indicates that a COLLECTING-EVENT
must exist for every SPECIMEN. In other words, the existence of
a SPECIMEN is predicated on the existence of a COLLECTING-EVENT.
The zero by the SPECIMEN entity indicates that there may be no
corresponding instance for a given COLLECTING-EVENT. This
implies that there is a reason for keeping information about a
COLLECTING-EVENT even though no SPECIMENs were collected. (This
is just a pedagogical example and not intended to bias the reader
one way or the other about the correctness of this
representation.)
A relationship may also be one-to-one, or many-to-many. One-to-
one relationships are relatively uncommon, except in the
depiction of supertype-subtype hierarchies, discussed below.
Many-to-many relationships are more common, and can be depicted
in two ways, depending on whether or not additional information
needs to be captured about the relationship beyond its existence.
If only the existence of the relationship needs to be recorded, a
many-to-many relationship can be drawn as in Figure 2 B. The
relationship between the SPECIMEN and REFERENCE entities is many-
to-many. Crow's feet are present at both ends of the line. A
single specimen may cited by many REFERENCEs. A single REFERENCE
can cite many SPECIMENs. If the relationship itself needs to be
described, the relationship should be drawn as an associative
entity, (a box with a diamond in it) as in Figure 2 C, and
labeled so that it can be populated with descriptive attributes.
Note that all many-to-many relationships imply an associative
entity, whether or not one is drawn in the ERD.
The placement of the crow's feet around the associative entity in
a many-to-many relationship may seem counter intuitive at first,
but can be explained as follows. An associative entity records
each instance of a relationship. If an individual SPECIMEN is
cited to REFERENCEs, each of these individual relationships is
recorded in the associative entity, CITATION. The "relationship"
between the SPECIMEN entity and the CITATION entity is then one-
to-many. The same one-to-many "relationship" exists between the
REFERENCE and CITATION entities. Note that there are no zeroes
by the main entities; each and every instance of the relationship
(a instance in the CITATION entity) is existence dependent on
each of the "target" entities. A CITATION, in this case, cannot
exist without both a SPECIMEN and REFERENCE.
A recursive relationship is used to indicate relationships
between individuals of the same entity. The TAXON entity (Figure
2 D) shows a recursive relationship. A TAXON may contain other
TAXONs (also an illustration of how the entity naming convention
has priority over grammar in modeling). Recursive relationships
are particularly important in biology because they model
hierarchies (individuals that are related to each other in a
potentially large and indefinite tree or network structure).
The last kind of relationship commonly used in information
modeling is that depicting a superset - subset relationship
between entities known as supertypes and subtypes, respectively.
The supertype-subtype concept is used to portray important
commonalities and distinctions between groups of similar things
in the real world. An entity (supertype) may have zero to many
subtypes. A subtype inherits all of the attributes of its
supertype, but also has additional attributes. The additional
attributes of one subtype are different from the additional
attributes of another subtype.
A common example from the business world concerns employees,
which may be full-time or part-time. A business typically
records certain information (attributes) about all employees, but
then records additional information for full-time employees that
it does not for part-time employees, and vice versa. (Some
authors refer to the entities in a supertype-subtype relationship
as an "Is A" hierarchy; e.g., a part-time employee "is a" kind of
employee.) Subtypes may or may not be mutually exclusive.
The supertype-subtype relationship places two requirements on the
attributes of related entities. First, the primary key of a
subtype must be exactly the same as its supertype. If EMPLOYEE-
ID is the primary key of EMPLOYEE, then the primary key of PART-
TIME-EMPLOYEE must also be EMPLOYEE-ID. This restriction
maintains a one-to-one relationship between instances of the
subtype and supertype, and therefore ensures the inheritance of
attributes. Second, the supertype must contain one or more
classification variables to indicate explicitly that a given
instance of the supertype is also none, one, or many of the
possible subtypes. and a supertype can be partitioned into more
than one hierarchy of subtypes.) Using the business example
again, an attribute called EMPLOYEE-TYPE-CODE, with the possible
values 'Full-Time' and 'Part-Time', could be added to the
EMPLOYEE entity to serve as the classification variable.
The diagramming conventions used here for super- and subtypes are
illustrated in Figure 2 E. A single cross-hatch with the words
"Is A" to the side indicate that the entity (or entities) below
is a subtype of the entity above. The branching relationship
line in the employee example illustrates mutually exclusive
relationship, in this case, mutually exclusive subtypes.
Completely separate lines would have been used if the subtypes
were not mutually exclusive.
Although it is possible to build models that don't include
subtypes, they are useful for indicating commonalities and
distinctions between entities. This is especially important when
one subtype can participate in relationships that the supertype
or other subtypes cannot.
The last modeling tool that should be understood is the use of
primary key attributes in the specification of a relationship.
Recall two points made above: 1) attributes of the primary key
uniquely identify instances in an entity, and 2) relationships
exist between instances, not entities. To relate any two
instances, the identifying information (primary keys) for both
must be placed in the same entity (information cannot exist in
the model outside an entity). Because all attributes must be
single-valued, one-to-many relationships are implemented by
distributing (copying) the primary key attributes of the "one"
entity into the "many" entity. In the SPECIMEN and COLLECTING-
EVENT example, a place-holder is created in the SPECIMEN entity
for the primary key of COLLECTING-EVENT (e.g., COLLECTING-EVENT-
ID). Within the SPECIMEN entity, the new attribute COLLECTING-
EVENT-ID is called a foreign key. In the textual portion of the
model foreign keys are typically listed among the attributes on
an entity.
In many-to-many relationships, both foreign keys are placed in
the associative entity. Even though information cannot exist in
the model without being placed in an entity, modelers sometimes
allow themselves the short-hand notation on not listing
associative entities. In these cases, the foreign keys that
specify the relationship are not shown.
Conventions of the Textual Description
The textual portion of the model provides the definitions and
descriptions of entities, the attributes of each entity, and
descriptions of the relationships. The text should also describe
briefly the outstanding issues involving a particular concept in
the model. Additional sections may be added as necessary in
later iterations.
As many individuals will be contributing to development of this
model it is important that descriptive standards be adopted to
ensure that the description is complete and consistent. We
propose the following information be recorded in descriptions of
all data entities.
Entity Name
Definition
Primary Key
Foreign Keys
Target Entity
Data Elements
Data Elements
Remarks
In addition, we propose to name all data objects, entities and
data elements, according to a naming convention. A naming
convention is a system for translating a data concept (e.g., a
data element or data entity) into a name. Naming conventions are
used in data administration to facilitate the development and use
of a standard reference (e.g., model or data dictionary) by
reducing ambiguity and redundancy among data objects. The name
of a data object should suggest its definition and possible
content. Armed with a knowledge of the naming convention, a
person looking in a dictionary for a data object should find it
easier to locate the corresponding object, or determine that the
object does not exist in the dictionary. Note that the names for
data objects used in a dictionary are not intended to be used as
the names for corresponding tables or data elements in a
database. The naming conventions used here conform to guidelines
set forth by the National Institute of Standards and Technology,
formerly Bureau of Standards, (Newton, 1987; Rosen & Law, 1989).
All data object names are written in upper case.
Different methods are used to derive names for entities and data
elements.
Entity Names. Entities are the primary subjects or concepts of
interest to an enterprise. To make an entity name meaningful, it
should always contain at least one noun. Adjectives or modifiers
are used to clarify and restrict the meaning of the noun. The
format of an entity name follows the English convention, in which
modifiers are placed before the noun. Modifiers are optional,
and used only to clarify the scope of the entity and eliminate
ambiguity. Hyphens are used to join words in a name.
Entity Name = [Modifiers] + Noun
Examples:
SPECIMEN
DERIVED-OBJECT
TYPE-SPECIMEN-CITATION
Entities are always named in the singular, to represent a typical
instance of the entity, except in cases where the instance itself
is a plural concept.
Data Element Names. Data element names are composed of two
parts, a prime term and a class term. The prime term is simply
an existing entity name, and the class term is composed of
optional modifiers plus a data class name. Note that modifiers
may be nouns as well as adjectives, and again are used to clarify
meaning. In some cases, an entity name and a class name are
sufficient to convey the meaning of the data element, and no
modifiers are required. A data class is used to describe the
content of the data element. A provisional list of class names
(after a document describing the naming conventions used by a
federal agency) is included below. (Other documents on data
administration contain similar lists of data classes.)
Data Element Name = Entity Name + [Modifiers] + Class Term
Examples:
SPECIMEN-IDENTIFIER
LOCALITY-COUNTRY-NAME
COLLECTING-EVENT-EQUIPMENT-NAME
Table of Class Terms
Class Name Abbrev
Definition
Amount AMT
A monetary value. (May include average, balance, and other
derived values).
Angle ANGL
The rotational measurement between two lines or planes, diverging
from a common point or line, respectively.
Area AREA
The measurement of a surface.
Count CNT
An integer value representing the number of items.
Code CD
A combination of one or more numbers, letters, or special
characters which is substituted for a specific meaning.
Indicates the existence of a predetermined, finite set of values.
Coordinate COORD
The designation of location by a line or plane. (Includes
latitude and longitude.)
Date DT
The notation of a specific period of time.
Dimension DMSN
A measured linear distance. (Includes: altitude, depth,
diameter, distance, elevation, height, length, radius, width.)
Flag FLG
A boolean variable for recording a yes/no, or on/off state.
Identifier ID
A combination of one or more numbers, letters, or special
characters that designate the identity of a specific object,
entity, or instance, but has no other meaning.
Mass MASS
The measure of inertia of a body.
Name NM
A designation of an object, entity, or instance, expressed as a
word, phrase.
Quantity QTY
A non-monetary value. (Includes, count, average, balance,
deviation, factor, index, and scale.)
Rate RT
A quantity, amount or degree of something in relation to a unit
of something else. (Includes: acceleration, density, flow,
speed, force, frequency, humidity, etc.)
Temperature TP
The measure of heat in an object or ambient medium.
Text TXT
An unformatted character string, generally in the form of words.
Time TM
A notation of a specified chronological point within a period.
Volume VOL
The measurement of space occupied by a three dimensional object.
Weight WT
The force with which an object is attracted toward the earth by
gravitation.
APPENDIX B
WORKSHOP PARTICIPANTS
James H. Beach Harvard University, MCZ
Stan Blum Smithsonian Institution, NMNH
David Cannatella Texas Memorial Museum, Univ. Texas
Jim R. Croft Australian National Botanic Garden
John Damuth Univ. California, Santa Barbara
* Janet Gomon Smithsonian Institution, NMNH
Bruce Gritton Monterey Bay Aquarium Research
Institute
Ronald Hellenthal Notre Dame University
Elaine Hoagland Association of Systematics
Collections
* Julian Humphries Cornell University
David Mark State University of New York,
Buffalo
Sue McLaren Carnegie Museum of Natural History
Richard Mooi California Academy of Sciences
Peter Rauch Univ. California, Berkeley
Gary Rosenburg Academy of Natural Sciences, Phila.
Wayt Thomas New York Botanical Garden
* Workshop Chairs