Quality Measuring in the Production of Databases

Transcrição

Quality Measuring in the Production of Databases
Quality Measuring in the Production of Databases
M. Rittberger, W. Rittberger
Universität Konstanz
Informationswissenschaft
Postfach 5560 D87
D–78434 Konstanz
Tel: +49–7531–883595
email: [email protected]
Abstract
Quality, quality control and assurance, and especially quality management are becoming
more and more important in using online information services. In this paper we will
focus on the production of online bibliographic databases and discuss possible attributes
responsible for the quality of this kind of online database. Starting with the acquisition of
the original document, the quality of document analysis in the selection of the document,
subject and bibliographical analysis are considered, whereby we give examples of how
the quality attributes of the different productions steps may be measured. Finally we
describe, as an example, a database recording and production system with its testing
routines and reference data.
1 Introduction
In recent years, there has been a dramatic increase in the number of publications dealing
with information services, information brokers, online databases and similar topics.
Thereby many studies have been made of quality, quality control and assurance, and
especially of quality management.
The transition from the production of printed "manuscripts" to that of electronic
databases - which in all subject areas, and also in commercial activities, coincides
with an enormous increase in the available literature - has pushed quality and all its
aspects into the foreground. Simply for content- and time-related reasons, online services of the sort common today facilitate much more effective and efficient searching
than has been and is possible using conventional printed information services (abstracts,
journals, bibliographies, handbooks, etc.). The much deeper and more comprehensive
view of the individual parts of an information unit provided by electronic versions of
the aforementioned and new information services leads to new demands on and standards for the information chain1 [4, 5, 6]. This "information chain" thereby affects
all participants, from the creators of information (authors, patent applicants and other
knowledge producers) to the producers (database producers, publishers, patent offices),
to the distributors (hosts, network managers, information brokers and services) and,
finally, to the end-users. The different links which form the information chain are mutually dependent, especially in the case of adjacent ones, so that every partial link is
dependent on the fault–free service of its providers.
1
According to [1], probably drawing on the Portersian value chain [2, 3], also called a value creation chain.
The overall quality which information searchers expect, and which is to be maintained
within the information chain, consists of aspects of the following domains:
the construction of the information unit itself and the structural and organizational
compilation of a database;
the system which makes data available; a database design is developed and the
database is implemented on a computer (file building);
the system with which data are selected and employed. Affecting the performance
in this domain are the retrieval system, support by a host, and the qualifications of
the user with regard to technology, methodology and subject knowledge.
In recent years the information searcher has increasingly been the one who has defined quality attributes and criteria (see section 2). He is thereby interested only in the
endproduct online service and never (or very seldom) differentiates among the aforementioned areas. For the evaluation of an online service it is, however, always necessary to
consider the quality of the individual domains separately, i.e., database production, computer implementation, online retrieval and use. In the given case, an overall judgment
should be based on the quality of the different domains (e.g., without good database
design and good implementation, even the best database can be unsatisfactory). Thus
in the future quality evaluations will be necessary for each of the individual domains.
An essential point for the information searcher is knowledge of which added values are
created in the individual domains. Drawing on [7, p. 90], product-related added value
creation has the following aspects:
The comparative added value is obvious from the electronically available version
of the online database, as opposed to the usual printed version;
The inherent added value is produced by the analysis of the data sets and their
electronic selectability in the database;
An aggregative added value is given by the collection of data sets in a database.
Quality and added value creation in producing databases occupies the center of our
study. We will make a few general observations on the quality of information in online
databases and then concretely suggest standard values and scalings (as called for in
[8]). Further to be determined is the extent to which tools, such as rules, standards,
norms, guidelines and manuals, are available for the production of a database and what
guidelines for quality specifications can be derived from them. As well, quality attributes
and criteria of the information user should thereby also be taken into account.
As an example, we will discuss reference databases with literature references,2 since they
currently play an important role in terms of numbers, size and significance [9, 10]. Most
of our conclusions can, however, also be applied to other information sources similar
to bibliographic databases. Rather than focusing on user perspectives, which ultimately
hold the foreground, in this article more attention will be devoted to system perspectives.
We thereby find ourselves in agreement with Kuhlen, who also prefers not to equate
quality with added value from a user-oriented perspective. Especially for information
products and services, he favors evaluating quality by means of a combination of user
perspectives and system attributes. He calls for norms and the definition of standards,
particularly in the case of information products [7, p. 93].
2
Also called literature or bibliographic databases.
2 Quality
In the relevant literature one finds a large variety of different descriptions, definitions and
notions for the concept of quality [11, 12]. In the narrower environment of information
science, [13] states that "quality is, like ethics, situational - at least in my universe - and
I suspect in that of most search professionals." Arnold believes that "quality is electronic
publishing’s golden idol," and for him quality is a question with many answers: "Toyota
Motors defined quality as products that conform to the specifications" [14].
From the producer’s side, various authors have described quality attributes and criteria
for the production of databases and given details on not only individual parts of an
information unit, but also on individual work steps [15, 16, 17, 18, 19]. Despite the
range of interpretations of quality by database producers [20, 21, 22, 23, 24], literature
shows that over the last few years there have been increasing calls for higher standards
regarding the quality of various attributes and aspects of information services. The
transition from printed sources containing a few thousand information units to electronic
manipulation of hundred-thousands or millions of information units in a database and
the direct use by information services and end-users have created a new situation which
gives users the opportunity to obtain direct influence on database producers and database
providers. But even today it can still be maintained that there is no comprehensive and
objective concept for the quality of databases, and that the major unsolved problem
in regard to the quality of information services consists in the development of usable
performance criteria [25].
End-users, and especially information brokers, have increasingly defined their requirements in terms of the services which they expect from information providers [26, 27, 28,
29, 30, 5, 31, 32, 33]. In an opinion survey of European information specialists from 12
different countries, Wilson asked the specialists to rank ten quality criteria for databases
— which were selected from SCOUG 1990 (see [34]) — [9]. He found the following
rank order, based on the significance of the criteria: coverage, accessibility, timeliness,
consistency, accuracy, value, documentation, harmonization, output, support.
[34] discusses the requirements set by the "Southern California User Group" (SCOUG),
generalized in [26]:
Ability to set limits on the basis of geography, language and contents;
Top-to-bottom indexing;
Coding of contents and document type;
Connection between related data, e.g., connections between conference proceedings
and the corresponding addresses;
Accurate listing of authors and titles;
No abbreviation of journal titles;
Author affiliations completely searchable;
At least the following fields must be present: author, affiliation, title, source and
country of origin, publication date, summary, indexing.
Our summary of quality requirements, attributes and criteria contains demands on not
only the database producer, but also the host, although users do not wish to accept this
distinction [16]. From the above discussion, five quality requirements can be derived
for the production of databases which are especially significant:
Scope and coverage of the subject area: By scope we mean the subject-related
contents of an area which is touched by the database. All relevant documents
(publications) - i.e., all those classifiable as dealing with the subject matter of an
area - are to be described in their full extent as information units. The area can be
subject-matter oriented, multidisciplinary, and also mission-oriented. Geographic
location, linguistic region and time period are further criteria of coverage.
Comprehensiveness: This means the inclusion and presence of all sorts of documents (publications): monographs, chapters and articles in monographs; journals;
journal articles; reports; articles and/or chapters from reports; conference papers,
conference proceedings; grey literature; dissertations; patents; norms. Comprehensiveness can be international or limited on the basis of geographic, temporal or
linguistic viewpoints (e.g., dissertations only from the English-speaking world).
Currency and timeliness: This means the time period between the publication of a
text (publication date) and the appearance of the information unit of this publication
in a database. Also usable as a currency indicator is the share of information units
from the year of publication included out of all the information units processed in
that year.
Accuracy: This means the avoidance of errors in all stages of creating an information
unit:
a. in document analysis;
b. during entry in the data fields;
c. and orthographical errors.
Consistency: This is uniformity and agreement in the processing of all information
units. In order to fulfill the requirements for a high level of consistency, strict
compliance with rules and working instructions is necessary:
a. in the choice of documents (scanning);
b. in classification and indexing (e.g., classificatory schema, thesaurus, indexing
rules);
c. in cataloguing (e.g., cataloguing rules, category schema).
These quality requirements show, on the one hand, the great interest of various groups
of persons involved with online retrieval in qualitatively excellent databases, and on
the other, a demand for the realization of a comprehensive and concrete treatment of
the quality of databases [35, 36]. To this end we will develop a model for the "quality
profile" of a database which presents qualitative and quantitative statements on "quality
indicators" for the individual parts and elements of an information unit or database. This
model will be oriented towards the production of databases. To illustrate and clarify
these concepts, using various tables we will provide quantitative and qualitative details
of two hypothetical databases (db1 and db2). By these the different possibilities for the
analysis of data - both formally and content based - as well as for the document—types
are demonstrated. In addition to this, where possible the connection will be shown
between user requirements and quality, as well as the relationships to the information
chain.
3 Production of Databases
In the following discussion we consider in greater detail the production process for a
%-Share3
Type of Procurement
db1
db2
Conventional, acquisition by purchase, exchange
or gift of single orders for the sources
55%
15%
Preordering of document series for the sources
25%
15%
Direct submission of documents by the
publishers on the basis of specific agreements
15%
15%
Direct submission of galley proofs of documents
by the publishers on the basis of specific
agreements
5%
15%
Direct submission of analyzed documents by the
publishers on the basis of specific agreements
(e.g., as worksheets, machine-readable texts)
-
40%
Table 1 Forms of acquisition for bibliographic databases.
database and distinguish three production steps, the acquisition of the original document,
the analysis of the document (selection, subject analysis, bibliographic analysis) and the
data recording and production system.
3.1 Acquisition of the original document
The acquisition of a document requires three steps:
Discovering and monitoring the publication and offering of literature;
preselection of the relevant sources;
actual acquisition.
Literature surveillance and scanning require, on the one hand, subject knowledge, in
order to be able to determine the relevance of a source. The actual acquisition, on
the other hand, requires documentary or bibliographic knowledge, in order to assure
that relevant available documents are identified, and that selected documents will be
promptly ordered and delivered.
Table 1 shows the various ways of procuring a document for a database. It gives
information on whether a document has still to be obtained conventionally, or whether
"half-ready products" or even analyzed documents can already be delivered. The
percentage figures for the two databases (db1 and db2) are thus a measure for the
speed with which documents can be brought into the production process. As in the case
of db2, higher percentages for the delivery of galley–proofed and analyzed documents
are thus currency indicators. A high share of conventionally procured documents (e.g.,
through exchange), as with db1, suggests a lower degree of currency for the database.
The methods of acquisition for db1 and db2 as described in table 1 are not satisfactory,
yet. Table 1 shows that 80% of the documents are delivered conventionally, for db2
40% of the documents are already analyzed by the publisher and directly transferred
3
percentage-shares are given as the percentage of the given value for the production of a database for a certain period of time
(e.g. a year)
to the database4, which indicates high accuracy and authenticity. With ongoing change
from conventional acquisition (i.e. with the database producer themselves doing the
whole process of document analysis) to delivery of bibliographic and content analyzed
documents a quicker and more efficient procurement of documents for the integration
of such literature into databases could be achieved.
3.2 Document Analysis
The task of document analysis consists in making an accurate and comprehensive
description of the original document. For this, clear and unambiguous methods of
subject analysis and bibliographic processing are needed, based as much as possible
on rules, guidelines, norms, manuals, etc. In addition, it is necessary to have an
accurate and consistent description of the contents of the formal structures and physical
characteristics of the data sets.
Selection of Documents
In the selection of a document it is decided, whether a
specific document (publication) should be considered for processing and inclusion in a
database. The database producer who takes the responsibility for the production has to
give a clear statement on selection within the database policy to allow the customer an
exact overview on the subject related and bibliographic content of the database.
For the selection, clear and unambiguous guidelines to determine database content,
document types, deliminations, and other selection criteria have to be defined. Also
the tools like subject classification schemes, thesauri, keyword lists, type of document
lists, category schemes, etc. have to be employed. Based on these guidelines and
the subject related tools the scope and coverage of the database is determined, other
guidelines and tools destinate the type of documents to be treated and give answer to the
comprehensiveness of the database. Other guidelines determine the limits of a database
in respect to the geographical area, language and further specific elements.
Percentages, as numerical values, are indicators which permit an overview of the
distribution. Besides the indicators for the evaluation of a database named in tables
1 and 2, a further indicator is the number of documents present after acquisition and
selection in relation to the sum of all possible documents. The number of possible
documents can, of course, only be estimated.
Table 2 lists the elements which are to be considered in deciding what to include in
the databases db1 and db2.5 The first column of db1 and the first column of db2
indicate what subject classification areas, what document types and which delimitations
are used. The second column of db1 and the second column of db2 give the distribution
in percentages: they show, for example, that the size of database db1 is smaller than that
of db2, because fewer subareas of the subject field were included; db1 includes above
all books, reports and grey literature, while db2 is a database chiefly containing journal
articles and conference papers. This information clearly relates to the completeness
of a database. More types of publications were includes in db2, even though the
focus is on journals and conference reports. In db1, by contrast, the nonconventional
4
The publishers Elsevier Science and American Institute of Physics, for example, offer database producers the delivery of
analyzed documents and thereby contribute to accelerating acquisition and document analysis. Other publishers offer galley proofs
of documents, refereed and corrected by the author, finished by the publisher, but not yet distributed.
5
For an overview of the number of documents procured within a certain period not only in terms of contents, - e.g., with a
classification according to the chief groups of a classificatory schema - but as well in terms of document type, absolute values can
also be given for the selection criteria in table 2.
db1
Selection Criteria
% - Share
/-
Subject Area: (e.g.
ACM-Classification)
Subareas:
A - general literature
B - hardware
C - comp. sys. org.
D - software
E - data
F - theory of comp
G - math. of comp.
H - inf. sys.
I - comp. methodologies
J - comp. applications
K - comp. milieux
Type of Publication:
Journal Article
Book
Report
Grey Literature
Dissertation
Patent
Norm
Conference Contribution
-
-
-
Selection Criteria
db2
/-
% - Share
80%
100%
15%
10%
40%
20%
10%
5%
5%
10%
15%
10%
5%
5%
5%
10%
15%
10%
10%
35%
15%
30%
10%
1%
9%
70%
5%
2%
1%
1%
5%
20%
-
Attributes
Attributes
Delimitation:
Geographical
Time Period
Language
EU countries
Last Five Years
EU Languages
International
None
All Languages
Availability of the Original
Yes
No
Processing Priority
Books
Core Journals
Table 2 Specific selection criteria for the choice of literature.
publications of grey literature and reports play a more important role. Further important
information concerning comprehensiveness can be inferred by studying the delimitations
of a database. Included in db1 are documents which were published in EU countries
over the last five years, while db2 includes journal articles and conference papers which
were produced internationally. Inferences can be made from processing priority about
the currency and topicality of the the contents of a database. We can assume from the
preference of db2 for journals, in contrast to db1’s favoring of books, that the contents
of db2 are more current and topical than those of db1, even if reports and grey literature
are included in it, those not highly valued in the processing priority of db1.
Db1 has a more specialized coverage, holds a high percentage of non-conventional
literature and covers a smaller regional scope. Db2 though is more international and
has a wide content related range. It contains many journals and conference publications
which are quickly available, but only a poor number of other document types. Thus
both databases are not complete in certain fields.
Subject Analysis Subject analysis serves as a means for the description of a publication’s scientific contents.
The nine points listed for the subject analysis, see table 3, essentially specify the
scientific contents and thus the value of a database. The number of points dealt with
establishes the breadth of an analysis, and individual numerical values and percentages
suggest the worth and depth of the evaluation. But surprisingly, despite its significance
in the productive process, subject analysis was not listed in the enumerations of user
requirements [34, 9].
Table 3 summarizes the steps involved in subject analysis. They include abstracting,
classifying, various possibilities for indexing, and further elements such as main keywords, data identification, title specification and indexing for special areas. The first
column of db1 and the first column of db2 show, for example, which steps were carried
out for the two databases. The second column of db1 and the second column of db2
give typical values for these steps.
Db1 and db2 differ strongly in subject analysis. In db1, abstracts are taken from the
original, while in db2 new abstracts are composed. The take over of the abstracts in
db1 makes possible a quicker processing of documents and therefore contributes to
the timeliness of the database, while in contrast, the composition of abstracts in db2
increases the consistency of the abstracts as uniform standards are employed for the
creation of abstracts, as, for instance, the ’Instructions for submitting abstracts’ [38]. In
db1 only supplementary keywords are given and a title specification is made to better
identify the contents, while in db2 the type of contents (e.g., experimental, theoretical,
etc.) is established, a thesaurus is used, and 11.8 descriptors are assigned per document.
Such an extensive analysis means an enormous advantage for users of db2 with on a
later search.
For the evaluation of subject analysis, it is also necessary to know which rules,
guidelines, norms, classifications, thesauri and manuals are available for the preparation
of the different elements, and what competence they have, not only on the internal
level, but also on the national or international levels. The following enumeration gives
examples of (a) internal, (b) national or (c) international instruments of this sort:
Abstracting and Indexing:
a. Manual for subject indexing [39];
b. JICST Thesaurus English Version [40];
c. Instructions for submitting abstracts [38];
Classification:
c. Subject categories and scope description [37]
Statement of which rules, etc., are employed for subject analysis are likewise quality
indicators for the evaluation of an information unit or database.
Subject analysis has great influence on the information chain, since its excellence and
comprehensiveness strongly affect the relevance of search results.
db1
Subject Analysis
/-
db2
%-Share
or Num.
Values
%-Share
/- or Num.
Values
On the basis of the Original Document
90%
100%
Abstract:
Creation
Inclusion (Take over)
Improvement
Translation
30%
70%
30%
60%
90%
10%
20%
35%
157
1.2
570
2.4
-
-
10
1.7
-
-
20,000
11.8
-
-
-
-
5.5
-
-
Subject Classification:
Number of all Terms within a Classification
Scheme (e.g. [37])
Number of Terms per Information Unit
Type of Content:
Number of all Codes within a Type of Content
Scheme
Number of Codes per Information Unit
Indexing:
Thesaurus:
Total Number of Available Descriptors
Number of Descriptors per Information Unit
Controlled Vocabulary:
Total Number of Available Descriptors
Number of Descriptors per Information Unit
Supplementary Keywords
Number of Descriptors per Information Unit
Main Keywords and Qualifier Pairs (M-Q):
Number of Terms per Information Unit
-
-
2.4
Data Identification (Data Flagging and Tagging)
-
-
-
Title Augmentation
Indexing for Special Areas (e.g., chemistry,
astronomy)
-
-
-
-
Table 3 Values given for the subject analysis of a bibliographic database.
Bibliographic Analysis Bibliographic analysis contributes to the description of the
formal elements of a publication or of an information unit.
Tables 4 and 5 include the key elements which are drawn on in processing6. They
include document types, author, title, publisher data, conference elements and further
specific elements, as for example the International Standard Serial Number (ISSN) for
journals, report numbers and corporate bodies for reports or the International Patent
6
There can be further database elements, which are necessary to fulfill the goals the database producer wants to achieve (e.g.
citation data, URL, pricing information, , physical properties, etc.
db1
Data Element
/-
Title of Publication:
Original
English
Carrier Language of the
Database
Authors
db2
%-Share or
Num. Value
/-
%-Share or
Num. Values
70%
30%
100%
10%
90%
100%
1
all
Affiliation
-
-
30%
Country of Affiliation
-
-
100%
Collaborators
-
Editors
Publication Date
Place of Publication
-
Collation
Original Language
-
Availability Note
Contract - Number
Conference Elements:
Title of Conference
Place of Conference
Date of Conference
Type of Document:
Journal
Book
Article in Book
Report
Grey Literature
Dissertation
Patent
Standards
Conference Article
Preprint
-
max. 3
all
60%
>90%
-
100%
70%
80%
-
60%
10%
-
-
-
-
-
-
100%
100%
100%
-
-
-
Table 4 Elements of a bibliographic description.
Classification (IPC) for patents. Likewise with subject analysis it is established here
what rules, etc. were used for inclusion and what competence they have (a) internally,
(b) nationally and (c) internationally. Examples are:
Cataloguing:
b. Guidelines for the cataloguing of documents [41]
db1
Data Element
Journal:
Title
ISSN
CODEN
Date of Publication
Collation: Volume and
Number
Book:
ISBN
Publisher
Place of Publisher
Information of
Monographic Series
/-
-
-
db2
%-Share or
Num. Value
100%
50%
60%
60%
/-
%-Share or
Num. Values
-
100%
100%
-
100%
100%
100%
100%
100%
100%
100%
100%
Report:
Report Number
Corporate Entry
100%
100%
Grey Literature and
Dissertations:
Corporate Entry
100%
50%
-
-
100%
100%
100%
-
-
Patents:
Country
Patent Number
International Patent
Classification
Norm:
Country
Norm Number
-
-
30%
-
Table 5 Elements of a bibliographic description of individual types of publications.
Country codes:
c. Codes for the representation of names of countries [42]
c. Terminology and codes for countries and international organizations [43];
Journal Title:
a. List of journals and serial publications [44];
The first column of db1 and the first column of db2 show which of the elements
were obtained for the database. The percentages and numerical values given in the
second column of db1 and the second column of db2 show the extent to which the
requirements were fulfilled.
As before the two databases differ thoroughly.Thus, for example, only the first author of
a publication is listed in db1, whereas all authors are named in db2. Failure to include
all authors is naturally a indication of the precision of the database, since the document
is not completely described. The weight which should be assigned to this inadequacy
when evaluationg a database depends on whether it is common in a specialized area for
several or even many authors to publish jointly. While conferences are not included in
db1, conference titles, locations and dates are given in db2. These details are helpful in
identifying conference publications (table 2 and 4) and present an informational value
of their own for conferences. In db2, affiliation is listed, an indication which also has
increasing significance for users.
The complete and accurate inclusion of all formal attributes helps the user in selecting
a document in a larger document collection and thereby influences the quality of the
retrieval and its results.
The different elements of a bibliographic analysis are needed for the further links within
the information chain. For example, the greatest degree of fanning of the information
unit during bibliographic analysis is useful in database design, in order to improve
retrieval possibilities. Aside from the selection of the subject, bibliographic data are
necessary in information use for limiting the formal level - e.g., limiting the selection
to information on patents obtained after 1987. Further, bibliographically error-free
processed data are assuming increasing significance for information users, since the
automation of document ordering and delivery requires correct data [45, 46]. As
well in regard to data exchange in international networks like the Internet, highly
accurate bibliographic data is increasingly needed, which must be produced according
to international standards.
In regard to the aforementioned quality requirements of users, accuracy and consistency
play an especially great role in bibliographic analysis. The accurate, correct and
consistent application of rules and the accurate, correct and consistent production of
data elements can greatly increase not only the accuracy of a database, but also the
consistency of its data. Furthermore, a contribution can be made to the currency and
high-speed processing of a dabatase, since through the avoidance of errors at this
production stage, expensive and time-consuming correction at a later date becomes
unnecessary.
3.3 Data Recording and Production System
Computer-supported production methods are being increasingly employed in the production of databases – especially because of the rapidly growing volume of available data. COMPINDAS [19] will be described here, which in all production phases
uses computer-supported methods for the production of databases. COMPINDAS
(COMputer-supported and INtegrated DAtabase production System) includes functions
for the acquisition and analysis of documents, the employment of reference data, and the
statistical evaluation and creation of machine-readable endproducts. The COMPINDAS
data-recording scheme makes possible a very specific and detailed entry and structuring
of data elements. For the entry of data, a comprehensive character set exists with which
special symbols and formulas can be represented.
Autonomous systems like METAL (Machine Evaluation and Translation of natural
Language) [47], AIR (Automated Indexing and Retrieval) [48, 49] and Kurzweil
Discover 7320 [50] support the production process.
Errors and their avoidance (see also [22]) play a major role in automatic procedures,
since through consistent, automatic checking of the entered data a subsequent correction
Reference Files
Thesaurus
db1 -
/-
db2 -
/-
-
Classification
-
Author
Institution/Affiliation
-
Conferences
-
Journal Title and
Abbreviations
-
Countries and Country Codes
-
Location
Language Designations
-
Dictionaries
-
-
Character Set
Table 6 Reference files for bibliographic databases
procedure can be avoided. The error rate - number of errors per 1000 entered symbols
or number of errors in a specific data field - can be used as a statement for a quality
evaluation.
Testing routines and reference data are employed in the production process. According
to [51] five types of tests can be made:
Consistency Test: The included data are compared with standardized lists using a
text-analysis procedure.
Plausibility Test: Which predefined rules must be fulfilled in an information unit is
set down in a matrix of elements dependent on the type of document. The absence
of fields or errors in the dependency of data fields is indicated.
Syntax Test: This is made in order to be able to further process data with defined
formats. Errors can be avoided through the greatest possible fanning of the data
elements.
Duplication Test: This test ensures that there are no double entries, and connections
between individual entries are also indicated.
Creation of Registers: Data elements are summarized in registers in order to detect
irregularities through the use of structured overviews.
In the production process, the extent to which reference files are employed is also
a measure of judgment. Table 6 includes typical reference data which are used in
creating a bibliographic database.
It is shown show which data were used in db1 and db2. Thus, e.g., in db1 a classification
was used for control, and in db2, a thesaurus [52]. For both databases a standard English
dictionary for locations is used [53], and for the authors the AACR2 [54].
Of the different quality requirements, accuracy and consistency are especially important
in the use of a data recording and production system. The numerical values for error
rates, which testing routines are employed, as well as which reference data are employed
with what competence (internal, national, international) are indicators of the quality of
the production process.
Essential for file building and for the adjacent links in the information chain are
knowledge of the fanning, the structuring and formatting of the data elements and of
the character set employed, just as high consistency and accuracy and great reliability
in data sets and data per se simplify the production of databases.
4 Concluding Remarks
Databases are today conceived and produced as original products. They are no longer
byproducts of the production of printed services, but in themselves the basis for the
production of electronic products and the offering of electronic added value services.
According to [14], between 1965 and 2001 there will be four phases in the production
of electronic information services. From 1992 to 1997 we will find ourselves in the "reconstruction" phase, in which high-quality databases will be reconstructed (redesigned)
using technologically advanced systems and software – as Kuhlen [55] already called
for in 1986. In this phase users will better be able to express their wishes, needs and
demands.
As was the case earlier with the production of abstract journals in printed form, till now
very little has been reported on the production of bibliographic databases. The view is
also widespread [24, 56, 57] that there is a lack of standards, and demands are being
made to produce them on the international level. This view cannot be supported for
the production of databases. It has been demonstrated in this paper that the guidelines,
rules, norms, instructions and manuals needed for the production of a database are
already by and large available.
To be sure, in our opinion there is still a need for a systematic and comprehensive
overview of these instruments – and as well for information on competence. The
production of a catalogue of all rules, guidelines, norms and manuals should therefore
be undertaken as soon as possible.
In tables 1 - 6 of the above sections we have summarized the steps and data elements
necessary for the production of information units and databases. Using as examples
two hypothetical databases (db1 and db2), we have characterized the individual values
which result from the steps of the production process using numerical values, percentage
figures or descriptive statements. These values were designated by us as the available
"indicators" of the "quality profile" of a database for the evaluation of a database. This
quality profile should be used in the evaluation of databases according to the ISO 9000
Series.
Our research demonstrates that the quality of databases can be evaluated using the
indicators for the different work steps, and data elements and also using indicators
for work instruments. The weighting of the individual indicators for the choices of
a database depends essentially on the concrete approach and needs of the user. But
objective test for individual database for typical application situations can as well be
developed and employed using the indicators named above. Through further research,
supplementation and improvements in individual indicators can be achieved. To this
end standards with defined values and tolerances must be developed (e.g., for authors
or in indexing). Initial attempts are found in [58, 18], who propose optimal values for
indexing depth.
Research aimed at achieving standards and tolerances is urgently needed, and simultaneously database tests should be undertaken, e.g., with user organizations or information
science institutes, as with [59], who calls for the introduction of testing offices.
For users, the quality profile is of the greatest value. One can thereby make comparisons
of the extent to which a database satisfies one’s own wishes and requirements. A
database with an indexing depth of 2 descriptors and only the obligatory bibliographic
statements does not permit the same level of retrieval that is possible with a database
featuring 11 descriptors and all formal elements including conference information.
Decisions on which database should be used are influenceable by the quality indicators
for various levels.
Within the information chain, creating a quality profile with evaluation-capable indicators for a database is an important intermediate step toward the overall evaluation of
an online information service. For the further access points in the information chain,
namely for file building (database design and computer implementation), as well as for
retrieval and use, quality profiles must likewise be worked out in order to achieve this
overall evaluation.
In the frame of the Konstanz Hypertext System [60, 61], a database choice component
[62] was integrated into the system which contains descriptions of individual databases.
There are plans to evaluate databases with the above-described indicators, whereupon
the presentation of databases in the Konstanz Hypertext System will be enhanced using
the test data. The Konstanz Hypertext System offers very flexible interaction and
presentation forms of information so that the data presented in tables 1 - 6 can be
presented and used in an appropriate manner.
The authors wish to thank Prof. R. Kuhlen (Information Science, University of
Konstanz) for valuable discussions, D. Marek (FIZ Karlsruhe) for numerous suggestions
and pointers which were useful in the writing of this article.
References
[1] W.G. Stock. Der Markt für elektronische Informationsdienstleistungen. IfoSchnelldienst, (14):22–31, Qualität: Online-Datenbanken; 1993.
[2] M.E. Porter. Wettbewerbsvorteile: Spitzenleistungen (competitive advantage) erreichen und behaupten. Campus: Frankfurt, Qualität: Allgemein; 1986.
[3] M.E. Porter and V.E. Millar. How information gives you competitive advantage.
Harvard Business Review, (July-August):149–160, Qualität: Allgemein; 1985.
[4] W. Schwuchow, editor. Qualität von Informationsdiensten, 7. Internationale
Fachkonferenz der Kommission Wirtschaftlichkeit der Information und Dokumentation
KWID in der Deutschen Gesellschaft für Dokumentation e.V. DGD in Zusammenarbeit
mit der Gesellschaft für Informatik e.V. GI und der International Federation for Information and Documentation FID, Garmisch-Partenkirchen, 2.-4. Mai 1993, Qualität:
Informationsdienste; 1993. Deutsche Gesellschaft für Dokumentation: Frankfurt.
[5] C. Tenopir. Database quality revisited. Library Journal, (1):64–67, Qualität: OnlineDatenbanken; 1990.
[6] U. Hanson. The hidden quality of the database: some (re-)liability aspects. In
I. Wormsell, editor, Information Quality: definitions and dimensions, pages 91–121.
Taylor Graham: London, Qualität: Online-Datenbanken; 1990.
[7] R. Kuhlen. Informationsmarkt. Chancen und Risiken der Kommerzialisierung
von Wissen. Number 15 in Schriften zur Informationswissenschaft. Universitätsverlag
Konstanz: Konstanz, Herbstschule; Schriften zur Informationswissenschaft; Qualität:
Informationsdienste; Kuhlen, Rainer; 1995; Literatur: Mündliche Prüfung; 1995.
[8] W. Stock. Qualitätsmanagement von Informationsdienstleistungen. In W. Rauch,
F. Strohmeier, H. Hiller, and C. Schlögel, editors, Mehrwert von Information - Professionalisierung der Informationsarbeit, Proceedings des 4. Internationalen Symposiums
für Informationswissenschaft (ISI’94), number 16 in Schriften zur Informationswissenschaft, pages 21–32. Universitätsverlag Konstanz, Konstanz, Literatur: Mündliche
Prüfung; Qualität: Online-Datenbanken; Rauch/Strohmeier/Hiller/Schlögl 1994; Mehrwert von Information -; 1994.
[9] T. Wilson. EQUIP: A European survey of quality criteria for the evaluation of
databases: report on the questionary survey. European quality management programme
for the information sector, CIQ; ELIS; Qualität: Informationsdienste; 1994.
[10] M.E. Williams. The state of databases today: 1994. In K.Y. Marcaccio, editor,
Gale Directory of Databases, volume 1, pages XIX–XXX. Gale Research, Qualität:
Online-Datenbanken; 1994.
[11] D.A. Garvin. Managing quality. The strategic and competitive edge. Free Press:
New York, 7 edition, Literatur: Mündliche Prüfung; Qualität: Allgemein; 1988.
[12] International Organization for Standardization, Geneve. ISO-8402. Quality management and quality assurance - vocabulary, 2 edition, Qualität: Allgemein; ELIS; 1994.
[13] R. Basch. Decision points for databases. Database, (August):46–50, CIQ; Qualität:
Online-Datenbanken; 1992.
[14] S.E. Arnold. Information manufacturing: the road to database quality. Database,
(October):32–39, Qualität: Allgemein; 1992.
[15] K. Bürk and D. Marek. Produktion von wissenschaftlich-technischen Datenbanken.
Handbuch der modernen Datenverarbeitung (HMD), 25(141):45–54, Qualität: OnlineDatenbanken; 1988.
[16] L. Granick. Assuring the quality of information dissemination: responsibilities
of database producers. Information Services & Use, 11:117–136, Qualität: OnlineDatenbanken; ELIS; 1991.
[17] B. Lawrence and T. Lenti. Application of TQM to the continuous improvement
of database production. In R. Basch, editor, Electronic information delivery: Ensuring
quality and value, chapter Part I Database Production, pages 69–87. Gower Publishing
Limited: Hampshire, Qualität: Online-Datenbanken; ELIS; 1995.
[18] W. Lück. Qualität von bibliographischen Datenbanken: Die Datenbank PHYS.
In 5. Österreichisches Online-Informationstreffen in Seggauberg, Qualität: OnlineDatenbanken; ELIS; 1993.
[19] D. Marek. Integrated system support for the cooperative production of bibliographic, referral and numeric databases. In D.I. Raitt and B. Jeapes, editors, 17th International Online Information Meeting 1993, pages 347–357. Learned Information:
Oxford, Qualität: Online-Datenbanken; ELIS; 1993.
[20] T.M. Aitchison. Aspects of quality. Information Services & Use, 8:49–61, Qualität:
Allgemein; 1988.
[21] E. Beutler. Assuring Data Integrity and Quality: A Database Producer’s Perspective. In R. Basch, editor, Electronic information delivery: Ensuring quality and value,
chapter Part I Database Production, pages 59–68. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; ELIS; 1995.
[22] E.T. O’Neill and D. Vizine-Goetz. Quality Control in online databases. In M.E.
Williams, editor, Annual Review of Information Science and Technology (ARIST),
volume 23, pages 125–156. Elsevier: New York et al., Qualität: Online-Datenbanken;
1988.
[23] P.L. Townsend. Commit to quality. John Wiley & Sons: New York, Qualität:
Allgemein; 1986.
[24] G.M. Wheeler. Securing product-service quality in large-scale bibliographic
database production. Master thesis, University of Wales, Qualität: Online-Datenbanken;
1988.
[25] A.L. Gilchrist. Quality management in information services - a perspective on
European practice. In W. Schwuchow, editor, Qualität von Informationsdiensten. 7.
Internationale Fachkonferenz der Kommission Wirtschaftlichkeit der Information und
Dokumentation e.V. in Zusammenarbeit mit der Gesellschaft für Informatik e.V. GI
und der International Federation for Information and Documentation FID. GarmischPartenkirchen, 2.-4. Mai 1993, pages 92–99, ELIS; Qualität: Informationsdienste; 1993.
[26] R. Basch. An overview of quality and value in information service. In R. Basch,
editor, Electronic Information Delivery, chapter Introduction, pages 1–10. Gower
Publishing: England, CIQ; Qualität: Online-Datenbanken; 1995.
[27] R. Fidel and D. Soergel. Factors affecting online bibliographic retrieval: a
conceptual framework for research. Journal of the American Society for Information
Science, 34(13):163–180, Qualität: Online-Datenbanken; 1983.
[28] P. Jasco. Testing the Quality of CD-ROM Databases. In R. Basch, editor, Electronic
information delivery: Ensuring quality and value, chapter Part III Quality Testing,
pages 141–168. Gower Publishing Limited: Hampshire, CD ROM; Qualität: OnlineDatenbanken; 1995.
[29] A.P. Mintz. Quality control and the zen of database production. Online,
(November):15–23, Qualität: Online-Datenbanken; 1990.
[30] B. Quint. Better Searching Through Better Searchers. In R. Basch, editor,
Electronic information delivery: Ensuring quality and value, chapter Part II Role of The
Search Intermediary, pages 99–116. Gower Publishing Limited: Hampshire, Qualität:
Online-Datenbanken; ELIS; 1995.
[31] C. Tenopir. Priorities of Quality. In R. Basch, editor, Electronic information
delivery: Ensuring quality and value, chapter Part III Quality Testing, pages 119–139.
Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; 1995.
[32] S.A.E. Webber. Criteria for comparing news databases. In Online Information 92,
8-10 December 1992, London, England, pages 537–546. Learned Information, Oxford,
England, Qualität: Online-Datenbanken; 1992.
[33] U. Weber-Schäfer. Die Nachfrage und das Angebot von externen Informationen zu
Unternehmensstrategien in einem Online-Informationssystem. Entscheidungsorientierte
Analyse am Beispiel des europäischen Binnenmarktes, Anforderungen und Konzepte.
Number 1660 in Europäische Hochschulschriften: 5, Volks- und Betriebswirtschaft.
Lang: Frankfurt am Main, Qualität: Online-Datenbanken; 1995.
[34] R. Basch. Measuring the quality of the data: report on the fourth annual SCOUG
Retreat. Database Searcher, (October):18–23, CIQ; Qualität: Online-Datenbanken;
1990.
[35] C.J. Armstrong. The Eye of the Beholder. In R. Basch, editor, Electronic
information delivery: Ensuring quality and value, chapter Part V The Role of User
Groups, pages 221–244. Gower Publishing Limited: Hampshire, Qualität: OnlineDatenbanken; 1995.
[36] R. Juntunen, E. Mickos, and T. Jalkanen. Evaluating the Quality of Finnish
Databases. In R. Basch, editor, Electronic information delivery: Ensuring quality and
value, chapter Part V The Role of User Groups, pages 201–219. Gower Publishing
Limited: Hampshire, Qualität: Online-Datenbanken; 1995.
[37] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Subject
categories and scope descriptions, Qualität: Online-Datenbanken; ELIS; 1991. IAEAINIS-3 (Rev.7).
[38] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Instructions
for submitting abstracts, Qualität: Online-Datenbanken; ELIS; 1988. IAEA-INIS-4
(Rev.2).
[39] Fachinformationszentrum Karlsruhe. Manual for subject indexing, Qualität: OnlineDatenbanken; ELIS; 1990. FIZ-KA-Serie 3-3, 160 pages.
[40] Japan Information Center of Science and Technology (JICST), -English versionVol.1. Tokyo. JICST-Thesaurus, Qualität: Online-Datenbanken; ELIS; 1987.
[41] C. Hitzeroth, D. Marek, and J. Müller. Leitfaden für die Erfassung von Dokumenten
in der Literaturdokumentation. Verlag Dokumentation: München, Qualität: OnlineDatenbanken; ELIS; 1976.
[42] International Organization for Standardization, Geneve. ISO-3166. Codes for the
representation of names of countries, 4th edition, Qualität: Online-Datenbanken; ELIS;
1993.
[43] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Terminology
and codes for countries and international organizations, Qualität: Online-Datenbanken;
ELIS; 1987. IAEA-INIS-5 (Rev.6).
[44] Fachinformationszentrum Karlsruhe. List of journals and serials publications,
Qualität: Online-Datenbanken; ELIS; 1992. FIZ-KA-Serie 3-8, 240 pages.
[45] M. Ockenfeld and E. Wetzel. Fachinformationsdatenbanken und Informationssysteme. Gesellschaft für Mathematik und Datenverarbeitung (GMD), Inst. für Integrierte
Publikations- und Informationssysteme (IPSI), Qualität: Informationsdienste; 1990.
[46] A. Oßwald. Dokumentlieferung im Zeitalter Elektronischen Publizierens. Schriften
zur Informationswissenschaft 5. Universitätsverlag Konstanz: Konstanz, ELIS; 1992.
[47] C. Best, B. Gravemann, A. Jacobs, and O. Ruczka. Erste Erfahrungen mit dem
automatischen Übersetzungssystem METAL. ABI-Technik, 13(1):41–44, ELIS; 1993.
[48] P. Biebricher, N. Fuhr, G. Knorz, G. Lustig, and M. Schwantner. Entwicklung
und Anwendung des automatischen Indexierungssystems AIR/PHYS. Nachrichten für
Dokumentation, 39(3):135–143, ELIS; Allgemein; 1988.
[49] W. Lück, W. Rittberger, and M. Schwantner. Der Einsatz des Automatischen
Indexierungs- und Retrieval-Systems AIR im Fachinformationszentrum Karlsruhe. In
R. Kuhlen, editor, Experimentelles und praktisches Information Retrieval. Festschrift
für Gerhard Lustig, number 3 in Schriften zur Informationswissenschaft, pages 141–
170. Universitätsverlag Konstanz: Konstanz, Qualität: Online-Datenbanken; Allgemein;
1992.
[50] Lesesystem Discover 7320. Der Durchbruch. Sonderdruck: PC Magazin, 49,
Qualität: Online-Datenbanken; ELIS; 1987.
[51] H. Behrens. Datenbanken und ihre Produktion. Foliensammlung zur Vorlesung im
SS94. Universität Konstanz. Informationswissenschaft, Qualität: Online-Datenbanken;
ELIS; 1994.
[52] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Thesaurus,
Qualität: Online-Datenbanken; ELIS; 1995. IAEA-INIS-13 (Rev.34).
[53] Webster’s new geographical dictionary. Merriam: Springfield, MA, Qualität:
Online-Datenbanken; ELIS; 1972.
[54] M. Gorman and P. Winkler, editors. Anglo-American cataloguing rules. American
Library Association: Chicago, 2 edition, Qualität: Online-Datenbanken; ELIS; 1988.
[55] R. Kuhlen. Information retrieval systems - a challenge for linguistic data processing. In R. Kuhlen, editor, Informationslinguistik: theoretische, experimentelle, curriculare und prognostische Aspekte einer informationswissenschaftlichen Teildisziplin,
number 15 in Sprache und Information, pages 89–117. Niemeyer: Tübingen, Qualität:
Online-Datenbanken; 1990 und älter; Kuhlen, Rainer; 1986.
[56] R. Juntunen, R. Ahlgren, J. Jalkanen, R. Hagelin, P. Helander, T. Koivulu,
I. Kivelä, E. Mickos, and A. Rautava. Quality requirements for databases - project for
evaluating Finnish databases. In 15th International Online Information Meeting 1991,
pages 351–359. Learned Information: Oxford, England, Qualität: Online-Datenbanken;
1991.
[57] J.P. Lardy. Bibliometric treatments according to bibliographic errors and data
heterogeneity: the end-user point of view. pages 547–556. Learned Information: Oxford,
Qualität: Online-Datenbanken; 1992.
[58] H.D. Withe and B.C. Griffith. Quality of indexing in online data bases. 23(13):211–
224, Qualität: Online-Datenbanken; 1987.
[59] P. Cahn. Testing database quality. Database, (February):23–30, Qualität: OnlineDatenbanken; 1994.
[60] R. Hammwöhner and R. Kuhlen. Semantic control of open hypertext systems by
typed objects. Journal of information science. Principles and practice, 20(3):175–184,
Hammwöhner, Rainer (Mitarbeiter bis 10/96); KHS-Publikationen; Qualität: Allgemein;
Journal of information science. Principles and practice; 1994; Kuhlen, Rainer; 1994.
Amsterdam, NL: Elsvier.
[61] M. Rittberger, R. Hammwöhner, R. Aßfalg, and R. Kuhlen. A homogenous
interaction platform for navigation and search in and from open hypertext systems.
In RIAO 94 Conference Proceedings. Intelligent multimedia information retrieval
systems and management, pages 649–663, New York, VIR Antrag; Hammwöhner,
Rainer (Mitarbeiter bis 10/96); Information Filtering; KHS-Publikationen; Aßfalg, Rolf
(Mitarbeiter bis 6/99); 1994; KHS News-Filtering; Rittberger, Marc; Kuhlen, Rainer;
VIR Literatur; 1994. Rockefeller University.
[62] M. Rittberger. Support of online database selection in KHS. In M.E. Williams,
editor, National Online Meeting’94, New York 10 -12 May, pages 379–387, KHSPublikationen; 1994; VIR Antrag; VIR Literatur; Rittberger, Marc; 1994.