Quality Measuring in the Production of Databases
Transcrição
Quality Measuring in the Production of Databases
Quality Measuring in the Production of Databases M. Rittberger, W. Rittberger Universität Konstanz Informationswissenschaft Postfach 5560 D87 D–78434 Konstanz Tel: +49–7531–883595 email: [email protected] Abstract Quality, quality control and assurance, and especially quality management are becoming more and more important in using online information services. In this paper we will focus on the production of online bibliographic databases and discuss possible attributes responsible for the quality of this kind of online database. Starting with the acquisition of the original document, the quality of document analysis in the selection of the document, subject and bibliographical analysis are considered, whereby we give examples of how the quality attributes of the different productions steps may be measured. Finally we describe, as an example, a database recording and production system with its testing routines and reference data. 1 Introduction In recent years, there has been a dramatic increase in the number of publications dealing with information services, information brokers, online databases and similar topics. Thereby many studies have been made of quality, quality control and assurance, and especially of quality management. The transition from the production of printed "manuscripts" to that of electronic databases - which in all subject areas, and also in commercial activities, coincides with an enormous increase in the available literature - has pushed quality and all its aspects into the foreground. Simply for content- and time-related reasons, online services of the sort common today facilitate much more effective and efficient searching than has been and is possible using conventional printed information services (abstracts, journals, bibliographies, handbooks, etc.). The much deeper and more comprehensive view of the individual parts of an information unit provided by electronic versions of the aforementioned and new information services leads to new demands on and standards for the information chain1 [4, 5, 6]. This "information chain" thereby affects all participants, from the creators of information (authors, patent applicants and other knowledge producers) to the producers (database producers, publishers, patent offices), to the distributors (hosts, network managers, information brokers and services) and, finally, to the end-users. The different links which form the information chain are mutually dependent, especially in the case of adjacent ones, so that every partial link is dependent on the fault–free service of its providers. 1 According to [1], probably drawing on the Portersian value chain [2, 3], also called a value creation chain. The overall quality which information searchers expect, and which is to be maintained within the information chain, consists of aspects of the following domains: the construction of the information unit itself and the structural and organizational compilation of a database; the system which makes data available; a database design is developed and the database is implemented on a computer (file building); the system with which data are selected and employed. Affecting the performance in this domain are the retrieval system, support by a host, and the qualifications of the user with regard to technology, methodology and subject knowledge. In recent years the information searcher has increasingly been the one who has defined quality attributes and criteria (see section 2). He is thereby interested only in the endproduct online service and never (or very seldom) differentiates among the aforementioned areas. For the evaluation of an online service it is, however, always necessary to consider the quality of the individual domains separately, i.e., database production, computer implementation, online retrieval and use. In the given case, an overall judgment should be based on the quality of the different domains (e.g., without good database design and good implementation, even the best database can be unsatisfactory). Thus in the future quality evaluations will be necessary for each of the individual domains. An essential point for the information searcher is knowledge of which added values are created in the individual domains. Drawing on [7, p. 90], product-related added value creation has the following aspects: The comparative added value is obvious from the electronically available version of the online database, as opposed to the usual printed version; The inherent added value is produced by the analysis of the data sets and their electronic selectability in the database; An aggregative added value is given by the collection of data sets in a database. Quality and added value creation in producing databases occupies the center of our study. We will make a few general observations on the quality of information in online databases and then concretely suggest standard values and scalings (as called for in [8]). Further to be determined is the extent to which tools, such as rules, standards, norms, guidelines and manuals, are available for the production of a database and what guidelines for quality specifications can be derived from them. As well, quality attributes and criteria of the information user should thereby also be taken into account. As an example, we will discuss reference databases with literature references,2 since they currently play an important role in terms of numbers, size and significance [9, 10]. Most of our conclusions can, however, also be applied to other information sources similar to bibliographic databases. Rather than focusing on user perspectives, which ultimately hold the foreground, in this article more attention will be devoted to system perspectives. We thereby find ourselves in agreement with Kuhlen, who also prefers not to equate quality with added value from a user-oriented perspective. Especially for information products and services, he favors evaluating quality by means of a combination of user perspectives and system attributes. He calls for norms and the definition of standards, particularly in the case of information products [7, p. 93]. 2 Also called literature or bibliographic databases. 2 Quality In the relevant literature one finds a large variety of different descriptions, definitions and notions for the concept of quality [11, 12]. In the narrower environment of information science, [13] states that "quality is, like ethics, situational - at least in my universe - and I suspect in that of most search professionals." Arnold believes that "quality is electronic publishing’s golden idol," and for him quality is a question with many answers: "Toyota Motors defined quality as products that conform to the specifications" [14]. From the producer’s side, various authors have described quality attributes and criteria for the production of databases and given details on not only individual parts of an information unit, but also on individual work steps [15, 16, 17, 18, 19]. Despite the range of interpretations of quality by database producers [20, 21, 22, 23, 24], literature shows that over the last few years there have been increasing calls for higher standards regarding the quality of various attributes and aspects of information services. The transition from printed sources containing a few thousand information units to electronic manipulation of hundred-thousands or millions of information units in a database and the direct use by information services and end-users have created a new situation which gives users the opportunity to obtain direct influence on database producers and database providers. But even today it can still be maintained that there is no comprehensive and objective concept for the quality of databases, and that the major unsolved problem in regard to the quality of information services consists in the development of usable performance criteria [25]. End-users, and especially information brokers, have increasingly defined their requirements in terms of the services which they expect from information providers [26, 27, 28, 29, 30, 5, 31, 32, 33]. In an opinion survey of European information specialists from 12 different countries, Wilson asked the specialists to rank ten quality criteria for databases — which were selected from SCOUG 1990 (see [34]) — [9]. He found the following rank order, based on the significance of the criteria: coverage, accessibility, timeliness, consistency, accuracy, value, documentation, harmonization, output, support. [34] discusses the requirements set by the "Southern California User Group" (SCOUG), generalized in [26]: Ability to set limits on the basis of geography, language and contents; Top-to-bottom indexing; Coding of contents and document type; Connection between related data, e.g., connections between conference proceedings and the corresponding addresses; Accurate listing of authors and titles; No abbreviation of journal titles; Author affiliations completely searchable; At least the following fields must be present: author, affiliation, title, source and country of origin, publication date, summary, indexing. Our summary of quality requirements, attributes and criteria contains demands on not only the database producer, but also the host, although users do not wish to accept this distinction [16]. From the above discussion, five quality requirements can be derived for the production of databases which are especially significant: Scope and coverage of the subject area: By scope we mean the subject-related contents of an area which is touched by the database. All relevant documents (publications) - i.e., all those classifiable as dealing with the subject matter of an area - are to be described in their full extent as information units. The area can be subject-matter oriented, multidisciplinary, and also mission-oriented. Geographic location, linguistic region and time period are further criteria of coverage. Comprehensiveness: This means the inclusion and presence of all sorts of documents (publications): monographs, chapters and articles in monographs; journals; journal articles; reports; articles and/or chapters from reports; conference papers, conference proceedings; grey literature; dissertations; patents; norms. Comprehensiveness can be international or limited on the basis of geographic, temporal or linguistic viewpoints (e.g., dissertations only from the English-speaking world). Currency and timeliness: This means the time period between the publication of a text (publication date) and the appearance of the information unit of this publication in a database. Also usable as a currency indicator is the share of information units from the year of publication included out of all the information units processed in that year. Accuracy: This means the avoidance of errors in all stages of creating an information unit: a. in document analysis; b. during entry in the data fields; c. and orthographical errors. Consistency: This is uniformity and agreement in the processing of all information units. In order to fulfill the requirements for a high level of consistency, strict compliance with rules and working instructions is necessary: a. in the choice of documents (scanning); b. in classification and indexing (e.g., classificatory schema, thesaurus, indexing rules); c. in cataloguing (e.g., cataloguing rules, category schema). These quality requirements show, on the one hand, the great interest of various groups of persons involved with online retrieval in qualitatively excellent databases, and on the other, a demand for the realization of a comprehensive and concrete treatment of the quality of databases [35, 36]. To this end we will develop a model for the "quality profile" of a database which presents qualitative and quantitative statements on "quality indicators" for the individual parts and elements of an information unit or database. This model will be oriented towards the production of databases. To illustrate and clarify these concepts, using various tables we will provide quantitative and qualitative details of two hypothetical databases (db1 and db2). By these the different possibilities for the analysis of data - both formally and content based - as well as for the document—types are demonstrated. In addition to this, where possible the connection will be shown between user requirements and quality, as well as the relationships to the information chain. 3 Production of Databases In the following discussion we consider in greater detail the production process for a %-Share3 Type of Procurement db1 db2 Conventional, acquisition by purchase, exchange or gift of single orders for the sources 55% 15% Preordering of document series for the sources 25% 15% Direct submission of documents by the publishers on the basis of specific agreements 15% 15% Direct submission of galley proofs of documents by the publishers on the basis of specific agreements 5% 15% Direct submission of analyzed documents by the publishers on the basis of specific agreements (e.g., as worksheets, machine-readable texts) - 40% Table 1 Forms of acquisition for bibliographic databases. database and distinguish three production steps, the acquisition of the original document, the analysis of the document (selection, subject analysis, bibliographic analysis) and the data recording and production system. 3.1 Acquisition of the original document The acquisition of a document requires three steps: Discovering and monitoring the publication and offering of literature; preselection of the relevant sources; actual acquisition. Literature surveillance and scanning require, on the one hand, subject knowledge, in order to be able to determine the relevance of a source. The actual acquisition, on the other hand, requires documentary or bibliographic knowledge, in order to assure that relevant available documents are identified, and that selected documents will be promptly ordered and delivered. Table 1 shows the various ways of procuring a document for a database. It gives information on whether a document has still to be obtained conventionally, or whether "half-ready products" or even analyzed documents can already be delivered. The percentage figures for the two databases (db1 and db2) are thus a measure for the speed with which documents can be brought into the production process. As in the case of db2, higher percentages for the delivery of galley–proofed and analyzed documents are thus currency indicators. A high share of conventionally procured documents (e.g., through exchange), as with db1, suggests a lower degree of currency for the database. The methods of acquisition for db1 and db2 as described in table 1 are not satisfactory, yet. Table 1 shows that 80% of the documents are delivered conventionally, for db2 40% of the documents are already analyzed by the publisher and directly transferred 3 percentage-shares are given as the percentage of the given value for the production of a database for a certain period of time (e.g. a year) to the database4, which indicates high accuracy and authenticity. With ongoing change from conventional acquisition (i.e. with the database producer themselves doing the whole process of document analysis) to delivery of bibliographic and content analyzed documents a quicker and more efficient procurement of documents for the integration of such literature into databases could be achieved. 3.2 Document Analysis The task of document analysis consists in making an accurate and comprehensive description of the original document. For this, clear and unambiguous methods of subject analysis and bibliographic processing are needed, based as much as possible on rules, guidelines, norms, manuals, etc. In addition, it is necessary to have an accurate and consistent description of the contents of the formal structures and physical characteristics of the data sets. Selection of Documents In the selection of a document it is decided, whether a specific document (publication) should be considered for processing and inclusion in a database. The database producer who takes the responsibility for the production has to give a clear statement on selection within the database policy to allow the customer an exact overview on the subject related and bibliographic content of the database. For the selection, clear and unambiguous guidelines to determine database content, document types, deliminations, and other selection criteria have to be defined. Also the tools like subject classification schemes, thesauri, keyword lists, type of document lists, category schemes, etc. have to be employed. Based on these guidelines and the subject related tools the scope and coverage of the database is determined, other guidelines and tools destinate the type of documents to be treated and give answer to the comprehensiveness of the database. Other guidelines determine the limits of a database in respect to the geographical area, language and further specific elements. Percentages, as numerical values, are indicators which permit an overview of the distribution. Besides the indicators for the evaluation of a database named in tables 1 and 2, a further indicator is the number of documents present after acquisition and selection in relation to the sum of all possible documents. The number of possible documents can, of course, only be estimated. Table 2 lists the elements which are to be considered in deciding what to include in the databases db1 and db2.5 The first column of db1 and the first column of db2 indicate what subject classification areas, what document types and which delimitations are used. The second column of db1 and the second column of db2 give the distribution in percentages: they show, for example, that the size of database db1 is smaller than that of db2, because fewer subareas of the subject field were included; db1 includes above all books, reports and grey literature, while db2 is a database chiefly containing journal articles and conference papers. This information clearly relates to the completeness of a database. More types of publications were includes in db2, even though the focus is on journals and conference reports. In db1, by contrast, the nonconventional 4 The publishers Elsevier Science and American Institute of Physics, for example, offer database producers the delivery of analyzed documents and thereby contribute to accelerating acquisition and document analysis. Other publishers offer galley proofs of documents, refereed and corrected by the author, finished by the publisher, but not yet distributed. 5 For an overview of the number of documents procured within a certain period not only in terms of contents, - e.g., with a classification according to the chief groups of a classificatory schema - but as well in terms of document type, absolute values can also be given for the selection criteria in table 2. db1 Selection Criteria % - Share /- Subject Area: (e.g. ACM-Classification) Subareas: A - general literature B - hardware C - comp. sys. org. D - software E - data F - theory of comp G - math. of comp. H - inf. sys. I - comp. methodologies J - comp. applications K - comp. milieux Type of Publication: Journal Article Book Report Grey Literature Dissertation Patent Norm Conference Contribution - - - Selection Criteria db2 /- % - Share 80% 100% 15% 10% 40% 20% 10% 5% 5% 10% 15% 10% 5% 5% 5% 10% 15% 10% 10% 35% 15% 30% 10% 1% 9% 70% 5% 2% 1% 1% 5% 20% - Attributes Attributes Delimitation: Geographical Time Period Language EU countries Last Five Years EU Languages International None All Languages Availability of the Original Yes No Processing Priority Books Core Journals Table 2 Specific selection criteria for the choice of literature. publications of grey literature and reports play a more important role. Further important information concerning comprehensiveness can be inferred by studying the delimitations of a database. Included in db1 are documents which were published in EU countries over the last five years, while db2 includes journal articles and conference papers which were produced internationally. Inferences can be made from processing priority about the currency and topicality of the the contents of a database. We can assume from the preference of db2 for journals, in contrast to db1’s favoring of books, that the contents of db2 are more current and topical than those of db1, even if reports and grey literature are included in it, those not highly valued in the processing priority of db1. Db1 has a more specialized coverage, holds a high percentage of non-conventional literature and covers a smaller regional scope. Db2 though is more international and has a wide content related range. It contains many journals and conference publications which are quickly available, but only a poor number of other document types. Thus both databases are not complete in certain fields. Subject Analysis Subject analysis serves as a means for the description of a publication’s scientific contents. The nine points listed for the subject analysis, see table 3, essentially specify the scientific contents and thus the value of a database. The number of points dealt with establishes the breadth of an analysis, and individual numerical values and percentages suggest the worth and depth of the evaluation. But surprisingly, despite its significance in the productive process, subject analysis was not listed in the enumerations of user requirements [34, 9]. Table 3 summarizes the steps involved in subject analysis. They include abstracting, classifying, various possibilities for indexing, and further elements such as main keywords, data identification, title specification and indexing for special areas. The first column of db1 and the first column of db2 show, for example, which steps were carried out for the two databases. The second column of db1 and the second column of db2 give typical values for these steps. Db1 and db2 differ strongly in subject analysis. In db1, abstracts are taken from the original, while in db2 new abstracts are composed. The take over of the abstracts in db1 makes possible a quicker processing of documents and therefore contributes to the timeliness of the database, while in contrast, the composition of abstracts in db2 increases the consistency of the abstracts as uniform standards are employed for the creation of abstracts, as, for instance, the ’Instructions for submitting abstracts’ [38]. In db1 only supplementary keywords are given and a title specification is made to better identify the contents, while in db2 the type of contents (e.g., experimental, theoretical, etc.) is established, a thesaurus is used, and 11.8 descriptors are assigned per document. Such an extensive analysis means an enormous advantage for users of db2 with on a later search. For the evaluation of subject analysis, it is also necessary to know which rules, guidelines, norms, classifications, thesauri and manuals are available for the preparation of the different elements, and what competence they have, not only on the internal level, but also on the national or international levels. The following enumeration gives examples of (a) internal, (b) national or (c) international instruments of this sort: Abstracting and Indexing: a. Manual for subject indexing [39]; b. JICST Thesaurus English Version [40]; c. Instructions for submitting abstracts [38]; Classification: c. Subject categories and scope description [37] Statement of which rules, etc., are employed for subject analysis are likewise quality indicators for the evaluation of an information unit or database. Subject analysis has great influence on the information chain, since its excellence and comprehensiveness strongly affect the relevance of search results. db1 Subject Analysis /- db2 %-Share or Num. Values %-Share /- or Num. Values On the basis of the Original Document 90% 100% Abstract: Creation Inclusion (Take over) Improvement Translation 30% 70% 30% 60% 90% 10% 20% 35% 157 1.2 570 2.4 - - 10 1.7 - - 20,000 11.8 - - - - 5.5 - - Subject Classification: Number of all Terms within a Classification Scheme (e.g. [37]) Number of Terms per Information Unit Type of Content: Number of all Codes within a Type of Content Scheme Number of Codes per Information Unit Indexing: Thesaurus: Total Number of Available Descriptors Number of Descriptors per Information Unit Controlled Vocabulary: Total Number of Available Descriptors Number of Descriptors per Information Unit Supplementary Keywords Number of Descriptors per Information Unit Main Keywords and Qualifier Pairs (M-Q): Number of Terms per Information Unit - - 2.4 Data Identification (Data Flagging and Tagging) - - - Title Augmentation Indexing for Special Areas (e.g., chemistry, astronomy) - - - - Table 3 Values given for the subject analysis of a bibliographic database. Bibliographic Analysis Bibliographic analysis contributes to the description of the formal elements of a publication or of an information unit. Tables 4 and 5 include the key elements which are drawn on in processing6. They include document types, author, title, publisher data, conference elements and further specific elements, as for example the International Standard Serial Number (ISSN) for journals, report numbers and corporate bodies for reports or the International Patent 6 There can be further database elements, which are necessary to fulfill the goals the database producer wants to achieve (e.g. citation data, URL, pricing information, , physical properties, etc. db1 Data Element /- Title of Publication: Original English Carrier Language of the Database Authors db2 %-Share or Num. Value /- %-Share or Num. Values 70% 30% 100% 10% 90% 100% 1 all Affiliation - - 30% Country of Affiliation - - 100% Collaborators - Editors Publication Date Place of Publication - Collation Original Language - Availability Note Contract - Number Conference Elements: Title of Conference Place of Conference Date of Conference Type of Document: Journal Book Article in Book Report Grey Literature Dissertation Patent Standards Conference Article Preprint - max. 3 all 60% >90% - 100% 70% 80% - 60% 10% - - - - - - 100% 100% 100% - - - Table 4 Elements of a bibliographic description. Classification (IPC) for patents. Likewise with subject analysis it is established here what rules, etc. were used for inclusion and what competence they have (a) internally, (b) nationally and (c) internationally. Examples are: Cataloguing: b. Guidelines for the cataloguing of documents [41] db1 Data Element Journal: Title ISSN CODEN Date of Publication Collation: Volume and Number Book: ISBN Publisher Place of Publisher Information of Monographic Series /- - - db2 %-Share or Num. Value 100% 50% 60% 60% /- %-Share or Num. Values - 100% 100% - 100% 100% 100% 100% 100% 100% 100% 100% Report: Report Number Corporate Entry 100% 100% Grey Literature and Dissertations: Corporate Entry 100% 50% - - 100% 100% 100% - - Patents: Country Patent Number International Patent Classification Norm: Country Norm Number - - 30% - Table 5 Elements of a bibliographic description of individual types of publications. Country codes: c. Codes for the representation of names of countries [42] c. Terminology and codes for countries and international organizations [43]; Journal Title: a. List of journals and serial publications [44]; The first column of db1 and the first column of db2 show which of the elements were obtained for the database. The percentages and numerical values given in the second column of db1 and the second column of db2 show the extent to which the requirements were fulfilled. As before the two databases differ thoroughly.Thus, for example, only the first author of a publication is listed in db1, whereas all authors are named in db2. Failure to include all authors is naturally a indication of the precision of the database, since the document is not completely described. The weight which should be assigned to this inadequacy when evaluationg a database depends on whether it is common in a specialized area for several or even many authors to publish jointly. While conferences are not included in db1, conference titles, locations and dates are given in db2. These details are helpful in identifying conference publications (table 2 and 4) and present an informational value of their own for conferences. In db2, affiliation is listed, an indication which also has increasing significance for users. The complete and accurate inclusion of all formal attributes helps the user in selecting a document in a larger document collection and thereby influences the quality of the retrieval and its results. The different elements of a bibliographic analysis are needed for the further links within the information chain. For example, the greatest degree of fanning of the information unit during bibliographic analysis is useful in database design, in order to improve retrieval possibilities. Aside from the selection of the subject, bibliographic data are necessary in information use for limiting the formal level - e.g., limiting the selection to information on patents obtained after 1987. Further, bibliographically error-free processed data are assuming increasing significance for information users, since the automation of document ordering and delivery requires correct data [45, 46]. As well in regard to data exchange in international networks like the Internet, highly accurate bibliographic data is increasingly needed, which must be produced according to international standards. In regard to the aforementioned quality requirements of users, accuracy and consistency play an especially great role in bibliographic analysis. The accurate, correct and consistent application of rules and the accurate, correct and consistent production of data elements can greatly increase not only the accuracy of a database, but also the consistency of its data. Furthermore, a contribution can be made to the currency and high-speed processing of a dabatase, since through the avoidance of errors at this production stage, expensive and time-consuming correction at a later date becomes unnecessary. 3.3 Data Recording and Production System Computer-supported production methods are being increasingly employed in the production of databases – especially because of the rapidly growing volume of available data. COMPINDAS [19] will be described here, which in all production phases uses computer-supported methods for the production of databases. COMPINDAS (COMputer-supported and INtegrated DAtabase production System) includes functions for the acquisition and analysis of documents, the employment of reference data, and the statistical evaluation and creation of machine-readable endproducts. The COMPINDAS data-recording scheme makes possible a very specific and detailed entry and structuring of data elements. For the entry of data, a comprehensive character set exists with which special symbols and formulas can be represented. Autonomous systems like METAL (Machine Evaluation and Translation of natural Language) [47], AIR (Automated Indexing and Retrieval) [48, 49] and Kurzweil Discover 7320 [50] support the production process. Errors and their avoidance (see also [22]) play a major role in automatic procedures, since through consistent, automatic checking of the entered data a subsequent correction Reference Files Thesaurus db1 - /- db2 - /- - Classification - Author Institution/Affiliation - Conferences - Journal Title and Abbreviations - Countries and Country Codes - Location Language Designations - Dictionaries - - Character Set Table 6 Reference files for bibliographic databases procedure can be avoided. The error rate - number of errors per 1000 entered symbols or number of errors in a specific data field - can be used as a statement for a quality evaluation. Testing routines and reference data are employed in the production process. According to [51] five types of tests can be made: Consistency Test: The included data are compared with standardized lists using a text-analysis procedure. Plausibility Test: Which predefined rules must be fulfilled in an information unit is set down in a matrix of elements dependent on the type of document. The absence of fields or errors in the dependency of data fields is indicated. Syntax Test: This is made in order to be able to further process data with defined formats. Errors can be avoided through the greatest possible fanning of the data elements. Duplication Test: This test ensures that there are no double entries, and connections between individual entries are also indicated. Creation of Registers: Data elements are summarized in registers in order to detect irregularities through the use of structured overviews. In the production process, the extent to which reference files are employed is also a measure of judgment. Table 6 includes typical reference data which are used in creating a bibliographic database. It is shown show which data were used in db1 and db2. Thus, e.g., in db1 a classification was used for control, and in db2, a thesaurus [52]. For both databases a standard English dictionary for locations is used [53], and for the authors the AACR2 [54]. Of the different quality requirements, accuracy and consistency are especially important in the use of a data recording and production system. The numerical values for error rates, which testing routines are employed, as well as which reference data are employed with what competence (internal, national, international) are indicators of the quality of the production process. Essential for file building and for the adjacent links in the information chain are knowledge of the fanning, the structuring and formatting of the data elements and of the character set employed, just as high consistency and accuracy and great reliability in data sets and data per se simplify the production of databases. 4 Concluding Remarks Databases are today conceived and produced as original products. They are no longer byproducts of the production of printed services, but in themselves the basis for the production of electronic products and the offering of electronic added value services. According to [14], between 1965 and 2001 there will be four phases in the production of electronic information services. From 1992 to 1997 we will find ourselves in the "reconstruction" phase, in which high-quality databases will be reconstructed (redesigned) using technologically advanced systems and software – as Kuhlen [55] already called for in 1986. In this phase users will better be able to express their wishes, needs and demands. As was the case earlier with the production of abstract journals in printed form, till now very little has been reported on the production of bibliographic databases. The view is also widespread [24, 56, 57] that there is a lack of standards, and demands are being made to produce them on the international level. This view cannot be supported for the production of databases. It has been demonstrated in this paper that the guidelines, rules, norms, instructions and manuals needed for the production of a database are already by and large available. To be sure, in our opinion there is still a need for a systematic and comprehensive overview of these instruments – and as well for information on competence. The production of a catalogue of all rules, guidelines, norms and manuals should therefore be undertaken as soon as possible. In tables 1 - 6 of the above sections we have summarized the steps and data elements necessary for the production of information units and databases. Using as examples two hypothetical databases (db1 and db2), we have characterized the individual values which result from the steps of the production process using numerical values, percentage figures or descriptive statements. These values were designated by us as the available "indicators" of the "quality profile" of a database for the evaluation of a database. This quality profile should be used in the evaluation of databases according to the ISO 9000 Series. Our research demonstrates that the quality of databases can be evaluated using the indicators for the different work steps, and data elements and also using indicators for work instruments. The weighting of the individual indicators for the choices of a database depends essentially on the concrete approach and needs of the user. But objective test for individual database for typical application situations can as well be developed and employed using the indicators named above. Through further research, supplementation and improvements in individual indicators can be achieved. To this end standards with defined values and tolerances must be developed (e.g., for authors or in indexing). Initial attempts are found in [58, 18], who propose optimal values for indexing depth. Research aimed at achieving standards and tolerances is urgently needed, and simultaneously database tests should be undertaken, e.g., with user organizations or information science institutes, as with [59], who calls for the introduction of testing offices. For users, the quality profile is of the greatest value. One can thereby make comparisons of the extent to which a database satisfies one’s own wishes and requirements. A database with an indexing depth of 2 descriptors and only the obligatory bibliographic statements does not permit the same level of retrieval that is possible with a database featuring 11 descriptors and all formal elements including conference information. Decisions on which database should be used are influenceable by the quality indicators for various levels. Within the information chain, creating a quality profile with evaluation-capable indicators for a database is an important intermediate step toward the overall evaluation of an online information service. For the further access points in the information chain, namely for file building (database design and computer implementation), as well as for retrieval and use, quality profiles must likewise be worked out in order to achieve this overall evaluation. In the frame of the Konstanz Hypertext System [60, 61], a database choice component [62] was integrated into the system which contains descriptions of individual databases. There are plans to evaluate databases with the above-described indicators, whereupon the presentation of databases in the Konstanz Hypertext System will be enhanced using the test data. The Konstanz Hypertext System offers very flexible interaction and presentation forms of information so that the data presented in tables 1 - 6 can be presented and used in an appropriate manner. The authors wish to thank Prof. R. Kuhlen (Information Science, University of Konstanz) for valuable discussions, D. Marek (FIZ Karlsruhe) for numerous suggestions and pointers which were useful in the writing of this article. References [1] W.G. Stock. Der Markt für elektronische Informationsdienstleistungen. IfoSchnelldienst, (14):22–31, Qualität: Online-Datenbanken; 1993. [2] M.E. Porter. Wettbewerbsvorteile: Spitzenleistungen (competitive advantage) erreichen und behaupten. Campus: Frankfurt, Qualität: Allgemein; 1986. [3] M.E. Porter and V.E. Millar. How information gives you competitive advantage. Harvard Business Review, (July-August):149–160, Qualität: Allgemein; 1985. [4] W. Schwuchow, editor. Qualität von Informationsdiensten, 7. Internationale Fachkonferenz der Kommission Wirtschaftlichkeit der Information und Dokumentation KWID in der Deutschen Gesellschaft für Dokumentation e.V. DGD in Zusammenarbeit mit der Gesellschaft für Informatik e.V. GI und der International Federation for Information and Documentation FID, Garmisch-Partenkirchen, 2.-4. Mai 1993, Qualität: Informationsdienste; 1993. Deutsche Gesellschaft für Dokumentation: Frankfurt. [5] C. Tenopir. Database quality revisited. Library Journal, (1):64–67, Qualität: OnlineDatenbanken; 1990. [6] U. Hanson. The hidden quality of the database: some (re-)liability aspects. In I. Wormsell, editor, Information Quality: definitions and dimensions, pages 91–121. Taylor Graham: London, Qualität: Online-Datenbanken; 1990. [7] R. Kuhlen. Informationsmarkt. Chancen und Risiken der Kommerzialisierung von Wissen. Number 15 in Schriften zur Informationswissenschaft. Universitätsverlag Konstanz: Konstanz, Herbstschule; Schriften zur Informationswissenschaft; Qualität: Informationsdienste; Kuhlen, Rainer; 1995; Literatur: Mündliche Prüfung; 1995. [8] W. Stock. Qualitätsmanagement von Informationsdienstleistungen. In W. Rauch, F. Strohmeier, H. Hiller, and C. Schlögel, editors, Mehrwert von Information - Professionalisierung der Informationsarbeit, Proceedings des 4. Internationalen Symposiums für Informationswissenschaft (ISI’94), number 16 in Schriften zur Informationswissenschaft, pages 21–32. Universitätsverlag Konstanz, Konstanz, Literatur: Mündliche Prüfung; Qualität: Online-Datenbanken; Rauch/Strohmeier/Hiller/Schlögl 1994; Mehrwert von Information -; 1994. [9] T. Wilson. EQUIP: A European survey of quality criteria for the evaluation of databases: report on the questionary survey. European quality management programme for the information sector, CIQ; ELIS; Qualität: Informationsdienste; 1994. [10] M.E. Williams. The state of databases today: 1994. In K.Y. Marcaccio, editor, Gale Directory of Databases, volume 1, pages XIX–XXX. Gale Research, Qualität: Online-Datenbanken; 1994. [11] D.A. Garvin. Managing quality. The strategic and competitive edge. Free Press: New York, 7 edition, Literatur: Mündliche Prüfung; Qualität: Allgemein; 1988. [12] International Organization for Standardization, Geneve. ISO-8402. Quality management and quality assurance - vocabulary, 2 edition, Qualität: Allgemein; ELIS; 1994. [13] R. Basch. Decision points for databases. Database, (August):46–50, CIQ; Qualität: Online-Datenbanken; 1992. [14] S.E. Arnold. Information manufacturing: the road to database quality. Database, (October):32–39, Qualität: Allgemein; 1992. [15] K. Bürk and D. Marek. Produktion von wissenschaftlich-technischen Datenbanken. Handbuch der modernen Datenverarbeitung (HMD), 25(141):45–54, Qualität: OnlineDatenbanken; 1988. [16] L. Granick. Assuring the quality of information dissemination: responsibilities of database producers. Information Services & Use, 11:117–136, Qualität: OnlineDatenbanken; ELIS; 1991. [17] B. Lawrence and T. Lenti. Application of TQM to the continuous improvement of database production. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part I Database Production, pages 69–87. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; ELIS; 1995. [18] W. Lück. Qualität von bibliographischen Datenbanken: Die Datenbank PHYS. In 5. Österreichisches Online-Informationstreffen in Seggauberg, Qualität: OnlineDatenbanken; ELIS; 1993. [19] D. Marek. Integrated system support for the cooperative production of bibliographic, referral and numeric databases. In D.I. Raitt and B. Jeapes, editors, 17th International Online Information Meeting 1993, pages 347–357. Learned Information: Oxford, Qualität: Online-Datenbanken; ELIS; 1993. [20] T.M. Aitchison. Aspects of quality. Information Services & Use, 8:49–61, Qualität: Allgemein; 1988. [21] E. Beutler. Assuring Data Integrity and Quality: A Database Producer’s Perspective. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part I Database Production, pages 59–68. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; ELIS; 1995. [22] E.T. O’Neill and D. Vizine-Goetz. Quality Control in online databases. In M.E. Williams, editor, Annual Review of Information Science and Technology (ARIST), volume 23, pages 125–156. Elsevier: New York et al., Qualität: Online-Datenbanken; 1988. [23] P.L. Townsend. Commit to quality. John Wiley & Sons: New York, Qualität: Allgemein; 1986. [24] G.M. Wheeler. Securing product-service quality in large-scale bibliographic database production. Master thesis, University of Wales, Qualität: Online-Datenbanken; 1988. [25] A.L. Gilchrist. Quality management in information services - a perspective on European practice. In W. Schwuchow, editor, Qualität von Informationsdiensten. 7. Internationale Fachkonferenz der Kommission Wirtschaftlichkeit der Information und Dokumentation e.V. in Zusammenarbeit mit der Gesellschaft für Informatik e.V. GI und der International Federation for Information and Documentation FID. GarmischPartenkirchen, 2.-4. Mai 1993, pages 92–99, ELIS; Qualität: Informationsdienste; 1993. [26] R. Basch. An overview of quality and value in information service. In R. Basch, editor, Electronic Information Delivery, chapter Introduction, pages 1–10. Gower Publishing: England, CIQ; Qualität: Online-Datenbanken; 1995. [27] R. Fidel and D. Soergel. Factors affecting online bibliographic retrieval: a conceptual framework for research. Journal of the American Society for Information Science, 34(13):163–180, Qualität: Online-Datenbanken; 1983. [28] P. Jasco. Testing the Quality of CD-ROM Databases. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part III Quality Testing, pages 141–168. Gower Publishing Limited: Hampshire, CD ROM; Qualität: OnlineDatenbanken; 1995. [29] A.P. Mintz. Quality control and the zen of database production. Online, (November):15–23, Qualität: Online-Datenbanken; 1990. [30] B. Quint. Better Searching Through Better Searchers. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part II Role of The Search Intermediary, pages 99–116. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; ELIS; 1995. [31] C. Tenopir. Priorities of Quality. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part III Quality Testing, pages 119–139. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; 1995. [32] S.A.E. Webber. Criteria for comparing news databases. In Online Information 92, 8-10 December 1992, London, England, pages 537–546. Learned Information, Oxford, England, Qualität: Online-Datenbanken; 1992. [33] U. Weber-Schäfer. Die Nachfrage und das Angebot von externen Informationen zu Unternehmensstrategien in einem Online-Informationssystem. Entscheidungsorientierte Analyse am Beispiel des europäischen Binnenmarktes, Anforderungen und Konzepte. Number 1660 in Europäische Hochschulschriften: 5, Volks- und Betriebswirtschaft. Lang: Frankfurt am Main, Qualität: Online-Datenbanken; 1995. [34] R. Basch. Measuring the quality of the data: report on the fourth annual SCOUG Retreat. Database Searcher, (October):18–23, CIQ; Qualität: Online-Datenbanken; 1990. [35] C.J. Armstrong. The Eye of the Beholder. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part V The Role of User Groups, pages 221–244. Gower Publishing Limited: Hampshire, Qualität: OnlineDatenbanken; 1995. [36] R. Juntunen, E. Mickos, and T. Jalkanen. Evaluating the Quality of Finnish Databases. In R. Basch, editor, Electronic information delivery: Ensuring quality and value, chapter Part V The Role of User Groups, pages 201–219. Gower Publishing Limited: Hampshire, Qualität: Online-Datenbanken; 1995. [37] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Subject categories and scope descriptions, Qualität: Online-Datenbanken; ELIS; 1991. IAEAINIS-3 (Rev.7). [38] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Instructions for submitting abstracts, Qualität: Online-Datenbanken; ELIS; 1988. IAEA-INIS-4 (Rev.2). [39] Fachinformationszentrum Karlsruhe. Manual for subject indexing, Qualität: OnlineDatenbanken; ELIS; 1990. FIZ-KA-Serie 3-3, 160 pages. [40] Japan Information Center of Science and Technology (JICST), -English versionVol.1. Tokyo. JICST-Thesaurus, Qualität: Online-Datenbanken; ELIS; 1987. [41] C. Hitzeroth, D. Marek, and J. Müller. Leitfaden für die Erfassung von Dokumenten in der Literaturdokumentation. Verlag Dokumentation: München, Qualität: OnlineDatenbanken; ELIS; 1976. [42] International Organization for Standardization, Geneve. ISO-3166. Codes for the representation of names of countries, 4th edition, Qualität: Online-Datenbanken; ELIS; 1993. [43] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Terminology and codes for countries and international organizations, Qualität: Online-Datenbanken; ELIS; 1987. IAEA-INIS-5 (Rev.6). [44] Fachinformationszentrum Karlsruhe. List of journals and serials publications, Qualität: Online-Datenbanken; ELIS; 1992. FIZ-KA-Serie 3-8, 240 pages. [45] M. Ockenfeld and E. Wetzel. Fachinformationsdatenbanken und Informationssysteme. Gesellschaft für Mathematik und Datenverarbeitung (GMD), Inst. für Integrierte Publikations- und Informationssysteme (IPSI), Qualität: Informationsdienste; 1990. [46] A. Oßwald. Dokumentlieferung im Zeitalter Elektronischen Publizierens. Schriften zur Informationswissenschaft 5. Universitätsverlag Konstanz: Konstanz, ELIS; 1992. [47] C. Best, B. Gravemann, A. Jacobs, and O. Ruczka. Erste Erfahrungen mit dem automatischen Übersetzungssystem METAL. ABI-Technik, 13(1):41–44, ELIS; 1993. [48] P. Biebricher, N. Fuhr, G. Knorz, G. Lustig, and M. Schwantner. Entwicklung und Anwendung des automatischen Indexierungssystems AIR/PHYS. Nachrichten für Dokumentation, 39(3):135–143, ELIS; Allgemein; 1988. [49] W. Lück, W. Rittberger, and M. Schwantner. Der Einsatz des Automatischen Indexierungs- und Retrieval-Systems AIR im Fachinformationszentrum Karlsruhe. In R. Kuhlen, editor, Experimentelles und praktisches Information Retrieval. Festschrift für Gerhard Lustig, number 3 in Schriften zur Informationswissenschaft, pages 141– 170. Universitätsverlag Konstanz: Konstanz, Qualität: Online-Datenbanken; Allgemein; 1992. [50] Lesesystem Discover 7320. Der Durchbruch. Sonderdruck: PC Magazin, 49, Qualität: Online-Datenbanken; ELIS; 1987. [51] H. Behrens. Datenbanken und ihre Produktion. Foliensammlung zur Vorlesung im SS94. Universität Konstanz. Informationswissenschaft, Qualität: Online-Datenbanken; ELIS; 1994. [52] International Atomic Energy Agency (IAEA), Vienna (Austria). INIS: Thesaurus, Qualität: Online-Datenbanken; ELIS; 1995. IAEA-INIS-13 (Rev.34). [53] Webster’s new geographical dictionary. Merriam: Springfield, MA, Qualität: Online-Datenbanken; ELIS; 1972. [54] M. Gorman and P. Winkler, editors. Anglo-American cataloguing rules. American Library Association: Chicago, 2 edition, Qualität: Online-Datenbanken; ELIS; 1988. [55] R. Kuhlen. Information retrieval systems - a challenge for linguistic data processing. In R. Kuhlen, editor, Informationslinguistik: theoretische, experimentelle, curriculare und prognostische Aspekte einer informationswissenschaftlichen Teildisziplin, number 15 in Sprache und Information, pages 89–117. Niemeyer: Tübingen, Qualität: Online-Datenbanken; 1990 und älter; Kuhlen, Rainer; 1986. [56] R. Juntunen, R. Ahlgren, J. Jalkanen, R. Hagelin, P. Helander, T. Koivulu, I. Kivelä, E. Mickos, and A. Rautava. Quality requirements for databases - project for evaluating Finnish databases. In 15th International Online Information Meeting 1991, pages 351–359. Learned Information: Oxford, England, Qualität: Online-Datenbanken; 1991. [57] J.P. Lardy. Bibliometric treatments according to bibliographic errors and data heterogeneity: the end-user point of view. pages 547–556. Learned Information: Oxford, Qualität: Online-Datenbanken; 1992. [58] H.D. Withe and B.C. Griffith. Quality of indexing in online data bases. 23(13):211– 224, Qualität: Online-Datenbanken; 1987. [59] P. Cahn. Testing database quality. Database, (February):23–30, Qualität: OnlineDatenbanken; 1994. [60] R. Hammwöhner and R. Kuhlen. Semantic control of open hypertext systems by typed objects. Journal of information science. Principles and practice, 20(3):175–184, Hammwöhner, Rainer (Mitarbeiter bis 10/96); KHS-Publikationen; Qualität: Allgemein; Journal of information science. Principles and practice; 1994; Kuhlen, Rainer; 1994. Amsterdam, NL: Elsvier. [61] M. Rittberger, R. Hammwöhner, R. Aßfalg, and R. Kuhlen. A homogenous interaction platform for navigation and search in and from open hypertext systems. In RIAO 94 Conference Proceedings. Intelligent multimedia information retrieval systems and management, pages 649–663, New York, VIR Antrag; Hammwöhner, Rainer (Mitarbeiter bis 10/96); Information Filtering; KHS-Publikationen; Aßfalg, Rolf (Mitarbeiter bis 6/99); 1994; KHS News-Filtering; Rittberger, Marc; Kuhlen, Rainer; VIR Literatur; 1994. Rockefeller University. [62] M. Rittberger. Support of online database selection in KHS. In M.E. Williams, editor, National Online Meeting’94, New York 10 -12 May, pages 379–387, KHSPublikationen; 1994; VIR Antrag; VIR Literatur; Rittberger, Marc; 1994.