Information science - part 1

Brian Vickery

escher4 (81K)
searching for information

Introduction

Throughout life, all people are engaged in activities –  practical or mental – trying to solve problems, activities that themselves give rise to problems. To solve these problems, people need knowledge. They can acquire this personal knowledge in two main ways. First, they can interrogate the world  –  natural and social  –  by means of closer observation, deeper analysis, controlled experiment, all forms of cognitive interaction. Second, they can enrich their personal knowledge by communicative interaction with the stock of public knowledge that mankind has built up over the millenia, thus acquiring what we may call information. The activities in which people engage also produce two other kinds of knowledge: that embodied in people (skills) and that embodied in their artefacts. The links between all these elements of the process are suggested in Figure 1.

Figure 1: People, knowledge and information

The role of the information profession is to provide procedures and mechanisms that will aid people in their communicative interaction with public knowledge. The main objective of this essay is to present and explore the nature of the information science that has developed out of information practice. It covers the following topics:
Information in society
What is information science?
Early development of information practice
Emergence of information science
Themes in information science, including information retrieval and user domains
Some information science concepts
Problems facing information science

1. Information in society

Information science developed out of information practice during the twentieth century, and has begun to acquire a distinctive shape. Information science studies people, recorded knowledge, and technologies that can bring the two into contact. It is therefore necessarily multidisciplinary, and has no obvious models in other fields with which it may be compared. It is only slowly evolving a unified view of the phenomena it seeks to understand.

The practice of information provision is of long standing. Two key developments have driven it forward – first, printing in the fifteenth century, and then computers and the Internet in the recent period. Practice has only begun to absorb the impact of this second event. Information science is basically a development of the twentieth century

1.1 Pervasiveness of information

We live, we are told, in an information society. But “information” is a phenomenon that has existed in all human societies – and indeed in animal communities. People, like other animals, are in continual interaction with their environment, and each interaction involves an assimilation of information from and about that environment.

To be more specific, people interact in many different ways with natural objects (both inorganic and organic), including those that may bear “traces” of events that have occurred (e.g. footprints in the sand), and they assimilate information about the properties and behaviour of those objects; they likewise interact with man-made products of all kinds, and with the tools used to make those products; they interact with animals, including those that have communicative abilities (e.g. barking dogs); they interact directly with other people (cooperating and communicating), and with signs and signals emanating indirectly from other people (e.g. traffic lights); and they interact with linguistic and graphic records of all kinds.

Interactions often involve much more than information assimilation – e.g., eating a banana or building a house are not primarily informative acts – but they always include the reception of information, in these cases about the taste, texture and smell of the banana, or about the materials and processes involved in building. In relation to other people and to some animals, humans also actively provide information, in speech, signal or writing. This is also true in all instances of human productive and creative activity – the products, even if constructed for utilitarian or aesthetic purposes, are incidentally sources of information for other people. Signs, signals, speech and linguistic records are of course specifically intended to act as information sources. Human interaction is rarely passive – people actively search for information about the objects, organisms and people in their environment, in order to adapt their actions to that environment.

So information transfer is an all-pervasive accompaniment of human activity. Productive and creative labour, consciousness, social cooperation, language and communication may all be seen as interdependent aspects of the historical and evolutionary act of transition from animal to human. The development of labour necessarily helped to bring the members of society closer together by multiplying cases of mutual support and joint activity, and by making clear the advantage of this joint activity to each individual. In short, men in the making arrived at the point where they had something to say to one another. Information assimilated by each individual from his or her interactions with the environment could then be transferred from one person to another to aid their joint activity.

1.2. Uses of the term ‘information’

“Information” is used with various meanings. In the account so far given, it is used to mean “knowledge about some aspect of the environment as assimilated by an individual human mind, or transmitted (directly or indirectly) to another mind.”

It has become customary in biology to widen the scope of the term, and to apply it to situations where one biological element (e.g., a cell, an organ, the protein synthesis mechanism) behaves in a manner that depends on receiving some kind of signal from another element (e.g., a hormone emitted by another organ, a base sequence in the DNA code). “Information” here means “a signal that initiates a specific mode of behaviour in the receiver”: it is in principle the same concept as a signal such as a traffic light, but one not devised, emitted or received by a conscious mind. The term “information” may in a similar way be applied to the action of an automatic controller, such as a thermostat: a thermometer measures the ambient temperature, and its value is the signal that turns a heating element on or off. Here the mechanism has been consciously designed, but in actual operation it is automatic.

In the information service professions, on the other hand, the first-mentioned meaning is narrowed. When knowledge is humanly assimilated from the physical or natural or even social environment, the process is called cognition. Only when it is transferred, directly or indirectly, from one person to another is it thought of as an “information” process. 

1.3 Information, knowledge, data

Information is thus defined as a kind of knowledge. How, then, are the two to be distinguished? Knowledge is what people know, or think they know. It resides initially in an individual mind, perhaps as a set of models of different aspects of the environment, or as “action plans” to deal with those aspects, so that we may talk of a (very complex) personal knowledge structure. Some of it (and in particular all that can be explicitly communicated) exists in linguistic form. Other personal knowledge may be in tacit form – we know something, or how to do something, even though we cannot verbalise and communicate it.

When people interact with the environment they do so through the senses of sight, touch and so on. The sense organs transmit signals to the brain, which then behaves (or learns to behave through trial and error) in particular ways corresponding to the signals received. These responses may be physical actions – a burnt hand is withdrawn from a source of heat – but they can also be mental: some element of the personal knowledge structure may be modified, extended, reorganised as a result of signals received. Information is thus such interpretation by the brain of sense experience as causes a modification of the mind’s knowledge. Information is the new or modified knowledge that has been added to or, better, assimilated to the personal knowledge structure.

So when we say that “we have received information, have been informed” we mean in the first instance that sense experience has caused a modification to our knowledge structure, a modification that we can express linguistically. Another person’s linguistically expressible knowledge may be spoken or recorded, thus becoming public knowledge; so in the second instance when we say we have been informed we may be referring to information transmitted in these ways. In that case, we mean that receipt of a linguistic expression of another person’s knowledge (what she says or writes about what she thinks she knows) has caused a modification to our own linguistically encoded knowledge structure.

It is to be noted that none of these activities give any guarantees about the truthfulness of the information received. The brain may misinterpret sense experience, linguistic information may be invalid, or incorrectly assimilated. The validity of knowledge can only be established by repeatedly carrying out action plans based on the knowledge, testing it to assess how true and reliable it is, modifying it from the experience arising from such repeated experiment. 

Another term that occurs in the discussion of information is “data”. Data is often in linguistic (or at any rate, alphanumeric) form such as tabulations; or it may exist as graphs, diagrams, drawings, maps, photographs, cinefilm, models, computer programs; or as instrumental recordings such as seismograms or stellar photographic plates. A set of data thus arises from the specialised processing of observations of one restricted aspect of the environment – e.g. measurements of the melting points of chemical substances, or responses to a social survey of food habits. Very rarely does a data set as such become part of the mental furniture of an individual. But interpretations of or generalisations from a data set may be assimilated as information.

1.4 The need for information

Information is needed in order to carry out any human activity. Some of this will be obtained via cognition – by direct interaction with those elements of the world involved in the activity, the interrogation of nature or society. But in most cases this must be supplemented by information obtained from other people, either directly or through records.

Think of the range of information needed in industrial society. “The citizen in his or her daily life from time to time needs to know about the availability, quality, and cost of many things, such as consumer goods and services, health and welfare services, education and training facilities. For the daily running of a household he (or she) may need practical information on child care, cooking, gardening, house maintenance, and many other crafts. He will want all kinds of general information to satisfy intellectual curiosity, and to follow up any personal entertainment, sporting or cultural interest. He will keep up with current affairs, social and political events. He will want to learn about possible occupations and their prospects, about currently available jobs, and about local, national or even international associations and their current activities. He may seek legal, financial and other advice. In an occupational capacity each citizen needs technical information about work procedures – whether the work is manual, clerical, technological, supervisory, managerial, educational, scholarly or whatever; administrative information about the work environment, its rules and regulations; personal information about career and working conditions. Those working in product organisations have a wide range of information needs concerning the market, competing products, raw materials and equipment, new technical and administrative methods, legal and financial regulations, sources of finance, of labour, and of services such as power, water and transport” (Vickery and Vickery, 2004).

Product organisations produce consumer goods, tools, machines, utilities. But we can look at production activity far more widely. The service professions produce altered people: healthier, more educated, better groomed, better advised, better protected and so on. Breeders and trainers produce altered animals and plants. Journalists produce accounts of events occurring in the world. The arts produce paintings, sculpture, music, stories, poems, architecture. Some people produce what we may call spectacles: theatre, dance, opera, concerts, cinefilms, sporting events, mass meetings and the like. Scholars produce theoretical constructs and mental tools such as methods of work, mathematics, logic, knowledge schemas. All these activities, and many more, from time to time need information.

1.5 Contemporary characteristics of information provision

For most men and women, throughout most of mankind’s history, information has passed from one person to another only by word of mouth. Even after the invention of writing, and even after the invention of printing, literacy was for long restricted to the few, and word of mouth (perhaps supplemented by pictures) remained the only information channel for the majority. During the last century, the telephone has provided wider person-to-person communication, and the introduction of radio greatly extended the scope of the spoken word in illiterate parts of the world. Television has similarly extended the scope of both speech and pictures. Illiteracy has still not been overcome, but has considerably decreased in many parts of the world.

When few were literate, those few were members of or associated with the dominant groups in society, and public knowledge reflected the interests and opinions of the dominant groups. In the European Middle Ages, for example, there was very little written by or about the many crafts that contributed to the construction of the great technical achievement of the period – its cathedrals. Churchmen and feudal lords had other things to think and write about. But each new step in knowledge distribution – such as printing, the press, radio, television, the Internet – coupled with the spread of literacy, has widened the range of subjects publicly discussed and of social groups able to contribute to that discussion.

The volume of information now publicly available is immense, distributed through the periodical press, books and pamphlets, reports of all kinds, mailshots, radio, television, digital records, the Internet. On every conceivable subject some information has been put into the public domain. Who produces it, and why? Published, broadcast and Web-posted information is emitted for a variety of reasons, some of which are considered below.

The first motive is frankly commercial – to make money by selling something. Every commercial firm publishes advertisements, gets its products reviewed in trade and popular magazines, creates a website, sends out mailshots and email, provides data-sheets and user manuals, and so on. Even if money is not the main object, much publishing is for self-advertisement: this is one reason why academics write papers, and learned institutions of all kinds publish annual reports. Many people as well as academics create websites to publicise their achievements, their expertise, their hobby-horses. Further along this scale are those agencies and individuals who publish to propagate particular political, religious, and other social views and beliefs.

But academic institutions, professional bodies, government agencies, some commercial publishers and even some business firms often also have a genuine educational motive in publishing information materials in the subjects they cover. The advancement of knowledge within any particular profession usually requires the continuous publication of research reports and state-of-the-art reviews, as well as indexes and guides to its cumulative literature. Government and public administration publish legal and regulatory information to tell the public of rules that need to be followed. Journalists, both popular and technical, publish news and opinion, not only to earn a living, but also in the belief that their various readerships have a desire and even a genuine need to know what is happening in their fields of interest. The same motivation can be at work among amateur enthusiasts for films, cookery, hobbies, popular music, sports, etc.

1.6 Information and work

Work activity involves a work object, on which actions are performed on its parts or properties, usually using tools, thus giving rise to a product, which has a particular use. (Later, as well as this material production, there develops mental production, with mental work objects, tools, actions and products.) To communicate about this production, man had to create a vocabulary, names, nomenclature, terminology, for all the elements of work activity. This development has been going on for many tens of thousands of years. By the time of the first recorded texts, five thousand years ago, the process is far advanced. Discussions of Mesopotamian technical texts repeatedly refer to word lists, to ‘a rich vocabulary relating to beer production’, to detailed descriptions of tanning, to ‘a wealth of technical terms’ related to weaving, to texts decribing the techniques of glass-making. Division of labour, giving rise to specialised workers and consequently specialised vocabulary, was already very evident in Mesopotamia: we know for instance of  bakers, brewers, butchers, carpenters, cartwrights, fishermen, fowlers, joiners, potters, shipwrights, smiths, stonemasons, tanners, weavers – to say nothing of astronomers, musicians, physicians, priests and scribes.

From early times – in the hunt or the field – work has been cooperative, there have been work groups. Communication takes place within a group, to facilitate cooperative working, and in the training of new workers (at first, no doubt, by apprenticeship). But since division of labour leads to the exchange of products, communication also takes place between those trading their goods, i.e. between work groups. Within a group, there must be agreement on the exact meaning of the terms used. But exchange leads to a wider need for conventions shared by different work groups – not only with regard to names, but also to measures, of length, distance, area, volume, weight, time. Such units as the cubit (length), the hon (volume), the shekel (weight) and the month were developed about the same time as writing, if not before. Money – which can be exchanged for any other commodity – is another form of standard measure that must be accepted by all work groups who exchange products.

There are a number of instances of large work groups in pre-capitalist times – e.g. those who laboured to build pyramids, or slaved in mines, or manned large ships, or served in armies – and in such groups problems of work organisation and consequently of communication were undoubtedly present. But on the whole in Europe, until at least the seventeenth century, small-scale craft industry was the rule. Then factory production began to come in. By subdivision of craft activities into the work of separate labourers, and by combining into one enterprise a number of different crafts that contributed to the construction of a complex product, assemblies of very diverse sets of work activities were created. This necessitated the development, over and above those carrying out the basic activities, of employees whose tasks were to design, plan and superintend those activities. The introduction of machinery – ultimately, automation – gave greater importance to those involved in its design, planning and maintenance. Some idea of the complexity of a modern factory is given by Figure 2, where the arrowed lines represent flows of materials, information or money. Commmunication is involved in all these flows.

Figure 2: Flows in a factory

1.7 Social developments and information demand

So we see that most human activity, except the purely instinctive, is based on knowledge. We gain knowledge through experience – by trial and error action, accompanied by thinking about the results of our actions. Or we learn from other, more experienced people – we receive information. As the amount of knowledge has grown in the world – more and more of it recorded in some way – the ratio of what we know from our own experience to the total available knowledge becomes ever smaller. We become more and more dependent on existing information resources.

Figure 3: Social developments and information demand

Figure 3 attempts to give a sketchy picture of  social developments leading to increased information demand. The coming together of people in ever larger groups creates a need for administrative information. This is not only a matter of administrators requiring information about the community, but also of the people in the community needing to know the laws, regulations, policies, and decisions of the administrators. These information needs have steadily increased, on the one hand as a result of the ever-greater involvement of governments in the life of the community (taxation, social welfare, planning, etc.), and on the other hand because of the growth of democratic participation in administrative decisions.

The growth of commerce – which is itself stimulated by technological innovation and improved transport – immediately creates information needs: traders have to identify potential markets and sources of supply, to be aware of new products and new consumer requirements, to learn about the activity of competitors and regulatory constraints that administrators may impose. With the diversification of trades and occupations, each specialist group develops information needs. In turn, the citizen as job-hunter requires to know of the services and opportunities that each trade offers.

Running parallel to this there has been the expansion and extension of education. This creates information needs among teachers and their administrators, but even more, of course, it lays a foundation for later information demand among those who have been educated.

The expansion of information need – and hence of information demand – leads to ever more effort being put into the activity of information supply – compilation of all kinds of information resource, publication, access to stores of information such as libraries, the book trade, active dissemination of information, until we reach its most recent vehicle – the Internet.

2. What is information science?

What is the goal of information science? There is a social activity that we may call information practice, aiming to help people to become better informed. Information science seeks principles and methods that will improve information practice.

Let us think of an analogy. There is a social activity known as medical practice, aiming to help people to become more healthy. Medical science seeks principles and methods that will improve medical practice. The practice, broadly speaking, consists of two activities: diagnosis and treatment. Diagnosis is identifying what is the most probable cause of a particular state of ill health. Treatment is deciding what method of acting is most likely to counteract the cause. Medical science seeks to understand what are the potential causes of ill health; to develop methods of correct diagnosis; to understand and expand the range of possible treatments; and to develop methods of deciding what treatment is most likely to be effective in a particular case.

Information practice, again broadly speaking, consists of two activities that we may call diagnosis and provision. Diagnosis is identifying what is the most probable information need in a particular state of information want. Provision is deciding what is the most likely method of acting so as to meet that need. Information science seeks to understand the potential range of situations giving rise to information want and need; to develop methods of identifying the actual information needed; to understand and expand the range of possible ways of satisfying information need; and to develop methods of deciding what way is most likely to be effective in a particular case.

Information need arises in the course of some human activity, practical or mental. The activity reaches a point where it cannot be effectively continued without the filling of some information gap in the personal knowledge of the individual (or social group) carrying out the activity. It appears as a felt “information want”, and is expressed in words as an “information demand”. Such a demand may be a wholly adequate expression of the information need – as in “what is the time of the next  train from Oxford to London?”. But it may be a very imperfect expression of the need, as when an enquirer asks for information about “bridge”: what meaning of bridge? what aspect of bridge is problematic? This is where diagnosis comes in: whatfurther data about the enquiry/enquirer is most likely to lead to a correct expression of the underlying information need?

Information science needs to explore three areas to improve diagnosis: first, the semantics of language (so as to recognise the potential ambiguity of “bridge”); second, the characteristics and variations in human information-seeking behaviour (so as to react appropriately to the expression of an information want); and third, the characteristics of human activities that may give clues to actual information need (knowing that the enquirer is an engineer and that some engineers build bridges may suggest the meaning of “bridge” that the enquirer intends; though of course this particular enquirer may turn out to be a chemical engineer who is an avid card player).

Information provision that is document-based presents a wide variety of aspects for information science to explore: for example, the overall structure and complexity of the universe of documents, in general and in each particular field of knowledge; how documents can be arranged for direct inspection and search; what features of documents can be used in representing them; how the representations may be organised for search; what ways there are of selecting and ranking retrieved representations; what alternatives in presenting the results of search to the enquirer; and how may choices in any of these activities be related to satisfaction of the information need.

Individual systems providing information are components of a wider “information system”, that encompasses all activities from the first generation of information to its ultimate use. This may be diagrammed, for example, as in Figure 4. Each information  function in the diagram, and its relationship with the others, offers practical problems on which information science may have help to offer.

Figure 4: Information functions

3. Early developments

3.1 The development of information practice

The practice of information provision, in the form of someone knowledgable passing on that knowledge to another, must have arisen as soon as any apprentice learning appeared – i.e. as soon as men and women developed craft skills that needed to be transmitted  to others – as early, therefore as the age of flint knapping. Aid in the provision of information from documents must have arisen as soon as collections of documents were created and preserved for later consultation, in the earliest administrative centres of Mesopotamia, Egypt, China and elsewhere.

Modern information provision rests on two foundations: technologies such as printing and electronic publication, and the intellectual organisation of documentary materials. Both technology and intellectual organisation have a long history. In this section I will sketch this out to the end of the nineteenth century, after which a change took place that has led to a great development of both information practice and information science.

3.1.1 Technological developments

Without technology, information transfer between people is confined to speech, gesture and imitation. We might indeed say that primitive language is the first tool evolved to enable information transfer. It was followed up – after many millenia – by writing. This involved  (1) choice of a medium to carry marks, (2) and of an implement to make them, (3) and of a durable “ink” or other marking agent, (4) a means of getting the text from writer to reader, (5) both of them remembering the shapes of the marks, and (6) agreeing on the meaning of each. Since writing was at first a skilled craft, a new social role developed, that of the scribe. Couriers and – much later – letter post delivered documents from one person to another. This required not only organisation and means of transport, but also social agreement on postal “addresses” – ways of naming the locations where particular people might be found.

To disseminate documents to more than one reader, they had to be copied, in institutions that were later called “scriptoria”. Subdivision of labour led to the appearance of specialists in the ornamentation of manuscripts – rubricatores and illuminatores – and binders. To store copies of documents, libraries were developed; and to aid search of their contents, catalogues. In both classical and late medieval times, the copying of texts was commercialised (scribes became scriveners), giving rise to the roles of publisher and bookseller – at first combined in that of the stationarius.

By this stage, then, the literate few could get access to documentary information by travelling to a library likely to possess it, or by buying a copy of a manuscript, or perhaps (as among university students) by borrowing or hiring one, and by receiving letters by post. For the illiterate many, the only technical change in means of communication was in the reproduction of illustrations, as in woodcut block prints.

Printing, of course, was the big technical innovation, requiring the combined use of paper, metal type, suitable inks, and the screw press. From the first, this was a commercial operation. The new social role of printer, and the further development of publishing and bookselling, began to make accurate copies of texts much more widely available.  But structurally the situation was the same as before, even though the volume of resources was so much greater – for information, you could now locate, buy, borrow or consult a book instead of a manuscript.

Texts were “separates” – individual works that summed up what an author currently had to say about a subject. Even in the days of manuscripts, some were of considerable size, for example encyclopedias such as those of the Roman Pliny or the medieval Bartholomew. The introduction of printed periodical publications – newsletters, newspapers, journals – brought a new element into the situation. Short items of information that had before only been distributed by letter – sometimes in multiple copies by “intelligencers” – now became widely available. The roles of journal or newspaper publisher, of editor, and of journalist became established.

Drawings and paintings had long been important in communicating information, and they were reproduced in printed texts by woodcut, engraving, etching, lithography and so on. Photography brought the ability to record a visual scene without the aid of a graphic artist, and half-tone (later, colour) processes enabled photographs to be reproduced in printed material.

As literacy spread, so did the need for do-it-yourself mechanisms to produce documents with a “printed” look. This led first to the typewriter, and a vast new social role, that of the typist; next to various forms of duplicator ; and lastly to methods of photocopying typed or printed text. As a result, printed publications were supplemented by a stream of  “near-print” reports, bulletins and so on. The proliferation of typed letters and memoranda brought about the formation of correspondence files, a new form of document storage (though institutions and individuals had of course long since kept letters for subsequent reference). To make things easier for the author of correspondence, instead of writing he was enabled to speak his words, to be recorded by the typist in short-hand (later, dictaphones and tape recorders would be used). This was roughly the situation at the end of the nineteenth century.

3.1.2 Intellectual organisation of documents and their representations

One of the key concepts in information science is that of “subject” – a statement in words (or other symbols) as to what a document (some record of knowledge) or collection of documents is ‘about’. Clay tablets have been found in Mesopotamia, that are in jars that have tags indicating their contents, the “subject” of the stored tablets. Again, subject arrangement developed early – in the double form of putting documents on the same subject into a group, and of arranging these subject groups into some coherent sequence. The arrangement of early document collections was thus systematic, as it has usually remained. The systems used have reflected the contemporary view of the structure of knowledge, changing as that view changed. As collections grew larger, the arrangement of subjects became hierarchical – a subject classification.

If we regard the individual articles or topics in an encyclopedia as separate documents, the encyclopedia as a whole can be regarded as a document collection, and Roman and medieval examples were usually in some sort of subject or even classified arrangement.

A catalogue is a list of documents in a particular collection. Each document is represented by a description, that at first was just the document title (perhaps plus author). For documents without an explicit title, the opening words were used as a pseudo-title, as is still the case for untitled poems. Catalogues are certainly as old as the library of Alexandria. As collections grew in size, it would become necessary to indicate the position of a catalogued document in the collection – its “call number”. In medieval times, the shelves were marked – often the first part of the mark represented the broad subject to which the shelf was devoted, and the second part the number of the document within that subject group. The arrangement of the catalogue entries would follow that of the shelved collection. Various medieval  union catalogues are known, listing the documents in several collections. 

In 1532 Conrad Pellican produced a new plan for the catalogue, bringing together the ideas of systematic arrangement, call numbers and alphabetical indexing:

A bibliography is a list of documents without regard to their location. Authors might list the documents they had consulted, but among the early separate bibliographies were lists of books issuing from an individual printer, or later, from a single author. In the sixteenth century, the biologist Conrad Gesner published a “universal” bibliography, influenced by Pellican’s ideas. It was arranged alphabetically by author. The book descriptions included, where possible: title, author, year and place of publication, printer, number of printed sheets. Contents were variously indicated: abstracts of short items, chapter headings from long ones. He then provided a subject guide to the bibliography, a classified arrangement that extended down to 30,000 topics, within which authors were named (to identify their works, the earlier publication had to be consulted). Gesner did not at first get around to the alphabetical index envisaged by Pellican, and justified this in his preface by saying “anyone who wanted information on subject X would surely know where to find it in the systematic arrangement”. But eventually he provided an alphabetical list of 4000 keywords with page references to the classified list.

Gesner indicated the subject of a book not only by title and chapter headings, but by an abstract. In commentaries by one author on the work of another, summaries must have been quite common. But the general use of summaries, précis or abstracts,  as a form of document or subject  description, dates from the eighteenth century, when periodicals containing abstracts were first published, usually completed by an alphabetical subject index.

It will be seen that up to the time of Gesner there were three methods of displaying the subject of a document in a catalogue or bibliography: its title, a keyword drawn from the title, and the position assigned to it in a classification of knowledge. Later workers used alphabetical terms not directly drawn from titles to indicate subject, and in about 1600 cross-references between related terms (see and see also) were introduced by some. In the early nineteenth century Martin Schrettinger wrote a book on “library science” (Bibliothekswissenschaft) in which he suggested defining a subject by asking four questions:

More precise rules for the alphabetical subject catalogue were developed by Charles Cutter in 1876. Headings were no longer to be drawn from titles, but prepared in a standardised form.

The systematic arrangement of books on the shelf was considerably advanced by the Decimal Classification introduced by Melvil Dewey, also in 1876. An integral part of his scheme was its alphabetical index to the classes. Further, he used the class number as a call number, which was thus no longer tied to a fixed shelf position – it was a relative location.

Somewhat earlier in the century, detailed rules had been published for document description for cataloguing purposes – by Antonio Panizzi for the British Museum Library and by Karl Dziatzko for the Prussian Ministry of Culture. Agreement had been reached on the main elements that should be used to describe a document: its title, people associated with it (author, editor, illustrator, etc), corporate source, the date and place of publication, publisher, size, subject, illustrations, and so on.

By the end of the nineteenth century, then, basic principles had been established for document description, and for systematic document arrangement and cataloguing, for alphabetical subject indexing, and for the linking of these two approaches. 

3.2 Emergence of information science

Out of the intellectual, technical and social matrix described above, information science began to develop. Towards the end of the nineteenth century, two things started to happen. First, there emerged a growing number of social groups with special and urgent needs for information: in particular, commercial, industrial and governmental organisations. Second, the first steps were taken in mechanising some aspects of information manipulation. When the two were brought together, new forms of information practice began to appear, and with this there went a new reflective approach to the problems of information practice, that led to the first stirrings of what we now call information science.

The social groups with new information needs are particularly exemplified by industrial organisations. Industrial libraries began to be formed from the 1870’s on, particularly in chemical firms, associated with research laboratories, and these continued to grow in number during the early twentieth century. In 1887, the first national research institute was set up in Germany in 1887 (the Physikalische Technische Reichanstalt), followed in 1911 by the Kaiser Wilhelm Gesellschaft with associated research establishments. In 1916 there was formed in the UK the Department of Scientific and Industrial Research, with associated industrial research associations. In the USA a National Research Council was set up in 1916..

Within such organisations there began to develop a more active kind of information service, which in turn led to the formation of the US Special Libraries Association in 1909 and the UK Association of Special Libraries and Information Bureaux in 1924. Paul Otlet produced a voluminous Traité de documentation in 1934.  An active interest in new forms of information provision developed.

Up to this point, the act of information search had always been carried out humanly (inappropriately called “manually”), by scanning catalogues or indexes, or browsing through document collections. The first clumsy steps to mechanise some aspects of the search came in the form of manipulating various kinds of cards with holes punched in them to represent either topics or documents . This required the establishment of codes associating the location of each hole with a meaning. Machine manipulation of punched cards had been invented by Joseph Jacquard in 1805 as a means of instructing a loom in the weaving of patterns in textiles. In 1890 Herman Hollerith used punched cards to analyse the results of the US census. Sorting devices permitted a sought-for combination of codes to be electromechanically selected. In due course, enterprising information practitioners started to use them to search coded catalogues and bibliographies.

Apart from mechanisation experiments, the period up to mid-century saw the beginnings of areas of study that have become important in information science.

3.2.1 Analysis of the”subject”

Semantic analysis of the nature of subjects became a matter of concern. Earlier, subjects had been expressed in natural language, as single terms (“economics”), compound terms (“solar system”, “death penalty”) or phrases (“fertilisation of flowers”, “ancients and moderns”). The only semantic relations explicitly taken into account were synonyms, homonyms (multi-meaning terms), the generic or class relation, and the non-specific see also relation.  Now, explorations began of the semantic structure of a subject statement. For example, the industrial librarian Julius Kaiser in 1911 published an indexing system used in his library, in which he regarded  each subject entry  as a composite of a “concrete” term, a “process” term, and a “location” term. “Concretes” were material or mental entities (silk, hardware, land, river, theory, equation) ; “processes” were actions and behaviours; “locations” were geographic or topographic places. Examples of composite subjects were wool-scouring, Brazil-education, nitrate-Chile-trade. The terms used as concretes, processes or locations could themselves be compound, e.g. cotton wool, dry cleaning, United States.

The Universal Decimal Classification (UDC), published first in 1905 by Paul Otlet and Henri LaFontaine, used the Dewey Decimal Classification for its basic schedules, but introduced the possibility of building up more complex subjects by combining terms from the schedules. For example, “virus diseases of indoor plants” could be represented as 635.965:632.38, where the colon links two class terms (indoor plants:virus diseases).   Many other combinatory devices were introduced.

The Colon Classification of Shiyali Ranganathan, first published in 1933, was designed from its inception to represent subjects as composite. The terms in each main area of knowledge (or “basic class”) were sorted into facets, broad categories that Ranganathan generalised as Personality, Matter, Energy, Space, Time. The terms in each facet were organised hierarchically, and subjects were expressed by combining terms from two or more facets. For example, in the basic class Agriculture, crops would be the “personality” facet, agricultural operations an “energy” facet, geographical places a “space” facet. The subject “manuring of cereal crops in India” could be represented as J38:2.2, where J = agriculture, 38 is the number for cereals in the crops facet, :2 is the number for manuring in the energy facet, and .2 is drawn from a geographical schedule, for India. The concept and techniques of facet analysis were later much developed and adopted, for example being introduced into later editions of the UDC.

Although it was not fully recognised at the time, the linking of two subject terms (concrete-process, plant-disease, cereal-manuring) placed them in a semantic relation with each other, different from the well-known and well-used class relation. The study of semantic relations became important to information science.

The mechanised techniques mentioned earlier also treated the subject as composite: the meanings assigned to punched holes were usually single concepts , and subjects were expressed in search by combinations of holes. This became known as the “post-coordinate” approach – concepts were not combined at the indexing stage (pre-coordinated), but only at the search stage.

3.2.2 Citations and document use

It had long been the practice in scholarly publications for the author to give references to previous writings that he/she had used to provide facts, methods, ideas, arguments. It occurred to some librarians that a survey of citations might be a guide to the most used, and hence possibly the most useful literature in a field of study, and that this in turn would be a guide to what literature a library with limited funds ought to acquire. This thought led to a number of citation studies during the second quarter of the twentieth century.  

The technique used was to select a source set of documents believed to be representative of the range of material of interest to a particular community of literature users; to scan and extract from this set all its citations to other documents; to organise the cited set (usually of journals and books) by title and possibly by date; and to rank the cited titles according to frequency of citation. It was soon realised that relative frequency of citation of documents is not an infallible guide to frequency of use, still less to relative value as perceived by the scholars in a field. Nevertheless, these studies started an interest in the use of citations for exploring the characteristics of a literature, that has proved very fruitful. The publication of “citation indexes” by Eugene Garfield, showing who has cited what, have provided very useful material for such studies.

3.2.3 Statistical regularities

The fact that the quantitative characteristics of things vary is well known – shoe sizes, book sizes, frequencies of citation, frequencies of loan of library books, whatever. We are used to the idea of an average – “most people use the library about once a week” – and of variation about the average – “but some people use it several times a day, and some people only about once a year”. We are less familiar with the idea that a more detailed description of the overall distribution of quantities can be a useful tool, particularly if we find that the pattern of the distribution can be represented by a mathematical formula. Statisticians have developed and analysed a number of such distributions, and some of them have been found to fit distributions of concern to information provision. This too was an interest that began to develop during the first half of the twentieth century.

The statistician Alfred Lotka in 1926 examined the decennial author index of Chemical Abstracts for 1907-1916. If he ranked the authors in the order of the number of papers each had published, he obtained a distribution that could be expressed by a simple formula. If Y is the number of authors who published X papers, then X2Y was equal to a constant, K. Such a distribution appears as a straight line if we plot X against Y on logarithmic graph paper.

What is the value of knowing that a frequency distribution can be represented by an equation? Let us suppose that this equation holds true for all fields of publication. It is fairly easy to look at publication in a particular field, spot the most prolific author, and count the number of his publications over a period. Let this X be 10. Then X2 = 100, Y=1, and K = 100.. Now put X = 1; if the equation holds, the number of authors with a single paper must be equal to K, i.e. Y = 100. We can similarly calculate the value of Y for any value of X, add up all the figures, and estimate the total number of authors and the total number of papers for that field during the period covered – all this from knowing one fact about the most prolific author. Of course, that “suppose” written earlier is crucial. We could not use the equation with confidence until we had analysed a considerable number of data sets to confirm it. And even then, we must be prepared to find exceptions, or perhaps a change in the equation over time as publication habits changed. For example, if most papers came to be written by several collaborators, might this affect the equation? So confirmed equations can be useful, but must be used cautiously.

The philologist G.K.Zipf in 1926 studied the distribution of words in text, and found that if Y is the number of words that appear X times in a corpus of text, then X2Y = a constant – the same equation as Lotka had found. He then listed the words in decreasing frequency of occurrence, giving each word a “rank” in the list, from 1 onwards, and found that a word with rank Z would occur approximately 1/Z times as frequently as the top-ranking word – e.g., if the top word occurred 1000 times, the second would occur about 1000/2 = 500 times, the third about 1000/3 = 333 times, and so on. The relevance of this to information science is that this pattern has been found widely in other distributions, for example in that about to be mentioned.

The librarian Samuel Bradford in the 1930s was examining comprehensive bibliographies on specific subjects prepared at the Science Museum Library in London. He looked at the frequency with which various journal titles appeared in the bibliography, and ranked them in the same way as Zipf had done with words in a text. After much consideration, he analysed the distribution in the following way. Start with a nucleus of the top-ranking periodicals (let there be T1 of them), which between them account for R references  in the bibliography. Then move down the list till we have accounted for another R references: we will find that it takes T2 titles to produce them. Move down to take in another R references, this time needing T3 titles, and so on down the list. Now we will find that the relation between the values of T1 to T2  to T3, etc, is approximately as 1 to N to N2, etc., where the value of N will depend on our initial choice of nucleus and on the bibliography being studied. There was a little confusion in Bradford’s subsequent mathematical manipulation of this relationship, but this was eventually cleared up, and the “Bradford distribution” has become much studied in information science.

All these equations for statistical regularities have since been found to be closely related, and their mathematical elaboration arouses much interest among the theoretically inclined. Overall, this field of study has become known as bibliometrics (or latterly, informetrics).

3.2.4 Specialised subject focus

The appearance of special libraries has already been noted as an indicator of the changing face of information provision. This was accompanied by increasing specialisation in publishing – for example, in science and technology, more specialised journals and abstracts journals, new kinds of publication such as industrial standard specifications and technical reports. There was also the construction of special library classifications and indexes (some of them on novel lines, such as Kaiser’s), and the production within research institutions of local abstracts bulletins. New technical devices came into use, such as photocopying and microfilm, 

Guides to the literature of special subjects began to be published early in the century, for example in chemistry, biology, mathematics, medicine. All this was not yet information science, but it was a prelude to what has become known as domain analysis, to better understand the nature of information need in a special subject, and the resources available for meeting it.

* * * * * * * * * * * * * * * *

By 1950, much had been achieved in information practice, and a start had been made in information science. Yet one leading US librarian in that year spoke of  a situation in which thousands of cataloguers, indexers, and abstracters in every library, documentation centre, abstracting and indexing service all over the world, contribute to a disorganised, unduly expensive, gapping and inefficient complex of services ... which hasn’t caught up with the nineteenth century let alone reached the twentieth. And in 1945 the much-quoted Vannevar Bush had affirmed: “The summation of human experience is being expanded at a prodigious rate, and the means we use for threading the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships”. It was time for a change, and the change was near at hand.

4. Themes in information science

4.1 Computers and telecommunications

In 1950, the telephone had been in operation for half a century. It enabled quick interpersonal interaction over a distance, and had become of major importance. The local nodes of the system were telephone exchanges, and telephone operators manned (or usually “womanned”) switchboards. Once again, a uniform system of “addresses” had to be established – telephone numbers. Use was eventually made even easier by the introduction of subscriber dialling. The answer-phone made it possible to record messages for later listening. Voice communication was supplemented by electrically delivered text messages, at first as telex (with keyboard input) and then as facsimile (by scanning an existing text, whether manuscript or printed).

The digital computer in principle went back to the experiments of Charles Babbage in the mid-nineteenth century, but with the coming of electronics it became feasible in practice during the 1940s. It added to mechanical  selection the facilities of (a) a stored program (a set of instructions listing the detailed steps to be taken to carry out the process) and (b) machine-readable storage of the data to be searched. The data store was at first in the form of punched cards, then of magnetic tape, and later of magnetic disks. Again, at first the data consisted of codes, as with the manually manipulated systems, but later came the use of textual records analogous to index or catalogue cards, and eventually the full text of documents became available for searching.

These developments created the new roles of programmer and information system designer, and involved much work on the construction of computer algorithms (standard ways of carrying out common processes), computer file structures, computer records (bibliographic and other formats) and indexes to them. The term “database” came into use, to mean an integrated collection of computer files, together with the programs (software) needed to manipulate them.

Truly successful use of the computer for information search came with the development of a number of new features: (1) disk storage, each section of which was equally accessible (so that search no longer involved scanning the whole store on tape from end to end); (2) multi-tasking (so that the computer could interweave a number of processes to be carried out in parallel); (3) online interaction (so that a particular program could be activated by a user directly from a keyboard and its outcome displayed for him on a screen); (4) the development of relatively cheap desktop microcomputers, which included keyboard and screen as well as computing capabilities; (5) remote access from a microcomputer to a distant information system via a “modem” linked to the telephone system (so that anyone with a microcomputer, modem and telephone could access such an information source);  (6) “packet switching”, whereby the use of telecommunication system resources was confined to the short times when bursts of electronic transmission were occurring (so increasing capacity and lowering costs); and eventually (7) the use of ever more resourceful “graphical user interfaces” to ease the interaction of the user with the computer.

Online information search involved the construction of “machine-readable” databases that might contain bibliographic descriptions of books, journal papers, patents and so on; or all kinds of tabular alpha-numerical data; or directories to people and institutions; or full texts (at first, particularly of short news items). Some existing publishers took on the role of “database producer”, but newcomers also entered the field. The further role of “online host” developed – agencies that acquired or leased databases from their producers, mounted them on a computer, and made them available for online search, and for downloading search results. Purely “in-house” information systems operated by academic institutions, firms or government agencies for their employees were subsidised, but online hosts were commercial operations.

In many institutional settings, alpha-numerical tabulations were indeed “the data” – the computer system provided the actual information sought. But much online search at this stage only accessed indexes to information – catalogues, bibliographies, directories and so on. For example, many libraries provided online public-access catalogues (OPACs).

Online search of databases involved (1) identifying which database to search and on which host it was located; (2) making telecommunication contact with the host; (3) using the search commands appropriate to the system contacted; and often (4) knowing the subject terminology and retrieval language used by the system. At first, the role of “search intermediary” developed, to carry out these tasks for the user. Then much effort was expended to make the tasks user-friendly – subject guides to databases (printed or online), communication programs to make connections automatically, natural language interfaces to avoid the use of special commands and terminology, and even online aid in the formulation and refinement of search queries. The introduction of graphical user interfaces on microcomputer screens, with windows and mouse control, also eased user tasks. With lap-top and hand-held microcomputers or even enhanced mobile telephones, one could carry a “desktop” anywhere.

Having brought online access to the desktop via the microcomputer, a further step was to bring the database and search software to the desk as well, through the use of high-capacity local storage in the form of CD-ROMs. This was not suitable for very large databases or for those that needed to be updated very frequently. The much higher capacity DVDs followed later. In academic institutions, much use was made of networked CD-ROM versions of the more important reference and bibliographic databases.

Meanwhile, there were developments at the input end of the author-to-reader chain. On the microcomputer, “word-processing” programs offered the possibility for an author to put a text (and, later, scanned illustrations, sounds, animations, video clips) directly into machine-readable form, replacing typing (perhaps repeated typing of various drafts). The digital text could either be printed to produce a publication, or be used as a file in a searchable database.

Telephone links to computers were not used only for information search – they were much used to access the computational powers of distant computers. It was in pursuit of this aim that the next development occurred – the construction of telecommunication networks. This involved establishing a system whereby any online computer could communicate with any other, using standard codes and procedures (protocols) and each with a unique address and domain name in the system.

The final flowering of this development was the Internet, a network of interconnected networks, all using the same protocols, so that messages could flow freely within it from one computer to another. As well as linking to the processing unit of a computer, it was natural to use the network to send messages to those who were operating the computer, or to other people who were using the same computer. Thus “electronic mail” was born, and has flourished mightily, serving the old functions of letter mail, telex and (in large part) facsimile – and, regrettably, with the same abuse, unsolicited junk mail.

Each computer in the system potentially became, in effect, an online host, and the owner of each could make available on his or her website whatever files he liked. In practice, the local nodes of the system included “Internet service providers”, whose computers housed the software needed to manage messages, and the files constituting the website of each host they served. Software, data files, drafts of papers, official announcements, news items, contributions to online discussions, eventually illustrations, computer games, anything you could get into machine-readable form, all became available on the Internet – and largely freely available, though some institutional hosts insisted on the use of authorised passwords. The long-established commercial online hosts began to offer access to their sites through the Internet, though maintaining charges for the use of their search facilities.

A new step forward was the introduction of hypertext links, pointing from one document to another (and here let me mention the name of Berners-Lee). By this means, within any machine-readable text or graphic the author could insert links to the addresses of any other such texts, whether housed on the same computer or on any other host. “Browser” software was constructed, whereby a mouse click on a hypertext link within a displayed page initiated an electronic transfer to the address of the link. The whole content of the Internet became a seamless World Wide Web through which one could browse as through the pages of an individual book.

Browsing has always been a valuable and often preferred way of searching for information, but it is not directed search – its successes are often serendipitous, by happy chance. Two kinds of aid to search were developed on the Internet, mirroring the two aids provided in a book (contents pages and index). First, the classified directory to WWW sites, each site or page being humanly assigned to a class after inspection by a knowledgable indexer. Second, the “search engine”, which trawled through the Web collecting (some part) of the text of each page encountered, and extracted words from it for entry into an enormous index, searchable online, each output from the index linking directly to the web page indexed.

Back to top of page

Continuation of Information science

Back to my home page