An analysis of community information systems

(1)

An Analysis of

Community Information

Systems

by

Dawn M. Heitmann

B.E., Electrical Engineering and Computer Science State University of New York at Stony Brook

(1986)

Submitted in partial fulfillment of the requirements for the degree of

Master of Science

in Electrical Engineering and Computer Science at the

Massachusetts Institute of Technology August 1987

Copyright Dawn M. Heitmann 1987

The author hereby grants permission to MIT and AT&T to reproduce and to distribute copies of this thesis document in whole or in part.

Signature of Author

Signature redacted

Department of Electrical Engirern an C uter Science kl ust 21, 1987

Signature redacted

C ertified by ... David K. Gifford Accepted T'hesis Supervisor

Signature

redacted

b ... _. _.. Arthur C. Smith Chairman, Departmental Committee

(2)

An Analysis Of Community Information Systems by

Dawn M. Heitmann Submitted to

the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the

Degree of Master of Science in

Electrical Engineering and Computer Science

Abstract

The purpose of this research is to determine the general promise of the community information system technology in the United States. Topics relating to community information systems that are concentrated on include: an overview of the technology, policy issues, system architecture organization, and an analysis of different design systems. A major portion of the research centers on the Boston Community Information System, located in the Laboratory of Computer Science at MIT. Many hypotheses have been tested and analyzed to reach the conclusions of this study.

As a result of the knowledge I have gained throughout my period of research on community information systems, I feel that the most beneficial design of such systems is one that uses a hybrid community information system technology with personal computers. I believe that with the Boston Community Information System a potentially successful technology has been developed, but I have serious doubts as to whether a community information system can, at this time, be successful in the United States. I feel that the market rather than the technology is the critical factor in determining the success of such information systems. Until real market needs are established, chances for a successful system are slim. I believe that the growth of community information systems into a powerful communication medium will be hampered until computer literacy is more widespread. I have reached the conclusion that these systems are slightly ahead of their time, but I have no doubt that their time will come.

(3)

to my parents for their constant

support, love, and understanding and

to Ken Short, my mentor

for his guidance, encouragement, and influence on my career goals

(4)

Acknowledgements

I would like to thank my thesis supervisor, David K. Gifford, for his guidance and support throughout my studies at MIT.

I also would like to extend my appreciation to AT&T for their financial support and for providing me with this exceptional learning opportunity. I would especially like to thank my supervisor, Vince Silverio, my department head, Dave Trimble, my director, Herb Zydney, and the OYOC administrator, Susan Topf.

My sincere thanks to David Burmaster, Robert Cote, David Segal, and Kendra Tanacea for all the help they provided.

(5)

Table of Figures

Figure 1-1: Block Diagram of Teletext System 10 Figure 1-2: Block Diagram of Videotex System 12 Figure 1-3: Community information System Technologies 16

Figure 3-1: Hierarchical Database Structure 30

Figure 3-2: Relational Database Structure 31

Figure 3-3: System Diagram of Boston CommInS 40 Figure 4-1: Comparison of Different Design Systems 53 Figure 5-1: The Boston CommInS User Population 62

Figure 5-2: Use of the Broadcast System 68

Figure 5-3: How Much Users Are Willing To Pay For One-Way Service 98 Figure 5-4: How Much Users Are Willing To Pay For Combined System 9 Figure 5-5: Number of Months User Had System Prior To Returning It 101

(8)

Chapter One

Overview: Community Information Systems

1.1 Introduction

The merger of the telecommunication and computer technologies has led to a whole new era in information systems. The three predominant technologies which such information systems revolve around are teletext, videotex, and hybrid. Teletext is a one-way medium which transmits alphanumeric material and simple graphics. Videotex is a two-way medium which disseminates pages of information only in response to user demand. Hybrid community information systems integrate the cost-effective service of teletext with the broad-spectrum accessing capability of videotex [14]. The overall goals of these systems are to deliver useful information and provide access to interactive services, in a cost-effective manner, suitable for use by untrained users.

1.2 Teletext

Teletext service is commonly provided by over-the-air broadcast signals and is relatively inexpensive. Broadcasting information has a key advantage over printed information -- nearly all of the costs associated with it are fixed. Pages of information are transmitted in a cyclic pattern, and a local adaptor captures the page selected by the user when it appears in the cycle. A page is the basic unit of

(9)

infoiiation ret rieved fioi the dat abase and displayed on the screen [3.

There is a trade-off between the number of pages available on one channel and the delay in serving the user's request. The length of time a user waits for a response, depends on the speed of the page cycle, which in turn depends on the number of characters per page, as well as on the number of pages there are per channel and the manner in which they are arranged. Some channels will broadcast popular pages more often than pages which are requested less frequently. This type of buffering makes it relatively easy to give almost instantaneous access to selected portions of the database. In Great Britain's teletext system, Ceefax, it takes fifteen seconds to cycle through one hundred pages [43].

There are four primary media options for teletext systems: television, radio, cable, and multipoint distribution services (MDS). With television, either broadband (full channel) or narrowband service can be implemented. FM radio stations can use a portion of their frequencies to transmit narrowband teletext without interfering with the regular signal. _{Multipoint distribution services use microwave frequencies to} transmit information [31]. _{Unfortunately, signal quality problems are often} associated with broadcast teletext.

(10)

USER'S

KEYPAD

1-WAY

MULTIPLEXERS DECODERS

TRANSMITTER

INFORMATION _COMPUTER _RECEIVERS PROVIDERS

DATABASE

Figure 1-1:Block Diagram of Teletext System

1.3 Videotex

In videotex, users' requests are sent from the videotex terminal to the host processor which then sends information back to the appropriate user. The response time in a videotex system is faster than in a teletext system and is independent of the number of pages available. However, the response time can be slowed down as a result of simultaneous service requests that occur with time-shared systems. Vidoetex systems are generally more costly than teletext sytems due to their virtually unlimited page capacity, their quick access time, and their two-way communication channel. There is a direct linear relationship between the number of users the system maintains and

(11)

the equipment costs at the central site.

Typically, videotex systems connect users to the central system via telephone lines, cable television, or packet subnetworks. The telephone is the leading videotex medium, because it is a two-way network that is switchable. It is the only medium that is in the position to offer universal service [10]. The major weakness of telephony, however, is that serious traffic problems are exhibited when the user level approaches 10,000 [7]. This casts doubt as to whether telephone systems will be able to support the user levels necessary for a commercially viable consumer community information system. In addition, telephone systems transmit information at a much slower rate than cable systems.

Cable systems have greater capacities than telephone systems. The drawback of cable systems is that almost all of the existing systems are one-way and are not switchable. In addition, their geographic distribution is limited, so they are not in the position to offer universal service to large segments of the population. As of 1985, cable was installed in only forty-four percent of United States' households [10].

It is the view of some that as more duplex cable systems are developed, cable could emerge as the leading medium for vidoetex services. However, it must be remembered that a large scale implementation of two-way cable systems is not an easy task. _{Perhaps a better solution to the capacity problem will be ISDN,}

(12)

eventually replacing the current telephone networks with these digital networks

which will provide both voice and high-speed data services. These networks will offer the same universal service as today's telephone networks do and should alleviate the capacity problems associated with data transmission.

USER'S KEYPAD

2-WAY

TAI MODEM DECODER

TRANSMITTER

IzIz

INFORMATION

PRMID N COMPUTER RECEIVER

PROVIDERS[

DATABASE

Figure 1-2:Block Diagram of Videotex System

Videotex is a flexible medium that is attractive to both consumers and small businesses. _{Applications for the consumer include entertainment, leisure, news,} electronic bulletin boards, and transaction services. Although consumers frequently access news services, these subscribers, for the most part, are unwilling to pay for information that easily can be obtained from alternative sources

[1].

Consequently,

(13)

news services generate low revenue, while entertainment and leisure applications tend to produce much more revenue.

Videotex services are particularly well suited for locating specific information and, therefore, can be tailored to the needs of small businesses. The ability to access up-to-the-minute stock market price quotes is one very useful application. On the corporate level, however, videotex is overshadowed by the corporate databases that are already available on the market. The corporate databases generally are not as user-friendly as videotex systems, but they are much more powerful and better fit the needs of large conglomerates.

The four key constituents of a videotex network are the system operators, the

equipment manufacturers, the information providers, and the suppliers of general services [1]. The system operators provide the channels on which information can

be carried. Typically, they are telephone companies or cable operators. The equipment manufacturers build the hardware for the system. Computer and television manufacturers take a lead role in this division. Information providers are organizations that provide the data for distribution. The suppliers of general services usually are consultants and software companies whose functions are to help implement videotex services and provide the necessary software.

The four levels of this vertical structure are often in conflict, since they differ in their requirements regarding capital, skill, marketing, and competition. Many

(14)

companies that control transmission channels, i.e., the telephone companies, also are interested in becoming database managers and information providers. The remaining components of this structure fear that these service providers pose a threat to both diversity and competition in the videotex industry. Government regulations, such as those set forth by the Second Computer Inquiry, will set the precedent for future operations

[16].

Videotex has had problems establishing itself as a successful business in the United States, since the criterion which determines videotex's success or failure is not the technology itself but, instead, its profitability. In order to be profitable, videotex systems must stop being tailored toward applications that fit already available capabilities and, instead, be reconstructed to meet identifiable market needs. Successful services will be those that gain the competitive advantage by identifying the information needs, comparing formats and capabilities with other systems, and determining what should be the optimum content [17].

There are both positive and negative social effects that could arise from the successful integration of videotex into our society. Clearly, it would lead to more efficient information handling. Also, videotex could serve as a direct feedback mechanism for public opinion. However, there are several detrimental effects that could occur by a mass acceptance of videotex. There would be an increase in information pollution as well as an erosion of personal privacy.

(15)

1.4 Hybrid Community Information Systems

A hybrid community information system combines the cost-effective service of teletext with the broad-spectrum accessing capability of videotex. A system that utilizes hybrid technology is the Boston Community Information System, known more commonly as the Boston CommInS, developed at the Massachusetts Institute of Technology. The broadcast nature of teletext allows this system to accommodate economically an arbitrary number of users, while the direct access capabilities of videotex ensure almost immediate access to the information in the databases. The

simplex (one-way) communication link transmits information to the user's personal

computer via a digital broadcast channel. The duplex (two-way) communication link uses a standard modem and a telephone line. It should be noted that while the Boston CommInS does take advantage of the local processing and storage capabilities that a personal computer offers, the use of a personal computer is not an essential feature of a hybrid system. In the future, it is likely that ISDN will be highly utilized for hybrid services.

1.5 Community Information Systems and Personal Computers The question often arises as to whether community information systems and personal computers will compete with or enhance one another. The information industry and the personal computer industry have similar products that are aimed primarily toward the same market. According to Rich Malloy, "Instead of fighting for the same markets with pretty much the same technology, videotex and personal computers will probably share features and merge into one powerful product." [24]

(16)

The Boston Community Information System is an example of a system based on this concept. It is an information system which utilizes the personal computer for local processing and storage. If a certain piece of information is needed, the system first searches through the databases on the personal computer. If the requested information cannot be found in these files, the system automatically uses its duplex communication channel to access the remote databases that most likely would contain the needed information. Throughout this process, the use of the personal or the remote databases is transparent to the user.

COMMUNITY INFORMATION SYSTEMS VIDEOTEX TELETEXT HYBRID

ITELEPHONE

J

PACKETS CABLE TELEVISION FM RADIO CABLE MDS

(17)

1.6 Historical Background

The development of large scale community information systems started in Great Britain, in the early 1970s. Teletext was invented by the British Broadcasting Corporation when they were searching for ways to provide closed captioning for the hearing impaired. Videotex was invented when the British Post Office was investigating ways to use the telephone at off-peak hours [2].

The British concept of videotex, as a centrally operated consumer information service, greatly influenced the design of early information systems in the United States. These first generation systems tended to be tree-structured with a single database, which was traversed by using menu-driven techniques or fixed keyword commands

[3].

Times Mirror Corporation and Knight-Ridder Newspapers pioneered videotex in the United States, with their Gateway and Viewtron systems. However, after almost a decade of research, market trials, and services, both ventures folded. Both Times Mirror and Knight-Ridder discovered too late that information services needed to be targeted toward small markets with specific interests or larger groups that wanted home transaction services. _{Mass markets, like newspaper readers,} would not support such systems. The Times Mirror venture cost an estimated $30 million, while Knight-Ridder's project cost approximately $50 million [5].

The development of systems in the United States has been influenced by its government's regulatory structure. Lack of government involvement in community information systems has given the private sector the power to control the growth

(18)

rate of information services. In the late 1970s, many companies designed experimental information systems modeled after videotex, but the consumers didn't seem particularly interested in the end product

[5].

New ventures are trying to distinguish themselves from early community information systems by striving for a more flexible technology and emphasizing interactive services. Current endeavors, in contrast to their predecessors, are delving more into the market needs that can be addressed by information systems. The older generation's view was that the need would rise to meet the technology, as opposed to the more current stand that the technology must instead be driven by the market.

(19)

1.7 Chronology of Community Information Systems

*

1970 British Post Office invents computer retrieval service to work with modified television

* 1972 British Broadcasting Corporation broadcasts experimental teletext pages

* 1973 Dow Jones News/Retrieval Service is established

* 1976 BBC begins public broadcast of teletext system, Ceefax * 1979 BPO offers videotex system, Prestel, to the public * 1979 The Source is established

* 1980 Knight Ridder and AT&T begin videotex trial in Florida * 1982 Times Mirror launches Gateway

* 1983 Knight Ridder launches Viewtron

* 1984 Boston Community Information System is operational * 1984 IBM, Sears, and CBS establish Trintex

* 1985 two year experimental test of Boston Community Information System is launched

* 1985 Pacific Bell announces Project Victoria * 1986 Gateway and Viewtron fold

(20)

Chapter Two

Policy Issues of Information Systems

2.1 Introduction

This section explores some of the problems that have arisen as a result of the information explosion and the technologies associated with it, the reasons behind these problems, and some corrective measures that either are currently, or might eventually, be taken. The general promise of community information system technology in the United States is dependent upon several policy issues:

" vertical integration

" content control and liability " privacy

Many of the questions that arise as a result of these issues, such as those posed below, are as of yet unresolved. This primarily is due to the major void that currently exists in the United States concerning regulations and policies on community information systems.

The issue of vertical integration is a complex economic problem that has major effects on community information system technology. At what point should system operators stop relying on information providers and become the owners themselves of information? Do such actions promote monopolistic and anticompetitive behavior? To what extent should the United States' government interfere?

(21)

The issues of content contrl and liability are compleientary. Should electroiilc publishing be regulated, as broadcasting systems are, or unregulated, as in the case of standard newspapers? Who is responsible for the content and the quality of the service provided -- the publishers, the carriers, the information providers, or the system operators? Should users be able to transmit information directly without editorial review; and if so, does this release the system operator from liability?

Finally, the issue of privacy in systems with two-way mediums must be addressed. What restrictions should be placed on the vast amounts of personal information that can be collected through such systems? Are there adequate safeguards over user profiles and use statistics? What happens if user anonymity cannot be preserved

[43]?

2.2 Vertical Integration

The degree of vertical integration, that can take place within an industry, is directly dependent on the regulatory environment in which the industry operates. In a deregulatory environment, the government poses little or no restrictions, and there can be a high degree of vertical integration [43]. For example, in a community information system operating under deregulation, the service, the equipment, and the information could all be provided by a single entity.

However, in the United States, a liberal vertical integration policy is regarded as a threat to both competition and diversity, because it is believed that it promotes both

(22)

anticompetitive and monopolistic behavior. Vertical limitations have been established in an effort to prevent such unfair competitive practices. Many of the clear distinctions, that once separated different media, no longer exist [32]. Companies that once operated in completely different arenas, such as AT&T and IBM, now find themselves as competitors. As a result, there is a fear that "many of the companies that control transmission channels are also becoming database managers and information providers and will tend to see others in those businesses as competitors." [31]

Currently, there seems to be two options available for controlling vertical integration. Either a separation policy or an access policy could be adopted. In the separation policy, the transmission company is limited to strictly carrying information and may not act in the capacity of an information provider. In the United States, however, private industries, such as telephone, cable, and broadcasting companies, provide transmission lines. If these companies, many of which have been the leaders in developing community information systems in the United States, are told that they cannot provide information, the incentive for developing a successful system may be severely hampered [31]. A less stringent policy would be an access policy which would allow a transmission company to act as an information provider while requiring it to also allow other information providers the use of its channels [31].

(23)

vertically integrate ofte(n has been questioned. Government regulations, that specify

what entities are allowed to and not allowed to participate in certain aspects of an information industry's infrastructure, seem to violate the First Amendment, which states "to research and write, to print or orate, to publish and distribute, is everybody's right." [321

2.3 Content Control and Liability

The printed page is the least regulated mass communication medium in the United States. _{Newspapers fully enjoy the freedoms of speech and press that are} guaranteed by the First Amendment. The broadcast medium, on the other hand, is more tightly regulated than the printed page and does not have the same First Amendment protection. The question that comes forth then, is whether a newspaper

delivered by electronic means is an extension of the printed version, and thus is as free as any other newspaper, or whether it is a broadcast, and thus under government control [32].

Currently, rules concerning information content apply primarily to broadcasting, not to newspapers or common carriers such as telephone services. Since newspapers may be distributed in a number of different ways, via a number of different communication mediums, content rules, unless applied uniformly to all mediums, could be unfair and even anticompetitive

[431.

For example, an electronic newspaper may not be able to fairly compete with a printed version because of the stricter regulation policies that typically surround broadcasted information. A printed

(24)

newspaper, which has little government regulation, clearly ha. the competitive advantage in such a case.

In August 1987, the broadcast industry won a long battle with the government when the FCC abolished the thirty-eight year old Fairness Doctrine which required broadcasters to air opposing viewpoints on controversial issues. Broadcasters have long been opposed to the Fairness Doctrine, because they feel that it has unconstitutionally restricted their freedom of expression in a manner that no other communication medium has experienced. A challenge on this ruling is expected based on the argument that it is the public, not the broadcasters, who should control the airwaves.

Once the question of what communication medium electronic publishing should be categorized as is resolved, an equally difficult question arises. Who is responsible for the content of the information; is it the information providers, the system operators or the common carriers? Common carriers generally are not held responsible for information content, because their job is primarily to transmit information provided by others, without editing it [31].

One of the most important benefits of electronic publishing is its ability to deliver timely updates of fast changing information. With this in mind, it is probably best if the information providers are held responsible for content, because submitting their stories to the system operators for editing purposes takes away from the

(25)

timeliness of the inforination [1]. This is the policy that the Prestel system, in Great Britain, has adopted [431.

In the United States, however, as mentioned before, the differences between the entities of information provider, system operator, and common carrier are not that well-defined. Often the system operator is also an information provider, as in the case of the Dow Jones News/Retrieval Service, or is involved in some manner with controlling the content of the information, as was Viewtron

[1].

If the system operators are viewed as publishers or editors, they most likely will be liable for the information on their systems. If on the other hand, the system operators act more in the realm of a common carrier, they probably will have little legal responsibility [1].

2.4 Privacy

Today, with the advent of two-way community information systems that can monitor and create records on subscriber usage, a "Big Brother Syndrome" has developed, much like the one feared in George Orwell's novel, 1984. Most people expect to be granted at least some degree of privacy, especially within certain domains, such as homes. Two-way systems are beginning to infringe even on these rights. _{In a survey, conducted in 1978, fifty-four percent of the American} population felt that the present uses of computers pose a threat to their privacy [26]. A 1984 Gallup Poll revealed that forty-seven percent of the American public feels

(26)

government has access.

Several potential threats to privacy have evolved as a result of two-way community information systems:

[1]

" improper commercial use of information relating to the subscriber by the service provider

" breaches of confidentiality by the involved parties

" governmental access to information for legal or investigative purposes " pressure on subscribers to authorize release of personal information " electronic eavesdropping by uninvolved parties

Under current laws, communication providers and system operators can monitor all user information requests [43]. Many companies, however, offer their subscribers some degree of privacy protection by refusing to disclose information about the users to other sources, unless it is in response to a compulsory order [31]. Another alternative to alleviate this problem would be to destroy the identity of the user as soon as any record is created. In this respect, the companies could maintain their marketing information about the use of their systems without violating the privacy of their users.

In 1977, the Federal Privacy Commission conducted a study focusing on the misuse of insurance and medical records. This report, along with some subsequent wovrk, has led to the suggestion of the following seven principles to help preserve privacy in two-way information systems: [31]

(27)

* subscribers should be told that they are using a two-way system and should be notified of any records that the company intends to keep on

them

* subscribers should give written consent to the collection of such information

* subscribers should have the right to see and make a copy of any collected information and also should have the right to correct it if the information is wrong or misleading

" the government should be able to obtain records only in response to a compulsory process, and the subscribers should be notified that such action is being taken

" records should be destroyed when no longer needed " the company should be required to keep records secure

" the company should be liable for damages resulting from misuse or unauthorized release of information gathered on the subscribers

Most of the community information systems that currently are available are not technologically equipped to handle problems concerning privacy. Consequently, privacy is based on policy issues. As a result, in these systems, privacy may evolve into a service that one would pay for, where the cost of privacy is proportionally incremental to the need for it

[431.

However, if new community information systems utilize personal computers, privacy may be able to be ensured on the technical level at the user site. An example of this concept can be seen in the Boston Community Information System.

(28)

Chapter Three

System Architecture Organization

3.1 Introduction

The purpose of this section is to give a general description of database systems found in community information systems. The system architecture in information systems is based typically on either the hierarchical or relational models. Generally, the database is organized in a manner that allows data to be accessed by using

attributes or menus. The advantages and disadvantages of both the attribute system and the menu system are described. Centralized and distributed databases are discussed in detail, along with a comparison between the two systems. Finally, the organization of the Boston Community Information System is layed out, including a description of the predicate data model, the filter list, and the remote

system.

3.2 Database Systems

With the advent of the information explosion in the 1970s and the 1980s, database design has taken on an added significance. Database systems are designed to manage large files of information. They are used to store, manipulate, and retrieve volumes of data. In a database, information is stored in records which are separated into a collection of fields. A field is the smallest named unit of data stored in the

(29)

database. Each field contains an identifying attribute or characteristic of the

corresponding record. A collection of records forms a file.

By using the Boston Commmunity Information System as an example, the concept of a database and its entities easily can be illustrated. The Boston CommInS contains two major information sources: the New York Times and Associated Press news wire. All incoming New York Times articles for a given week constitute one database. Therefore, since the Boston CommInS stores ninety days of NYT articles, thirteen one-week databases are formed. New databases for the AP wire service are formed on a daily basis. Although the CommInS is comprised of several databases, the file structure remains transparent to its users, thereby allowing the system to be viewed as one composite database. Each article in a database can be considered a separate record. Each record, i.e., news article, consists of several fields including the type, date, category, author, priority, subject, title, and text fields. These fields are explained in detail in section 3.4.

[35].

In information systems, either a hierarchical database or a relational database architecture is typically implemented. Hierarchical databases are tree-structured. Each record is linked to all others in a strictly hierarchical arrangement. There is a many to one relationship from child to parent. The conceptual structure of such databases matches the actual structure. Once implemented, the database is difficult to change without complications.

(30)

Relational databaste are not a.s structured as hierarchical models. Data are stored in two-dimensional tables. The rows of the tables correspond to the data records, while the columns correspond to the fields [43]. Relationships aren't fixed and, therefore, can be defined to meet the application's needs. In the future, it is believed that relational databases probably will support almost all information systems because of their flexibility and low maintainence requirements

[6].

STUDENTS JONES EXAMS 5HWKS 91 82 87 94 ~ SMITH EXAMS HWKS 79 86 80 75 HARRIS EXAMS HWKS 90 88 98 93

Figure 3-1:Hierarchical Database Structure

Attribute (keyword) systems and menu (tree-structured) systems are the two kinds

of database systems that dominate community information systems. In attribute systems, users play an active role by initiating information requests. In menu

(31)

JONES 91 82 87 94

SMITH 79 86 80 75

HARRIS 90 88 98 93

Figure 3-2:Relational Database Structure

systems, users play a passive role by selecting an alternative from various computer generated options [13].

There are several considerations that must be taken into account when designing a database for a community information system. The system should be designed to be transparent so that the details of the distribution of data can effectively be hidden from its users. Such is the case with the Boston Community Information System which allows its users to see its internal component databases as one composite database. Other key factors in database design for information retrieval services include the user-friendliness of the system, the ease at which data can be accessed, the search procedures, and the average response time. Trade-offs often exist between hard to formulate yet efficient search and retrieval methods, such as those found in attribute systems, and user-friendly formulations that usually are less efficient and more expensive. There are additional trade-offs among transmission speed, page size, and display time.

HWK 2 EXAM 2 HWK 1

(32)

In an attribute system, information is retrieved by describing the characteristics of the information being sought. In such systems, requests are often formulated as logical combinations of keywords. This method offers more powerful and selective searches to the user than menu systems do. Keywords are considered to be more effective than menus and improve the chances for a successful search [1]. However, the operation of attribute systems generally does require some degree of user training. The user may have to refer to an instruction manual and most likely must understand Boolean operations.

A further distinction can be made within attribute systems by differentiating between keyword and query systems. Query systems are more powerful than strictly keyword systems. In a keyword system, one is often confined to a prespecified list of words that can be searched for. In a query system, no such limitations exist, thereby allowing free-text retrieval.

A menu-based system is easy to learn and use. The computational cost of retrieval is low compared to the processing that is necessary to support Boolean retrieval. A menu system requires little or no training and also is easier to browse through than an attribute system. Unfortunately, several problems can arise. In a menu-based system, users may find it difficult to choose among the offered alternatives, and once a wrong path is taken, the search will be unproductive [34]. It must be remembered that information rarely fits into a single hierarchical pattern, for the sheer reason that people categorize things differently. Also, if the system is poorly designed,

(33)

information may be inaccessible, and once the system is built, it cannot be adjusted easily.

In an experiment done by Geller and Lesk [13] to determine whether non-specialist users prefer a menu or attribute system for database retrieval, it was observed that people opted for the attribute system over the menu system both initially and after experiencing both methods. Over eighty percent of all the searches conducted

during the experiment period used the attribute system.

Menu systems may be the better choice for systems with few offerings, but in the systems of today, attribute systems are more effective. Most first generation community information systems were menu-driven, such as Prestel, Viewtron, and Gateway. More recently, however, there is a trend towards keyword and query systems. It is believed that new ventures, such as Trintex, will concentrate more on these retrieval methods than on the menu-driven approach. This is because with the large databases in current systems, the depth of the menu, necessary for a complete traversal of the database, presents problems that can lead to time consuming as well as unsuccessful searches. Consequently, information access time, defined as the time that elapses from when the user first initiates the data search to when he/she receives the requested information, is much more efficient with the attribute systems.

(34)

3.3 Centralized and Distributed Databases

The two kinds of databases most often used in community information systems are centralized and distributed databases. Centralized database systems are suitable for systems that have a limited number of frequently changing pages, such as most teletext systems. However, they are inconvenient for transaction-type services or information services that involve external information providers [43]. Distributed

database systems are very suitable for general-purpose information retrieval

applications. They use databases positioned in strategic locations that are close to the point of user activity to hold frequently needed data, yet they still have the ability to access data from other databases.

In centralized systems, a single computer system processes all the requests generated by the users. The concentration of all resources at one physical location capitalizes on economies of scale. The costs involved in duplicating facilities and overhead are eliminated. However, communication costs are generally greater, and the number of interactive users such systems can support is limited. Also, bottlenecks develop fairly often for database access and line handling [3]. Examples of information systems based on centralized databases include the Source and CompuServe [431.

Distributed systems, such as the Boston CommInS, consist of a cooperating confederacy of computers that work together to process users' requests. There is a collection of computer sites, each of which maintains a local database system [22]. Each site can operate independently of the other subsystems throughout the network

(35)

yet, at the same time, retains the ability to access databases at other sites. Thus, each site can function on both a local and global level.

Typically, distributed databases in community information systems are either

master/replicated or external databases. In the master/replicated database, all databases contain the same information, and the users typically access the closest one. Information providers update only one database, designated the master. All other databases are then automatically updated via communication links. External systems store information in many different databases. Frequently accessed information is often available in more than one database to keep communication costs to a minimum [43].

Distributed systems have several advantages over centralized systems. Small computers that are often found in distributed systems tend to be more dedicated to specific tasks than the larger time-shared computers common to centralized systems. This, along with a distributed system's ability to speedup query processing, can lead to an overall increase in efficiency as well as to reductions in an individual's cost and time expenditures. Reliability and security are increased, because information is available at more than one data site. If one site crashes, the rest of the sites are not paralyzed. Also, distributed systems are more flexible and allow more incremental growth. This is a result of the existence of many smaller systems instead of one huge system, as is seen in a centralized system. For example, if a user's needs change, a distributed system can be reconfigured more easily than a centralized

(36)

sytem because of the modularity associated with it. Subsystems can be altered or replaced without having a drastic effect on the remainder of the system. As a result of the ease at which the technology can be accessed, distributed systems don't run as high of a risk of becoming obsolete as do centralized systems. Disadvantages of distributed systems include software development cost, greater potential for bugs, and increased processing overhead at each individual site [22].

3.4 Organization of the Boston CommInS

In the Boston Community Information System, information is delivered in continuous streams to the centralized database site via dedicated telephone circuits. Headers are used to distinguish incoming news articles. The stored information is then transmitted as a series of packets from the central site to remote personal computers via a broadcast simpex link. Information is transmitted at a rate of 4.8 kbits/sec. The software at the remote end reassembles these packets and filters the entries to see if they should be stored locally. Each personal computer stores a selected portion of the data as defined by the user's filter list, thus creating a local database. If a user's inquiry cannot be responded to at the local level, it is automatically sent over a duplex communication channel to the centralized database where it can be processed.

The central site maintains a complete database of all received information. Currently, the entire database is transmitted approximately every four hours. Newly arrived articles are placed at the head of the transmission queue, and, on

(37)

average, the article is transmitted to the users withii five minutes. Inform ation

classified with a high priority is transmitted more frequently than other information [14].

At the central site there exists a group of shared database servers whose functions are to accept data from the information sources, add data to their own databases, transmit database updates to the personal computers, and implement remote procedure calls. When a database is replicated to more than one server, the incoming telephone circuit is connected to each server so that all copies of a database are up-to-date [14].

The system is organized by the predicate data model which was developed specifically for the Boston CommInS. The predicate data model provides the full text searching capabilities necessary to effectively handle relatively unstructured information, such as the news articles that the Boston CommInS receives. This approach was developed because it was felt that a hierarchical or relational model was too restrictive for handling such information [14]. It allows the users to view the multiple databases in the system as one large database. The contents of the database can be described by a Boolean combination of predicates which provides a great deal of flexibility in the formulation of queries [15].

The predicate data model contains predicates and result sets. When a user submits a query, a predicate function returns true if the query matches the record that it was

(38)

applied to. A record becomes part of the result set if and only if it matches the query [15].

Each individual database is described by a Boolean combination of predicates. The disjunction of the predicates in each database describes the composite database. Queries are routed to the component databases that contain the needed information. A complete result set is formed by taking the union of the results from the

component databases [15].

The predicate database consists of a set of records which contain a fixed set of fields: type, date, category, author, priority, subject, title, and text [35]. The type field tells the source of the article. The date field indicates when the article was written. The category field indicates the general subject area. The priority field states the priority of the field as determined by the information source. The rest of the fields are self-explanatory.

The local database is customized for each user through the use of a personal computer at each user site. The user compiles a list of queries that interest him/her and prioritizes them, thereby forming a personalized filter. Incoming articles are stored locally if they match any queries in the filter list. This creates a personal database for each user. The filter list easily can be altered, and the system can process any query in the list. When the user submits a query, a list of entries from the personal database is displayed that match the query. If the query cannot be

(39)

answered in this manner, the information is accessed remotely

[15].

In order to access information, the user submits queries that are Boolean combinations of words or phrases that may occur in the fields specified previously

[14].

Queries processed locally can consist of five types: words, phrases, stems of

words, field specific queries, and dates. The simplest query format is a word.

Compound queries that consist of a combination of primitive queries also can be formed. In this case, an article would not be matched unless it contained all of the words in the query. A phrase requires that the words in the query occur consecutively and in the order in which they are listed. Stems of queries require that only the designated stem be matched. For example, if the stem query was "comput*", computer, computers, computing, and computation would all be matches. Field specific queries allow specification of queries according to the fields mentioned previously in this section. Queries also can be submitted by entering a specific date.

Requests to the remote system cannot be comprised of phrases and must contain specific date ranges for effective query routing [35]. The query language used in the Boston CommInS provides a great deal of flexibility in allowing the users to generate requests. This is in contrast to other systems that have controlled vocabularies which limit the users' inquiries to only predefined terms.

(40)

attriLbutes of both within the syst em. Queries eni:le the user to obtain the desired information, because it is felt that this form of text searching is the more powerful and easier to use searching mechanism in large databases. However, once the information is stored in the personal database, the filter list becomes a menu for accessing the database. This allows the user to browse through the database and retrieve information in the manner that he/she has defined it.

RADIO --I RECEIVERS TRANSMITTER PERSONAL COMPUTERS DISK __ SERVERS MODEMS STORAGE LOCAL NETWORK NYT AP

(41)

Chapter Four

Analysis of Different Design Systems

This section compares the hybrid Boston Community Information System to general teletext and videotex systems using the specifications outlined below. Community information systems such as Prestel, Gateway, Viewtron, the Source, and Project

Victoria will be discussed in detail. Minitel, Trintex, and Covidea also will be

mentioned.

The criteria that will be used to analyze the different community information systems are [14]:

" the system should serve a metropolitan area cost-effectively

" the system should have a high quality user interface that can be mastered easily by novice users

" the users should be able to process information drawn from the system in any way they desire

" the system should access very large databases * the privacy of the users should be safeguarded

* the services provided should be available to authorized users only " the system should be easily expandable

(42)

4.1 The Boston Community Information System

The Boston Community Information System combines personal computers, broadcast data communication, and duplex communication to achieve a system that takes advantage of both teletext and videotex properties. The system allows users to gain access to both the New York Times and the Associated Press. The primary objective of the system is to allow users to customize information that is stored locally and to have access to large databases should the need arise. The use of simplex communication allows the system to support cost-effectively an arbitrary number of users, while the use of duplex communication ensures almost immediate access to the database. The simplex communication link receives information via a radio transmitter. The duplex link uses standard modem communication via telephone lines.

There are several advantages that occur as a result of the processing and storing capabilities of the personal computer. Due to the majority of the processing taking place at the user's location, the central databases can support an unlimited number of users. A high quality user interface can exist since the processing power resides with the user. Information can be kept private, because only the user has access to his/her personal database. The user can control incoming information and can add value to it by filtering and processing it further. Finally, the system can be

expanded easily, since the personal computer is programmable.

(43)

inquiry, a list of matching database entries with summaries is displayed. The filter for the local database is specified by a list of queries. Once the filter has been defined, menu-based retrieval can be used to browse through the database.

User access can be limited to those services that are subscribed to by using various encryption techniques [15]. To ensure that the services provided by the system are available only to authorized users, all data blocks are encrypted by using a master key and a randomly generated data block key. Each data block contains the number of the master key that it was encrypted by. The system allows for a flexibility in service offerings in a commercial system with paying subscribers.

4.2 General Teletext and Videotex Systems

Teletext systems can cost-effectively serve a major metropolitan area, because these systems basically incur only fixed costs and, therefore, are not population dependent. Another benefit of such systems is that no user information is recorded, since the system is simplex. This eliminates the necessity of safeguarding the privacy of the users.

Unfortunately, teletext systems also have several failings. The user interface, as well as the ability to access information, is limited. The user is unable to specify the way he/she wants information processed. Finally, the system is not easily expandable.

(44)

information offerings and are readily expandable. Videotex, however, is not economical. It is dependent on a centralized system, and videotex users cannot tailor the system to reflect their own interests. Problems are experienced with the user interface due to bandwidth and response time limitations. Also, the privacy of the system cannot be ensured, because the user's requests can be monitored and recorded by the duplex system.

4.3 Prestel

The British Post Office offered Prestel, the first commercial videotex service, to the public in 1979. The system transmits pages of textual and color graphics information. Prestel is a master/replicated database system. Advantages of such a database include better performance, due to fewer bottlenecks, and a decrease in communication costs, because the need for long distance calls is eliminated. Prestel accesses a centralized database via telephone lines, and uses an alphanumeric keypad to request data along with a modified television receiver to display information. Information is collected and continually updated by information providers. As soon as information is requested, it is sent over the telephone lines at a rate of 1200 bits per second.

The user accesses information by using sequences of menu selections or indices to numbered pages. Topics are cross-referenced to help the user find information along a number of different routes. The information search, along with the elimination of extraneous information, however, can lead to an increase in both cost and effort due

(45)

to false starts which are inherent in static indexed structure.

The local Prestel center has a database of more than 250,000 pages of information and acts as a gateway to information located in different databases [45]. Airline reservations, electronic shopping, and electronic mailings are some of the things that can be done on the system. The system currently is being used most often for business applications [20]. A Prestel study, done in 1981, showed that eighty-five percent of the subscribers used the system for professional purposes [43].

4.4 Times Mirror's Gateway

Times Mirror Videotex Services launched Gateway, a two-way home information system, in 1982. The Gateway system allowed subscribers to shop, bank, book airline reservations, receive news updates, communicate with other users, and play educational games -- all at home. The information was delivered over telephone lines and displayed on a modified television set. The decision to utilize telephone transmission versus cable was made because telephone lines were readily available

and were two-way.

The Gateway database was designed with an inverted tree structure. The subscriber used a menu-driven system or a list of specified keywords to move through the tree which provided access to large databases. The requested information was then decoded and displayed, using text and color graphics, on the television screen. Although the user interface easily could be mastered by novice users, it was not very

(46)

flexible. Using a prespecified list of keywords or menus lessened the quality of the user interface and limited the user's ability to process information in the manner which he/she desired.

One of the major flaws of Gateway was that it was not a very economical system. The use of color graphics proved to be a greater expense, in both cost and time, than the users perceived to be worthwhile. To receive all Gateway services, Times Mirror charged residential users a $50 terminal deposit, a $20 sign up fee, a $20 per month user fee, and a $3 per hour fee for usage over the twenty hour allotment. Commercial users were charged $75 per month for five hours of usage and $15 for each additional hour. Professionals also had to buy, for $600, their own AT&T Sceptre terminals, which are dedicated videotex terminals that can handle graphics based on the North American Presentation Level Protocol Syntax (Naplps) standards

[41].

Times Mirror initiated a major trial of Gateway in 1982. Three hundred and fifty high income Southern Californian households received the system for a testing period of nine months. Although most of the results of the trial remain proprietary, some information has been released to the public. Participants stated that they wanted a comprehensive home information system that could be used for more than just news and entertainment. As a matter of fact, seventy percent of the respondents in the trial said that they did not want community information systems to take the place of other information sources [33]. Only twenty-three percent of the

(47)

trial users said that they would subscribe to the system if it were cornmercially available

[17].

Despite this premonition, Gateway became commercially available in November 1984. Fifty percent of the trial users discontinued use of the system as soon as they had to pay for the service [9]. Times Mirror estimated that more than half of the system's revenues would be derived from advertising. The actual amount of money generated by advertisements fell far short of this percentage. Due to financial

problems and lack of user interest, Gateway folded in March 1986.

4.5 Knight Ridder's Viewtron

Knight Ridder developed the videotex color graphics system, Viewtron, and piloted it in Coral Gables, Florida, in 1980, with the cooperation of AT&T. The total user population was 604 individuals from 204 households. This group was chosen because it was felt that its members would be receptive to the home information and transaction services that Viewtron offered. Most of the user population had annual incomes exceeding $40,000 and over one-half had college degrees [36].

The publicized results of the test indicated that the news, the electronic bulletin board, and the information on local entertainment and events were ranked as the most valuable services offered by Viewtron. Participants indicated a thirty-three percent reduction in newspaper reading and a forty-five percent reduction in television viewing [36].

(48)

Viewtron users could access large datAbases iii thiree ys. They could use the imienu decision tree approach, summon a frame by typing its unique number, or use a prespecified set of keywords. The same drawbacks with the user interface and the ability to process information, that were evident in Times Mirror's Gateway, also were found in Viewtron.

The Viewtron backers established a strong policy on the privacy of its subscribers. Aggregate records that did not disclose an individual's identity could be used by the corporation for its own purposes and could be released to other parties. Individual records, that established the user's identity, could be used by the corporation to provide services in response to the user's request, maintain technical operations, and conduct research to compile bulk information. Individual information could not be disclosed to government agencies, except in response to compulsory processes or with the user's consent [45].

Viewtron was publicly launched in October 1983. It offered home banking, travel information, stock quotes, and large volumes of business financial news and data. Viewtron users had to purchase a $600 AT&T terminal with videotex capabilities and pay $12 per month for the service plus $1 per hour for telephone charges. Viewtron folded in March 1986 for the same reasons Gateway did. It was not a very