Skip to Content

Department of Linguistics

Databases: A Brief Introduction

Robert Mannell

Types of Databases

Databases can be roughly divided into two types, SIMPLE and COMPLEX. COMPLEX databases can be further divided into HIERARCHICAL, NETWORK and RELATIONAL databases.

Before examining these types of databases it is necessary to define three key terms.

  1. FILE: A file on a computer is equivalent to a single file in a filing cabinet. A single computer file has a unique name and its existence is listed on the computer's directory (list of all files on the computer). All information on a computer disk is stored in files. Database files usually consist of many RECORDS.
  2. RECORD: A record in a computer file is similar to a line or a paragraph (or perhaps a whole sheet of a document) in a typical paper file found in a filing cabinet. A record is typically a single case or instance of the type of information that the database file deals with. A record may contain (for example) one person's details whilst the whole file contains the details of many people (each in a separate record). Database records usually consist of more than one FIELD.
  3. FIELD: A field contains a single item of information (such as a person's name). Usually more than one item of information is required in a database and each of these items of information are stored in different fields. A simple database would be a mailing list which might have as few as two fields, a name field and an address field. Each record would consists of these two fields (one record for each person).

A. Simple Databases

Simple databases typically consist of one file with many records consisting of one or more fields. Alternatively, a simple database might consist of a single DATA FILE as well as one or more INDEX FILEs.

DATA files can be thought of as having two types of ORDERING or SEQUENCING. The PHYSICAL ORDER of a database is the actual order in which the data is stored in the file. The LOGICAL ORDER is the order in which we might choose to access that data. The logical order need not be the same as the physical order. If the two types of order are not the same, however, it is necessary to have some mechanism for access the data in a physically non-sequential order. This is typically achieved by the use of an INDEX file. This is very similar to a book index where topics are listed alphabetically (the logical order) and the reader is referred to the page number on which each topic is found (physical order). Some books have more than one index (eg. author index and subject index) which refer back to the single physical sequence (page numbering). Multiple indexes are also possible even on simple computer databases. For example, in a mailing list database you may have an index to names and an index to suburbs or towns.

An INDEX is often also useful as a quick way to find things in a file even when logical and physical order are the same. This is because a complete record may contain many fields whilst an index record may only contain the indexed field and an access number (eg. record number or some other indication of position in file). Index files are, because of this, generally much smaller than the actual data file and so much quicker to scan sequentially.

Simple databases might include booklists, bibliographies, mailing lists, etc...

B. Complex Databases

Many large complex databases exist on more than one data file. That is, the data in a single database may be spread across several (or many) files.

A typical database of this sort might be as follows:-

University Student Database

FILE # Contents
1 Student file
2 Fees file
3 Course file(s)
4 Library loan files

The files in a complex database must be linked together in some way. Such links between the files must be invisible to the user. To the user a complex database must appear to be a single integrated entity. Further, it must be possible to produce reports from many files using simple instructions.

There are three common ways of linking complex databases (nb. some database systems utilise complex hybrids of the following database models.)

a) Hierarchical Model

This type of database utilises a pyramid-like structure with a number of levels. Files within the database can be considered to be NODES located at a particular level in the database and connected to one node in the level above and one or more nodes in the level below. A node on a level closer to the apex is the "parent" of a node connected to it on the next lower level. The lower level node is the "child" of the higher level node that it is connected to. Each child has only one parent, each parent can have one or more children.

In a hierarchical model, data stored at a node can only be accessed at that level or at the level of its parent. Data cannot be exchanged between nodes at the same level without going through the lowest common parent (or grandparent). Such a structure is common in large organisations where data within a branch is accessible to members of that branch and to head office, but not to another branch (without a request for information via head office). Security and central control are strong motivations for this type of database structure.

b) Network model

Like the hierarchical model, the network model consists of multiple levels and parent and child nodes. Unlike the hierarchical model the network model is not pyramidal and not only can parent nodes have more than one child but child nodes can have more than one parent. Information can flow in both directions much more flexibly and the various levels can interact much more freely in the network model. Nodes on various levels are much more equal than in a hierarchical model. Ready interchange of information and the facilitation of consultative processes are the main motivations behind this type of model.

c) Relational model

The relational model differs from the above two models in that its main emphasis is not on the control of relationships between levels of the database. The relational model is more a method of linking various types of information into a single database. In a relational database each record of each file has at least one link field which is used to link the data in different files. The student number is a common link field in a University database. On one file a student's name, address, etc. will be listed along with the student number. In another file student numbers will appear against course numbers. In another file course number will appear along with details about the course. In yet another file, details of all unpaid library fines will indicate student number versus the amount owed. In the database the numbers (student number, course number, etc) will be defined as link fields and database is designed to allow easy access to relevant information in the various files. For example, if you are interested in a list of all students in a course you simply request such a list and the database will get the student numbers from the course list, and then use those numbers to get the student names from another file. These linkages will be transparent to the user who will only see a course list containing student names.

The Relational Database Model is based on the following operations of relational (or set) algebra. These operations are:- SELECTION, PROJECTION, JOIN, UNION, DIFFERENCE, and INTERSECTION. The SELECTION (or SELECT) operation is the most commonly used operation and is the basis of database searches.

Database Searching, Boolean Logic and Query Languages

The simplest type of database search is a simple search for a single text token (which can be a word, a phrase or some othe extent of text). Boolean logic can be used to greatly extend the scope of database text searches. The most basic boolean operators are listed below.

Expression Expression is true if:- Effect on Scope
A AND B A and B are both present decreases scope
A OR B Either A or B, or both A and B, are present increases scope
A XOR B Either A or B, but not both, is present
A NOR B Neither A nor B are present
A NAND B One of A or B is present or neither are present
NOT A A is not present

Sometimes proximity is also taken into account during a search. For example some database search algorithms include a "NEAR" operator. "A NEAR B" in such systems means "both A and B occur and A is near B". Ideally a user should be able to define "NEAR" for the purpose of a particular search.

In the above table "scope" refers to the potential for matches to the search criteria. In the latter three cases the number of matches can either increase or decrease relative to the number of matches obtained for a search on either A or B alone.

Web search engines use a limited sub-set of the above operators (most often just AND and OR. Standard database query languages, such as Structured Query Language (SQL), are used by commercial (and some open source) Database Management Systems (DBMS). These include and greatly extend the functionality of the boolean operators. Such languages permit complex end-user queries, however a great deal of automatic database functionality is still provided by special-purpose programs.

Various scripting languages, such as Perl, Tcl/Tk, or Python also facilitate certain types of database manipulation, especially to text databases and documents. The specialised requirements of speech and language databases have often resulted in the development of special-purpose query languages.