data dependence Data dependence Structure dependence

data dependence Structure dependence If the file-system programs are affected by change in the file structure, they exhibit structuraldependence. For example, when we add dateof-birth field to the CUSTOMER file, the programs that access CUSTOMER will not work with the new CUSTOMER file structure. Therefore, all of the file system programs must be modified to match to the new file structure. Data dependence Even changes in file data characteristics such as changing a field from integer to decimal require changes in all programs that access the file. The practical significance of data dependence is the difference between the logical data format (how the user views the data) and the physical data format (how the computer stores the data).

Structure independence it is possible to make changes in the database structure without effecting the application program s ability to access the data

Types of Database

What is a database and a DBMS? Database Shared collection of logically related records designed to meet the information needs of an organization. System catalogue (metadata) provides description of data to enable program data independence. Logically related data comprises entities, attributes, and relationships of an organization s information. Database Management System (DBMS) A software system that enables users to define, create, maintain, and control access to the database.

DBMS Component

Block diagram Database Management

Example DBMS

Types of DB according to number of users 1. Single-user database: It supports only one user at a time. In other words, if user A is using the database, users B must wait until user A finishes. A single-user database that runs on a personal computer is called desktop database or personal database. With a personal DBMS, each client workstation must load the entire database application into main memory along with the client database application in order to view, insert, update, or print data. Recent personal databases use indexed files that enable the server to send only part of the DB, but in either case, these DBMSs put a heavy demand on client workstations and on the network.

Types of DB according to number of users Multi-user database: It supports multiple users to access the database simultaneously. In this type, the following two subtypes are found. Workgroup database: When the multiuser database supports a relatively small number of users (usually fewer than 50) or a specific department within an organization, it is called a workgroup database. Enterprise database: When the database is used by the entire organization and supports many users (more than 50, usually hundreds) across many departments, the database is known as an enterprise database.

Types of Database-according to location Centralized database Client/server database Distributed database Homogenous distributed database Heterogeneous distributed database

Client/Server Databases In contrast client/server databases, such as Oracle, split the DBMS and the applications accessing the DBMS into a process running on the server and the applications running on the client. The client application sends data requests across the network. When the server receives a request, the server DBMS process retrieves the data from the database, performs the requested functions on the data (sorting, filtering, etc) and sends only the final query result (not the entire database) back via the network to the client.

Traditional Two-Tier Client-Server Client (tier 1) manages user interface and runs applications. Server (tier 2) holds database and DBMS. Advantages include: wider access to existing databases; increased performance; possible reduction in hardware costs; reduction in communication costs; increased consistency.

Traditional Two-Tier Client-Server

Three-Tier Client-Server Client side presented two problems preventing true scalability: Fat client, requiring considerable resources on client s computer to run effectively. Significant client side administration overhead. By 1995, three layers proposed, each potentially running on a different platform.

Three-tier client-server architecture Client: user interface Application server: business and processing logic Database server: data validation and database access Advantage: reducing client cost, software distribution, maintenance cost, and balancing load

Three-Tier Client-Server Advantages: Thin client, requiring less expensive hardware. Application maintenance centralized. Easier to modify or replace one tier without affecting others. Separating business logic from database functions makes it easier to implement load balancing. Maps quite naturally to Web environment.

Three-Tier Client-Server

Transaction Processing Monitors Program that controls data transfer between clients and servers in order to provide a consistent environment, particularly for Online Transaction Processing (OLTP).

TPM as middle tier of 3-tier clientserver

Personal vs. Client/Server Multi-user client/server databases generate less network traffic than personal database. Handle client failures: In a personal database system, when a client workstation fails, the DB is likely to become damaged due to interrupted updates, insertions or deletions. Records in use at that time of the failure are locked by failed client, which means they are unavailable to other users.

Personal vs. Client/Server On the other hand, a client/server database is not affected when a client workstation fails. The failed client s in-progress transactions are lost, but the failure of a single client does not affect other users. In the case of a server failure, a central synchronized transaction log, which contains a record of all current database changes, enables in-progress transactions from all clients to be either fully completed or rolled back.

Summary of Client Server Functions : Client Manages the user interface. Accepts and checks syntax of user input. Generates database requests and transmits to the server. Passes response back to user. Server Accepts and processes database requests from clients. Checks authorization. Ensures integrity constraints not violated. Performs query/update processing and transmits response to the client. Maintains system catalogue. Provides concurrent database access. Provides recovery control.

Centralized Database

Distributed DBMS Environment

Reasons of data distribution Improved reliability and availability through distributed transactions Improved performance Allowing data sharing while maintaining some measure of local control Easier and more economical system expansion Distributed nature of some database applications

Advantages of DDBMSs Data are located near the greatest demand site. The data in a distributed environment are dispersed to match business requirement. Faster data access. Management of distributed data with different levels of transparency. Increased reliability and availability Reliability refers to system live time, that is, system is running efficiently most of the time. Availability is the probability that the system is continuously available (usable or accessible) during a time interval. It has multiple nodes (computers) and if one fails then others are available to do the job.

Advantages of DDBMSs Easier expansion (scalability): Allows new nodes (computers) to be added anytime without chaining the entire configuration. Faster data processing It makes possible to process data at different site, thereby spreading out the system s workload. Improved communication Example: Local accounts receivable operation use sales department data directly, without having to depend on delay report from the central office.

Advantages of DDBMSs Reduced operating cost(scalability) User friendly interface Processor independent Request do not depend on the specific processor, any processor can handle the user s request.

Disadvantages of DDBMSs Complexity Cost Security Increased storage requirement: Multiple copies of data has to be at different sites, thus an additional disk storage space will be required. Lack of Standards at database level. Lack of Experience Increase training cost Database Design More Complex

A DDBMS mainly classified into two types: Homogeneous Distributed database management systems Heterogeneous Distributed database management systems

Homogeneous DDBMS In a homogeneous distributed database all sites have identical software and are aware of each other and agree to cooperate in processing user requests. The homogeneous system is much easier to design and manage the operating system used, at each location must be same or compatible. The database application (or DBMS) used at each location must be same or compatible.

Homogeneous DDBMS All sites have identical software Are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of right to change schemas or software Appears to user as a single system

Heterogeneous DDBMS In a heterogeneous distributed database different sites may use different schema software. In heterogeneous systems, different nodes may have different hardware & software and data structures at various nodes or locations are also incompatible. Different computers and operating systems, database applications or data models may be used at each of the locations. Window Site 4 Object Oriented Object Oriented Site 3 Linux Unix Relational Site 5 Unix Site 1 Hierarchical Communications network Network DBMS Site 2 Linux Relation

Heterogeneous DDBMS Different sites may use different schemas and software Difference in schema is a major problem for query processing Difference in software is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing

Heterogeneous DDBMS On heterogeneous system, translations are required to allow communication between different sites (or DBMS). The heterogeneous system is often not technically or economically feasible. In this system, a user at one location may be able to read but not update the data at another location.

Basic Distributed Processing Environment In a distributed processing a database s logical processing is shared among two or more physically independent sites that are connected through a network. Example: The data selection and data validation might be performed on one computer. A report based on that data might be created on another computer.

Basic Distributed Processing Environment Each site can access and update the database. If database Is located on computer A, a network computer is called database server

Difference between distributed processing and distributed database Distributed database stores a logically related database over two or more physically independent site. Distributed processing system uses only single site database but shares the processing chores(duty) among several sites.

Database fragments In distributed database is composed of several parts called" database fragments The database fragments are located at different site and can also be replicated among various site.

What is fully distributed data The user don't need to know the name, location of each fragment in order to access the database User may be located at different sites can able to access the database as a single logical unit.

Chapter 2 Database Environment

Objectives of Three-Level Architecture All users should be able to access same data. A user s view is immune to changes made in other views. Users should not need to know physical database storage details.

Objectives of Three-Level Architecture DBA should be able to change database storage structures without affecting the users views. Internal structure of database should be unaffected by changes to physical aspects of storage. DBA should be able to change conceptual structure of database without affecting all users.

ANSI/SPARC 3-Tier Architecture Proposal for standard terminology & general architecture for DBSs produced in 1971 by DBTG (Data Base Task Group) appointed by Conference on DBSs & Languages (CODASYL) DBTG recognized the need for a 2-tier architecture with system view (schema) & user view (subschema) ANSI (American National Standards Institute)-SPARC (Standards Planning & Requirements Committee) produced similar terminology & architecture in 1975(ANSI/X3/SPARC)* in 1975 ANSI-SPARC recognized the need for a 3-tier architecture

Database architecture

ANSI/SPARC 3-Tier Architecture External Level Conceptual/Logical Level Internal Level User 1 User 2 User n View 1 View 2 View n Conceptual Schema Physical Schema Logical DI Physical DI Database

ANSI/SPARC 3 Levels of Abstraction View or External Level Highest level of abstraction which describes only a part of the DB User s view of the DB. This level describes that part of the DB that is relevant to each user. Logical or Conceptual Level Describes what data are stored in the DB & what relationships exist among those data Viewed by DBA All entire and their attributes and relationships Describes the entire DB in terms of relatively simpler structures Physical or Internal Level Lowest level of abstraction describes how data are actually stored Describes complex low-level data structures in detail Concerned storage space allocation for data Data compression and encryption techniques Mappings The process of transforming requests and results between levels

The Conceptual Model Global view of the entire database Representation of data as viewed by the entire organization Basis for identification and high-level description of main data objects, avoiding details

The Internal Model The database as seen by the DBMS Maps the conceptual model to the DBMS Depicts a specific representation of an internal model Logical independence Can change the internal model without affecting the conceptual model

Example

The Physical Model Lowest level of abstraction Describes the way data are saved on storage media such as disks or tapes Software and hardware dependent Requires database designers to have a detailed knowledge of the hardware and software used to implement database design Physical independence Can change the physical model without affecting the internal model

Data Independence Data and programs are independent of each other, so change is once has no or minimum effect on other. Data and its structure is stored in the database where as application programs manipulating this data are stored separately, the change in one does not unnecessarily effect other. Major objective of the 3-tier architecture is to proved data independence (DI) Upper levels are unaffected by changes at the lower level Two kinds of DI: Logical DI Physical DI

Data Independence Logical DI Immunity of the external schemas to changes in the conceptual schema Addition removal of entities, attributes, or relationships, should be possible having to change the external schemas

Data Independence Logical DI Faculty(fid:string, fname:string, sal:real) Faculty_public(fid:string, fname: string, office:integer) Faculty_private(fid:string, sal: real) View course_info can be redefined in terms of Faculty_public & Faculty_private so that users who queries course_info gets the same answer as before

Data Independence Physical DI Immunity of the conceptual schema to changes in the internal schema Using different file organizations or storage structures, using different storage devices, modifying indexes, or changing hashing algorithms should be possible without having to change the upper schemas Deterioration in performance is the most common reason for internal schema changes

Objective of the three-level architecture The objective of the three-level architecture: is to separate each user s view of the database from the way the database is physically represented. There are several reasons why this separation is desirable: Each user should be able to access the same data, but have a different customized view of the data. Each user should be able to change the way he or she views the data, and this change should not affect other users. Users should not have to deal directly with physical database storage details, such as indexing or hashing. In other words, a user s interaction with the database should be independent of storage considerations. The Database Administrator (DBA) should be able to change the database storage structures without affecting the users views. The internal structure of the database should be unaffected by changes to the physical aspects of storage, such as the changeover to a new storage device. The DBA should be able to change the conceptual structure of the database without affecting all users.