Granularity of Data Protection for MLS Applications and DBMSs *

Size: px

Start display at page:

Download "Granularity of Data Protection for MLS Applications and DBMSs *"

Howard Sharp
6 years ago
Views:

1 Granularity of Data Protection for MLS Applications and DBMSs * Arnon Rosenthal a and William Herndon b a The MITRE Corporation, 202 Burlington Road, Bedford MA 01730, USA arnie@mitre.org b The MITRE Corporation, 7525 Colshire Drive, McLean VA wherndon@mitre.org A secure Database Management System (DBMS) will be widely adopted only if it provides a convenient base for application development. Given this assumption, we examine two questions: "Should an application s view of the database consist of objects whose attributes are at more than one security level" and "Should a DBMS directly support such multilevel objects?" We investigate the impact on MLS application development of alternative degrees of DBMS support. Performance estimates and a comparison methodology are also presented. We conclude that applications should be built using object classes that capture natural real world entities and whose instances may include elements at different security levels. We then show that direct DBMS support for such classes can be quite helpful. As a byproduct, our analysis describes how untrusted code can decompose operations on multilevel objects into operations on single-level objects. Keyword Codes: K.6.5; H.2.1 Keywords: Information Systems, Security and Protection; Database Management, Logical Design, schema and subschema. 1. INTRODUCTION Secure Database Management System (DBMS) research must provide a basis for building DBMS environments that are both secure and acceptable to users. Organizations buy DBMS environments (a multi-billion dollar market consisting of DBMSs and related tools for application development, user interface design, and data administration) in order to develop applications more easily. To be acceptable to major development programs, we believe that multilevel secure DBMSs (S-DBMSs) and their associated methodologies and tools should allow applications to be built with effort only moderately exceeding the effort with popular commercial DBMS environments. This paper examines the tradeoffs inherent in designing a practical multilevel secure (MLS) DBMS with object management capabilities. In the near future, builders of applications with complex structures, such as design or multimedia systems, will routinely use * This work was funded under contract DAAB07-93-C-N651 from SPAWAR OOI. 1

2 object DBMSs. Therefore, secure object DBMSs will be needed to support such applications operating in an MLS environment. Assuming that security and assurance requirements have been met, we believe that these systems should be judged principally on how good an environment they provide for application development and evolution, with application performance also a significant factor. We therefore analyze the convenience of developing single-level applications over MLS data, and the performance of hypothetical S-DBMSs under various assumptions about the granularity of protection and DBMS architecture. The analysis in this paper leads to two main conclusions: Applications should be written over an interface consisting of multi-level conceptual objects 1 that match real world or mental phenomena. A conceptual object may include attributes at different security levels, and its class definition is independent of security level assignments. It is desirable and feasible to build a secure DBMS that directly supports multilevel conceptual objects, supplying their storage and access operations. If scoped appropriately (section 3.1.1), the support need not be excessively complex or costly. Support for conceptual objects as derived data (i.e., as views) is shown to be somewhat less advantageous. We reason from the application developer s point of view, as well as the DBMS builder s. To support our first conclusion, we argue that application code is simpler and more robust if built over multilevel conceptual objects, as compared with applications that explicitly manipulate a set of single-level objects that represents the conceptual object. In support of the second conclusion, we examine the costs of having the application builder, rather than the DBMS, produce the mapping from conceptual objects to base objects that are directly supported by the DBMS. For the following two kinds of DBMS support, we compare the code complexity and application performance: slo DBMS: Each DBMS object consists of attribute values at a single security level. We refer to this model as single-level objects (denoted slo). mlo DBMS: Each DBMS object consists of attribute values that may be at different security levels. We refer to this model as multilevel objects (denoted mlo). Our hypothetical S-DBMS must permit only legal information flows and is, therefore, restricted according to the Bell-LaPadula access control policy [Bell75]. Although we do not assume any particular level of assurance for our S-DBMS, it appears that mlo support need not cause a significant increase in the size of the trusted computing base, especially if applications are untrusted Application development perspective 1 We use the term object in the sense of object-orientation, as a unit of data and function. A unit of security protection will be called a granule. A multilevel conceptual object may also be called a natural or just conceptual object. 2

3 The major motivation for using object databases (as compared to using relational databases or files) is that real world phenomena are more directly represented as objects (with identity and methods); as a consequence, application development and enhancement becomes easier. For example, a planning system might be written in terms of object classes such as Ship, WeaponSystem, and Mission. The definition of these object classes is driven by application needs, not by computer-oriented representation issues. 2 To be acceptable to users of object databases, an S-DBMS must preserve these advantages. When database designers produce an object-oriented schema without regard to security, the schema consists of class definitions that describe the natural objects of the application. We call this a natural or conceptual schema; instances of these classes are called conceptual objects. Development of such a model is the key step for most application development methodologies and is supported by object-oriented analysis and database design tools [Booc91]. As the system and supporting DBMS evolve, these class definitions are likely to be fairly robust, more robust than either detailed application requirements or the current classifications of data elements. Therefore, we consider it desirable to have an application operate on conceptual objects, even in situations where the MLS DBMS cannot store the objects directly. An application is called naturally mlo if its conceptual objects contain attributes that should be classified at different security levels. We contend that many applications are naturally mlo. That is, a natural real world object is very likely to have information at differing security levels. 3 For example, the ship named Glomar Explorer may have a confidential name and destination, top secret mission, and unclassified weaponry. Whereas the Enterprise may have an unclassified name, secret destination and mission, and top secret weaponry. Large systems are extremely difficult to understand and build, even when using objects that map to natural real world phenomena. Application designs that manage multiple singlelevel DBMS storable objects instead of one conceptual object add substantial complexity. For example, an application written in terms of objects determined by security boundaries may require recoding if an attribute is reclassified. In contrast, an application written in terms of conceptual objects will fail only if some of its operations become illegal, or if it becomes confused by newly visible data. 2 Relational design methodologies have more difficulty providing such natural matches because many realworld objects are a poor match to a single relational tuple. 3 Requirements surveys may understate applications need for multilevel objects. Software requirement studies often exclude capabilities that appear infeasible with current technology, especially if operational practices currently work around the capability s absence. 3

4 It is possible to provide a view based facility for deriving multilevel conceptual objects from single-level DBMS objects, but an mlo S-DBMS that directly supports conceptual objects has several advantages: Application development effort decreases because developers do not need to define the view mapping (i.e., code that decomposes conceptual objects into single-level DBMS objects becomes unnecessary). Application performance is usually at least as good over an mlo DBMS, as when the same application tasks are accomplished through numerous requests that manipulate multiple single-level objects. With a trusted mlo DBMS, performance can improve significantly. Providing mlo capabilities need not significantly decrease the level of assurance, especially if applications are untrusted. 1.2 Terminology and overview We assume that the reader is basically familiar with object models. In such a model, the application information is organized into objects that are instances of object classes. Each object class has a set of attributes for its instances; the value of an attribute for an object instance is called an element. (Roughly the same terminology and analysis can be applied to relational systems, with object class and object replaced by relation and tuple.) Section 2 describes strategies for writing naturally mlo applications in terms of their conceptual objects, even if the underlying DBMS supports only single-level objects. Section 3 describes an approach to analyzing strategies that support conceptual objects as views over DBMS objects. Section 4 contains a detailed example and analysis of one way to implement a multilevel conceptual view over an slo DBMS. Section 5 summarizes our conclusions. 2. PROVISION OF MULTILEVEL CONCEPTUAL OBJECTS This section discusses why and how application code should be shielded from the securitymotivated decomposition of multilevel conceptual objects. Section 2.1 sets the stage by discussing DBMS support for views, and by showing several ways that an slo DBMS may represent a multilevel conceptual object. Section 2.2 discusses the software engineering consequences of writing applications directly against the single-level pieces of the decomposed conceptual object. We conclude that most applications should be coded in terms of conceptual objects, even if the DBMS only supports single-level objects. Section 2.3 summarizes the application builder s decomposition-related tasks, if the DBMS supports multilevel views over stored single-level objects. We conclude that direct support for multilevel objects will be substantially more convenient. Later sections provide some insight into ways that one might implement multilevel views or base objects. 4

5 2.1 Preliminaries: Multilevel views and decompositions Before discussing applications, it is useful to clarify our distinction between slo and mlo capabilities and to discuss the complexities that an application written over an slo DBMS must contend with. Section distinguishes between full support for multilevel objects (storage, update, triggers, administration, etc.) and support for multilevel views, which are useful but do not completely provide the kind of shielding advocated in section 2.2. Section describes three ways that a multilevel conceptual object can be partitioned into a set of single-level objects. The detailed analysis in later sections examines how operations on such an object may be similarly partitioned. These decomposition options give a sense of the difficulties to be overcome. Polyinstantiation issues are deferred to section Multilevel views and multilevel base objects The boundary between slo and mlo DBMSs can appear blurred because DBMSs based on single-level objects may still allow users to define views that combine data from multiple levels. This section draws a distinction between a multilevel view capability and an mlo DBMS. DBMSs typically support both base objects and views (derived objects). A base object type is one where the user describes the attribute list and the DBMS provides storage and access and ancillary operations. With views, the definer must supply (at a minimum) a set of base object classes and an expression that derives the view from those classes. In an object DBMS, a view is simply a new class whose data and operations are defined by methods that reference previously defined classes. Currently, the view definer must provide retrieval and customized update methods for the view s attributes. Ancillary operations (e.g., storage clustering, error messages, triggers, or constraints) can be provided, but must be implemented by the view definer, currently with little help from the DBMS. Relational DBMSs behave slightly differently. (This is one of the few places where the difference is significant to our argument). Relational views provide full query capability but current products offer limited updates and no support for ancillary DBMS operations. Furthermore, the definer of a view relation cannot attach methods that extend the DBMSprovided capabilities. We can now define an mlo DBMS to be a DBMS that treats multilevel objects as base objects, supported by all of the DBMS s capabilities. In particular, given a list of attributes, the DBMS directly provides storage (i.e., users need not define base objects and mappings to those base objects). This definition imposes no requirements on how objects are physically stored as records; that mapping is written by DBMS implementers and transparent to applications, except for its impact on performance and assurance. In contrast, a DBMS is defined to be slo if users need to map an object s attribute list onto attributes of previously defined objects. To keep the distinction crisp, we require that an slo DBMS provide only the usual DBMS capabilities, without special treatment for sets that decompose a conceptual object. Sets, indexing, and clustering are included, but code or 5

6 templates that understand the decomposition of multilevel conceptual objects would be considered a partial implementation of mlo capabilities Decomposing a multilevel object into single-level objects When only an slo DBMS is available, elements of a multilevel conceptual object must be partitioned into single-level chunks that we call partition objects or (to emphasize that their storage is handled directly by the DBMS) DBMS objects. These decompositions can be accomplished in several places: internally to an mlo DBMS, or by using a view that defines a conceptual object, or directly by the application program (as discussed in section 2.2). Several partitioning approaches will now be sketched (omitting minor details): Element-wise Partitioning: A conceptual object is split so that each element appears in its own DBMS object, which holds the element value and is classified at the element s level. The original conceptual object is replaced by a set of reference objects (classified at a single-level) that point at the element objects. If polyinstantiated, an element is represented as a set of values. Attribute-Set Partitioning: A conceptual object is split so that commonly classified attributes are defined in their own class. For example, suppose that the first two attributes of a conceptual object (e.g., MinSpeed, MaxSpeed) always have the same classification, as do the next three attributes (e.g., ShipName, ShipDesignator, HomePort ). The conceptual object s class is then split into two DBMS base classes, each of which defines the attributes having a common classification. This approach combines several of the classes created by element-wise partitioning. It generalizes the Columnar Partitioning approach of [Burn92], in which attributes of a base class have a common classification which is also fixed for all instances of the class. Levelwise Partitioning: Each conceptual object is represented by multiple instances of its class, each containing all the elements at a particular level. As shown in Figure 1, a conceptual TRIP object that includes elements at k different security levels is decomposed into k DBMS objects, with pointers to tie them to the trip. (For simplicity, the figure omits labels on pointer-list entries.) The conceptual object is derivable by overlaying information from its single-level objects. This decomposition uses the minimum number of slo DBMS objects and labels to implement each conceptual object. 6

7 TRIP Passenger From Kissinger (u) Beijing (ts) DC North (c) Managua (s) Iran @x5} Trip_DBMS_Objects Kissinger - DC - Beijing - North Managua Iran Figure 1. A Relation and Its Levelwise Decomposition 2.2 Building applications directly over an slo dbms s objects This section further discusses whether an application should be coded over multilevel conceptual objects or over partition objects. The alternatives are show in Figure 2. Over single-level objects the access operations span a greater conceptual distance, and several slo operations may sometimes be replaced by a single access to a multilevel object. However, as discussed later, it may not be possible to hide the single level objects completely. Application data access requirements Application data access requirements Multilevel conceptual objects or relations? Single level objects Single level objects Figure 2. Applications' Data Accesses, over Single-level and Multilevel Objects Programmers naturally manipulate conceptual objects, not partition objects. If an application is implemented directly in terms of the partition objects, we will see that the data manipulation code has more to do, and may be complex and confusing. On top of the real 7

8 difficulties of multilevel operation (e.g., refusal of access), one must cope with representational difficulties due to class definitions that do not resemble classes of the conceptual objects. The number of classes explodes by a factor of (# conceptual attributes) for element-wise partitioning, by a factor of (# groups of commonly labeled attributes) for attribute partitioning, or by a factor of two for levelwise partitioning. In addition, applications written in terms of partition objects seem harder to reuse or evolve. To move an existing application to an MLS environment, one must rewrite it to manipulate the (artificial) partition objects. Also, one cannot change decomposition strategy (to obtain different performance tradeoffs) without rewriting the application. The problems are illustrated below for attribute-set and element-wise partitioning. Levelwise partitioning is treated in the detailed examples of section 4, where similar observations can be made. Due to partitioning, even simple attribute retrievals require two steps instead of one. For instance, if an attribute mainweapon is classified separately from other attributes of an object o, a direct reference in the C++ style (i.e., o.mainweapon) cannot be used. Instead, mainweapon must be defined in its own class, and then an indirect reference: o.ptrtomainweaponelement->mainweapon must be used. There may also be significant performance degradation, especially compared with a trusted subject mlo DBMS that stores each conceptual object in a single contiguous multilevel record. As another example, consider the schema of Figure 1. The simple becomes the query Trip_dbms_objects.Passenger where trip=@t1 and Passenger null Attribute set partitioning (as proposed in [MiLu92, JaKo90]) has an additional problem it can make evolution very costly. A change to the classification requirements (e.g., reclassifying an element differently than others within an object instance) requires splitting the instance, and hence splitting the object s class. Many applications will need to be recompiled. Worse yet, it is necessary to recode applications whose expected information now is part of a new class. One can define a view that preserves the old class s attribute list, but the ancillary DBMS capabilities may not be available to applications. Thus, we see that artificial single-level classes do not provide an acceptable application development environment. The effort needed to write new applications increases, as does the effort needed to modify applications to reflect new functional or security requirements and to move existing applications to MLS operation. Use of single-level classes may also harm performance since applications need to access several objects instead of one. In view of these difficulties, we believe that applications should be designed and implemented in terms of their multilevel conceptual objects. That is, the application builder should define conceptual classes, and write the application over those classes The application builders tasks over mlo base or view objects An mlo DBMS can directly support conceptual object classes, reading a class s attribute definitions and providing storage and access operations. 8

9 In contrast, over an slo DBMS the application builder must define the single-level DBMS objects for the partition, and then define views that map conceptual objects and operations to these DBMS objects. Next, users must implement any operations that the DBMS does not automatically provide, and supply parameters to disambiguate operations (e.g., update) that the DBMS may support. Users further work (beyond defining the views) is shown in Figure 3. The left side applies to an object DBMS, and the right side to a relational DBMS. Heavy arrows represent operations whose implementation would be easy to generate automatically. For an object DBMS, the view definer must provide code for the basic access operations, and also for whatever ancillary operations are needed. Automated assistance would clearly be desirable (e.g., the DBMS could reduce the burden by providing code templates). We consider such aids to be a partial implementation of an mlo DBMS. Bold arrows indicate mappings that might be generated automatically. For views over a relational DBMS, users can code the additional operations, but the DBMS will not package the operations with the view definition, nor will it check the appropriateness of arguments. Application data access requirements Application data access requirements Storage defns and Read opns View classes of multilevel objects Classes of single level objects index defn, errors, updates, triggers, constraints View relations of multilevel tuples Storage defns and Read opns Relations of single level tuples index defn, errors, updates, triggers, constraints Figure 3. Implementation of operations against views, in MLS object and relational databases 3. APPROACH TO THE ANALYSIS This section gives an overview of our analysis of building an mlo DBMS versus having application-provided views. Two criteria are considered: development effort and application performance. We describe principles for comparison, present a layered model of possible implementation styles, and make some general observations about performance. Ideally, one would perform detailed analyses and prototype all the implementations, but this was infeasible. 9

10 Instead, Section 4 analyses implementations of each data access operation for each layered implementation style. Where decomposition is needed, levelwise implementation is used. A back of an envelope calculation for element-wise decomposition yielded qualitatively similar conclusions about slo versus mlo interfaces. 4 While not definitive, the exercise improved our intuition and raised our confidence that mlo DBMSs and user-defined views can be implemented with reasonable effort, performance, and assurance. Section 3.1 discusses our means for ensuring that comparisons of mlo versus slo systems pose similar hurdles to both systems. Section 3.2 summarizes the operations that the analysis will consider. Section 3.3 provide a layered model of alternative ways to partition conceptual objects into slo DBMS objects, and to map DBMS objects to stored records. Where decomposition is needed, levelwise partitioning is assumed. Finally, subsection 3.4 examines each layer s components of the run-time cost, deriving some general observations that apply to all access operations and all decompositions. Detailed implementations are presented and analyzed in section Analysis of mlo operations versus slo operations Leveling the playing field Our first inclination for comparing slo versus mlo DBMSs was to compare functionality, performance, and implementation effort of each operation they support. Unfortunately, such a comparison provides little guidance, revealing only that when a DBMS operation is given extra power its design complexity and run-time costs increase. It is more helpful to analyze application operations because, as compared with slo, fewer mlo DBMS operations can often accomplish the same application task. Therefore, we examine the programming effort and running time of all the DBMS calls needed for each data manipulation operation on a conceptual object. A conceptual access can usually be handled by a single mlo DBMS operation (though security restrictions or polyinstantiation may occasionally force applications to split the task into several operations, for both slo and mlo DBMSs). For conceptual views defined over an slo DBMS, we qualitatively examine the size and running time of the code that implements each data management operation. To prevent write-down, a secure DBMS must limit the power of updates. For example, a single unprivileged operation cannot Delete a conceptual object that contains attributes at different levels. And a Secret user cannot delete a single-level Confidential object. Hence a request s interpretation depends on the subject s level, the elements security levels, and the desired semantics for polyinstantiation. Even with single-level objects, the application must help to specify the operation semantics, or else select among DBMS provided treatments. 4 [MuJa93] thoroughly compares performance of two decompositions, Novel which approximates our levelwise, and Seaview which approximates element-wise. The operations allowed in Novel are more restrictive than in our semantics or Seaview. The relative performance depended heavily on the workload. 10

11 Motivated by these examples, we state a minimal requirement for mlo DBMS s update operations. The criterion is attainable, and proved useful when it identified a difficulty in our proposed operation definitions. Another desirable feature is that the criterion can be stated without committing to a particular polyinstantiation philosophy. Requirement for mlo Updates: If it takes several slo steps to accomplish a task, an mlo DBMS should accomplish the same task within the same number of steps. We thus envision an mlo DBMS that provides storage of multilevel objects, creation and deletion of objects whose nonnull attribute values are all at the subject s level, updates to elements that dominate a subject s level or are uninitialized (null), and queries to sets of multilevel objects. While such a DBMS has semantic limitations, it does provide functionality absent in systems that simply allow manual definition of multilevel views. 11

12 3.1.2 A Simplified Object Model We describe a simple object model consistent with many object DBMSs and then extend it to include security levels for data elements. Our goal is merely to provide a model sufficient to examine view implementations, so completeness has been sacrificed to simplicity. Objects, classes, elements, and attributes were introduced in section 1.2. o.a denotes the value of attribute A of an object instance o. The value may be either a primitive (e.g., an integer or string) or a reference to another object. Furthermore, we grant 5 that the DBMS allows a value to be a set of object references. The set of all attribute values (i.e., element values) of an object o is denoted o.a 1,..., o.a n. In the secure extension of the object model, every nonnull element has a security level. In an slo DBMS, all elements of an object must have identical levels; in an mlo DBMS they are unrestricted. We make no assumptions about the mechanism by which the mlo DBMS associates level information with an object. Null is interpreted as no available information ; it is used for uninitialized elements, may be explicitly assigned as an element s value, and is returned when the user is not authorized to see an element s value. A null-valued element has no assigned security level. Assumptions about some language capabilities will facilitate the analysis. To ease the burden of view implementers, we grant that the DBMS supports set valued queries that search through any object class, returning members that satisfy a search predicate. Also, the query language (which is not formally defined) can produce either a subset of the set being searched, or sets of newly created objects whose attribute values are specified in the query. There is simple syntax for searching or iterating through a set (e.g., for emp in dept.employees) Treatment of conflicting information (polyinstantiation) When different security levels have conflicting information about the values of certain attributes, there appears to be no single operation semantics acceptable to all applications. We do not wish to take a position on the polyinstantiation issue, but we must choose semantics to be used in our examples. Our sample code therefore adopts the following treatment: Polyinstantiation Treatment: For each object instance, for each attribute, the DBMS may maintain one value for each security level. Retrievals return all values at levels dominated by the subject. Modify operations insert, delete, or change values at the user s level. Future work may extend the software engineering idea of conceptual object to a polyinstantiated world. We do not expect the thrust of our arguments to change, because routine decomposition by security level should be handled by separate modules from 5 Our analysis sometimes makes assumptions that are favorable to slo DBMSs, rather than exploit an mlo advantage that is subtle or that depends on specific circumstances. To emphasize that the absence of the assumption would only make our conclusions stronger, we use grant rather than assume. 12

13 controversial polyinstantiation semantics. Also, we do not specify semantics or mechanisms for choosing a single value to return to an application. Finally, returned values may have advisory labels indicating their security level, but such labels are not considered in our examples. 13

14 3.2 List of data management operations We assume that applications require the customary object data management operations. Typical ones are specified below, and are assumed to be the native operations of both slo and mlo S-DBMSs. No special syntax is needed for accessing multilevel objects. Each operation takes the subject s level (denoted l s ) as an implicit parameter and accesses only information authorized by the Bell-LaPadula policy. Retrieval Operations: These operations return information about an object o from all levels that are dominated by l s. Two retrieval operations are considered: retrieving attribute A of object o, and retrieving all attributes of o. In case of polyinstantiation, Retrieve returns a set of values for each attribute in the target list. (A fuller description would show the result as a set of (value, level) pairs.) - Retrieve o.a - Retrieve o.[a 1,..., A n ] Update Operations: These operations modify values of an object s attributes, or create and delete objects. Four update operations will be considered: - Modify o.a = value - Modify o.[a 1,..., A n ] = [value 1,..., value n ] - Delete o - Create o We emphasize that the semantics represent an attempt at concreteness rather than at an excellent model. Modify sets the element at level l s to the indicated value, for the indicated attributes. Create produces an object whose attributes are all null and hence unlabeled. Delete operations are restricted to objects whose elements are at level l s (or, if write-up is specified, that dominate l s ); other deletions are very difficult even in slo DBMSs. Administrative Operations: We examine one example: - Reclassify o.a The listed operations resemble those of an object DBMS that supports persistent objects in a language like C++. User requests via a high-level set-oriented query language will compile down into such operations, but it is very difficult to generalize about the behavior of query optimizers. 3.3 Data interfaces and storage Our analysis uses a model consisting of three layers: conceptual objects, base DBMS objects that underlie the conceptual objects, and (for examinations of how DBMSs internal decisions affect relative performance) the stored records. Users see only the conceptual and DBMS layers, but for performance comparisons the storage layer must also be examined. We consider storage structures both for trusted subject DBMSs (with the optimistic assumption 14

15 that records for an object are clustered close together) and also for systems that segregate data by level and have OS-provided MAC [Keef89, Denn88]. As shown in Figure 4, a layer may provide several options for representing a multilevel conceptual object. Typically a DBMS will support one treatment at each layer, corresponding to a path in the figure. For performance analyses, the DBMS layer refers to the interface from application to DBMS process (while earlier sections were interested in the interface between application and vendor-supplied code). A dark arrow indicates decomposition, using one of the approaches from section Gray arrows indicate mappings that do not alter the object s structure. Application code Conceptual object Conceptual layer Multilevel DBMS-object Single level DBMS-objects DBMS layer Multilevel records 1-level records clustered by object (trusted subject DBMSs) 1-level records segregated by level (e.g., Seaview) Storage layer Figure 4. Potential storage strategies for multilevel conceptual objects Sections 3.4 and 4 analyze the performance of several styles of DBMS. The alternatives, described in terms of the layered model of the figure, are: Application-defined views over an slo DBMS: These implementations decompose a conceptual object into a set of slo DBMS objects. We did not notice any situationspecific optimizations that seemed to provide large savings. Trusted subject and segregated DBMSs offer alternative ways for the DBMS to cluster its records. A layered implementation of an mlo DBMS. An mlo DBMS may be implemented as a layer over an slo DBMS. This differs from the previous case only in that the DBMS rather than users generate the view definitions. This implementation need not introduce new trusted code, so it raises no assurance problems. Assuming that 15

16 the generated code executes as part of the user process, performance will be the same as in the previous case. An mlo DBMS that maps conceptual objects directly to multilevel records. The mapping is straightforward, but requires trusted code and may require modifications to existing storage and access routines. To analyze the performance of an implementation strategy, one combines the costs incurred at the DBMS-layer and storage-layer nodes in an implementation path, as discussed in the next section. 3.4 Performance comparisons This section specifies further details of the model used for performance analysis. Drawing on the layered model, it identifies circumstances where two implementations will incur equal cost at some layer, or where one will be superior. These general observations reduce the need for section 4 to analyze every cost component of every operation for every implementation. The analyses consider two costs for each of the scenarios mentioned in the previous section, assuming a fixed decomposition approach. First, the choice of node at the DBMS layer roughly determines the number of interprocess communications between application programs and the DBMS. Second, the choice of node at the storage layer determines the record layout, and hence the number of disk pages that need to be accessed for a request. It is not necessary to specify the factors relative weights because in each comparison of interest, whenever cost components differed the difference favors use of the mlo DBMS. We assume that the label checking cost in an mlo DBMS is not a problem, since often the cost of checking labels may be negligible compared with access times, and clever implementation might allow one label to apply to multiple values. Over an slo DBMS, a conceptual object corresponds to multiple DBMS objects. If the DBMS is trusted, we grant that one might be able to instruct an slo DBMS to cluster these DBMS objects fairly closely on secondary storage. The models and assumptions above are sufficient to make some useful general observations: If mlo DBMS objects are decomposed into single-level records, the layout in storage can be the same as if multilevel conceptual objects are decomposed into single-level DBMS objects that then map directly to records. So with such records, mlo and slo DBMSs seem to incur similar record access costs. If the DBMS implementation uses multilevel records (in a trusted subject architecture), many operations require fewer record accesses. Mlo DBMSs will have fewer IPCs. Since the number of levels encountered in an object instance will be small, there should not be need for many IPCs to control the transfer of records. Hence the number of required IPCs depends mostly on the number of calls to the DBMS interface, and not on the detailed record structures. 16

17 If most object instances contain elements at just one level, then performance of mlo and slo will be similar, because each conceptual object has one partition object (under levelwise decomposition). 17

18 SECTION 4 ANALYSIS OF DATA MANAGEMENT OPERATIONS This section presents the code that users would need to write in order to give conceptual objects the data management functionality described in section 3.2. We show how a multilevel conceptual view would be implemented over an slo DBMS. Each code fragment corresponds to a single mlo DBMS operation. After the code for each operation, performance estimates are presented. We qualitatively compare costs of conceptual objects operations over slo and mlo DBMS, for the storage schemes listed in section 3.4. Record access costs are discussed in terms of the number of accesses and locality. As discussed in section 3.4, mlo and slo DBMSs implemented over single-level records use the same record placements and hence incur the same record access costs. IPC costs depend on how many DBMS requests are needed to obtain the appropriate slo objects, and are independent of record placement. To obtain a basis for analysis, we needed to specify some semantic and implementation properties, but make no claim that these decisions are optimal. We do believe that similar qualitative conclusions about the benefits of mlo DBMSs would be obtained under many other assumptions. We assume levelwise partitioning, and that a rather simple form of polyinstantiation is used. More seriously, we have not considered how idiosyncrasies of query optimization would affect the performance of set-oriented queries. However, relatively few application programmers are capable of exploiting optimizer properties. The implementations assume that levelwise decomposition to slo uses the following schema at the DBMS layer: For each application conceptual class (denoted AppC), there must be a corresponding class of DBMS objects, denoted AppC_dbmsObjects. (Many slo operations will thus include a step to traverse from AppC to AppC_dbmsObjects). We grant that the slo DBMS provides a general mechanism for defining and accessing sets that span security levels. Application code can then give each member o AppC a new attribute, denoted o.dbmsobjects, that contains a set of references to the partition objects of o. o.dbmsobjects has no special status to an slo DBMS; its contents must be maintained by application code. 4.1 Retrieval operations Retrieve o.a mlo-application code: o.a slo-application code: retrieve x.a where x o.dbmsobjects and x.a is nonnull For this operation, the view definer over slo must provide code that searches o.dbmsobjects, returning the DBMS object for every level where there is a value for o.a. (Recall that the DBMS automatically hides members of o.dbmsobjects at levels not 18

19 dominated by l s.) If o.a is polyinstantiated, a set is returned. It is easy to imagine a macro that would produce this query from a user request for o.a; we consider such a macro as a step toward providing mlo. For both mlo and slo, one query is issued and one set of values is communicated to the application program, so the IPC costs are the same. For an mlo DBMS implemented over multilevel records, a single stored record is retrieved and returned; fewer accesses are required than for DBMSs implemented over single-level records, which must check the partition objects for all levels dominated by l s. However, with non-segregated storage, a DBMS that clusters the records of o.dbmsobjects will reduce the number of page accesses. Retrieve o.[a 1, A 2,..., A n ] Multi-attribute retrieval is a query operation without a direct counterpart among C++ direct manipulation operations. It can be very useful in defining user views. mlo-application code: o.[a 1, A 2,..., A n ] Given such a query, an mlo DBMS retrieves the desired attributes from the DBMS object corresponding to o. With slo, the desired attributes are stored among DBMS objects at multiple levels, so the result must be created from values taken from all of these objects. Several approaches are possible and because of the importance of this operation, we examine each approach in detail. 1. The query processor or the application can simply decompose the request into multiple requests for single attributes. slo-application code: o.a 1 ; o.a 2 ;...; o.a n The query processor would then expand the request to, and execute, the following: slo-application code: retrieve x.a 1 where x o.dbmsobjects and x.a 1 is nonnull; retrieve x.a 2 where x o.dbmsobjects and x.a 2 is nonnull;..., retrieve x.a n where x o.dbmsobjects and x.a n is nonnull Although these requests may be packaged as a stored procedure (and perhaps require only one IPC), the code contains multiple queries. Unless the DBMS is smarter than today s products, it will not recognize the commonality and will issue separate retrievals. While the buffer manager may ensure that the disk is accessed only once, each such retrieval will still involve substantial overhead. 2. The application can iterate through the returned DBMS objects, overlaying their nonnull attributes. Since query languages do not include an overlay operation, the application code below includes statements from both the query language and a general purpose programming language. (Though it is tempting to add an overlay 19

20 operation to the DBMS, such generalized code would be nontrivial to implement and should be regarded as another step toward providing DBMS support for mlo.) slo-application code: create result; for each x o.dbmsobjects { retrieve x; for i = 1,..., n { if x.a i is nonnull then add x.a i to the set result.a i }} Substantial IPC overhead is possible, depending on buffering tactics. The record accesses are the same as in the previous case. Qualitatively, we conclude that producing code for retrieving a multilevel object will be a burden on the view implementer, but not an enormous one. However, if applications directly access slo DBMS objects (violating the advice of section 2), every data access will become more complex; here the burden will be much greater. Finally, an mlo DBMS seems able to perform at least as well as the application code over single-level objects. 4.2 Update operations Update operations create or modify element values at the subject s level, as described in section They also modify the sets o.dbmsobjects when value assignment creates a new partition object. Minor modifications would suffice to add an option for new information to replace old information at higher levels (i.e., write-up.) Modify o.a mlo-application code: o.a = value slo-application code: retrieve o l in o.dbmsobjects where l = l s; if o l does not exist and value is nonnull{ create o l ; add o l to o.dbmsobjects } o l. A = value; if null(o l ) then delete o l The view implementation over slo makes several DBMS requests, so IPC overhead can be substantial. (Element-wise decomposition would perform better here, but possibly worse on multi-attribute operations.) The record access costs appear likely to be similar in the two cases. Modify o.a 1, o.a 2,..., o.a n The implementation parallels single-attribute Modify above, except that values are added to multiple elements. n separate Modify operations are not necessary, because all the new values go into the DBMS object at level l s. However, to implement Modify [with write-up], 20

21 to remove the previous values at higher levels, o.dbmsobjects must be searched at levels dominating l s. The performance comments for the previous operation still apply. For the update with write up, the comments about multi-attribute retrieval also apply. Delete o Deletion semantics are quite troublesome. Even in an slo DBMS it is difficult to handle the Secret deletion of an Unclassified Ship. Therefore our examination of code and performance is limited to cases where all attributes of the deleted object are at or above l s. A more robust treatment would require a subtle polyinstantiation strategy beyond the scope of this report. mlo-application code: delete o slo-application code: Flag the conceptual object o as deleted at level l s ; For each x o.dbmsobjects where level(x)= l s {delete x} The treatment can easily be expanded to provide Deletion with write-up. Write-up places data integrity at risk because the requesting subject cannot be informed of integrity violations that involve the higher levels; on the other hand, deleting information at l s without deleting corresponding high-level information also impairs database integrity. Two IPCs are needed, one for each line of code. An slo DBMS may succeed in clustering the modified information, so record access costs will probably not differ. However, for deletion with write-up it will be necessary to search dbmsobjects at higher levels. Create o (uninitialized) mlo-application code: create o slo-application code: create o; assign o.dbmsobjects = {} This code maps directly to the DBMS s create operation, so the view implementation will have costs comparable to the mlo DBMS. 4.3 Administrative operations To reclassify a conceptual attribute when views use levelwise decomposition, one must move elements among the DBMS objects. For simplicity, the analysis below assumes that the attribute to be reclassified is at the level of the subject. Reclassify o.a from level l s to level l n slo-application code: retrieve o l in o.dbmsobjects where l = l s ; (at ls) value = o l.a; o l.a = null; 21

22 if o l is entirely null, delete it; retrieve o l in o.dbmsobjects where l = l n; o l.a = value (at ln) There are two cases of this operation. If l n dominates l s, Reclassify can be unprivileged. Otherwise it must be a privileged operation because there is a write-down to l n. For an slo DBMS, reclassify makes separate requests to retrieve the DBMS objects, and therefore, has higher IPC cost than a single mlo request. Note also that although we have omitted the code that manages o.dbmsobjects, adjustments to its members may be required (much as in the Modify operation). 4.4 Observations The programs and discussions of sections 4.1 through 4.3 yield several observations about implementing conceptual classes over an slo DBMS. First, implementers of conceptual views must supply code for each operation. Only simple customizations of each operation are needed for each new view. Second, an mlo DBMS implemented over multilevel records may have performance advantages due to reduced record access; mlo DBMSs even over singlelevel records may have advantages due to reduction in IPC cost, especially in a client-server environment. Performance does not appear to improve if applications are given direct access to the slo representation. It is possible that code which simultaneously dealt with polyinstantiation semantics and the levelwise decomposition would have some performance advantages, but such code would be difficult to write and maintain. The example presented levelwise decomposition because it generates a minimum number of single-level objects. However, several operations were quite awkward, and element-wise decomposition should be considered as an alternative. Some of the view implementation code would become simpler. Also, retrieval of {all attributes} would be expressible as a single query against the element-wise DBMS Objects, so query optimizers and other DBMS components that understand view mappings could do a better job. However, schema browsers and database design tools would need enhancement to shield users from the explosion of artificial classes. The detailed analysis has a significant byproduct. It identifies how an mlo DBMS can be implemented over the ordinary interface of an slo DBMS, requiring neither trusted code (beyond that already in the DBMS) nor access to DBMS internals. However, in nonsegregated architectures, simpler implementations and better performance might be obtained by adding trusted code that manipulates multilevel records. The arguments of sections 3 and 4 are subject to several caveats. We did not consider optimization of requests expressed in high level data languages (since such requests are currently less common in object database applications than in the relational world). Levelwise decomposition may not be the best approach (though a sketch of element-wise decomposition yielded similar conclusions). We assumed that applications are untrusted and each subject 22

Implementation Techniques

V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight