Which rule specifies that an entity can be a member of only one subtype at a time?

Hierarchies in Data Modeling

Joe Celko, in Joe Celko's Trees and Hierarchies in SQL for Smarties (Second Edition), 2012

10.1 Types of Hierarchies

A generalization hierarchy can be either overlapping or disjoint. In an overlapping hierarchy, an entity can be a member of several subclasses. For example, people at a university could be broken into three subclasses: faculty, staff, and students. But there is nothing to prevent the same person from belonging to two or more of these subclasses. A student could be on staff as part of a co-op program, a professor can take a class as a student, and so forth.

In a disjoint hierarchy, an entity can be in one and only one subclass. For example, students at a university could be broken into three subclasses: foreign, in state, and out of state.

For the OO-minded reader, disjoint hierarchies are rather like single-inheritance type hierarchies, whereas overlapping hierarchies are like multiple-inheritance type hierarchies.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123877338000069

Domain modeling

Marco Brambilla, Piero Fraternali, in Interaction Flow Modeling Language, 2015

3.6 Generalization Hierarchies

The domain model allows the designer to organize classes into a hierarchy to highlight their common features.

Generalization

A generalization hierarchy (also called is-a hierarchy) connects a superclass and one or more subclasses, representing a specialization of the superclass. The hierarchy can be multilevel, because a subclass can in turn be a superclass of other subclasses.

Each subclass inherits the features (attributes, operations, and associations) defined in the superclass and may add locally defined features. For example, Figure 3.6 specifies that “Laptop” and “Tablets” are subclasses of class “Computer.” “Laptop” has the additional attribute “HDinterface,” denoting the type of hard disk interface, and “Tablet” has the additional attribute “Connectivity,” denoting the type of connectivity (WiFi, 3G, or 4G). We say that “Computer” is specialized into “Laptop” and “Tablet,” and conversely that “Laptop” and “Tablet” are generalized into “Computer.”

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 3.6. Graphic notation for IS-A hierarchies.

When domain modeling has the purpose of specifying the persistent classes that form the data tier of an application, it is customary to assume a few restrictive hypotheses that simplify the form of generalization hierarchies and make them more easily implementable with conventional database technology.

1.

Each class is defined as the specialization of at most one superclass. In technical terms, “multiple inheritance” is avoided.

2.

Each instance of a superclass is specialized exclusively into one subclass.

3.

Each class appears in at most one generalization hierarchy.

These restrictions reduce the expressive power of the domain model. For example, due to the first two constraints, an instance of class “Computer” cannot be a “Tablet” and a “Laptop” at the same time. However, a similar meaning can be conveyed by the diagram of Figure 3.7, which specializes class “Computer” into three subclasses: laptops, tablets, and convertibles. With this solution, the locally defined attributes of class “Laptop” and “Tablet” must be duplicated in class “Convertible.”

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 3.7. Generalization hierarchy approximating the use of multiple inheritance and nonexclusive specialization.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128001080000035

Transformation to Resolve Data Impedance

James Bean, in SOA and Web Services Interface Design, 2010

6.4 Abstraction

Abstraction removes implementation specificity and resolves a mapping across disparate structures. The realization of an abstraction is sometimes referred to as a generalization hierarchy, where subtypes can inherit and replace the structure of a parent supertype. In the case of abstraction, the parent supertype is never realized in a physical sense, and only the subtypes exist to replace it in the message.

A common example of abstraction is associated with different address types. An address is a form of location for someone or something. It may represent the locator for a physical place such as a residence address, or it may represent a more virtual location such as a telephone number or an e-mail address. Some service consumers might need to acquire the location of a person as a method of initiating contact and without constraint as to which type of address it is. For some customers, the residence or postal address might be available, while for others only a telephone number might be available. A Customer Contact data service would expose whatever contact information is available for the customer regardless of type. The challenge for transformation technologies such as XSLT is resolving the concept of polymorphism. Polymorphism implements a variable outcome where any one of a set of possible outcomes can be instantiated. The complexity lies with identifying and resolving the existence of a particular element over another. While abstraction can be resolved, the transformation functionality required to accomplish a polymorphic type of transformation will usually include multiple transformation steps and possibly a combination of XSLT and other transformation technologies.

Some practitioners avoid polymorphic interfaces. However, it is a powerful method for resolving abstraction requirements. Interestingly, XML Schemas support abstraction and polymorphism in the service interface as a combination of an abstract “head” element declaration and referencing “substitutionGroup” element declarations. The type differences between substitutionGroup elements can sometimes introduce type casting complexity, and a general recommendation is to define the head element as “anyType” (see Figure 6.17).

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 6.17. XML Schema substitutionGroup References with XML Spy by Altova

(Copyright 2003-2008 Altova GmbH, reprinted with permission of Altova)

Abstraction is a valuable Web service interface design technique. However, it can introduce some complexity to transformation. Effective service design requires an understanding of advantages and limitations and ensuring that the service, service interface XML Schemas, and any requisite transformation are well-defined.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012374891100006X

encoway

Thorsten Krebs, in Knowledge-Based Configuration, 2014

23.3 Modeling of the Working Example

The term configuration is often used with different meanings (compare Mittal and Frayman, 1989; Stumptner, 1997). In this chapter product configuration describes the task of assembling a product from a predefined set of components and the selection of component characteristics while satisfying a given set of constraints. A configuration model formalizes the knowledge about the application domain in a machine-readable way, abstracting the domains’s complexity to a subset relevant to the configuration process.

The inference engine engcon uses methods known from structure-based configuration. Component types declaratively define configurable products and the components of which they can be assembled. They have a unique name and define a set of attributes. Each attribute can be of a specific type, string or numeric. The result of a configuration process is a unique value assignment for each of the component attributes that constitute the configuration.

Component types are arranged within two different types of hierarchical relations: taxonomies and partonomies.

Taxonomies define a classification of objects in a generalization hierarchy: more special component types inherit attributes of more general ones. Inherited attributes can be monotonically refined; no new values may be introduced on lower levels. This, in combination with the closed world assumption3 is the basis for taxonomic inferences. For example, when any CPU can be fast or medium but a CPUD is always fast (refined inherited attribute value), and there is no other fast CPU (closed world assumption) then the system can infer that any fast CPU has to be a CPUD.

Partonomies define the decomposition of a configurable product. The parts that comprise a product are related to the composite together with a (minimum and maximum) multiplicity definition. For example, a HDUnit has one to four HDisk and one or two HDController parts. The concept of multiplicity allows for the specification of parts that can be either optional or required. Using the same component types within partonomies for different products is the key to define a large external variance with only a small internal variance.

A component type describes a prototypical product or component, some sort of pattern of which instances are generated within a configuration process according to the multiplicity definition. The generated instances together with the values that have been selected for their attributes form the configuration result. For more information on this topic and structure-based configuration in general see also Günter (1995).

Representing configuration knowledge for the inference engine engcon is based on a hybrid approach consisting of conceptual knowledge and constraint knowledge. Conceptual knowledge describes the products and the components from which products can be assembled. This means that conceptual knowledge describes the search space for possible configuration solutions. Constraint knowledge on the other hand narrows down this search space by defining situations that do not describe allowed configurations and thus must be avoided during the configuration process.

A constraint defines restrictions on one or more attributes of one or more component types. It consists of a condition and an action. The condition describes a situation in which the action definition is relevant with respect to a configuration task. For example, constraint gc1 has the condition that an instance of component type CPUS is present within the configuration. The action is executed when the condition matches the contents of a running configuration process and thus restricts instances of the component types that appear within the condition. The constraint gc1, for example, limits the choice of motherboards to those of compatible type MBDiamond. This means that the component types and their relations describe the search space while the set of constraints restricts this space to the subset of actually buildable combinations (see Mailharro, 1998).

Figure 23.1 shows a screenshot of encoway’s modeling tool K-Build. On the left side there are the two hierarchies: taxonomy on the top and partonomy on the bottom. We can see that an MBDiamond is a special MB (motherboard) that belongs to the component type Hardware and is part of PC. The tool K-Build uses the term “concept” to denote component types. On the right-hand side the attributes of MBDiamond are listed together with a reference to constraints that restrict its extension: gc1 in this example.

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 23.1. Screenshot of encoway’s modeling tool K-Build.

engcon distinguishes different types of constraints that restrict the value assignment of attribute values, specialize an instance along the defined taxonomy, or decompose a component type according to its part definitions. A very expressive ready-to-use library comes along with the modeling environment K-Build (see also Section 23.4). Constraints are written as plug-ins using Java and thus the constraint library can be easily extended according to the project’s specific needs. Typically, encoway prefers to use multidirectional constraints: no matter which of the concerned attributes are changed, the constraint evaluates possible values for all affected attributes. This reduces the potential of running into a conflict situation, one in which the currently selected set of values does not conform to the configuration model.

Avoiding conflicts when possible enables a better user experience. Because of this, encoway has developed a mechanism that allows the user to select values that are not possible in the current situation and computes the consequences of this decision: obviously, at least one of the prior decisions cannot persist and the user can decide to take back either that prior decision or the new decision. Computing the consequences of a decision that is not possible is achieved by prioritizing the decisions and executing them again in a specific order: the new decision first. When the prior decisions are executed and at some point a value cannot be set, then the user needs to decide between the existing and the new decision.

Using engcon, the process of steering through the search space is incremental and typically exactly one solution is sought: the one best matching with a given set of requirements. Within this process user decisions and system decisions alternate within each step: a user decision is performed and the system computes consequences such as taxonomic inferences and constraint evaluation. After all consequences are computed the result is displayed to the user and another iteration cycle begins (Ranze et al., 2002).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124158177000232

Requirements Analysis and Conceptual Data Modeling

Toby Teorey, ... H.V. Jagadish, in Database Modeling and Design (Fifth Edition), 2011

Conceptual Data Modeling

Let us now look more closely at the basic data elements and relationships that should be defined during requirements analysis and conceptual design. These two life cycle steps are often done simultaneously.

Consider the substeps in Step II.a, conceptual data modeling, using the ER model:

Classify entities and attributes (classify classes and attributes in UML).

Identify the generalization hierarchies (for both the ER model and UML).

Define relationships (define associations and association classes in UML).

The remainder of this section discusses the tasks involved in each substep.

Classify Entities and Attributes

Though it is easy to define entity, attribute, and relationship constructs, it is not as easy to distinguish their roles in modeling the database. What makes a data element an entity, an attribute, or even a relationship? For example, project headquarters are located in cities. Should “city” be an entity or an attribute? A vita is kept for each employee. Is “vita” an entity or a relationship?

The following guidelines for classifying entities and attributes will help the designer's thoughts converge to a normalized relational database design:

Entities should contain descriptive information.

Multivalued attributes should be classified as entities.

Attributes should be attached to the entities they most directly describe.

Now we examine each guideline in turn.

Entity Contents

Entities should contain descriptive information. If there is descriptive information about a data element, the data element should be classified as an entity. If a data element requires only an identifier and does not have relationships, it should be classified as an attribute. With city, for example, if there is some descriptive information such as country and population for cities, then city should be classified as an entity. If only the city name is needed to identify a city, then city should be classified as an attribute associated with some entity, such as Project. The exception to this rule is that if the identity of the value needs to be constrained by set membership, you should create it as an entity. For example, “state” is much the same as city, but you probably want to have a State entity that contains all the valid State instances. Examples of other data elements in the real world that are typically classified as entities include Employee, Task, Project, Department, Company, Customer, and so on.

Multivalued Attributes

A multivalued attribute of an entity is an attribute that can have more than one value associated with the key of the entity. For example, a large company could have many divisions, some of them possibly in different cities. In this case, division or division-name would be classified as a multivalued attribute of the Company entity (and its key, company-name). The headquarters-address attribute of the company, on the other hand, would normally be a single-valued attribute.

Classify multivalued attributes as entities. In this example, the multivalued attribute division-name should be reclassified as an entity Division with division-name as its identifier (key) and division-address as a descriptor attribute. If attributes are restricted to be single valued only, the later design and implementation decisions will be simplified.

Attribute Attachment

Attach attributes to the entities they most directly describe. For example, the attribute office-building-name should normally be an attribute of the entity Department, rather than the entity Employee. The procedure of identifying entities and attaching attributes to entities is iterative. Classify some data elements as entities and attach identifiers and descriptors to them. If you find some violation of the preceding guidelines, change some data elements from entity to attribute (or from attribute to entity), attach attributes to the new entities, and so forth.

Identify the Generalization Hierarchies

If there is a generalization hierarchy among entities, then put the identifier and generic descriptors in the supertype entity and put the same identifier and specific descriptors in the subtype entities.

For example, suppose five entities were identified in the ER model shown in Figure 2.5(a):

Employee, with identifier empno and descriptors empname, address, and date-of-birth.

Manager, with identifier empno and descriptors empname and jobtitle.

Engineer, with identifier empno and descriptors empname, highest-degree, and jobtitle.

Technician, with identifier empno, and descriptors empname and specialty.

Secretary, with identifier empno, and descriptors empname and best-skill.

Let's say we determine, through our analysis, that the entity Employee could be created as a generalization of Manager, Engineer, Technician, and Secretary. Then we put identifier empno and generic descriptors empname, address, and date-of-birth in the supertype entity Employee; identifier empno and specific descriptor jobtitle in the subtype entity Manager; identifier empno and specific descriptor highest-degree and jobtitle in the subtype entity Engineer; etc. Later, if we decide to eliminate Employee as an entity, the original identifiers and generic attributes can be redistributed to all the subtype entities.

Define Relationships

We now deal with data elements that represent associations among entities, which we call relationships. Examples of typical relationships are works-in, works-for, purchases, drives, or any verb that connects entities. For every relationship the following should be specified: degree (binary, ternary, etc.), connectivity (one-to-many, etc.), optional or mandatory existence, and any attributes that are associated with the relationship and not the entities. The following are some guidelines for defining the more difficult types of relationships.

Redundant Relationships

Analyze redundant relationships carefully. Two or more relationships that are used to represent the same concept are considered to be redundant. Redundant relationships are more likely to result in unnormalized tables when transforming the ER model into relational schemas. Note that two or more relationships are allowed between the same two entities as long as those relationships have different meanings. In this case they are not considered redundant. One important case of nonredundancy is shown in Figure 4.1(a) for the ER model and Figure 4.1(c) for UML. If “belongs-to” is a one-to-many relationship between Employee and Professional-association, if “located-in” is a one-to-many relationship between Professional-association and City, and if “lives-in” is a one-to-many relationship between Employee and City, then “lives-in” is not redundant because the relationships are unrelated. However, consider the situation shown in Figure 4.1(b) for the ER model and Figure 4.1(d) for UML. The employee works on a project located in a city, so the “works-in” relationship between Employee and City is redundant and can be eliminated.

Which rule specifies that an entity can be a member of only one subtype at a time?

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 4.1. Examples of redundant and nonredundant relationships: (a) nonredundant relationships, (b) redundant relationships using transitivity, (c) nonredundant associations, and (d) redundant associations using transitivity.

Ternary Relationships

Define ternary relationships carefully. We define a ternary relationship among three entities only when the concept cannot be represented by several binary relationships among those entities. For example, let us assume there is some association among entities Technician, Project, and Notebook. If each technician can be working on any of several projects and using the same notebooks on each project, then three many-to-many binary relationships can be defined (see Figure 4.2(a) for the ER model and Figure 4.2(c) for UML). If, however, each technician is constrained to use exactly one notebook for each project and that notebook belongs to only one technician, then a one-to-one-to-one ternary relationship should be defined (see Figure 4.2(b) for the ER model and Figure 4.2(d) for UML). The approach to take in ER modeling is to first attempt to express the associations in terms of binary relationships; if this is impossible because of the constraints of the associations, try to express them in terms of a ternary relationship.

Which rule specifies that an entity can be a member of only one subtype at a time?

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 4.2. Comparison of binary and ternary relationships: (a) binary relationships, (b) different meaning using a ternary relationship, (c) binary associations, and (d) different meaning using a ternary association.

The meaning of connectivity for ternary relationships is important. Figure 4.2(b) shows that for a given pair of instances of Technician and Project, there is only one corresponding instance of Notebook; for a given pair of instances of Technician and Notebook, there is only one corresponding instance of Project; and for a given pair of instances of Project and Notebook, there is only one instance of Technician. In general, we know by our definition of ternary relationships that if a relationship among three entities can only be expressed by a functional dependency involving the keys of all three entities, then it cannot be expressed using only binary relationships, which only apply to associations between two entities. Object-oriented design provides arguably a better way to model this situation (Muller, 1999).

Example of Data Modeling: Company Personnel and Project Database

ER Modeling of Individual Views Based on Requirements

Let us suppose it is desirable to build a company-wide database for a large engineering firm that keeps track of all full-time personnel, their skills and projects assigned, department (and divisions) worked in, engineer professional associations belonged to, and engineer desktop computers allocated. During the requirements collection process—that is, interviewing the end users—we obtain three views of the database.

The first view, a management view, defines each employee as working in a single department, and defines a division as the basic unit in the company, consisting of many departments. Each division and department has a manager, and we want to keep track of each manager. The ER model for this view is shown in Figure 4.3(a).

Which rule specifies that an entity can be a member of only one subtype at a time?

Which rule specifies that an entity can be a member of only one subtype at a time?

Which rule specifies that an entity can be a member of only one subtype at a time?

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 4.3. Example of data modeling: (a) management view, (b) employee view, (c) employee assignment view, and (d) global ER schema.

The second view defines each employee as having a job title: engineer, technician, secretary, manager, and so on. Engineers typically belong to professional associations and might be allocated an engineering workstation (or computer). Secretaries and managers are each allocated a desktop computer. A pool of desktops and workstations is maintained for potential allocation to new employees and for loans while an employee's computer is being repaired. Any employee may be married to another employee, and we want to keep track of this relationship to avoid assigning an employee to be managed by his or her spouse. This view is illustrated in Figure 4.3(b).

The third view, shown in Figure 4.3(c), involves the assignment of employees, mainly engineers and technicians, to projects. Employees may work on several projects at one time, and each project could be headquartered at different locations (cities). However, each employee at a given location works on only one project at that location. Employee skills can be individually selected for a given project, but no individual has a monopoly on skills, projects, or locations.

Global ER Schema

A simple integration of the three views just defined over the entity Employee results in the global ER schema (diagram) in Figure 4.3(d), which becomes the basis for developing the normalized tables. Each relationship in the global schema is based on a verifiable assertion about the actual data in the enterprise, and analysis of those assertions leads to the transformation of these ER constructs into candidate SQL tables, as Chapter 5 shows.

Note that equivalent views and integration could be done for a UML conceptual model over the class Employee. We will use the ER model for the examples in the rest of this chapter, however.

The diagram shows examples of binary, ternary, and binary recursive relationships; optional and mandatory existence in relationships; and generalization with the disjointness constraint. Ternary relationships “skill-used” and “assigned-to” are necessary because binary relationships cannot be used for the equivalent notions. For example, one employee and one location determine exactly one project (a functional dependency). In the case of “skill-used,” selective use of skills to projects cannot be represented with binary relationships.

The use of optional existence, for instance, between Employee and Division or between Employee and Department, is derived from our general knowledge that most employees will not be managers of any division or department. In another example of optional existence, we show that the allocation of a workstation to an engineer may not always occur, nor will all desktops or workstations necessarily be allocated to someone at all times. In general, all relationships, optional existence constraints, and generalization constructs need to be verified with the end user before the ER model is transformed to SQL tables.

In summary, the application of the ER model to relational database design offers the following benefits:

Use of an ER approach focuses end users' discussions on important relationships between entities. Some applications are characterized by counterexamples affecting a small number of instances, and lengthy consideration of these instances can divert attention from basic relationships.

A diagrammatic syntax conveys a great deal of information in a compact, readily understandable form.

Extensions to the original ER model, such as optional and mandatory membership classes, are important in many relationships. Generalization allows entities to be grouped for one functional role or to be seen as separate subtypes when other constraints are imposed.

A complete set of rules transforms ER constructs into mostly normalized SQL tables, which follow easily from real-world requirements.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820204000045

Smarthome Configuration Model

Lothar Hotz, Katharina Wolter, in Knowledge-Based Configuration, 2014

10.3.2 A System Model of Smarthomes

Figures 10.3 and 10.4 present the system model of Smarthomes in two (for illustration reasons) separated parts. Figure 10.3 illustrates the definition of the compositional structure of the SmartHomeSystem and Figure 10.4 illustrates major parts of the generalization hierarchy. The following component types are defined:

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 10.4. The Smarthome system model (Part 2) with component hierarchy including attributes. SmartHomeObject is the taxonomical root of all classes.

A SmartHomeSystem consists of a voltage source (VoltageSource), an optional central control unit (CentralControlUnit), up to five smart rooms (SmartRoom), and between zero and five outside sensors (SensorDeviceOutside).

The CentralControlUnit allows a manual central regulation of all devices. The VoltageSource has to supply electrical power such that all network nodes can be included in the network.

Each SmartRoom consists of light groups (LightGroup), control units (ControlUnit) such as switches (Switch) and dimmers (Dimmer), a sun-blinds control (SunBlindsControl), and several sensors (SensorInside).

LightGroups include a number of lights that can be regulated together through a control unit (ControlUnit). A control unit can regulate several light groups (directed association regulates). A sun-blinds control (SunBlindsControl) regulates the sun blinds, which themselves are not part of the smart room model. Instances of SensorInside (i.e., sensors of the room) provide signals on the bus that are interpreted by an appropriate control.

The outside sensors (SensorDeviceOutside) provide the sensor data for the control units. In our model, the SunBlindsControl interprets sensor data, while a ControlUnit reacts on sensor data as well as on manual actions (e.g., via a Switch). Specific sensors inside a room control the presence of a person in the room.

The component types Switch and SunBlindsControl have a SwitchType. Thus, SwitchType is a shared class; however, each control has its individual switch type.

Figure 10.4 illustrates the generalization hierarchy of all classes used in the system model. While the compositional hierarchy is based on general classes, the generalization hierarchy includes specific components of certain companies. For example, the general class SensorInside is specialized into inside sensors that can detect both presence and brightness (here called PBS-A and PBS-B for Presence-Brightness Sensor, both of type PresenceAndBrightness) or additionally measure temperature (PBTS-A of type PresenceAndBrightnessAndTemperature).

As mentioned in Subsection 10.2, a local operating network connects the components. The class NetworkNode summarizes all components that are connected through the bus (i.e., sensors and control units). By doing so, it is possible to define attributes as well as constraints that apply to all network nodes. As an example, see the System Attribute Constraints in Subsection 10.4.2, which restrict the PowerConsumption of NetworkNodes.

The outside sensors (SensorDeviceOutside) can be specialized to the specific sensor device SensorAggregate that may contain several types of sensors (such as wind or temperature sensors) or can be specialized to Multisensor. The parts of SensorAggregate (i.e., WindSensor, OutsideTemperatureSensor, BrightnessSensor, and RainfallSensor) are modeled as further sensors. A Multisensor covers brightness and presence sensors in one component. This choice between a combination of distinct sensors into a sensor aggregate and the use of one multisensor, which includes various sensors, is one of the major system-related decisions for a Smarthome system. Depending on the selected features, a combination of sensors or choice of the multisensor is possible. The Constraint fsc-3 represents this dependency (see Subsection 10.4.3 and Table 10.1). These sensors furthermore illustrate the use of attribute Price, which all basic component types have by inheriting it from ProductComponent. Thus, the attribute Price does not have to be modeled for each component type and only specific prices have to be defined for the more specific components.

Table 10.1. Certain components realize combinations of the SunBlindsFeature. An extensional table represents the constraint fsc-3 by enumerating possible sensor combinations. The following abbreviations hold: WD: WindDep., BD: BrightnessDep., TD: TemperatureDep., SA: SensorAggregate, WS: WindSensor, BS: BrightnessSensor, TS: TemperatureSensor, MS: MultiSensor.

Feature SelectionRequired Components
SunBlindsFeatureSensorDeviceOutside
WDBDTDSAWSBSTSMS
yes yes yes yes yes yes yes no
yes yes no yes yes yes no no
yes no yes yes yes no yes no
yes no no yes yes no no no
no yes yes no no no no yes
no yes no no no no no yes
no no yes no no no no yes
no no no no no no no no

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124158177000104

Configuration Knowledge Representation and Reasoning

Lothar Hotz, ... Katharina Wolter, in Knowledge-Based Configuration, 2014

Computer Configuration Model in UML: Structure

In a configuration model (see Figure 6.7), the structure of a configurable product is defined on the basis of the modeling facilities component types (concepts or classes), associations with multiplicities, and generalizations. Note that existing commercial configuration environments do not directly support UML-based representations but typically include similar modeling facilities that allow the representation of partonomies, generalization hierarchies, and constraints.

Component types: A component type has a unique name and is characterized by a set of attributes. Attributes are defined on the basis of datatypes (the datatype of each attribute is defined in [datatype], which can denote a constant, an enumeration, or a range). For example, maxprice[0..2500] specifies an integer range attribute of the component type PC. In the examples in this book, attributes are single-valued; that is, no attribute has more than one value.

Associations and Multiplicities: The part-of modeling facility is used to describe part-of associations between component types. In its simplest form, these associations are assumed to be of type composite (not shared); this means that no instance (component) of a component type can be part of more than one instance (whole component). For example, each CPU is part of exactly one MB (motherboard) and each MB consists of one or two CPUs. Note that we apply multiplicities to further describe associations between component types. Other examples of multiplicities are the following: each PC (personal computer) consists of one or more Applications (no upper limit defined here) and each Application is part of exactly one PC. Each harddisk (HDisk) has exactly one DiskPort and each DiskPort is associated with one HDisk (within the same HDUnit). Furthermore, each DiskPort is connected with a ControllerPort. Note that additional types of associations are included in the individual book chapters where needed.

Generalizations: This modeling facility relates two or more component types through a subset relation. The generalization relationship between subtypes and supertype (or the inverse specialization relationship between supertype and subtypes) can be characterized as disjoint and complete. Disjointness means that each instance of a component type X can be assigned to only one of the subtypes of X. For example, each CPU is either of type CPUS or CPUD but not both. Completeness means that each instance is assigned to one of the leaf nodes of the generalization hierarchy. Furthermore, generalization hierarchies in the configuration context typically do not allow multiple inheritance. Again, further modeling facilities with different semantics are introduced in the other chapters of this book where needed. Note that for reasons of simplicity no definition of specific application types is included in our example; it is assumed that each instance of type Application has the same required hdcapacity (200) and the same price, which is 50. In a complete model of a personal computer additional subtypes would be included or defined as part of a corresponding component catalog.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124158177000062

The Entity–Relationship Model

Toby Teorey, ... H.V. Jagadish, in Database Modeling and Design (Fifth Edition), 2011

Generalization: Supertypes and Subtypes

The original ER model has been effectively used for communicating fundamental data and relationship definitions with the end user for a long time. However, using it to develop and integrate conceptual models with different end user views was severely limited until it could be extended to include database abstraction concepts such as generalization. The generalization relationship specifies that several types of entities with certain common attributes can be generalized into a higher-level entity type—a generic or superclass entity, which is more commonly known as a supertype entity. The lower levels of entities—subtypes in a generalization hierarchy—can be either disjoint or overlapping subsets of the supertype entity. As an example, in Figure 2.5 the entity Employee is a higher-level abstraction of Manager, Engineer, Technician, and Secretary, all of which are disjoint types of Employee. The ER model construct for the generalization abstraction is the connection of a supertype entity with its subtypes, using a circle and the subset symbol on the connecting lines from the circle to the subtype entities. The circle contains a letter specifying a disjointness constraint (see the following discussion). Specialization, the reverse of generalization, is an inversion of the same concept; it indicates that subtypes specialize the supertype.

Which rule specifies that an entity can be a member of only one subtype at a time?

Figure 2.5. Supertypes and subtypes: (a) generalization with disjoint subtypes, and (b) generalization with overlapping subtypes and completeness constraint.

A supertype entity in one relationship may be a subtype entity in another relationship. When a structure comprises a combination of supertype/subtype relationships, that structure is called a supertype/subtype hierarchy, or generalization hierarchy. Generalization can also be described in terms of inheritance, which specifies that all the attributes of a supertype are propagated down the hierarchy to entities of a lower type. Generalization may occur when a generic entity, which we call the supertype entity, is partitioned by different values of a common attribute. For example, in Figure 2.5, the entity Employee is a generalization of Manager, Engineer, Technician, and Secretary over the attribute job-title in Employee.

Generalization can be further classified by two important constraints on the subtype entities: disjointness and completeness. The disjointness constraint requires the subtype entities to be mutually exclusive. We denote this type of constraint by the letter “d” written inside the generalization circle (Figure 2.5a). Subtypes that are not disjoint (i.e., that overlap) are designated by using the letter “o” inside the circle. As an example, the supertype entity Individual has two subtype entities, Employee and Customer; these subtypes could be described as overlapping or not mutually exclusive (Figure 2.5b). Regardless of whether the subtypes are disjoint or overlapping, they may have additional special attributes in addition to the generic (inherited) attributes from the supertype.

The completeness constraint requires the subtypes to be all-inclusive of the supertype. Thus, subtypes can be defined as either total or partial coverage of the supertype. For example, in a generalization hierarchy with supertype Individual and subtypes Employee and Customer, the subtypes may be described as all-inclusive or total. We denote this type of constraint by a double line between the supertype entity and the circle. This is indicated in Figure 2.5(b), which implies that the only types of individuals to be considered in the database are employees and customers.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820204000021

Discovery of Abstract Knowledge from Non-Atomic Attribute Values in Fuzzy Relational Databases

Rafal A. Angryk, Frederick E. Petry, in Modern Information Processing, 2006

3 Attribute-Oriented Induction from fuzzy tuples

3.1 Building concept hierarchy from α-proximity table

The creation of an α-proximity relation for a particular domain Dj can lead to the extraction of a crisp concept hierarchy, allowing attribute-oriented induction on such a domain. From the propagation of shadings in the Table 2, we can easily observe that the equivalence classes marked in the table have a nested character.

As in the case of a fuzzy similarity relation [21], each α-cut (where α∈[0,1]) of a fuzzy binary relation in Table 2 creates disjoint equivalence classes in the domain Dj.If we let Πα denote a single equivalence class partition induced on domain Dj by a single α-level–set, then by an increase of the value of α to α' we are able to extract the subclass of Πα, denoted Πα' (a refinement of the previous equivalence class partition). A nested sequence of partitions Πα1,Πα2,…, Παk, where α1<α2<…<αk, may be represented in the form of a partition tree, as in Figure 1.

Which rule specifies that an entity can be a member of only one subtype at a time?

Fig. 1. Partition tree of domain HAIR COLOR, built on the basis of Table 2.

This nested sequence of partitions in the form of a tree has a structure identical with the crisp concept hierarchy used for AOI. The increase of conceptual abstraction in the partition tree is denoted by decreasing values of α; lack of abstraction during generalization (0-abstraction level at the bottom of generalization hierarchy) complies with the 1-cut of the similarity relation (α = 1.0), and is denoted as S1.0

An advantage of the utilization of the proximity-based fuzzy model is that such a hierarchy, by definition implemented in every such fuzzy database, can be extracted automatically for a user who has no background knowledge about the specific domain.

The only thing differentiating the hierarchy in the Figure 1 from the crisp concept hierarchies applicable for AOI is the lack of abstract concepts, which are used as the labels characterizing the sets of generalized (grouped) concepts. To create a complete set of the abstract labels it is sufficient to choose only one value of the attribute per the equivalence class at the each level of hierarchy (α), and assign a unique abstract descriptor to it. Sets of such definitions (value of attribute and value of α linked with an abstract name) can be stored as a relational database table (Table 3), where the first two columns create a natural key for this relation.

Table 3. Table of abstract descriptors (for Figure 1).

ATTRIBUTE VALUEABSTRACTION LEVEL (α)ABSTRACT DESCRIPTOR
black 0.8 DARKISH
red 0.8 REDDISH
blond 0.8 BLONDISH
black 0.7 DARK
blond 0.7 BLONDISH
black 0.5 ANY

The combination of partition tree in Figure 1 and the table of abstract descriptors allow us to build the generalization hierarchy in the form shown in Figure 2.

Which rule specifies that an entity can be a member of only one subtype at a time?

Fig. 2. Crisp generalization hierarchy formed using Tables 2 and 3.

The disjoint character of equivalence classes generated from the α-proximity table does not allow any concept in the hierarchy to have more than one direct abstract at every level of generalization hierarchy. Therefore this approach can be utilized only to form a crisp generalization hierarchy. Such a hierarchy, however, can be then successfully applied as a foundation to the development of a fuzzy concept hierarchy – by extending it with additional edges to represent partial membership of the lower level concepts in their direct abstract descriptors. Depending on the assigned memberships, reflecting preferences of the user, this can create consistent or inconsistent fuzzy concept hierarchies.

3.2 Character of imprecision reflected in fuzzy records

Before introducing our approach to AOI from imprecise data, let us analyze briefly the nature of the uncertainty representation allowed in the fuzzy database model. There are two actual representations of imprecision in the fuzzy database schema. First, as already mentioned, is the occurrence of multiple attribute values. Obviously, the more descriptors we use to characterize a particular record in the database, the more imprecise is its depiction. Uncertainty about the description is also implicitly reflected in the similarity of values characterizing a particular entity, e.g. when we describe someone's hair as {black, dark brown, red, auburn} we have more doubt about the person's hair colour than in the case when we characterize it as {blond, dark blond, light brown, brown}, since this description would be rather immediately interpreted as “blondish”. There are the same number of attribute values in each case, however the higher similarity of values utilized in the second set results in the higher informativeness carried by the second example.

The imprecision of the original information is actually reflected both in the number of inserted descriptors for a particular attribute and in the similarity of these values. In Table 4 we summarize observations concerning their relationship. The domain called Quantity of attribute values is a discrete set of integer numbers (> 0, since the fuzzy model does not allow empty attributes); the Similarity of attribute values is characterized in fuzzy databases with a continuous set of real numbers in a range [0,1] – the values of α.

Table 4. Character of information stored in the Fuzzy Databases.

Quantity of attr. values/Similarity of attr. valuesLOWHIGH
SMALL Imprecise Precise
LARGE Imprecise(Error suspected) Precise(Confirmed)

The simplified characterization of data imprecision presented in Table 4 can be enhanced with a brief analysis of the boundary values. The measure of imprecision can be thought of ranging between 0 (i.e. the lack of uncertainty about results) and infinity (maximum imprecision). The common opinion that even flawed information is better than lack of the information, leads us to say that imprecision reaches its maximum limits when there is no data inserted at all. Since the fuzzy database model does not allow empty attributes we will not consider this further. The minimum imprecision (0-level) is achieved by a single attribute value. If there are no other descriptors or auxiliary information, we must assume the inserted value is a perfect characterization of the particular entity's feature. The same minimum can be also accomplished with multiple values if they all have identical meaning (synonyms). Despite the fact that multiple, identical descriptors additionally confirm an initially inserted value, they cannot lead to further reduction of imprecision, since it already has the minimal value. Therefore the descriptors, which are so similar that they are considered to be identical, can be reduced to a single descriptor. Obviously, some attribute values, initially considered as different, may be treated as identical at a higher abstraction level. Therefore we can conclude that the practically achievable minimum of imprecision depends on the abstraction level of employed descriptors, and can reach its original 0-level only at the lowest level of abstraction (for α = 1.0 in our fuzzy database model).

3.3 Partial Vote Propagation to generalize imprecise data

Since the fuzzy database model permits the reflection of uncertainty about the value characterizing each feature via insertion of multiple attribute descriptors, it is necessary to provide a mechanism allowing AOI from such data. In this section we propose a method enabling generalization of multiple attribute values, based on the dependencies presented in the previous section.

In the case of attribute generalization with utilization of a concept hierarchy(Figure 2), we have single attribute values at the bottom level of the hierarchy. Therefore the generalization of tuples with single descriptors is straightforward. A problem arises with the case of multiple attribute values describing a single entity. Where should we assign a person whose hair was described as {d.brown, auburn, red}? Our solution is based on partial vote propagation, where a single vote, corresponding to one database record, is partitioned to represent each of the originally inserted attribute values. During AOI all fractions of this vote propagate gradually through multiple levels of generalization hierarchy, the same way as the regular (precise) records do. The only difference is that the record with uncertainty has multiple generalization paths (different paths for different vote's fractions), whereas each of the precise records has only one generalization path.

The most trivial solution would be to split the vote equally among all inserted descriptors: {d.brown|0.(3), aubum|0.(3), red|0.(3)}. This approach however does not take into consideration real life dependencies, which are reflected not only in the number of inserted descriptors, but also in their similarity. We propose replacement of the even distribution of vote with a nonlinear spread, dependent on the similarity and the number of inserted values. Using the partition tree (Figure 1), we can extract from the set of the originally inserted values the concepts which are more similar to each other than to the remaining descriptors; we call these subsets of resemblances (e.g. {red, auburn} from the above-mentioned example). Then we use the subset as a base for calculating new vote's fractions. An important aspect of this approach is extraction of the subsets of similarities at the lowest possible level of their occurrence, since the nested character of α-proximity relation guarantees that above this α-level they will always occur together. Repetitive extraction of such subsets could unbalance the original dependencies among inserted values.

The proposed approach is rather straightforward given (1) a set of attribute values inserted as a description of particular entity (i.e. Set of Descriptors), and (2) a hierarchical structure (tree) reflecting Zadeh's partition tree for the particular attribute (Figure 1). We want to extract the list of all subsets of similarities from the given Set of Descriptors, with the highest Level of α-proximity of their common occurrence. This is achieved by preorder recursive traversal of the partition tree‥ Searching from the root of the tree, if any subset of the given Set of Descriptors occurs at a particular node of the concept hierarchy, we store the values that were recognized as similar, and the value of α. In Figure 3 we present an example of such a search for subsets of similarities for a record with values {black, d.brown, blond, red}. Numbers on the links in the tree represent the order in which the particular subsets of similarities were extracted.

Which rule specifies that an entity can be a member of only one subtype at a time?

Fig. 3. Subsets of similar values extracted from the original set of descriptors.

After extracting the subsets of similarities (Figure 3), we summarize α values as a measure reflecting both the frequency of occurrence of the particular attribute values in the subsets of similarities, as well as the abstraction level of these occurrences. Since value blond appeared only at the top and the bottom level, we assign it a grade 1.5 (1.0+0.5). The remaining attribute values were graded as follows:

black|(1.0 + 0.8 + 0.7 + 0.5) = black|3.0

d.brown|(1. 0 + 0.8 + 0.7 + 0.5) = d.brown|3.0

red|(1.0 + 0.7 + 0.5) = red|2.2

In die next step, we use the sum of all generated grades (1.5+3.0 + 3.0+ 2.2 = 9.7) in order to normalize the grades finally assigned to each of the participating attribute values:

black|(3.0/9.7) = black|0.31

d.brown|(3.0/9.7) = d.brown/.0.31

red|(2.2/9.7) = red|0.23

blond|(1.5/9.7) = blond|0.15

This new distribution of the vote's fractions more accurately reflects real life dependencies than a linear approach. The final results are shown in Figure 4.

Which rule specifies that an entity can be a member of only one subtype at a time?

Fig. 4. Partial Vote Propagation for records with uncertainty.

Normalization of the initial grades has a crucial meaning for preservation of the generalization model's completeness. It guarantees that each of the records is represented as a unity, despite being variously distributed at each of the generalization levels.

During the AOI process all fractions of the vote may gradually merge to finally become unity at the level of abstraction high enough to erase the originally occurring imprecision. In such a case, we observe that there is a removal of imprecision from data due to its generalization. Such a connection between the precision and certainty seems to be natural and was already noted by other researchers [3, 15]. In general, very abstract statements have a greater probability to be “correct” than more detailed ones.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444520753500157

NET Privacy

Marco Cremonini, ... Claudio Agostino Ardagna, in Computer and Information Security Handbook, 2009

Data Privacy Protection

The concept of anonymity was first introduced in the context of relational databases to avoid linking between published data and users’ identity. Usually, to protect user anonymity, data holders encrypt or remove explicit identifiers such as name and Social Security number (SSN). However, data deidentification does not provide full anonymity. Released data can in fact be linked to other publicly available information to reidentify users and to infer data that should not be available to the recipients. For instance, a set of anonymized data could contain attributes that almost uniquely identify a user, such as, race, date of birth, and ZIP code. Table 28.2A and Table 28.2B show an example of where the anonymous medical data contained in a table are linked with the census data to reidentify users. It is easy to see that in Table 28.2a there is a unique tuple with a male born on 03/30/1938 and living in the area with ZIP code 10249. As a consequence, if this combination of attributes is also unique in the census data in Table 28.2b, John Doe is identified, revealing that he suffers from obesity.

Table 28.2A. Census Data

SSNNameAddressCityDate of BirthZIP
John Doe New York 03/30/1938 10249

Table 28.2B. User reidentification

Anonymous Medical Data
SSNNameDate of BirthSexZIPMarital StatusDisease
09/11/1984 M 10249 Married HIV
09/01/1978 M 10242 Single HIV
01/06/1959 F 10242 Married Obesity
01/23/1954 M 10249 Single Hypertension
03/15/1953 F 10212 Divorced Hypertension
03/30/1938 M 10249 Single Obesity
09/18/1935 F 10212 Divorced Obesity
03/15/1933 F 10252 Divorced HIV

If in the past limited interconnectivity and limited computational power represented a form of protection against inference processes over large amounts of data, today, with the advent of the Internet, such an assumption no longer holds. Information technology in fact gives organizations the power to gather and manage vast amounts of personal information.

To address the problem of protecting anonymity while releasing microdata, the concept of k-anonymity has been defined. K-anonymity means that the observed data cannot be related to fewer than k respondents.56 Key to achieving k-anonymity is the identification of a quasi-identifier, which is the set of attributes in a dataset that can be linked with external information to reidentify the data owner. It follows that for each release of data, every combination of values of the quasi-identifier must be indistinctly matched to at least k tuples.

Two approaches to achieve k-anonymity have been adopted: generalization and suppression. These approaches share the important feature that the truthfulness of the information is preserved, that is, no false information is released.

In more detail, the generalization process generalizes some of the values stored in the table. For instance, considering the ZIP code attribute in Table 28.2B and supposing for simplicity that it represents a quasi-identifier, the ZIP code can be generalized by dropping, at each step of generalization, the least significant digit. As another example, the date of birth can be generalized by first removing the day, then the month, and eventually by generalizing the year.

On the contrary, the suppression process removes some tuples from the table. Again, considering Table 28.2B, the ZIP codes, and a k-anonymity requirement for k=2, it is clear that all tuples already satisfy the k=2 requirement except for the last one. In this case, to preserve the k=2, the last tuple could be suppressed.

Research on k-anonymity has been particularly rich in recent years. Samarati56 presented an algorithm based on generalization hierarchies and suppression that calculates the minimal generalization. The algorithm relies on a binary search on the domain generalization hierarchy to avoid an exhaustive visit of the whole generalization space. Bayardo and Agrawal57 developed an optimal bottom-up algorithm that starts from a fully generalized table (with all tuples equal) and then specializes the dataset into a minimal k-anonymous table. LeFevre et al.58 are the authors of Incognito, a framework for providing k-minimal generalization. Their algorithm is based on a bottom-up aggregation along dimensional hierarchies and a priori aggregate computation. The same authors59 also introduced Mondrian k-anonymity, which models the tuples as points in d-dimensional spaces and applies a generalization process that consists of finding the minimal multidimensional partitioning that satisfy the k preference.

Although there are advantages of k-anonymity for protecting respondents’ privacy, some weaknesses have been demonstrated. Machanavajjhala et al.60 identified two successful attacks to k-anonymous table: the homogeneity attack and the background knowledge attack. To explain the homogeneity attack, suppose that a k-anonymous table contains a single sensitive attribute. Suppose also that all tuples with a given quasi-identifier value have the same value for that sensitive attribute, too. As a consequence, if the attacker knows the quasi-identifier value of a respondent, the attacker is able to learn the value of the sensitive attribute associated with the respondent. For instance, consider the 2-anonymous table shown in Table 28.3 and assume that an attacker knows that Alice is born in 1966 and lives in the 10212 ZIP code. Since all tuples with quasi-identifier <1966,F,10212> suffer anorexia, the attacker can infer that Alice suffers anorexia. Focusing on the background knowledge attack, the attacker exploits some a priori knowledge to infer some personal information. For instance, suppose that an attacker knows that Bob has quasi-identifier <1984,M,10249> and that Bob is overweight. In this case, from Table 28.3, the attacker can infer that Bob suffers from HIV.

Table 28.3. An example of a 2-Anonymous table

Year of BirthSexZIPDisease
1984 M 10249 HIV
1984 M 10249 Anorexia
1984 M 10249 HIV
1966 F 10212 Anorexia
1966 F 10212 Anorexia

To neutralize these attacks, the concept of l-diversity has been introduced.60 In particular, a cluster of tuples with the same quasi-identifier is said to be l-diverse if it contains at least l different values for the sensitive attribute (disease, in the example in Table 28.3). If a k-anonymous table is l-diverse, the homogeneity attack is ineffective, since each block of tuples has at least l>=2 distinct values for the sensitive attribute. Also, the background knowledge attack becomes more complex as l increases.

Although l-diversity protects data against attribute disclosure, it leaves space for more sophisticated attacks based on the distribution of values inside clusters of tuples with the same quasi-identifier.61 To prevent this kind of attack, the t-closeness requirement has been defined. In particular, a cluster of tuples with the same quasi-identifier is said to satisfy t-closeness if the distance between the probabilistic distribution of the sensitive attribute in the cluster and the one in the original table is lower than t. A table satisfies t-closeness if all its clusters satisfy t-closeness.

In the next section, where the problem of location privacy protection is analyzed, we also discuss how the location privacy protection problem has adapted the k-anonymity principle to a pervasive and distributed scenario, where users move on the field carrying a mobile device.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123743541000285

What rule specifies that an entity can be a member of only one subtype at a time?

The disjoint rule specifies that if an entity instance of the supertype is a member of one subtype, it MUST simultaneously be a member of another subtype.

When a supertype entity instance can be only one of its subtypes at a time the subtypes are called?

Disjoint subtypes, also known as nonoverlapping subtypes, are subtypes that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes.

Is a generic entity type that has a relationship with one or more subtypes?

A supertype is a generic entity type that has a relationship with one or more subtypes. A subtype is a sub-grouping of the entities in an entity type that is meaningful to the organization and that shares common attributes or relationships distinct from other subgroups.

Does a member of a subtype have to be the member of a supertype?

A member of a subtype does NOT necessarily have to be a member of the supertype. An entity cluster can have a relationship with another entity cluster much the same way that an entity can have a relationship with another entity.