# Handling Imprecise Data In Information Systems Cultural Studies Essay

Asma Zoghlami1, Cyril de Runz2

1University of Paris 8, 2 University of Reims

1LIASD laboratory, 2CReSTIC

1Saint Denis, 2Reims France

zoghlami@ai.univ-paris8.fr, cyril.de-runz@univ-reims.fr

Herman Akdag

University of Paris 8

LIASD laboratory

Saint Denis, France

Herman.Akdag@ai.univ-paris8.fr

Abstract— In the literature, several studies have focused on introducing fuzzy extensions to the relational database models and to the object database models in order to store the imprecision. Among these models there are the fuzzy EER model and the fuzzy UML model both applied for fuzzy object-oriented database modelling and the fuzzy ER model adapted to applications in fuzzy relational database models. All these previous fuzzy conceptual models are not adapted to fuzzy spatiotemporal data. In this paper, we propose an approach for modeling imprecise data in object and relational databases based on the representation of data using connected and normalized fuzzy sets stored via α-cuts. The approach is applied to Geographic Information Systems in order to handle imprecise spatiotemporal data.

Keywords—Imprecise data, fuzzy set, Geographical Information System, Spatiotemporal data, UML

## Introduction

The representation of imperfect data and its exploitation in Information Systems represent a major theme of the artificial intelligence domain. Thus, several studies focused on proposing new data models to store the imperfect data and to establish fuzzy queries that consider the imperfection in the databases.

Among existing tools for modeling Information Systems, UML is considered as a standard. However, the data represented in UML models were far from reflecting the real world situations due to uncertainty, imprecision, etc. To respond to this new requirement, an extension called fuzzy UML based on the fuzzy sets, has been introduced in [1] in order to enable the conceptual modeling of imprecise data. Thus, different levels of imprecision have been mainly introduced in the UML class model. However, fuzzy UML does not consider imprecise spatial and temporal data. Therefore, it cannot be applied in Geographic Information Systems that are particularly based on spatial and temporal data.

In this article, we propose an approach based on connected and normalized fuzzy sets stored via α-cuts. It aims to handle imprecise data in relational and object oriented databases. The approach is then applied in a Geographic Information System in order to implement a fuzzy spatiotemporal database storing imprecise data. It is based on the F-Perceptory approach presented in [2]. F-Perceptory is an extension to handle fuzziness in the Perceptory data model. The latter enriches the UML model to support the modeling of space and time through the PictograF language formerly called PVL (Plug-in for Visual Language).

This article is structured as follows. Section 2 defines the different basic nature of data imperfection and presents the fuzzy set theory. Section 3 highlights the approach that we propose to model imprecise information in an object database view, then in a relational database view. Our approach makes the distinction between the fuzzy data case and the possibilistic data case. Thus, this section identifies the main constraints implemented in both cases. Section 4 presents an application of our approach in the geographic information field aiming to handle imprecise spatiotemporal data with the fuzzy Perceptory model (F-Perceptory). Section 5 exposes a case study dedicated to construct a fuzzy Geographical Information System for the representation and the analysis of archaeological data. Section 6 establishes a discussion in which we make first a comparative study between the main fuzzy conceptual models and our model and then a second comparative study between the basic spatiotemporal conceptual data models and F-Perceptory. The last section (7) is devoted to the conclusion of this work.

## Data imprecision and imperfection

## Nature of imperfection

Human reasoning as well as information resource is often imperfect. The terms commonly used to describe imperfect information are incomplete, uncertain, imprecise, fuzzy etc. Fig 1 illustrates the three main types of imperfection as presented in [3].

Example of the main imperfection types

The imprecision is a difficulty in a statement due to unknowing the exact data or to the fact that the natural language terms used to describe a system characterize it in a vague way. The following statement illustrates an example of imprecision: "The residential building is about 30 m high". In this case, the height of the building can take one of the values: 31, 32, 29, 28, etc. Thus, the possible values are a priori in the interval [25, 35].

The uncertainty concerns a doubt about the validity of the knowledge. It is due to the reliability of the observer who is unsure or careful, so he cannot determine the truth value of the knowledge. We can illustrate the concept of uncertainty through the following example: "What we see seems to be a residential building". In this case, it may be a residential building as it cannot be.

The incompleteness is a lack of knowledge or a partial knowledge of some system specifications.

There are various more detailed taxonomies of the imperfection types. The most used in the geographic information community is the one introduced in ([4,5]). To denote the imperfection, Fisher uses the term uncertainty as a global concept which includes all the other concepts. He considers that the principal factor of uncertainty is the real world abstraction process mainly through the definition of classes and the assignment of an object to a class. As shown in Fig 2, the modeling of imperfect data may be done using a lot of theories (probabilities, possibilities, fuzzy sets, etc.). All of those theories use a paradigm of the attribution of weights (between [0,1]) to each element of the studied domain.

The uncertainty model according to (Fisher, Comber, & Wadsworth, 2005)

## Dealing with imprecision: the fuzzy set theory

Imprecision should be considered in the modeling of the information. As the sorites paradox makes evident that probabilities are not adapted to imprecision, Zadeh introduced in [6] the fuzzy set theory. Indeed, the fuzzy set theory defines the notion of partial and valued membership of a value to a class. A fuzzy set A is characterized by a membership µA function taken values in [0,1]. For each domain value x, a membership degree µA(x), defined in [0,1], is proposed. Therefore, concepts like young, old, etc. may be easily modeled by fuzzy sets.

An α-cut Aα, for all α > 0, is the set of the domain values (the set of x) having a membership degree higher or equal to α (µA(x) ≥ α). By convention, A0 is the set of x such as µA(x) > 0.

A fuzzy set A is connected if and only if for all α in [0; 1] Aα is connected. Aα is connected if for all nonempty sets B and C such as Aα is their union, there exists at least one point of B adhering to C or one point of C adhering to B. On R, Aα is connected if and only if it is an interval.

## Modeling imprecise information

## Modeling fuzzy data

In this article, we propose an approach based on the use of a set of connected α-cuts in order to model and store the imprecision in both object databases and relational databases. The modeling approach that we propose allows us to reduce the cost and the complexity of storage, to maintain the possibilities of exploitation and to keep at the same time a global view of the fuzzy set. The notion of connection is particularly useful in applications dedicated to classification, optimization, etc.

In our case, the use of connected α-cuts allows us to store different values of the imprecise data in the form of a multivalued set. Their use enables to draw the boundaries between a very low confidence membership (the 0-cut), a rather low confidence membership, a moderately low confidence membership, a low confidence membership, etc, which may also be interpreted as a range of values between almost impossible and very possible (Fig 3).

Interpretation examples on connected α-cuts

## Object view

We can model the fuzzy object according to the representation proposed in Fig 4. In this figure, the fuzzy object is composed of n objects belonging to the class "Fuzzy object imperfection", where n is the chosen number of α-cuts and the "Fuzzy object imperfection" class is devoted to store the fuzzy information as a multivalent set of values. In fact, each fuzzy set referenced in the "Fuzzy object imperfection" class is characterized by its identifier (fuzzy set-id), the different values of its n α-cuts (α-value), the minimum value and the maximum value of each α-cut (min-level and max-level).

A fuzzy object representation in an object database view

In this case, we check three constraints. The first constraint is to ensure that the value of min-level is always lower than or equal to the value of max-level. The second constraint consists in verifying that the set of α-cuts form a connected and normalized fuzzy set. The third constraint is to ensure that a fuzzy object is composed of exactly n objects belonging to the class "Fuzzy object imperfection".

## Relational database view

In a relational database, the fuzzy object class is transformed to the table "fuzzy_object". This table is connected to the fuzzy object imperfection table, which stores the fuzzy information as a multivalent set of values, through the foreign key fuzzy set-id.

A fuzzy object representation in a relational database view

The constraint related to the connection verification of the fuzzy sets and which verifies the structure of the fuzzy sets in the table "fuzzy_object_imperfection" is expressed in the PL/pgSQL language in Fig 6.

Trigger for the verification of the fuzzy set structure of a fuzzy object

## Modeling possibilistic data

## Object database view

The possibilistic object modeling is illustrated in Fig 7. According to this model, the possibilistic object is composed of at least one object of the class "Fuzzy object imperfection". Thus, the possible object has one or several possible hypothesis with different possibility degrees.

In this case, we check three constraints. The first constraint is to ensure that the value of min-level is always less than or equal to the value of max-level in the class "Fuzzy_object _imperfection". The second constraint consists of verifying that the sum of the possibility degrees assigned to each possibilistic object is lower than or equal to 1. The third constraint is to ensure that a possibilistic object is composed of at least one object belonging to the "Fuzzy object imperfection" class.

A possibilistic object representation in an object database view

## Relational database view

The possibilistic object representation is a particular case of the fuzzy object representation. In fact, the possibility degrees of the possibilistic object correspond to one or several tuples of the table "fuzzy_object_imperfection". Thus, the possibilistic object class is transformed at the logical data model to the table "possibilistic object". This last one is connected to the fuzzy object imperfection table through the foreign keys fuzzy set-id and α-value.

A possibilistic object representation in a relational database view

The constraint that checks the sum of the possibility degrees of each possibilistic object is expressed in PL/pgSQL in Fig 9.

Trigger for the verification of the sum of the possibility degrees

## Handling imprecise spatiotemporal information with F-Perceptory

## Main spatiotemporal kind of information

The geographic information field proposed several methods for the design of spatiotemporal information systems. Some of these methods result from the adaptation of non-specific methods by the spatialization and the temporalization of conceptual models, like the Perceptory model which is a spatiotemporal extension of the UML data model [7]. Unlike these last methods, some specific methods have their own tools for the design of Geographic Information Systems. The most known among them are MADS [8] and POLLEN [9].

In this article, we are particularly interested in the Perceptory method which extends the UML meta-model by spatial and temporal stereotypes. The stereotypes allow enriching the UML data model by creating new spatial and temporal modeling elements and by assigning to them particular graphic representation called pictograms.

## Spatial data

The main pictograms used to model the spatial dimension of geographic entities are presented in table 1. In this table we have three simple geometries. The first one is a point geometry that represents the zero dimensional objects. For example, a building is represented by a point geometry on a map if its size is smaller than 500 m². The second one represents the one dimensional objects. That is the case of a road which may have a linear geometry on a small scale. The third one is a polygon geometry that represents the two dimensional objects. For example, a building is represented by a polygon geometry on a map if its size is bigger than 500 m².

Main spatial pictograms in Perceptory

## Temporal data

Temporal modeling relies on two fundamental concepts: the existence and the evolution. The existence of an object corresponds to its period of life which begins at its appearance date and ends at its disappearance date. The evolution characterizes the various state changes of an object during its life. The pictograms used to model the temporal dimension of geographical entities are presented in table 2.

Temporal pictograms

## Imprecise Spatial data constraints

We distinguish two types of spatial imprecision. The first type is the fuzzy geometry that includes the forms fuzzy polygon, fuzzy line and fuzzy point. This type corresponds to spatial objects that we can not accurately determine their boundaries. It is represented by enclosing the Perceptory’s spatial pictograms by a rectangular outline with dashed lines. The second type of the spatial imprecision includes the valued geometries which are geometries that are associated to a degree of possibility d. Thus, we have the polygon shape associated to a degree of possibility d ("polygon with d "), the line shape associated to a degree of possibility d ("line with d ") and the point shape associated to a degree of possibility d ("point with d "). The hierarchy of the spatial imprecision is illustrated in Fig 10.

Spatial imprecision in F-Perceptory

A set of spatial integrity constraints has to be checked in order to ensure the data consistency. We distinguish two types of spatial constraints: the constraints on fuzzy spatial data and the constraints on possibilistic spatial data.

We associate to each spatial imprecision type its equivalent representation in UML. In this UML representation, each spatial class will be connected to the class "Shape imperfection" that has a geometric type attribute (Geom) and a degree of membership associated to it. The navigation from the spatial class to the class "Shape imperfection" is provided by the role geometries. Inversely, the navigation from the class "Shape imperfection" to the spatial class is provided by the role spatial object.

## Fuzzy spatial data

## The fuzzy geometry is related to geometric shapes that we can not accurately determine their boundaries. Fig 11 presents a representation example of a fuzzy spatial object as well as the different interpretations that we can make on it through a set of connected α-cuts.

Interpretation examples on connected α-cuts for a fuzzy spatial object

## Fuzzy polygon constraints

Three main constraints have to be respected in the fuzzy polygon model. The first constraint is to verify that the α-cuts form a connected and normalized fuzzy set which means that:

Whatever the geometry G1 with a degree d1, all the geometries concerning our fuzzy set and having a degree higher than d1 are included in G1.

The geometric shapes are connected.

The maximum degree is equal to 1.

The second constraint ensures that each spatial object of the class "Sc-polygon" is composed of n geometries. The last constraint is to check that the attribute "geom" is of type polygon.

Example of constraints applied to fuzzy spatial data

Fig 13 illustrates an example of the connection constraint verification on the fuzzy polygon geometry.

Trigger for the verification of the topological relation "contains" on fuzzy polgon geometries

## Fuzzy line constraints

Like in the case of fuzzy polygon, three main constraints have to be checked. The first constraint is to verify that the α-cuts form a connected and normalized fuzzy set. The second constraint ensures that each spatial object of the fuzzy line class is composed of n geometries. The last constraint consists in checking that the geometry type of the attribute "geom" is a line when the degree is equal to 1.

## Fuzzy point constraints

In this case, we maintain the first constraint applied to the fuzzy polygon and to the fuzzy line which is to verify that the α-cuts form a connected and normalized fuzzy set. In the second constraint, we check that each spatial object of the fuzzy point class is composed of n geometries. The third constraint consists in verifying that the geometry type is a point when the degree is equal to 1.

## Possibilistic spatial data

## Valued polygon constraints

As illustrated in Fig 14, in the case of the valued polygon, a spatial object is composed of one or many geometries with varying degrees of possibility. Thus, we need first to check that the sum of the possibility degrees of these different geometries is lower than or equal to 1. Second, a spatial object has to be composed of at least one geometry and of at most n geometries. Finally, the geometry type is always a polygon.

Example of constraints applied to possibilistic spatial data

Fig 15 illustrates an example of trigger expressed in the Pl/pgSQL language that checks the sum of the possibility degrees assigned to each valued polygon Pi of the table Sc-polygon so that: .

Trigger for the verification of the sum of the possibilty degrees assigned to a valued polygon

## Valued line constraints

We need first to check that a spatial object of the valued line class is composed of at least one geometry. Second, we need to verify that the sum of the possibility degrees of the different geometries is lower than or equal to 1. Finally, the geometry type must be always a line.

## Valued point constraints

The valued point constraints are established following the same principle of the valued polygon and the valued line. Thus, the first one consists of verifying that the geometry type is a point. The second constraint concerns the verification of the sum of the possibility degrees which must be lower than or equal to 1. In the last constraint, a spatial object must be composed of at least one geometry.

## Imprecise Temporal data constraints

As in the spatial imprecision, we distinguish two types of imprecision in the temporal case. The first type is a fuzzy timestamp that takes the form of a fuzzy period or a fuzzy date. This kind of imprecision is represented by enclosing the Perceptory’s classic time pictograms by a rectangular outline with dashed lines. The second type corresponds to a valued timestamp that associates to the temporality a value d indicating a degree of possibility. Thus, we have the time period associated to a degree of possibility ("period with d") and the date associated to a degree of possibility ("date with d "). These temporal imprecision types are illustrated in the Fig 16.

Temporal imprecision in F-Perceptory

A set of temporal integrity constraints has to be checked in order to ensure the data consistency. We distinguish two types of temporal constraints: the constraints on fuzzy temporal data and the constraints on possibilistic temporal data.

We associate to each temporal imprecision type its equivalent representation in UML. In this representation, each temporal class will be connected to the class "Temporal imperfection". The navigation from the temporal class to the class "Temporal imperfection" is provided by the role date or period depending on whether the temporal class is respectively related to a fuzzy date or a fuzzy period. Inversely, the navigation from the class "Temporal imperfection" to the temporal class is provided by the role temporal object.

## Fuzzy temporal data

## Fuzzy date constraints

Four main constraints have to be checked. The first constraint is related to the verification of the consistency of dates. Thus, the minimum date must be less than or equal to the maximum date in the class "Temporal imperfection." The second constraint consists in ensuring that the fuzzy sets represented in the class "Temporal imperfection" and referring to the dates are connected and normalized. The third constraint checks that a temporal object is composed of n dates representing a specific date with a multivalent representation, where n is the number of α-cuts on the fuzzy set. In the fourth constraint, the values min-date and max-date have to be equal in case alpha is equal to 1.

Example of constraints applied to fuzzy temporal data

## Fuzzy period constraints

Three constraints have to be checked. The first constraint is the verification of the consistency of dates. The second constraint consists in ensuring the connection and the normalization of fuzzy sets that represent the different periods in the fuzzy period class. The third constraint is to verify that a temporal object is composed of n periods representing the same period with a multivalent representation (n = number of α-cuts).

## Possibilistic temporal data

## Valued date constraints

In the case of the valued date, a temporal object is composed of at least one date, and of at most n dates. Thus, it has one or several dates with different possibility degrees. In this case we must check that the sum of the possibility degrees of the different dates is lower than or equal to 1.

Example of constraints applied to possibilistic temporal data

## Valued period constraints

Following the same principle of the valued date, we should first check that the sum of the possibility degrees of the different periods is lower than or equal to 1. Second, a temporal object is composed of at least one period and at most n periods. Finally, an object of the class "Temporal imperfection" must have permanently min-date <= max-date.

## Application on the construction of a spatiotemporal information system devoted to archaeological data

By querying about the past, archaeological information is by essence imperfect and its quality should be taken into consideration from the information system modeling to the analysis phase. Thus, data imperfection may be identified, characterized, memorized and queryable into an archaeological GIS. In the following section, we present an application that handles imprecise data in an archaeological information system.

## F-GISSAR: a fuzzy geographical information system for the repesentation and the analysis of archaelogical data

Handling urban archaeological data is a main issue in order to understand the past and to restitute this knowledge to citizen. In Europe and particularly in Reims (France), there were many invasions, wars and thus many destruction/construction processes. Thus, the storage and the visualization of archaeological data are essential.

GISSAR is a spatiotemporal database devoted to store archaeological data related to the city of Reims. Urban excavation data in this database are considered according to the triplet time-space-function as detailed in [10]. In this triplet, there are generally seven spatial scales from the stratigraphic units to the urban areas. Time is an integrate component of excavation objects, and it is generally represented by a period of time. The function is, as in classic GIS, a part of the semantic information. The descriptive component is also formed by information about materials, types of structure, etc. Fig 19 presents a global view of the GISSAR class diagram describing the structure of the excavation geographical information system.

A global view of the GISSAR class diagram

In the GISSAR data model, we distinguish different levels of imprecision. The first level concerns the imprecision on the descriptive characteristics (dimension, composition, etc.). In fact, we describe the dimension by fuzzy predicates such as thick, high, long, etc. For the documentation, we are in front of a reliability problem. For instance, the confidence we have in the document should be considered in terms of document originality, content or author’ relevance.

The second level of imprecision is related to time. Temporal features of archaeological entities correspond to time periods where the considered objects were active. This dating presents a lack of precision, since we cannot precisely identify the two terminals of the time interval.

The last level is related to space, namely the geometry shape of space objects that may have fuzzy boundaries and also the imprecision of their georeferencing.

To handle the first level of imprecision the keyword FUZZY is introduced and placed in the Dimension class in front of the imperfect attributes such as length, width, height and thickness. Imprecise spatial and temporal information are modeled using F-Perceptory. The fuzzy boundaries of the archaeological sites and the archaeological entities are considered as fuzzy polygons. The artifacts and the documentations are considered as fuzzy points as well as the geolocation imprecision. Fig 20 presents an extract from the F-GISSAR data model highlighting the three levels of imperfection in the GISSAR model.

Extract from the F-GISSAR model

## Operating example of F-GISSAR: querying imprecise spatiotemporal data

We consider the request aiming to find the entities that satisfy the following condition:

Their activity period is the 2nd Century (with at least a degree of 0.4)

Their shape belongs to the site "PC 87"

The final degree must be at least equal to 0.8.

This request corresponds to an α-cut with α equal to 0.8 and could be expressed as follows:

(ActivityPeriod(x) ~ 2nd Century AND Shape(x) ~ PC 87) >= 0.8.

Using the Zadeh t-norm, this implies that:

Min (ActivityPeriod(x) ~ 2nd Century, Shape(x) ~ PC 87) >=0.8.

Then:

ActivityPeriod(x) ~ 2nd Century >= 0.8 AND Shape(x) ~ PC 87 >=0.8.

Fig 21 illustrates an example of the query returning the entities having an activity period in the 2nd Century. Fig 22 shows the result of the query returning the entities that belong to the site PC 87. According to this example the entity having the identifier 356 is the only one that satisfies the two conditions.

Database extraction of entities with temporal imperfection

Query with spatial imprecision

The visualization of the query combining the spatial and the temporal imperfection is illustrated in Fig 23.

Visualization of entities that have an activity "Middle of the 2nd Century" and that belong to site PC 87

## Discussion

In the literature, several studies have focused on introducing fuzzy extensions on relational database models and on object oriented database models in order to store imprecision. We can classify these studies into two different groups.

The first group includes works that are interested in establishing fuzzy queries taking into consideration the imprecision in the database. In this case, we mainly can mention the work of [11] and [12].

The second group includes works that are interested in proposing new data models to store imprecise information in relational and object databases.

In Table 3, we propose a comparative study between our model and the main fuzzy conceptual database models. The first model is the fuzzy ER model in which the authors in ([13], [14], [15], [16]) have proposed extension on the ER model to represent fuzzy entities, fuzzy relations, etc. The second model is the fuzzy EER model which extends the EER models in order to represent fuzzy attributes, fuzzy classes, etc as presented in ([17], [18], [19]). The third model is the fuzzy UML data model introduced in [20]. According to this table, the fuzzy ER model is adapted to applications in fuzzy relational database models. The fuzzy EER model and the fuzzy UML model are applied to model fuzzy object oriented databases. However, these fuzzy data models are not adequate to represent and store fuzzy spatiotemporal data. By contrast, our model is adapted to handle imprecise spatiotemporal in both relational databases and object oriented databases.

Design of fuzzy databases through fuzzy conceptual models: A comparative study

We propose in table 4 a comparison between the spatiotemporal modelling methods (MADS and Perceptory) and our approach F-Perceptory.

The fuzzy extension of the MADS method introduced in [21] has clearly defined the concepts of fuzzy spatiality and fuzzy temporality. However, the reflection generated on these two concepts does not exceed the theoretical framework. In fact, there is currently no physical response implementing these concepts. The concepts of fuzzy spatiality and fuzzy temporality were also introduced in [22]. These concepts enrich the capacity for the pictographic expression of the Geographic Information Systems, notably the Perceptory data model. However, in this approach, the author has not considered the implications on the database and on its exploitation through queries that consider the data imperfection.

A comparison between the main spatiotemporal modeling methods and our approcach F-Perceptory

## conclusion

In this article, we first introduce an approach adapted to handle fuzzy and possibilistic data in relational and object oriented databases. The approach enables to handle two forms of imprecison by assigning them a multivalent representation using the α-cuts. Then, we present an application example of the approach aiming to represent and to store imprecise spatiotemporal data in a Geographic Information System.

In the last section, we propose a comparative study that highlights our contribution compared to the other fuzzy conceptual models and to the other main spatiotemporal modelling methods. Our approach is distinguished by considering the particularity of spatiotemporal data in fuzzy conceptual data models, and by introducing and implementing a set of spatial and temporal constraints required to ensure the data consistency in fuzzy relational and object oriented databases.