A survey on spatial, temporal, and spatio-Temporal database research and an original example of relevant applications using SQL ecosystem and deep learning

ABSTRACT Spatio-temporal data serves as a foundation for most location-based applications nowadays. To handle spatio-temporal data, an appropriate methodology needs to be properly followed, in which space and time dimensions of data must be taken into account ‘altogether’ – unlike spatial (or temporal) data management tools which consider space (or time) separately and assumes no dependency on one another. In this paper, we conducted a survey on spatial, temporal, and spatio-temporal database research. Additionally, to use an original example to illustrate how today’s technologies can be used to handle spatio-temporal data and applications, we categorize the current technologies into two groups: (1) traditional, mainstay tools (e.g. SQL ecosystem) and (2) emerging, data-intensive tools (e.g. deep learning). Specifically, in the first group, we use our spatio-temporal application based on SQL system, ‘hydrological rainstorm analysis’, as an original example showing how analysis and mining tasks can be performed on the conceptual storm stored in a spatio-temporal RDB. In the second group, we use our spatio-temporal application based on deep learning, ‘users’ future locations prediction based on historical trajectory GPS data using hyper optimized ANNs and LSTMs’, as an original example showing how deep learning models can be applied to spatio-temporal data.

pdf37 trang | Chia sẻ: thanhle95 | Lượt xem: 497 | Lượt tải: 1download
Bạn đang xem trước 20 trang tài liệu A survey on spatial, temporal, and spatio-Temporal database research and an original example of relevant applications using SQL ecosystem and deep learning, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tjit20 Journal of Information and Telecommunication ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjit20 A survey on spatial, temporal, and spatio-temporal database research and an original example of relevant applications using SQL ecosystem and deep learning Kulsawasd Jitkajornwanich , Neelabh Pant , Mohammadhani Fouladgar & Ramez Elmasri To cite this article: Kulsawasd Jitkajornwanich , Neelabh Pant , Mohammadhani Fouladgar & Ramez Elmasri (2020) A survey on spatial, temporal, and spatio-temporal database research and an original example of relevant applications using SQL ecosystem and deep learning, Journal of Information and Telecommunication, 4:4, 524-559, DOI: 10.1080/24751839.2020.1774153 To link to this article: https://doi.org/10.1080/24751839.2020.1774153 © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group Published online: 17 Sep 2020. Submit your article to this journal Article views: 832 View related articles View Crossmark data A survey on spatial, temporal, and spatio-temporal database research and an original example of relevant applications using SQL ecosystem and deep learning Kulsawasd Jitkajornwanich a, Neelabh Pantb, Mohammadhani Fouladgarb and Ramez Elmasrib aData Science and Computational Intelligence (DSCI) Laboratory, Department of Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Ladkrabang, Bangkok, Thailand; bDepartment of Computer Science and Engineering, College of Engineering, The University of Texas at Arlington, Arlington, TX, USA ABSTRACT Spatio-temporal data serves as a foundation for most location-based applications nowadays. To handle spatio-temporal data, an appropriate methodology needs to be properly followed, in which space and time dimensions of data must be taken into account ‘altogether’ – unlike spatial (or temporal) data management tools which consider space (or time) separately and assumes no dependency on one another. In this paper, we conducted a survey on spatial, temporal, and spatio-temporal database research. Additionally, to use an original example to illustrate how today’s technologies can be used to handle spatio-temporal data and applications, we categorize the current technologies into two groups: (1) traditional, mainstay tools (e.g. SQL ecosystem) and (2) emerging, data-intensive tools (e.g. deep learning). Specifically, in the first group, we use our spatio-temporal application based on SQL system, ‘hydrological rainstorm analysis’, as an original example showing how analysis and mining tasks can be performed on the conceptual storm stored in a spatio-temporal RDB. In the second group, we use our spatio-temporal application based on deep learning, ‘users’ future locations prediction based on historical trajectory GPS data using hyper optimized ANNs and LSTMs’, as an original example showing how deep learning models can be applied to spatio-temporal data. ARTICLE HISTORY Received 30 June 2018 Accepted 22 May 2020 KEYWORDS Survey; spatio-temporal database; deep learning; SQL; rainfall analysis 1. Introduction The global positioning system (or GPS) is one of the most utilized technologies these days. By using GPS, moving objects can be tracked in relation to other fixed objects such as buildings, rivers, highways, or someotherpolygonson themap. Identifying thebest routes toadestination or checking the traffic condition can also be done with GPS. The specialized spatial(-temporal) © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http:// creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. CONTACT Kulsawasd Jitkajornwanich kulsawasd.ji@kmitl.ac.th Data Science and Computational Intelligence (DSCI) Laboratory, Department of Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Chalongkrung Rd, Ladkrabang, Bangkok 10520, Thailand JOURNAL OF INFORMATION AND TELECOMMUNICATION 2020, VOL. 4, NO. 4, 524–559 https://doi.org/10.1080/24751839.2020.1774153 database is used to store and query from in response to user queries about the (geo)spatial objects in reality, e.g. moving cars (as moving objects) and roads, cities (as station objects). All real-world objects have both spatial and temporal aspects, making them complex and multi- dimensional, which cannot be efficiently managed and supported by traditional databases. These specialized databases are preferred over RDBMS when storing and processing complex real-world objectswhich have spatial and temporal dimensions.Many applications that are location-based utilize spatio-temporal databases somewhere in the components such as geographic information systems (GIS), environmental monitoring and assessment, decision support systems, route navigation systems and transportation scheduling. Geospa- tio-temporal data is composed of spatial concepts and temporal concepts together, so in the first part, our contributions include a concept survey on spatial databases and temporal databases in the following topics, i.e. datamodels, query languages, indexing structures and ontology formalizations. A survey on spatio-temporal database research is then discussed, consisting of spatio-temporal modelling, query languages, and operators; spatio-temporal data warehousing, and concept hierarchies; and spatio-temporal indexing strategies. Because a large number of applications that utilize this type of data has been constantly developed – although not many survey papers show how current, available tools are of use or relevant to (Arora, 2015; Faisal et al., 2017; Gómez et al., 2009; 2016; Radhakrishna et al., 2015) – the second part of the contributions showed our original sample applications based on spatio-temporal data using mainstay/emerging technologies. The paper is organized as follows. Section 2 builds on our conference paper to give a survey of spatial, temporal, spatio-temporal database research. Section 3 illustrates with an original example of how today’s technologies can be adopted to utilize spatio-temporal data, which were divided into two subsections: (i) spatio-temporal application using SQL system and (ii) spatio-temporal application using deep learning. Finally, we conclude our survey paper in Section 4. 2. Survey of spatial, temporal, spatio-temporal database research In this section, we first overview spatial and temporal database concepts and review their respective research studies including spatial/temporal data types, query languages, index- ing strategies and ontological concepts. We then survey spatio-temporal database research studies in the following domains: (1) spatio-temporal modelling, query languages, and operators; (2) multi-variable, multi-dimensional data, and concept hierar- chies; and (3) spatio-temporal indexing. 2.1. Spatial database research Real-world spatial objects are stored and managed by a spatial database management system (SDBMS) where spatial capabilities including spatial data models and operations are provided. The spatial(-temporal) objects are not limited to 2D or geographic/geome- try data. They can also be 3D (or higher) data and are in other domains such as a 3D scan of the brain of a patient or the composition of molecular proteins in humans (Shashi & Sanjay, 2003). JOURNAL OF INFORMATION AND TELECOMMUNICATION 525 2.1.1. Spatial data models Spatial data refers to data depicting the referenced location of the actual entities in the real world. The spatial data will have both spatial and non-spatial aspects such as name of a river, origin of the river, etc. (in case of spatial object: river). There are two main types of spatial data model: object (or vector) model and field (or raster) model. Two types of spatial models are used: object model and field model. Spatial data that is distinct, and has tangible characteristics with concrete location, is usually modelled by object model. Coordinate systems are used to frame the shapes of geometry/geographic objects. Most commonly used geometry/geographic objects are points, lines, polygons, and other shapes. Point is usually defined by longitude (x) and latitude (y) whereas a line is defined by any two points at minimum. Polygon is defined as a sequence of minimum three points, which in turn defines a sequence of lines where each line does not intersect one another and the area within is considered as part of the polygon. Spatial data with location but do not have identifiable shapes are represented by field models. Each grid point/cell of a raster is functionally mapped to a specific value. The globe is perceived as a continuous surface in which functions are used to represent spatial features (Shekhar et al., 1999). An example of how spatial data can be represented differently using these two models as well as their respective operators that are supported w.r.t. OGC (Open Geospatial Consortium) is shown in Figure 1 (Pant et al., 2018). 2.1.2. Spatial query languages To query spatial data, an extension is developed on top of the traditional relational data- bases. With this extension, the spatial operations and relationships as well as other related functionalities can be used and referred to as spatial query language. As mentioned in Borrmann and Rank (2009), incorporating spatial capabilities into existing SQL systems is preferred over developing a brand new system. We can classify spatial queries into (at least) six groups as follows: (1) point query (e.g. identifying all MBRs covering a given point), (2) range query (e.g. identifying all POIs within a given query polygon), (3) nearest neighbour (e.g. finding the closest point(s) to the query polygon), (4) distance scan (e.g. enumerating all points that fall within the specific radius of the given point along with the distance values), (5) intersection query (e.g. listing all objects overlapping with the query polygon) and (6) containment query (e.g. listing all objects that are con- tained within a query polygon) (Egenhofer, 1994; Gandhi et al., 2007). 2.1.3. Spatial indexing Indexing (or access path) is one of the main mechanisms that help speed up accessing and retrieving objects from the data storage. To access and retrieve spatial data from a database, however, traditional 2-D indexing strategies are not suitable. Therefore, several spatial index- ing structures were proposed. R-trees (a prevalent spatial indexing structure based on B- trees), and their variations are among the main ones which will discuss them here. (1) R-Tree R-tree is a data structure extended from B-tree, a well-known indexing structure for one- dimension data widely used in relational databases. R-trees are developed to handle 526 K. JITKAJORNWANICH ET AL. multidimensional data objects effectively by utilizing hierarchical structure of overlapped MBRs such that the height of trees are always balanced and the I/O cost is minimized whenever tree operations are called, which is the ultimate goal of physical database design in any spatial databases (Guttman, 1984). (2) R-Tree variations Spatial databases mainly use R-trees to store, manage and retrieve spatial objects. Other multidimensional indexes are also available such as KDB trees and quadtrees. However, in some cases, the R-trees suffer from overlapping of MBRs. So, the spatial database engine ends up going through too many paths in order to retrieve the desired results. Therefore, variations of R-trees are proposed. In R+-trees (Sellis et al., 1987), to reduce the access time, MBRs of internal nodes do not overlap. In R-link trees, nodes are linked together, allowing queries to be performed faster and can also be done concurrently, more efficient than R-trees and R+-trees (Ng & Kameda, 1994). Hilbert R-tree (Kamel & Faloutsos, 1994) use 2D-c method to order MBRs within the nodes according to Hilbert space filling curve based on the centre point of the regular bounding rectangles. Each variation of R-trees has pros and cons. The selection as to which R-tree variation to use depends on the characteristics (such as size, shape, and distribution) of spatial objects as well as the spatial types will be invoked (Neelabh et al., 2016). Figure 1. Example of how spatial data can be represented and operated using spatial data models (Pant et al., 2018). JOURNAL OF INFORMATION AND TELECOMMUNICATION 527 2.1.4. Spatial ontology In any domain, ontology is used to formally define relevant concepts and associated relationships/properties of entities of interest in the real world. Spatial ontology is created to model, reason, facilitate query and exchange spatial data (esp. from the Web) among machines. Two types of spatial ontology research are: (1) spatial data storage consolidation via ontology, and (2) design and development of spatial ontology. In (1), with these spatial ontologies, different spatial databases can be integrated and their data can be exchanged and collaborated (Bennacer et al., 2004). In (2), a spatial ontol- ogy is designed and developed in a domain-specific fashion where a limited collection of related storage in the spatial domain were taken into account (Baglioni et al., 2007), or in a universal fashion where a broad concepts and relevant characteristics of geospatial objects are all considered when defining spatial ontology (Hogenboom et al., 2010; Parent et al., 2006; Spaccapietra et al., 2004). 2.2. Temporal database research Temporal dimension of data and other time-related issues plays a very important role in applications such as accounting and banking. In fact, in many other applications, temporal data are also critical. In temporal databases, time-related component are carefully and sys- tematically recorded and validated, which are very important in time-sensitive appli- cations. In the last two decades, many research studies on temporal databases have been done (Arora, 2015; Jensen, 2000; 2016; Radhakrishna et al., 2015). We will discuss them next. 2.2.1. Temporal data models Time is defined as an ordered list of time points in a particular interval. An origin will be part of all time sequences. Before the origin, the points are negative whereas after the origin, the points are positive. In any application where time is concerned, chronon, which refers to the smallest time granularity, is defined w.r.t. the application requirements and can be varied from application to application. Also note that any activity between two chronons is considered meaningless and will not be kept in the database (Dyreson et al., 1994; Frank, 2003). For example, for banking applications, chronon can be second. The temporal dimension of the data in the database is divided into two different aspects: valid time (VT) and transaction time (TT). These two timestamp concepts are equally important and needed to capture the complete picture of the data from past, present and future. The temporal features are typically added either by: (1) extending the existing RDBMSs or (2) creating a middle layer with the time-related functionalities without making any changes to the operational, underlying databases (Arora, 2015). Although many approaches were proposed in the literature, only some were actually implemented and materialized as prototypes or in the commercial tools (Arora, 2015; Faisal et al., 2017; Radhakrishna et al., 2015) and only VT timestamps were supported – not TT timestamps. 2.2.2. Temporal query languages There are several temporal query languages proposed. In this subsection, only prominent ones are discussed here. A query language called TSQL2 (temporal structure query 528 K. JITKAJORNWANICH ET AL. language), proposed in 1994, to manipulate and query time-related data stored in a rela- tional database. By using TSQL2 temporal data types and time validity concepts can be formally defined (Snodgrass et al., 1994). TQuel (temporal query language) (Snodgrass, 1987), an extension to Quel, is an Ingres query language facilitating the temporal dimen- sion of the data. TQuel was proposed in 1987 and considered to be the precedent of TSQL2. In RDF databases, temporal-enhanced SPARQL called T-SPARQL is proposed to allow time-related logics and functionalities to be done on the database. It is more suitable for temporal RDF database models employing triple time stamping (Grandi, 2010). 2.2.3. Temporal indexing Several accesspath concepts and strategies for indexing temporal databases havebeen intro- duced since 1990. In this subsection, someof themwill be reviewedhere. In general, temporal indexes are created over time intervals and versions using different clustering techniques based on times and key values. Some data replication is allowed for better performance efficiency. Furthermore, utilizing different storage systems and architectures for historical data and current data is another approach (Lomet & Salzberg, 1989) in that a better paralle- lization in operations can be achieved. Time index (Elmasri et al., 1990) is a B+-tree-based indexing method in which each leaf page contains all active versions at the changes. In Becker et al. (1996), Becker proposes multi-version B-tree where an index entry (both key and time) is created whenever a change is made. In Ramaswamy (1997), however, windows on time intervals and B+-tree are used to create temporal indexes. In aggregation indexes, aggregating/grouping queries can be effectively enhanced on temporal databases (Kline & Snodgrass, 1995). A data structure, called aggregation tree, is built for each of the aggregation functions. In Böhlen et al. (2006), an AVL tree is created for each combination of start and end points. First, the start index is traversed and the activated tuples are inserted into the endpoint tree. Then, the expired tuples are removed and the aggregation is returned as a result. To create an index for temporal join, B+-tree index over join attribute and B+-tree index over the time dimension can be combined as a two-level index (Wang et al., 2008). 2.2.4. Temporal ontology Time dimension of data can be realistically modelled by using temporal ontology. Since temporal and spatial ontologies are closely related to each other in that temporal relation- ships can be thought of as one dimension less in spatial relationship. Figure 2 shows a sample operation called Meet(,)=TRUE between two objects (X and Y) in the temporal and spatial concepts where X is an object with transparent end- points and Y is an object with solid endpoints. Meet is a boolean binary relationship between two objects which are considered met when only their boundaries are over- lapped such as ones shown in Figure 2. In Hobbs and Pan (2004), a formal ontology for time is comprehensively defined by Allen where temporal concepts and associated oper- ations are systematically and logically captured (Allen & Kautz, 1987). With the defined temporal ontology, time dimension of data can easily be modelled, retrieved, and reasoned about. A technique called lightweight along with a reasoning module such as Jess are both used to materialize temporal ontology concepts in the ontology manage- ment software (e.g. Protégé) (O’Connor & Das, 2010). JOURNAL OF INFORMATION AND TELECOMMUNICATION 529 2.3. Spatio-temporal database research In this subsection, spatial and temporal aspects of data are combined to better model spatio- temporalpropertiesofdatabases (seeFigure3