ABSTRACT
Spatio-temporal data serves as a foundation for most location-based
applications nowadays. To handle spatio-temporal data, an
appropriate methodology needs to be properly followed, in which
space and time dimensions of data must be taken into account
‘altogether’ – unlike spatial (or temporal) data management tools
which consider space (or time) separately and assumes no
dependency on one another. In this paper, we conducted a
survey on spatial, temporal, and spatio-temporal database
research. Additionally, to use an original example to illustrate how
today’s technologies can be used to handle spatio-temporal data
and applications, we categorize the current technologies into two
groups: (1) traditional, mainstay tools (e.g. SQL ecosystem) and (2)
emerging, data-intensive tools (e.g. deep learning). Specifically, in
the first group, we use our spatio-temporal application based on
SQL system, ‘hydrological rainstorm analysis’, as an original
example showing how analysis and mining tasks can be
performed on the conceptual storm stored in a spatio-temporal
RDB. In the second group, we use our spatio-temporal application
based on deep learning, ‘users’ future locations prediction based
on historical trajectory GPS data using hyper optimized ANNs and
LSTMs’, as an original example showing how deep learning
models can be applied to spatio-temporal data.
37 trang |
Chia sẻ: thanhle95 | Lượt xem: 508 | Lượt tải: 1
Bạn đang xem trước 20 trang tài liệu A survey on spatial, temporal, and spatio-Temporal database research and an original example of relevant applications using SQL ecosystem and deep learning, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tjit20
Journal of Information and Telecommunication
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjit20
A survey on spatial, temporal, and spatio-temporal
database research and an original example of
relevant applications using SQL ecosystem and
deep learning
Kulsawasd Jitkajornwanich , Neelabh Pant , Mohammadhani Fouladgar &
Ramez Elmasri
To cite this article: Kulsawasd Jitkajornwanich , Neelabh Pant , Mohammadhani Fouladgar &
Ramez Elmasri (2020) A survey on spatial, temporal, and spatio-temporal database research and
an original example of relevant applications using SQL ecosystem and deep learning, Journal of
Information and Telecommunication, 4:4, 524-559, DOI: 10.1080/24751839.2020.1774153
To link to this article: https://doi.org/10.1080/24751839.2020.1774153
© 2020 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
Published online: 17 Sep 2020.
Submit your article to this journal Article views: 832
View related articles View Crossmark data
A survey on spatial, temporal, and spatio-temporal database
research and an original example of relevant applications
using SQL ecosystem and deep learning
Kulsawasd Jitkajornwanich a, Neelabh Pantb, Mohammadhani Fouladgarb and
Ramez Elmasrib
aData Science and Computational Intelligence (DSCI) Laboratory, Department of Computer Science, Faculty of
Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Ladkrabang, Bangkok, Thailand;
bDepartment of Computer Science and Engineering, College of Engineering, The University of Texas at
Arlington, Arlington, TX, USA
ABSTRACT
Spatio-temporal data serves as a foundation for most location-based
applications nowadays. To handle spatio-temporal data, an
appropriate methodology needs to be properly followed, in which
space and time dimensions of data must be taken into account
‘altogether’ – unlike spatial (or temporal) data management tools
which consider space (or time) separately and assumes no
dependency on one another. In this paper, we conducted a
survey on spatial, temporal, and spatio-temporal database
research. Additionally, to use an original example to illustrate how
today’s technologies can be used to handle spatio-temporal data
and applications, we categorize the current technologies into two
groups: (1) traditional, mainstay tools (e.g. SQL ecosystem) and (2)
emerging, data-intensive tools (e.g. deep learning). Specifically, in
the first group, we use our spatio-temporal application based on
SQL system, ‘hydrological rainstorm analysis’, as an original
example showing how analysis and mining tasks can be
performed on the conceptual storm stored in a spatio-temporal
RDB. In the second group, we use our spatio-temporal application
based on deep learning, ‘users’ future locations prediction based
on historical trajectory GPS data using hyper optimized ANNs and
LSTMs’, as an original example showing how deep learning
models can be applied to spatio-temporal data.
ARTICLE HISTORY
Received 30 June 2018
Accepted 22 May 2020
KEYWORDS
Survey; spatio-temporal
database; deep learning; SQL;
rainfall analysis
1. Introduction
The global positioning system (or GPS) is one of the most utilized technologies these days. By
using GPS, moving objects can be tracked in relation to other fixed objects such as buildings,
rivers, highways, or someotherpolygonson themap. Identifying thebest routes toadestination
or checking the traffic condition can also be done with GPS. The specialized spatial(-temporal)
© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://
creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any
medium, provided the original work is properly cited.
CONTACT Kulsawasd Jitkajornwanich kulsawasd.ji@kmitl.ac.th Data Science and Computational Intelligence (DSCI)
Laboratory, Department of Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang
(KMITL), Chalongkrung Rd, Ladkrabang, Bangkok 10520, Thailand
JOURNAL OF INFORMATION AND TELECOMMUNICATION
2020, VOL. 4, NO. 4, 524–559
https://doi.org/10.1080/24751839.2020.1774153
database is used to store and query from in response to user queries about the (geo)spatial
objects in reality, e.g. moving cars (as moving objects) and roads, cities (as station objects). All
real-world objects have both spatial and temporal aspects, making them complex and multi-
dimensional, which cannot be efficiently managed and supported by traditional databases.
These specialized databases are preferred over RDBMS when storing and processing
complex real-world objectswhich have spatial and temporal dimensions.Many applications
that are location-based utilize spatio-temporal databases somewhere in the components
such as geographic information systems (GIS), environmental monitoring and assessment,
decision support systems, route navigation systems and transportation scheduling. Geospa-
tio-temporal data is composed of spatial concepts and temporal concepts together, so in
the first part, our contributions include a concept survey on spatial databases and temporal
databases in the following topics, i.e. datamodels, query languages, indexing structures and
ontology formalizations. A survey on spatio-temporal database research is then discussed,
consisting of spatio-temporal modelling, query languages, and operators; spatio-temporal
data warehousing, and concept hierarchies; and spatio-temporal indexing strategies.
Because a large number of applications that utilize this type of data has been constantly
developed – although not many survey papers show how current, available tools are of
use or relevant to (Arora, 2015; Faisal et al., 2017; Gómez et al., 2009; 2016; Radhakrishna
et al., 2015) – the second part of the contributions showed our original sample applications
based on spatio-temporal data using mainstay/emerging technologies.
The paper is organized as follows. Section 2 builds on our conference paper to give a
survey of spatial, temporal, spatio-temporal database research. Section 3 illustrates with an
original example of how today’s technologies can be adopted to utilize spatio-temporal
data, which were divided into two subsections: (i) spatio-temporal application using
SQL system and (ii) spatio-temporal application using deep learning. Finally, we conclude
our survey paper in Section 4.
2. Survey of spatial, temporal, spatio-temporal database research
In this section, we first overview spatial and temporal database concepts and review their
respective research studies including spatial/temporal data types, query languages, index-
ing strategies and ontological concepts. We then survey spatio-temporal database
research studies in the following domains: (1) spatio-temporal modelling, query
languages, and operators; (2) multi-variable, multi-dimensional data, and concept hierar-
chies; and (3) spatio-temporal indexing.
2.1. Spatial database research
Real-world spatial objects are stored and managed by a spatial database management
system (SDBMS) where spatial capabilities including spatial data models and operations
are provided. The spatial(-temporal) objects are not limited to 2D or geographic/geome-
try data. They can also be 3D (or higher) data and are in other domains such as a 3D
scan of the brain of a patient or the composition of molecular proteins in humans
(Shashi & Sanjay, 2003).
JOURNAL OF INFORMATION AND TELECOMMUNICATION 525
2.1.1. Spatial data models
Spatial data refers to data depicting the referenced location of the actual entities in the
real world. The spatial data will have both spatial and non-spatial aspects such as name
of a river, origin of the river, etc. (in case of spatial object: river). There are two main
types of spatial data model: object (or vector) model and field (or raster) model. Two
types of spatial models are used: object model and field model. Spatial data that is distinct,
and has tangible characteristics with concrete location, is usually modelled by object
model. Coordinate systems are used to frame the shapes of geometry/geographic
objects. Most commonly used geometry/geographic objects are points, lines, polygons,
and other shapes. Point is usually defined by longitude (x) and latitude (y) whereas a
line is defined by any two points at minimum. Polygon is defined as a sequence of
minimum three points, which in turn defines a sequence of lines where each line does
not intersect one another and the area within is considered as part of the polygon.
Spatial data with location but do not have identifiable shapes are represented by field
models. Each grid point/cell of a raster is functionally mapped to a specific value. The
globe is perceived as a continuous surface in which functions are used to represent
spatial features (Shekhar et al., 1999). An example of how spatial data can be represented
differently using these two models as well as their respective operators that are supported
w.r.t. OGC (Open Geospatial Consortium) is shown in Figure 1 (Pant et al., 2018).
2.1.2. Spatial query languages
To query spatial data, an extension is developed on top of the traditional relational data-
bases. With this extension, the spatial operations and relationships as well as other related
functionalities can be used and referred to as spatial query language. As mentioned in
Borrmann and Rank (2009), incorporating spatial capabilities into existing SQL systems
is preferred over developing a brand new system. We can classify spatial queries into
(at least) six groups as follows: (1) point query (e.g. identifying all MBRs covering a
given point), (2) range query (e.g. identifying all POIs within a given query polygon), (3)
nearest neighbour (e.g. finding the closest point(s) to the query polygon), (4) distance
scan (e.g. enumerating all points that fall within the specific radius of the given point
along with the distance values), (5) intersection query (e.g. listing all objects overlapping
with the query polygon) and (6) containment query (e.g. listing all objects that are con-
tained within a query polygon) (Egenhofer, 1994; Gandhi et al., 2007).
2.1.3. Spatial indexing
Indexing (or access path) is one of the main mechanisms that help speed up accessing and
retrieving objects from the data storage. To access and retrieve spatial data from a database,
however, traditional 2-D indexing strategies are not suitable. Therefore, several spatial index-
ing structures were proposed. R-trees (a prevalent spatial indexing structure based on B-
trees), and their variations are among the main ones which will discuss them here.
(1) R-Tree
R-tree is a data structure extended from B-tree, a well-known indexing structure for one-
dimension data widely used in relational databases. R-trees are developed to handle
526 K. JITKAJORNWANICH ET AL.
multidimensional data objects effectively by utilizing hierarchical structure of overlapped
MBRs such that the height of trees are always balanced and the I/O cost is minimized
whenever tree operations are called, which is the ultimate goal of physical database
design in any spatial databases (Guttman, 1984).
(2) R-Tree variations
Spatial databases mainly use R-trees to store, manage and retrieve spatial objects.
Other multidimensional indexes are also available such as KDB trees and quadtrees.
However, in some cases, the R-trees suffer from overlapping of MBRs. So, the spatial
database engine ends up going through too many paths in order to retrieve the
desired results. Therefore, variations of R-trees are proposed. In R+-trees (Sellis et al.,
1987), to reduce the access time, MBRs of internal nodes do not overlap. In R-link
trees, nodes are linked together, allowing queries to be performed faster and can
also be done concurrently, more efficient than R-trees and R+-trees (Ng & Kameda,
1994). Hilbert R-tree (Kamel & Faloutsos, 1994) use 2D-c method to order MBRs
within the nodes according to Hilbert space filling curve based on the centre point
of the regular bounding rectangles. Each variation of R-trees has pros and cons. The
selection as to which R-tree variation to use depends on the characteristics (such as
size, shape, and distribution) of spatial objects as well as the spatial types will be
invoked (Neelabh et al., 2016).
Figure 1. Example of how spatial data can be represented and operated using spatial data models
(Pant et al., 2018).
JOURNAL OF INFORMATION AND TELECOMMUNICATION 527
2.1.4. Spatial ontology
In any domain, ontology is used to formally define relevant concepts and associated
relationships/properties of entities of interest in the real world. Spatial ontology is
created to model, reason, facilitate query and exchange spatial data (esp. from the
Web) among machines. Two types of spatial ontology research are: (1) spatial data
storage consolidation via ontology, and (2) design and development of spatial ontology.
In (1), with these spatial ontologies, different spatial databases can be integrated and
their data can be exchanged and collaborated (Bennacer et al., 2004). In (2), a spatial ontol-
ogy is designed and developed in a domain-specific fashion where a limited collection of
related storage in the spatial domain were taken into account (Baglioni et al., 2007), or in a
universal fashion where a broad concepts and relevant characteristics of geospatial
objects are all considered when defining spatial ontology (Hogenboom et al., 2010;
Parent et al., 2006; Spaccapietra et al., 2004).
2.2. Temporal database research
Temporal dimension of data and other time-related issues plays a very important role in
applications such as accounting and banking. In fact, in many other applications, temporal
data are also critical. In temporal databases, time-related component are carefully and sys-
tematically recorded and validated, which are very important in time-sensitive appli-
cations. In the last two decades, many research studies on temporal databases have
been done (Arora, 2015; Jensen, 2000; 2016; Radhakrishna et al., 2015). We will discuss
them next.
2.2.1. Temporal data models
Time is defined as an ordered list of time points in a particular interval. An origin will be
part of all time sequences. Before the origin, the points are negative whereas after the
origin, the points are positive. In any application where time is concerned, chronon,
which refers to the smallest time granularity, is defined w.r.t. the application requirements
and can be varied from application to application. Also note that any activity between two
chronons is considered meaningless and will not be kept in the database (Dyreson et al.,
1994; Frank, 2003). For example, for banking applications, chronon can be second.
The temporal dimension of the data in the database is divided into two different
aspects: valid time (VT) and transaction time (TT). These two timestamp concepts are
equally important and needed to capture the complete picture of the data from past,
present and future. The temporal features are typically added either by: (1) extending
the existing RDBMSs or (2) creating a middle layer with the time-related functionalities
without making any changes to the operational, underlying databases (Arora, 2015).
Although many approaches were proposed in the literature, only some were actually
implemented and materialized as prototypes or in the commercial tools (Arora, 2015;
Faisal et al., 2017; Radhakrishna et al., 2015) and only VT timestamps were supported –
not TT timestamps.
2.2.2. Temporal query languages
There are several temporal query languages proposed. In this subsection, only prominent
ones are discussed here. A query language called TSQL2 (temporal structure query
528 K. JITKAJORNWANICH ET AL.
language), proposed in 1994, to manipulate and query time-related data stored in a rela-
tional database. By using TSQL2 temporal data types and time validity concepts can be
formally defined (Snodgrass et al., 1994). TQuel (temporal query language) (Snodgrass,
1987), an extension to Quel, is an Ingres query language facilitating the temporal dimen-
sion of the data. TQuel was proposed in 1987 and considered to be the precedent of
TSQL2. In RDF databases, temporal-enhanced SPARQL called T-SPARQL is proposed to
allow time-related logics and functionalities to be done on the database. It is more suitable
for temporal RDF database models employing triple time stamping (Grandi, 2010).
2.2.3. Temporal indexing
Several accesspath concepts and strategies for indexing temporal databases havebeen intro-
duced since 1990. In this subsection, someof themwill be reviewedhere. In general, temporal
indexes are created over time intervals and versions using different clustering techniques
based on times and key values. Some data replication is allowed for better performance
efficiency. Furthermore, utilizing different storage systems and architectures for historical
data and current data is another approach (Lomet & Salzberg, 1989) in that a better paralle-
lization in operations can be achieved. Time index (Elmasri et al., 1990) is a B+-tree-based
indexing method in which each leaf page contains all active versions at the changes. In
Becker et al. (1996), Becker proposes multi-version B-tree where an index entry (both key
and time) is created whenever a change is made. In Ramaswamy (1997), however,
windows on time intervals and B+-tree are used to create temporal indexes. In aggregation
indexes, aggregating/grouping queries can be effectively enhanced on temporal databases
(Kline & Snodgrass, 1995). A data structure, called aggregation tree, is built for each of the
aggregation functions. In Böhlen et al. (2006), an AVL tree is created for each combination
of start and end points. First, the start index is traversed and the activated tuples are inserted
into the endpoint tree. Then, the expired tuples are removed and the aggregation is returned
as a result. To create an index for temporal join, B+-tree index over join attribute and B+-tree
index over the time dimension can be combined as a two-level index (Wang et al., 2008).
2.2.4. Temporal ontology
Time dimension of data can be realistically modelled by using temporal ontology. Since
temporal and spatial ontologies are closely related to each other in that temporal relation-
ships can be thought of as one dimension less in spatial relationship. Figure 2 shows a
sample operation called Meet(,)=TRUE between two objects (X
and Y) in the temporal and spatial concepts where X is an object with transparent end-
points and Y is an object with solid endpoints. Meet is a boolean binary relationship
between two objects which are considered met when only their boundaries are over-
lapped such as ones shown in Figure 2. In Hobbs and Pan (2004), a formal ontology for
time is comprehensively defined by Allen where temporal concepts and associated oper-
ations are systematically and logically captured (Allen & Kautz, 1987). With the defined
temporal ontology, time dimension of data can easily be modelled, retrieved, and
reasoned about. A technique called lightweight along with a reasoning module such as
Jess are both used to materialize temporal ontology concepts in the ontology manage-
ment software (e.g. Protégé) (O’Connor & Das, 2010).
JOURNAL OF INFORMATION AND TELECOMMUNICATION 529
2.3. Spatio-temporal database research
In this subsection, spatial and temporal aspects of data are combined to better model spatio-
temporalpropertiesofdatabases (seeFigure3