Table Of ContentA
set is used for model generalization: selection,
AbstractionofGeodatabases
(re-)classification,aggregation,andareacollapse.
Sometimes, also the reduction in the number of
MonikaSester
pointstorepresentageometricfeatureisapplied
InstituteofCartographyandGeoinformatics,
inthemodelgeneralizationprocess,althoughthis
LeibnizUniversityofHannover,Hannover,
is mostly considered a problem of cartographic
Germany
generalization.This is achieved by line general-
izationoperations.
Synonyms
HistoricalBackground
Cartographic generalization; Conceptual gener-
alization of databases; Geographic data reduc-
Generalizationisaprocessthathasbeenapplied
tion; Model generalization; Multiple resolution
by human cartographers to generate small scale
database
maps from detailed ones. The process is com-
posedofanumberofelementaryoperationsthat
havetobeappliedinaccordancewitheachother
Definition
inordertoachieveoptimalresults.Thedifficulty
is the correct interplay and sequencing of the
Model generalization is used to derive a more
operations,whichdependsonthetargetscale,the
simple and more easy to handle digital repre-
type of objects involved, as well as constraints
sentationofgeometricfeatures(Grünreich1995).
these objects are embedded in (e.g., topological
It is being applied mainly by National Map-
constraints,geometricandsemanticcontext,...).
ping Agencies to derive different levels of rep-
Generalization is always subjective and requires
resentationswithlessdetailsoftheirtopographic
the expertise of a human cartographer (Spiess
datasets,usuallycalledDigitalLandscapeMod-
1995). In the digital era, attempts to automate
els(DLM’s).Modelgeneralizationisalsocalled
generalization have lead to the differentiation
geodatabase abstraction, as it relates to generat-
between model generalization and cartographic
ing a more simple digital representation of ge-
generalization, where the operations of model
ometric objects in a database, leading to a con-
generalizationare considered to be easier to au-
siderabledatareduction.Thesimplificationrefers
tomatethanthoseofcartographicgeneralization.
to both the thematic diversity and the geometric
After model generalization has been applied,
complexityoftheobjects.Amongthewellknown
the thematic and geometric granularity of the
mapgeneralizationoperationsthefollowingsub-
data set corresponds appropriately to the target
©SpringerInternationalPublishingSwitzerland2016
S.Shekharetal.(eds.),EncyclopediaofGIS,
DOI10.1007/978-3-319-23519-6_13-2
2 AbstractionofGeodatabases
scale. However, there might be some geometric area-to-line reduction, the use of the Medial
conflicts remaining that are caused by applying Axisis popular,which isdefinedas the locusof
signatures to the features as well as by impos- points that have more than one closest neighbor
ingminimumdistancesbetweenadjacentobjects. on the polygon boundary. There are several
Theseconflictshavetobesolvedbycartographic approximations and special forms of axes (e.g.,
generalization procedures, among which typifi- Straight Skeleton (David and Erickson 1998)).
cation and displacement are the most important Depending on the object and the task at hand,
(for a comprehensive overview, see Mackaness thereareformsthatmaybe morefavorablethan
et al. 2007). As opposed to cartographic gen- others (e.g., Chin et al. 1995 and Haunert and
eralization, modelgeneralization processes have Sester2007).
already achieved a high degree of automation.
Fully automatic processes are available that are Aggregation
abletogeneralizelargedatasets,e.g.,thewhole This is a very important operation that merges
ofGermany(UrbankeandDieckhoff2006). twoormoreobjectsintoasingleone,thuslead-
ing to a considerable amount of data reduction.
Aggregationisoftenfollowingaselectionorarea
ScientificFundamentals
collapseprocess:whenanobjectistoosmall(or
unimportant)to be presented in the target scale,
Operationsofmodelgeneralizationareselection,
it has to be merged with a neighboring object.
re classification, aggregation, area collapse, and
For the selection of the most appropriate neigh-
linesimplification.
bor,therearedifferentstrategies(seeFig.1,e.g.,
selectingtheneighboraccordingtothematicpri-
Selection
orityrules,theneighborwiththelongestcommon
According to a given thematic and/or geometric
boundary,thelargestneighbor,ortheareacanbe
property, objects are selected which are being
distributedequallytotheneighbors(Haunertand
preserved in the target scale. Typical selection
Sester2007;vanOosterom1995;Podrenek2002;
criteria are object type, size or length. Objects
vanSmaalen2003).Anothercriterionistoselect
fulfillingthesecriteriaarepreserved,whereasthe
a neighborwhichleadsto a compactaggregated
others are discarded. In some cases, when an
region and solve the whole problem as a global
area partitioning of the whole data set has to
optimizationprocess(HaunertandWolff2006).
be preserved, then the deleted objects have be
Aggregationcan also be performedwhen the
replacedappropriatelybyneighboringobjects.
objectsare nottopologicallyadjacent.Then, ap-
propriate criteria for the determination of the
Re-Classification
neighborhoodare neededas wellas measuresto
Often,thethematicgranularityofthetargetscale
fill the gaps between the neighboring polygons
is also reduced when reducing the geometric
(Bundy et al. 1995). Aggregation can also be
scale. This is realized by reclassification or new
appliedtoothergeometricfeaturessuchaspoints
classificationofobjecttypes.Forexample,inthe
and lines. This leads to point aggregations that
German ATKIS system, when going from scale
can be approximated by convex hulls, or to ag-
1:25.000to 1:50.000,the variationof settlement
gregationsoflinesfeatures.
structures is reduced by merging two different
settlementtypestooneclassinthetargetscale.
LineSimplification
AreaCollapse Line simplification is a very prominent gener-
Whengoingtosmallerscales,higher-dimensional alization operation. Many operations have been
objects may be reduced to Lower dimensional proposed, mainly taking the relative distance
ones. For instance, a city represented as an area betweenadjacentpointsandtheirrelativecontext
is reduced to a point; an areal river is reduced into account. The most well-known operator is
to a linear river object. These reductions can the Douglas-Peucker-Algorithm (Douglas and
be achieved using skeleton operations. For the Peucker1973).
AbstractionofGeodatabases 3
Definition Neighbours Size All neighbors
A
dark object -> object -> object -> object ->
equal distribution
light gray object max_neighbors biggest neighbor
to all neighbors
AbstractionofGeodatabases,Fig.1 Differentaggregationmethods
KeyApplications the necessary information is communicated to
the user, database abstraction methods are used.
The key application of database abstraction or Also, it allows for the progressive transmission
model generalization is the derivation of less ofmoreandmoredetailedinformation(Brenner
detaileddatasetsfordifferentapplications. andSester2005;Yang2005).
CartographicMapping SpatialDataAnalysis
The production of small scale maps requires a Spatialanalysisfunctionsusuallyrelatetoacer-
detailed data set to be reduced in number and tain level of detail where the phenomena are
granularityoffeatures.Thisreductionisachieved bestobserved,e.g.,forplanningpurposes,ascale
using database abstraction.It hasto be followed of approximately 1:50.000 is very appropriate.
by cartographic generalization procedures that Databaseabstractioncanbeusedtogeneratethis
areappliedinordertogeneratethefinalsymbol- scale from base data sets. The advantage is that
izedmapwithoutgraphicalconflicts. thelevelofdetailisreducedwhilestillpreserving
thegeometricaccuracy.
VisualizationonSmallDisplays
The size of mobile display devices requires the
presentation of a reduced number of features. FutureDirections
To this end, the data can be reduced using data
abstractionprocesses. MRDB:MultipleResolutionDatabase
For topographicmapping,often data sets of dif-
InternetMapping:Streaming ferent scales are provided by Mapping Agen-
Generalization cies. In the past, these data sets were typically
Visualization of maps on the internet requires produced manually by generalization processes.
thetransmissionofanappropriatelevelofdetail With the availability of automaticgeneralization
to the display of the remote user. To achieve tools,suchmanualeffortcanbereplaced.Inorder
an adequatedata reductionthat still ensuresthat tomakeadditionaluseofthislatticeofdatasets,
4 AbstractionofGeodatabases
thedifferentscalesarestoredinadatabasewhere methodologyandpractice.Taylor&Francis,London,
theindividualobjectsinthedifferentdatasetsare pp106–119
ChinFY,SnoeyinkJ,WangCA(1995)Findingthemedial
connected with explicit links. These links then
axis of a simplepolygon inlinear time.In: Springer
allowforanefficientaccessofthecorresponding
(ed) ISAAC 95: proceedings of the 6th international
objects in the neighboring scales, and thus an symposium onalgorithms andcomputation, London,
ease of movement up and down the different pp382–391
DavidE,EricksonJ(1998)Raisingroofs,crashingcycles,
scales. There are several proposals for appro-
andplayingpool: applicationsofadatastructure for
priate MRDB data structures, see e.g., Balley
findingpairwiseinteractions.In:SCG98:proceedings
et al. (2004). The links can be created either in of the 14th annual symposium on computational ge-
the generalization process or by matching ex- ometry,Minneapolis,pp58–67
Douglas D,Peucker T(1973) Algorithmsfor thereduc-
isting data sets (Hampe et al. 2004). Although
tion of the number of points required to represent a
different approaches already exist, there is still
digitizedlineoritscaricature.CanCartogr10(2):112–
researchneededtofullyexploitthisdatastructure 122
(Sheerenetal.2004). Grünreich D (1995) Development of computer-assisted
generalizationonthebasisofcartographicmodelthe-
ory. In:MüllerJC, Lagrange JP, WeibelR(eds) GIS
DataUpdate
andgeneralization–methodologyandpractice.Taylor
An MRDB in principle offers the possibility of &Francis,London,pp47–55
efficientlykeepingtheinformationinlinkeddata Hampe M, Sester M, Harrie L (2004) Multiple repre-
sentationdatabasestosupportvisualisationonmobile
sets up-to-date. The idea is to exploit the link
devices.In:Internationalarchivesofphotogrammetry,
structure and propagatethe updated information remote sensing and spatial information sciences, IS-
to the adjacentand linked scales. There are sev- PRS,Istanbul,vol35
eralconceptsforthis,however,thechallengeisto Haunert JH, Sester M (2005) Propagating updates be-
tweenlinkeddatasetsofdifferentscales.In:Proceed-
restrict the influence rangeto a manageablesize
ingsof22ndinternationalcartographicconference,La
(HaunertandSester2005). Coruna,pp9–16
HaunertJH,SesterM(2007,inpress)Areacollapseand
roadcenterlinesbasedonstraightskeletons.Geoinfor-
matica
Cross-References
HaunertJH,WolffA(2006)Generalizationoflandcover
mapsbymixedintegerprogramming.In:Proceedings
(cid:2)Generalization,On-the-Fly of14thinternationalsymposiumonadvancesingeo-
(cid:2)HierarchiesandLevelofDetail graphicinformationsystems,Arlington
Mackaness WA, Sarajakoski LT, Ruas A (2007) Gen-
(cid:2)MapGeneralization
eralisation of geographic information: cartographic
(cid:2)MobileUsageandAdaptiveVisualization modelling and applications. Published on behalf of
(cid:2)VoronoiDiagram theinternationalcartographicassociationbyElsevier,
(cid:2)WebMappingandWebCartography Amsterdam
MüllerJC,Lagrange JP,WeibelR(eds) (1995)GISand
generalization –methodologyandpractice.Taylor &
Francis,London
References Podrenek M (2002) Aufbau des DLM50 aus dem Ba-
sisDLM und Ableitung der DTK50 – Lösungsansatz
inNiedersachsen. Kartographische Schriften Band6.
Balley S, Parent C, Spaccapietra S (2004) Modelling
KirschbaumVerlag,Bonn,pp126–130
geographic data with multiple representation. Int J
Sheeren D, Mustière S, Zucker JD (2004) Consistency
GeogrInfSci18(4):327–352
assessment between multiple representations of geo-
Brenner C, Sester M (2005) Continuous generalization
graphical databases: a specification-based approach.
for small mobile displays. In: Agouris P, Croitoru A
In: Proceedings of the 11th international symposium
(eds) Next generation geospatial information. Taylor
onspatialdatahandling,Leicester
&Francis,Hoboken,pp33–41
SpiessE(1995)Theneedforgeneralizationinagisenvi-
Bundy G, Jones C, Furse E (1995) Holistic generaliza-
ronment.In:MüllerJC,LagrangeJP,WeibelR(eds)
tion of large-scale cartographic data. In: Müller JC,
GIS and generalization – methodology and practice.
LagrangeJP,WeibelR(eds)GISandgeneralization–
Taylor&Francis,London,pp31–46
AbstractionofGeodatabases 5
Urbanke S, Dieckhoff K (2006) The adv-project atkis van Smaalen J (2003) Automated aggregation
generalization,partmodelgeneralization(inGerman). of geographic objects. A new approach to
KartographischeNachrichten56(4):191–196 the conceptual generalisation of geographic
A
vanOosteromP(1995)Thegap-tree,anapproachto‘on- databases. PhD thesis, Wageningen University,
the-fly’ map generalization of an area partitioning. TheNetherlands
In: MüllerJC, Lagrange JP, Weibel R(eds) GISand Yang B (2005) A multi-resolution model of vector map
generalization–methodology andpractice. Taylor& dataforrapidtransmissionovertheinternet.Comput
Francis,London,pp120–132 Geosci31(5):569–578
A
approximateanswer, but continuouslyrefine the
AggregateQueries,Progressive
answerastimegoeson,progressivelyimproving
Approximate
itsquality.Thus,iftheuserhasafixeddeadline,
hecanobtainthebestanswerwithintheallotted
IosifLazaridisandSharadMehrotra
time; conversely, if he has a fixed answer accu-
DepartmentofComputerScience,Universityof
racy requirement, the system will use the least
California,Irvine,CA,USA
amountoftimetoproduceananswerofsufficient
accuracy. Thus, progressive approximate aggre-
gate queries are a flexible way of implementing
Synonyms
aggregatequeryanswering.
Multi-Resolution Aggregate trees (MRA-
Approximate aggregate query; On-line
trees) are spatial – or in general multi-
aggregation
dimensional – indexing data structures, whose
nodes are augmented with aggregate values for
alltheindexedsubsetsofdata.Theycanbeused
Definition
very efficiently to providean implementationof
progressiveapproximatequeryanswering.
Aggregatequeriesgenerallytakeasetofobjects
as input and produce a single scalar value as
output,summarizingoneaspectoftheset.Com-
HistoricalBackground
monlyusedaggregatetypesincludeMIN,MAX,
AVG,SUM,andCOUNT.
Aggregate queries are extremely useful because
If the input set is very large, it might not
theycan summarizea hugeamountof data by a
be feasible to compute the aggregate precisely
singlenumber.Forexample,manyusersexpectto
andinreasonabletime.Alternatively,theprecise
knowtheaverageandhighesttemperatureintheir
value of the aggregate may not even be needed
city and are not really interested in the temper-
by the application submitting the query, e.g., if
ature recorded by all environmental monitoring
the aggregate value is to be mapped to an 8-bit
stations used to produce this number. The sim-
color code for visualization. Hence, this moti-
plest aggregate query specifies a selection con-
vates the use of approximate aggregate queries,
dition specifying the subset of interest, e.g., “all
whichreturnavalueclosetotheexactone,butat
monitoring stations in Irvine” and an aggregate
afractionofthetime.
typetobecomputed,e.g.,“MAXtemperature”.
Progressiveapproximateaggregatequeriesgo
The normal way to evaluate an aggregate
one step further. They do not produce a single
query is to collect all data in the subset
©SpringerInternationalPublishingSwitzerland2016
S.Shekharetal.(eds.),EncyclopediaofGIS,
DOI10.1007/978-3-319-23519-6_41-2
2 AggregateQueries,ProgressiveApproximate
of interest and evaluate the aggregate query off-line synopses, MRA-trees are flexible and
over them. This approach has two problems: can adapt to the characteristics of the user’s
first, the user may not need to know that the quality/time requirements. Their advantage over
temperature is 34:12ıC, but 34 ˙ 0:5ıC will samplingisthattheyhelpqueriesquicklyzeroin
suffice; second, the dataset may be so large onthesubsetofinterestwithouthavingtoprocess
that exhaustive computation may be infeasible. agreatnumberoftuplesindividually.Moreover,
These observations motivated researchers to MRA-treesprovidedeterministic answer quality
devise approximate aggregate query answering guarantees to the user that are easy for him
mechanisms. to prescribe (when he poses his query) and to
Off-linesynopsisbasedstrategies,suchashis- interpret(whenhereceivestheresults).
tograms (Ioannidis and Poosala 1999), samples
(Acharyaetal.1999),andwavelets(Chakrabarti
et al. 2000) have been proposed for approx- ScientificFundamentals
imate query processing. These use small data
summaries that can be processed very easily to Multi-dimensional index trees such as R-trees,
answer a query at a small cost. Unfortunately, quad-trees, etc., are used to index data exist-
summaries are inherently unable to adapt to the ing in a multi-dimensional domain. Consider a
queryrequirements.Theuserusuallyhasnoway d-dimensionalspaceRd andafinitesetofpoints
of knowing how good an approximate answer (input relation) S (cid:2) Rd. Typically, for spatial
is and, even if he does, it may not suffice for applications, d 2 f2;3g. The aggregate query
his goals. Early synopsis based techniques did is defined as a pair (agg, RQ ) where agg is
not provide any guarantees about the quality an aggregate function (e.g., MIN, MAX, SUM,
of the answer, although this has been incor- AVG, COUNT) and RQ (cid:2) Rd is the query
porated more recently (Garofalakis and Kumar region.The queryasks forthe evaluationof agg
2005). overalltuplesinS thatareinregionRQ.Multi-
Online aggregation (Hellerstein et al. 1997) dimensional index trees organize this data via a
wasproposedtodealwiththisproblem.Inonline hierarchical decomposition of the space Rd or
aggregation, the input set is sampled continu- grouping of the data in S. In either case, each
ously,aprocesswhichcan,inprinciple,continue nodeN indexesasetofdatatuplescontainedin
until this set is exhausted,thus providingan an- its subtree which are guaranteed to have values
swerofarbitrarilygoodquality;thegoalis,how- withinthenode’sregionRN.
ever, to use a sample of small size, thus saving MRA-trees(LazaridisandMehrotra2001)are
on performance while giving a “good enough” genericdata techniquesthat can be appliedover
answer. In online aggregation, a running aggre- any standard multi-dimensional index method;
gate is updated progressively,finally converging theyarenotyetanotherindexingtechnique.They
totheexactansweriftheinputisexhausted.The modifytheunderlyingindexbyaddingthevalue
sampling usually occurs by sampling either the of the agg over all data tuples indexed by (i.e.,
entire datatable ora subsetofinterestonetuple in the sub-tree of)N to each tree nodeN. Only
at a time; this may be expensive, depending on a single such value, e.g., MIN, may be stored,
the size of the table, and also its organization: but in general, all aggregate types can be used
if tuples are physically ordered in some way, without much loss of performance.An example
then sampling may need to be performed with ofanMRA-quad-treeisseeninFig.1.
randomdiskaccesses,whicharecostiercompared Thekeyobservationbehindtheuse ofMRA-
tosequentialaccesses. trees is that the aggregatevalue of all the tuples
Multi-resolutiontrees(LazaridisandMehrotra indexedbyanodeN isknownbyjustvisitingN.
2001) were designed to deal with the limita- Thus, in addition to the performance benefit of
tions of established synopsis-based techniques a standard spatial index (visiting only a fraction
and sampling-based online aggregation. Unlike ofselected tuples, ratherthan the entire set), the
AggregateQueries,ProgressiveApproximate 3
A
AggregateQueries,ProgressiveApproximate,Fig.1 ExampleofanMRA-quad-tree
AggregateQueries,ProgressiveApproximate,Fig.2 AsnapshotofMRA-treetraversal
MRA-tree also avoids traversing the entire sub- performance. This situation is seen in Fig.2:
tree of nodescontainedwithin the queryregion. nodesattheperimeterofthequery(setNp)can
Nodes that partially overlap the region may or befurtherexplored,whereasnodesattheinterior
may not contribute to the aggregate, depending (Nc)neednotbe.
onthe spatialdistributionof pointswithinthem. The progressive approximation algorithm
Such nodes can be further explored to improve (Fig.3)hasthreemajorcomponents:
4 AggregateQueries,ProgressiveApproximate
AggregateQueries,ProgressiveApproximate,Fig.3 Progressiveapproximationalgorithm
• Computation of a deterministic interval of Thedetailsofthisforalltheaggregatetypescan
confidence guaranteed to contain the aggre- be found in Lazaridis and Mehrotra (2001). For
gatevalue,e.g.,[30,40]. example,iftheSUMofallcontainednodesis50
• Estimationoftheaggregatevalue,e.g.,36.2. andtheSUMofallpartiallyoverlappingnodesis
• A traversal policy which determines which 15,thentheintervalis[50,65]sinceallthetuples
node to explore next by visiting its children in the overlappingnodescould either be outside
nodes. orinsidethequeryregion.
Thereisnosinglebestwayforaggregatevalue
The interval of confidence can be calculated estimation. For example, taking the middle of
by taking the set of nodes partially overlap- theintervalhasthe advantageofminimizingthe
ping/containedinthequeryintoaccount(Fig.2). worst-caseerror.Ontheotherhand,intuitively,if
AggregateQueries,ProgressiveApproximate 5
a node barely overlaps with the query, then it is to compute even the exact answer. Query se-
expectedthatitsoverallcontributiontothequery lectivity affects processing speed; like all multi-
will be slight. Thus, if in the previous example dimensionalindexes, performancedegradesas a A
therearetwo partiallyoverlappingnodes,A and higher fraction of the input table S is selected.
B, with SUM(A)D 5 and SUM(B)D 15, and However,unliketraditionalindexes,thedegrada-
30%ofAand50%ofBoverlapswiththequery tion is more gradual since the “interior” area of
respectively, then a good estimate of the SUM thequeryregionisnotexplored.Atypicalprofile
aggregatewillbe50C5(cid:3)0:3C15(cid:3)0:5D59. of answer error as a function of the number of
Finally, the traversal policy should aim to nodesvisitedcanbeseeninFig.4.
shrink the interval of confidence by the great- MRA-trees use extra space (to store the
est amount, thus improving the accuracy of the aggregates) in exchange for time. If the under-
answer as fast as possible. This is achieved by lying data structure is an R-tree, then storage
organizingthe partially overlappingnodesusing of aggregates in tree nodes results in decreased
apriorityqueue.Thequeueisinitializedwiththe fanoutsincefewerboundingrectanglesandtheir
rootnodeandsubsequentlythefrontnodeofthe accompanying aggregate values can be stored
queueisrepeatedlypicked,itschildrenexamined, withina diskpage.Decreasedfanoutmayimply
theconfidenceintervalandaggregateestimateis increased height of the tree. Fortunately, the
updated, and the partially overlapping children overheadofaggregatestoragedoesnotnegatively
are placedinthe queue.Our examplemayshow affect performance since it is counter-balanced
thepreferencetoexplorenodeBbeforeAsinceit by the benefits of partial tree exploration. Thus,
contributedmore(15)totheuncertaintyinherent evenforcomputingtheexactanswer,MRA-trees
intheintervalofconfidencethanB(5).Detailed are usually faster than regular R-trees and the
descriptionsof thepriorityusedforthedifferent differencegrowsevenifasmallerror,e.g.,inthe
aggregate types can be found in Lazaridis and orderof10%,isallowed(Fig.5).
Mehrotra(2001).
Performance of MRA-trees depends on both
the underlying data structure used as well as KeyApplications
the aggregate type and query selectivity. MIN
and MAX queries are typically evaluated very Progressiveapproximateaggregatequeriesusing
efficientlysincethequeryprocessingsystemuses a multi-resolution tree structure can be used in
the node aggregates to quickly zero in on a many application domains when data is either
few candidate nodes that contain the minimum large, difficultto process, or the exact answer is
value; very rarely is the entire perimeter needed notneeded.
AggregateQueries, Relative Error (COUNT, 25%)
Progressive 1.4
Approximate,Fig.4
Answererrorimprovesas 1.2
moreMRA-treenodesare or
visited Err 1
e
ativ 0.8
el
e R 0.6
g
a
er 0.4
v
A
0.2
0
0 100 200 300 400 500 600
# MRA-tree Nodes Visited