GPS trajectory data segmentation based on probabilistic logic ✩
Sini Guo
a, Xiang Li
b,c,∗, Wai-Ki Ching
a,b,d, Ralescu Dan
e, Wai-Keung Li
f, Zhiwen Zhang
aaAdvancedModelingandAppliedComputingLaboratory,DepartmentofMathematics,TheUniversityofHongKong,PokfulamRoad, Hong Kong
bSchoolofEconomicsandManagement,BeijingUniversityofChemicalTechnology,Beijing,China
cBeijingAdvancedInnovationCenterforSoftMatterScienceandEngineering,BeijingUniversityofChemicalTechnology,Beijing100029,China dHughesHall,WollastonRoad,Cambridge,UK
eDepartmentofMathematicalSciences,UniversityofCincinnati,Cincinnati,USA
fDepartmentofStatisticsandActuarialScience,TheUniversityofHongKong,PokfulamRoad,HongKong
a r t i c l e i n f o a b s t r a c t
Articlehistory:
Received4May2018
Receivedinrevisedform20September 2018
Accepted25September2018 Availableonline9October2018
Keywords:
GPStrajectorydatasegmentation Probabilisticlogic
Newton’smethod
With the rapid development ofinternet economy, transparentlogistics is stepping into a prosperity period with massive transportation data generated and collected every day. In this paper, we focus on the segmentation of GPS trajectory data generated in logistics transportation to analyze the vehicle behaviors and extract business affair informationaccordingtothevehiclebehaviorcharacteristics,whichischallengingdueto the complexityoftrajectory data andunavailability of road information.Weextract the stoppingpointsfromthetrajectorydatasequencebasedonthedurationofnonmovement, andconstructbusinesstimewindowandelectronicfencebyanalyzingthedrivinghabits ofvehicles.Furthermore,weproposeaprobabilisticlogicbaseddatasegmentationmethod (PLDSM)whichnotonlyhelpsfindingallthebusinesspointsbutalsoassistsininferring thebusinessaffaircategories.Anefficientnumericalalgorithmintegratingdualitytheory and Newton’s method is proposed to obtain the optimal solution. Finally, a practical example is presented to validatethe effectiveness ofPLDSM. Theresults greatly enrich thedatasegmentationtechniqueandpromotethepracticabilityofprobabilisticlogic.
©2018ElsevierInc.Allrightsreserved.
1. Introduction
Massivedataaregeneratedeverysecondwiththerapiddevelopmentofinformationtechnology.Althoughtheprocessing capabilityofcomputer hasachievedgreat improvement,datastorageandretrievalare stillchallengedby theincreasingly accumulatedcomplicateddata.Therefore,howtofiltertheredundantdataandextractvaluableinformationfromrawdata areamajorconcerninrealapplications.Datasegmentationattractspeople’sattentionforitsefficiencyindatasummariza- tion andinformationmining. As adata-based informationtechnique, data segmentationis intended forsegmenting data sequenceintoaseriesofdisjointsegments basedonsomepredeterminedcriteria.Ithasbeenappliedsuccessfullyinvar- ious fields,such as image processing [42] [17], DNA sequence segmentation [5] and vehicle trajectory analysis [2]. Data
✩ ThispaperispartoftheVirtualspecialissueonNewTrend inAggregation andFusion ofInformation,edited byIreneDíazandYusukeNojima.
*
Correspondingauthor.E-mailaddress:[email protected](X. Li).
https://doi.org/10.1016/j.ijar.2018.09.008
0888-613X/©2018ElsevierInc.Allrightsreserved.
228 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
segmentation isable topartition adigitalimage intoregions ofpixels bysimilarity incolorandintensity.Invehicletra- jectory analysis, datasegmentation canbe appliedto simplifythevehiclemovement tocompact thecontinuallyevolving spatio-temporal database, achievinga fastandefficientsupport ofapplications suchasrecognition oftrajectory patterns, predictionofcongestedareas[2].Inthispaper,wemainlyfocusonGPStrajectorydatasegmentation.
Varioustrajectorydatasegmentationmethodshavebeenproposedinthepastfewyears,whichcanbedividedintotwo classes. The first is attribute-driven segmentation method. It is capable ofpartitioning the spatial trajectory into several number ofsegments such that themovement inside each segment ishomogeneous withrespectto some movement at- tributes,suchasspeed,residencetimeandheading.YoonandShahabi[41] partitionedatrajectoryintoasmallnumberof spatially andtemporallyhomogeneous segments. Ineach segment,thevehicle approximatelymovedata constantspeed.
Buchin et al. [6] considered location, heading, speed,curvature, sinuosity andcurviness tosegment the trajectory into a minimum numberofsegments. KrummandHorvitz [25] segmented thetrajectory ofacar byfinding thedata sequence withspeedlessthantwomilesperhour.Panagiotakiset al.[34] describedtherepresentativenessofeachsegmentbasedon localdensityandtrajectorysimilarity,andthenidentifiedthesegmentbordersandthenumberofsegmentsbythenovel segmentationalgorithm.Aronovet al.[3] designedtheoutlier-tolerantcriterionandthestandarddeviationcriteriontoseg- ment thetrajectory byproposingaunivariate attributefunction.Moreattribute-drivendatasegmentationcan befoundin [7,14,15,40,43].
Thesecondkindoftrajectorydatasegmentationreferstothepattern-drivensegmentationmethod,whichmostlyfocuses on patterndetectionbasedonindividualorgroupofmovingentities.Pattersonet al. [36] usedtrajectorydatatoinferan individual’stransportationmodeintobus,walkandcartopredictthemostlikelyroute.Inviewofthatpeoplehadtowalk throughthetransitionbetweentwodifferenttransportationmodes,Zhenget al.[44] proposedawalk-basedtrajectorydata segmentation methodtoidentifythetransportation modes.LaubeandImfeld[26] developeda relativemotionframework to studythesimilar behaviors ofdifferententities,wherea collectionofspatio-temporal patternswere definedbased on themovingdirection.Asanextensionoftherelativemotionframework,Laubeet al.[27] extractedfourmovementpatterns includingflock, leadership, convergenceandencounterfromgeospatialdata.Benker et al.[4] redefined theflock asa set ofobjects whichmovedalong pathsclosetoeach other fora predefinedtime.Jeunget al.[22] proposed aconvoyquery andoutlined preliminarytechnique todetect theconvoyof vehiclesfrommassivespatio-temporal data.Zhenget al.[45]
developedachangepoint-basedsegmentationmethodtopartitionthetrajectoryintosegmentswithdifferenttransportation modes.Otherpattern-basedsegmentationstudiescanbefoundin[1,16,23,30,35,46].
Most ofthecurrentGPS trajectory datasegmentation are performedby meansofcluster algorithm. Clusteralgorithm is a well-known generic unsupervised learning technique, which can be applied to determine relevantsimilar groups of trajectory dataaccordingtosome commoncharacteristics.Forexample,Giannottiet al.[18] employedacluster algorithm toextracttheregionsthatvehiclesusuallyvisitedbasedonthedensityoflocations,andcalculatedthetraveltimeofeach segment betweentwo regions to infer thevehicle behaviors. Pelekis et al.[37] proposed acluster algorithm to segment trajectorydatausingvariousdistancefunctionsbasedonlocation,speed,accelerationanddirection.Therearetwopopular clusteralgorithmscommonlyappliedinthetrajectorydatasegmentation:K-means(KM)anddensity-basedspatialcluster- ing of applicationswith noise (DBSCAN).The KM-basedsegmentation pursues to optimizethe definedclustering quality function withacertain numberofclusters[28,29].Theprerequisite ofthismethodisknowing thenumberofclustersin advance. In contrast,DBSCAN clustersthetrajectories based ondensity criteriawithout knowingthe numberof clusters [33].Itmerelydependsontwoinputs:
ε
andMinpts,whereε
referstotheradiusofaneighborhoodandMinpts denotes the minimumnumberofdatapoints ineachneighborhood.Forexample,Gonget al.[20] proposedan improvedDBSCAN algorithmtoidentifythestoppingpointsofaGPStrajectory.Chenet al.[10] designedaT-DBSCANalgorithmbyconsidering the spatio-temporalcharacteristics oftheGPS trajectory.In additiontothe abovecluster algorithms,some othermachine learning techniquesare applied in data segmentation, such asK-nearest neighbor (KNN) [24], neural networks(NN) [9], convolutionalneuralnetwork(CNN)[13] anddeepneuralnetwork(DNN)[39].Althoughtheabovemachinelearningalgorithmsareabletosolvethecomplicatedtrajectorydatasegmentationproblem effectively,therearestillsomerestrictions.Thefirstisthatmostofthecurrentmachinelearningbaseddatasegmentation methodsrelyontheroadinformationsuch asmapmatchingorGIS components.Inrealapplications,itmaybeexpensive orevenimpossibletoderivetheGISdata.Thesecondistheharshdemandontheaccuracyoftraining data.Themachine learning algorithms performwellincasethatthe trainingdataisaccurate.However, thedataerrorsandinformationloss are inevitable in the data retrieval and collectionprocess, which may causethe machine learning methods to obtain a wrong conclusionviolatingthecommonsenseandlogic.Totacklewiththeabovedeficiencies,weincorporateprobabilistic logic intodatasegmentation toreduce thenegative influenceoferroneousdata,andproposethePLDSM,whichnot only segments thedata sequenceaccurately,butalso helpson recuperatinginformationloss.As anapplication, weapply it to logisticstransportationfield.
Inrecentyears,therapiddevelopmentofinterneteconomyinChinagreatlypromotestheprosperityoflogistics,which permeateseveryaspectofourdailylife.Forexample,on11thNovember2017,thetotaltradingvolumeinAlibabaturnedto 168.2billionRMB,generating812million logisticsorders.Asanimportantpartoflogistics,thefourthpartylogisticsisplay- ing anincreasinglyimportantroleforitsefficiencyincollaboratinginformationresourcesandinformationsharing.Instead ofprovidingtransportation serviceforrealproducts,thefourthparty logisticsisintendedforprovidinglogisticsprogram- ming, consultationandlogisticsinformationsystemforthefirst,secondandthird partylogistics.Toachieveatransparent transportation andhave abetter control of the logisticsinformation, thetransportation vehicles inthe first, second and
Fig. 1.Before segmentation.
Fig. 2.After segmentation.
thirdpartylogisticsareusuallyrequiredtoinstallelectronicsensorsincludingGPSreceivers,electroniclocks,andelectronic fuelingtankcaps,whichcollectmassivetransportationdataforthefourthpartylogistics.However,thefourthpartylogistics hasnoaccesstoobtainingthebusinessaffairinformationincludingbusinesstimeandbusinesslocations.Asforthisappli- cationscene,inthispaper,weinvestigatetheapplicationofdatasegmentationtominethebusinessaffairinformationfrom GPStrajectorydata.ThekeyissueistosegmenttheseriesofGPSdataintosegmentsofwhicheachsegmentcorrespondsto aparticularbusinessaffair.Forexample,basedontrajectorydatageneratedbyanindividualvehicle,Fig.1onlyprovidesa traveltrajectory.Afterdatasegmentation,thetrajectoryissegmentedintofivesegments,ofwhicheachcontainstheorigin anddestinationofabusinessaffair.Fourbusinesspoints A,B,C andD wherebusinessaffairstakeplacearedetected(see Fig.2).Inthemeanwhile,we needtoextractmorebusiness affairinformationbyanalyzingthesegmentedtrajectorydata characteristics.Forexample,accordingtothelocationandtimeofeachbusinesspointandthedailydrivinghabitsofeach vehicle,wecandeducethecategoryofproductsthisvehicleistransporting.Thispaperfocusesonidentifyingthetimeand destinationsofbusinessaffairs,andbusinessaffaircategorieswithouttravelreports.Comparedwithpreviousdatasegmen- tationmethods, our proposed PLDSMis ableto extract the hiddenlogisticstransportation informationwhen GIS datais unavailableandperformsbetterinrecuperatinginformationloss.Thecontributionofthispapercanbestatedasfollows:
• ProbabilisticlogicisfirstlyappliedinsegmentingGPStrajectorydatawithoutreferringtoanyroadinformation,which isabletoovercomethedeficiencyofhighdependenceonGISdatainpreviousstudies.
• ThePLDSMisproposedandanefficientnumericalalgorithmintegratingdualtheoryandNewton’smethodisdesigned, achievingabetterperformancethantraditionalmachinelearningtechniquewhenthedataisinadequateandnotaccu- rateenough.
Theremainderofthepaperisstructuredasfollows.InSection2,weintroducethetransportationdataforsegmentation.
Section 3 presentsthe PLDSMand formulatesa maximumentropy model.In Section 4, Newton’smethod together with duality theory are employed to solve the optimalsolution of themaximum entropy model.A practical exampleis then presentedinSection5tovalidatetheeffectivenessofPLDSM.Finally,Section6concludesthepaper.
2. Datapreparation
Inthissection,weintroducedifferenttypesofdatacollectedinthetransportationprocess.Variouselectronicdevicesare equippedinthevehiclestotrackthetransportationstatustimely.Theseincludelogisticselectroniclocks,electronicfueling tankcapsandGPSreceivers.
230 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Fig. 3.GPS log and trajectory.
Electronic locks: The electronic locks, by means of mobile GPRS wireless network system, are installed to monitor the whole transportation process of products in the realtime, assisting transportation enterprises andthe clients to get real-timedynamic informationofgoodstoensuretheeffectivesupervisionduringtransportation.Thiscanreduce thecost of enterprise andenhance competitiveness. Every time the cargodoor ofthe vehicle isopened, there isa record in the electroniclock.
Electronicfuelingtankcaps: The electronic fueling tank caps are designedto monitor the fueling of vehicles in the transportation process to prevent drivers fromstealing the oil.Again every time whenthe vehicle is fueled, thereis an electronicrecordinthetankcap.
GPSreceivers:GPS(GlobalPositioningSystem)receiversarewidelyappliedintransportationfields,whichareintended for recording and uploading real-time data including position coordinates, speed, heading and some other geographical information.
The GPS receivers renew the positioning data every several seconds, generating a sequence of GPS points, L= {L1,L2,. . . ,Ln}. Each GPS point Li includes multiple vehicle attributes (see Fig. 3), Li= {ti,xi,yi,vi,hi}, i=1,2,. . . ,n, wheretireferstothetimestamptheGPSdataisuploaded,xi and yistandforthelongitudeandlatitudeofthevehicleat ti,respectively,videnotestheinstantspeed,andhi representstheheadingrangingfrom0◦ to360◦.Inparticular,hi=0◦ meanstheheadingdirectionisNorthandhi=90◦referstotheEast.
Inadditiontothetransportationdata L,somelimitedhistoricalbusinessdata B= {B1,B2,. . . ,BN}isalsoavailablefor thepurposeofanalyzingtheregularpatternsofbusinessaffairdistributions(see Fig.3),where Bj= {t∗j,x∗j,y∗j,δj}denotes the j-thbusinessdata,t∗j representsthetime whenthevehiclearrives atthe j-thbusinesspoint, x∗j and y∗j refer tothe longitudeandlatitude,andδj istheresidencetimeofvehicleatthe j-thbusinesspoint, j=1,2,. . . ,N.
3. Datasegmentation
In thissection, wepresenttheprocedureofsegmenting theGPS trajectorydatato findthe businesspoints.As shown inFig.4,firstly,theoutliersaredetectedandremoved inthedataprocessing.Secondly,thestoppingpointsofthevehicle are identifiedby analyzingthemovementstatusofthevehicleateach GPSpoint.Finally,probabilistic logicisintroduced to filterthebusinesspointsfromthestoppingpointsbycomprehensivelyconsideringtheelectroniclockstatus,electronic fuelingtankcapstatus,residencetime,residencelocationandtrajectory.
3.1. Dataprocessing
GPS receiver recordsanduploads datafrequently, andthiswillinevitablygeneratesome outliers.Outliersrefer tothe datathat aresignificantlydistantfromothersintermsofacertain similaritymetric.Therearetwo kindsofoutlierscom- monlyemergedintheGPSdatastream.Thefirstisspeedoutlier.IfthespeedataGPSpointiszero,whilethevehiclekeeps movingforaconsecutivetimeperiodbeforeandafterthispoint,thenthespeedatthisinstantisanoutlier.Similarly,ifthe speed isnonzerowhile thevehiclestays stillfora consecutivetimeperiodbefore andafterthispoint,thenit isanother kind ofspeed outlier(see Fig.5). Todeal withtheseoutliers,a popularmethod isto replacethe outlierspeed withthe averageofthespeedbeforeandafterthisGPSpoint.Forexample,ifvi=0,vi−1>0 and vi+1>0,thenvi=0 isreplaced by vi=(vi−1+vi+1)/2.The secondislocation outlier.Itisgenerallyassumedthatthe movingdistanceofavehicle ina shorttimeintervalislessthanacertainthresholdduetothelimitedspeed.IfthedistanceoftwoconsecutiveGPSpointis farbeyondthedistancethevehicletravelsatthemaximumspeed,then itisdeemedasalocationoutlier,whichexhibits anabruptjumpinthetrajectory(see Fig.6).Usuallywedeletethisoutliertoachieveasmoothtrajectory.
Fig. 4.Framework for GPS trajectory data segmentation.
Fig. 5.Speed outlier. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)
Fig. 6.Location outlier.
3.2. Stoppingpointsdetection
Thestoppingpointdetectionisdesignedforidentifyingpotentialtripendswithinthedatastreambysearchingfortime periodsofnonmovement.Onatwo-dimensionalplane,wefirstsequentiallyconnectthediscretepointstoformatrajectory, andthendivide thetrajectoryinto tripsifthetime interval ofnonmovementexceedsa certainthreshold.A GPSpoint is deemed as a trip endpoint if the vehicle keepsmoving fora consecutive time period before thispoint and then stays nonmovementforaconsecutivetimeperiodafterthispoint.Forexample,Table1providesaGPStrajectorydatasequence.
Itisseenthatthevehiclestopsat2014/5/182:26:19,staysnonmovementfor438 sandthenstartsanewtrip.
232 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Table 1
GPStrajectorydata.
t x y h v
2014/5/18 2:25:14 121.368100 31.250116 121 31 2014/5/18 2:25:26 121.368100 31.249650 127 20 2014/5/18 2:25:43 121.368966 31.249166 120 25 2014/5/18 2:25:57 121.369833 31.248700 120 22
2014/5/18 2:26:19 121.370750 31.248383 0 0
2014/5/18 2:29:12 121.371233 31.248383 0 0
2014/5/18 2:32:12 121.371266 31.248366 0 0
2014/5/18 2:32:19 121.371283 31.248366 0 0
2014/5/18 2:33:37 121.371283 31.248366 0 0
2014/5/18 2:34:04 121.371600 31.248050 171 12 2014/5/18 2:34:21 121.371400 31.247233 193 25 2014/5/18 2:34:22 121.371383 31.247166 193 25 2014/5/18 2:34:35 121.371116 31.246283 196 27 2014/5/18 2:34:47 121.370883 31.245383 193 29
This step produces a trip-end file that contains all the stopping points coordinate information,including time, longi- tude andlatitude.Inaddition, theresidencetime ofeach stoppingpoint can alsobe obtainedby computingthetime of nonmovement.
3.3. Businesspointidentificationbasedonprobabilisticlogic
Allthestoppingpointsaredetectedaftertheaboveprocedures.Tofurtherfilterthebusinesspointsfromthesestopping points, we sequentially analyzethe electronic lockstatus, fueling tank capstatus, residencetime, residencelocation and trajectoryofeachstoppingpoint.
Electroniclockandfuelingtankcap
Foreach stopping point,we firstcheck thestatus oftheelectronic lockandfueling tankcap. Ifthereis anunlocking recordintheelectroniclock,thenitisdefinitelyabusinesspointasthelockcanonlybeunlockedwhencargodischarging takesplace.Incasethatnounlockingrecordisavailable,weturntoidentifythefuelingtankcapstatus.Ifthefuelingtank capisunlocked,itcanbeinferredthatthevehicleliesinagasstationwherethevehicleisrefueling.
Businesstimewindow
Incasethatnorecordinbothoftheelectroniclockandfuelingtankcapisavailable,weconstructbusinesstimewindow based onhistoricalbusiness datatoanalyzethe relationshipbetweenvehicleresidencetime andbusinessaffair.Suppose that the residence time is subject to a certain kind of probability distribution. We studythe statistics of the residence timeofhistoricalbusinesspointandformulatethefrequencydistributionfunction.Denotetheresidencetimeofvehiclein the i-thbusinesspointasδi.Thestepsareasfollows:Firstly,reorderδi inan increasingorderasδ(1),δ(2),. . . ,δ(N),where δ(1)=min{δ1,δ2,. . . ,δN}andδ(N)=max{δ1,δ2,. . . ,δN}.Secondly,divide[δ(1),δ(N)]intomidenticalintervals:[δ(1),δ(1)+
σ
), [δ(1)+σ
,δ(1)+2σ
),. . .,[δ(1)+(m−1)σ
,δ(N)],whereσ
=(δ(N)−δ(1))/m.Finally,weconstructthestatisticsofthefrequency ofstoppingpointswhoseresidencetimeliesin[δ(1)+iσ
,δ(1)+(i+1)σ
),i=0,1,. . . ,m−1,anddenoteitas fi.Thenstudy thestatisticsofthefrequencyofbusinesspointswhoseresidencetimeisin[δ(1)+iσ
,δ(1)+(i+1)σ
)anddenoteitas fi∗. Figs. 7and8give thehistogramoffrequencyanddistributionfunction,wherethe redhistogramrepresentsthebusiness pointsfrequencyandtheblueonedenotesallthestoppingpointsfrequency.ThedistributionfunctionisF
(
t) =
⎧ ⎨
⎩
fi∗fi
,
ifδ
(1)+
iσ ≤
t< δ
(1)+ (
i+
1) σ ,
i=
0,
1, . . . ,
m−
1,
0,
otherwise.
(1)
For each point,its probability ofresidencetime satisfying thebusiness demand (the criterion we adoptforclassifying a businesspoint)is F(t)iftheresidencetimeist.
Electronicfence
Electronicfenceisconstructedtoanalyzetherelationshipbetweentheresidencelocationandbusinessaffair.Byplotting the historical business dataon thetwo-dimensional plane, it isfound that thebusiness points are centered around sev- eralfixedpositions,ofwhichthedensityismuchhighercomparedwithotherpositionsontheplane.Therefore,weemploy DBSCANtoclusterthesehistoricalbusinesspoints.Foreachcluster,anelectronicfenceisbuilttolocatetheareawherebusi- nessaffairusuallytakesplace(see Fig.9).Wecanidentifywhetherastoppingpointisabusinesspointbyelectronicfence.
Ifthestoppingpointliesintheelectronicfence,thenitisprobablyabusinesspoint.However,duetotheexistenceofdata perturbation,thelongitudeandlatitudeofastoppingpointmayvaryslightlyduringthenonmovementperiod.Forexample, itfollowsfromTable1thatthevehiclekeepsnonmovementfrom2:26:19to2:33:37,butitslocationsareslightlydifferent.
To overcomethisdeficiency,we dividethe varyinglocations ofeachstopping pointinto twogroups: (1) locationsinthe
Fig. 7.Histogram of frequency.
Fig. 8.Frequency distribution function.
Fig. 9.Electronic fences.
Fig. 10.Stopping point data perturbation.
electronicfenceand(2) locationsoutsidetheelectronicfence(see Fig.10).Assumethatthefrequenciesofthosetwogroups are finand fout,respectively.Thentheprobabilityforthelocationsatisfyingthebusinessdemandcanbeestimatedby
F
=
fin(
fin+
fout) .
(2)234 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Fig. 11.Special trajectory 1.
Fig. 12.Special trajectory 2.
Trajectoryidentification
Somespecialtrajectoriescanbeutilizedtodeterminewhetherastoppingpointisabusinesspoint.Therearetwospecial trajectoriesconsidered.ThefirstisillustratedinFig.11whichprovidesatrajectoryfromA toB.Thevehicletakesthepath A→C→D→Bratherthan A→B thoughthereisadirectpath.Therefore,thestoppingpoint E inpathC→D islikely abusinesspoint.AnothertypicaltrajectoryisshowninFig.12,whereavehicleisdrivingstraightlyalongtheroadfromF to H.Atthecrossroad, itturnsto theright, stopsatG forawhile,andreturnsby thesamewayitcame.Then stopping point G islikelytobeabusinesspoint.Therefore,wecananalyzetheheadingchangesofthevehiclebeforeandafterthe stopping point to identifywhether its trajectorysatisfies the business demand.If thetrajectory matches one ofthe two specialtrajectories,thenitisprobablyabusinesspoint.
To find the business points accurately and estimate the corresponding probabilities, we introduce probabilistic logic intodatasegmentationprocess.ProbabilisticlogicwasfirstproposedbyNilssonwhopresentedaprocedureforcomputing probabilistic entailment [32],which inspiresmanysubsequentprobabilistic logic studies[8,12,19,21,31,38].Givenaset of propositionsandtheirassociatedprobabilities,thisprocedurecomputesarangeofprobabilitieswithinwhichtheprobability ofsome giventargetpropositionlies.The procedureoperatesbyfirstfinding allconsistentassignments oftruthvaluesto thegivenpropositionsandthenusingtheseassignmentstosetupasystemoflinearequationsthatcanbesolved.
Inviewoftheabovefiveattributes:electroniclockstatus,fuelingtankcapstatus,residencetime,locationandtrajectory, sixpropositionsaresetandsevenlogicformulaeareproposedtodeterminewhetherthestoppingpointsarebusinesspoints.
Propositions
• P1:Thereisanunlockingrecordintheelectroniclock
• P2:Thereisanunlockingrecordintheelectronicfuelingtankcap
• P3:Theresidencetimesatisfiesthebusinessdemand
• P4:Thelocationsatisfiesthebusinessdemand
• P5:Thetrajectorysatisfiesthebusinessdemand
• P6:Thestoppingpointisabusinesspoint
Fig. 13.Semantic tree.
Logicformulae
• P1→P6:Thestoppingpointiscertainlyabusinesspointifthereisanunlockingrecordintheelectroniclock
• P2→ ¬P6:Thestoppingpointisnotabusinesspointifthereisarecordintheelectronicfuelingtankcap
• ¬P1∧ ¬P2∧ ¬P3→ ¬P6: The stopping point is not a business point ifthe corresponding residencetime doesnot satisfythebusinessdemand
• ¬P1∧¬P2∧¬P4→ ¬P6:Thestoppingpointisnotabusinesspointifitslocationdoesnotsatisfythebusinessdemand
• ¬P1∧ ¬P2∧P3∧P4→P6:Thestoppingpoint isabusinesspointifitsresidencetime andlocation bothsatisfythe businessdemand
• ¬P1∧ ¬P2∧P3∧P5→P6:Thestoppingpointisabusinesspointiftheresidencetimeandtrajectorybothsatisfythe businessdemand
• ¬P1∧ ¬P2∧P4∧P5→P6:Thestoppingpointisabusinesspointifitslocationandtrajectorybothsatisfythebusiness demand
Forany proposition Pi,i=1,2,. . . ,6, there are two possiblesituations calledatoms: Wi1 where Pi is trueand Wi2 where Pi isfalse.Therealityworldmustbeinoneofthem,butwemightnotknowwhichone. Supposethat T r(.)isan assignmentfunctionfrompropositionssetS to{0,1},whereS= {P1,P2,P3,P4,P5,P6}.ForanyPi∈S,
T r
(
Pi) =
1,
ifPiis true,
0
,
otherwise.Basedontheabovesixpropositions,a binarysemantictreecanbeconstructedtodescribeallthe26 setsofatoms(see Fig.13).Ateachnodeofthetree,webranchleftifthepropositionistrueandbranchrightifthepropositionisfalse.The truthvaluesofthesixpropositionsareshowninTable2,ofwhicheachcolumnrepresentsapossibleatomwithaunique setoftruthvaluesforallthesixpropositions.
Although Table 2provides 64sets of atoms, thereare actually fewer possible atomsdue to some of the truevalues for our 6 propositions are logically inconsistent. Several different kinds of inconsistencies are detected in Table 2. The first is the inconsistency between P2→ ¬P6 and P1→P6, which is marked with a “×”. Take [1,1,1,1,1,0]T as an example,it means that theelectronic lockandthe fueling tank capare unlocked simultaneously, which isimpossiblein reality.ThesecondistheinconsistencywithP1→P6 whichismarkedwitha“×”.Forexample,thetruthvaluesofthesix sentences[1,0,1,1,1,0]T areinconsistentsincethestoppingpointisdefinitelyabusinesspointwhenthereisanunlocking recordintheelectroniclock.Thethirdinconsistencyismarkedwitha“×”,whichcontradictswithP2→ ¬P6.Someother inconsistenciesduetothecontradictionwiththeproposedlogicformulaearemarkedwithdifferentcolors.Finally,24sets ofconsistentatomsaremarkedwitha“√
”,ofwhichthetruthvaluematrixisexpressedasfollows:
V
=
⎛
⎜ ⎜
⎜ ⎜
⎜ ⎝
1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 0 0 1 0 0 0
⎞
⎟ ⎟
⎟ ⎟
⎟ ⎠ ,
whereVi j representstheconsistenttruthvalueofPi in j-thpossibleatom,i=1,2,. . . ,6,j=1,2,. . . ,24.
236S.Guoetal./InternationalJournalofApproximateReasoning103(2018)227–247
Table 2
Truthvalueassignmentindifferentpossibleatoms.
P1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
P3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
P4 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
P5 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
P6 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
× × × × × × × × √ × √ × √ × √ × × √ × √ × √ × √ √ × √ × √ × × √
P1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
P3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P4 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
P5 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
P6 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
× × × × × × × × √ × √ × √ × √ × × √ × √ × √ × √ √ × × √ × √ × √
pi
=
j=1
Vi jqj
,
i=
1,
2, . . . ,
6,
(4)which impliesthat the probability of a propositionis the sumof theprobabilities for thepossible atoms inwhich that propositionistrue.
Foreachstoppingpoint,theprobabilitiesofthefirstfivepropositionsp1,p2,p3,p4andp5canbeestimatedbyanalyzing itsGPStrajectorydataandhistoricalbusinessdata.Here p1,p2and p5 arebinaryvariablestakingvaluesin{0,1}.Denote p1=1 ifthere isan unlocking recordin theelectronic lock, otherwise, p1=0.Similarly, p2=1 if thereisan unlocking recordinthefuelingtankcap,otherwise,p2=0.p5=1 ifthetrajectory satisfiesthebusinessdemand,otherwise, p5=0.
Furthermore, p3 andp4 takevaluesininterval[0,1],whichcanbe calculatedaccordingtoEqs. (1) and(2).Ourobjective istocompute p6todeterminewhetherthisstoppingpointisabusinesspoint.FollowingfromEq. (4),itisknownthat p6 canbeuniquelydeterminedonceavectorofpossibleatomsq= [q1,q2,. . . ,q24]T isselected.However,thevectorqisnot unique.Therefore,a maximumentropymodelisformulatedtoselecttheoptimalvectorq.ˆ
⎧ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎨
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎩
max
−
24j=1
qjlnqj
24j=1Vi jqj
=
pi,
i=
1,
2, . . . ,
5,
q1+
q2+ . . . +
q24=
1,
0
≤
qj≤
1,
j=
1,
2, . . . ,
24,
(5)
wherethefirstconstraintensuresthatvectorqsatisfiesVq=p,thesecondconstraintimpliesthatthesumofprobabilities forallthepossibleatomsequalsto1,andthethirdconstraintgivesthelowerandupperboundsofqj,j=1,2,. . . ,24.
4. ModelsolvingbasedonNewton’smethod
Inthissection,wefirsttransformthemaximumentropyproblemintoitsdualproblem,andthenapplyNewton’smethod to solving the problem. There are two major reasonsfor conducting the transformation.First, the original problem is a maximizationproblemwithconstraintswhilethedualproblemisaminimizationproblemwithoutconstraints.Second,the numberof variablesin thedual problemissignificantly lessthanthe original problem, thereforethecomputational cost canbesubstantiallysaved.Theabovemaximumentropymodelisexpressedas
⎧ ⎪
⎪ ⎪
⎨
⎪ ⎪
⎪ ⎩
max
−
24j=1
qjlnqj
Vq
¯ = ¯
p,
0
≤
qj,
j=
1,
2, . . . ,
24,
(6)
where p¯ = [p1,p2,p3,p4,p5,1]T and V¯ composes ofthe first five rowsin V anda vector ofall ones. Remark that the constraintqj≤1 canbediscardedaswerequirethatq1+q2+ · · · +q24=1,and0≤qj, j=1,2,. . . ,24.
V
¯ =
⎛
⎜ ⎜
⎜ ⎜
⎜ ⎝
1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
⎞
⎟ ⎟
⎟ ⎟
⎟ ⎠ .
Asshownin[11],thedualproblemof(6) is
minλ max
q
⎧ ⎨
⎩ −
24j=1
qjlnqj
+ λ
T(
p¯ − ¯
Vq)
⎫ ⎬
⎭ ,
(7)238 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Algorithm4.1Newton’sMethod.
Step1.Initializeaninitialsolutionλ0,andasmallenoughpositivethresholdε.Setk=0.
Step2.Computethegradient∇f(λ(k))andHessian∇2f(λ(k)). Step3.Deriveλ(k+1)accordingtoiterationformula:
λ(k+1)=λ(k)−
∇2f(λ(k))−1
∇f(λ(k)). Step4.If ∇f(λ(k+1))<,thenλ∗=λ(k+1);
otherwise,setk=k+1 andreturntoStep2.
Step5.Outputtheoptimalsolutionλ∗.
whereλ= [λ1,λ2,λ3,λ4,λ5,λ6]T isthemultiplier.Thentheoptimalsolutionq∗ canbeobtainedbysolvingtheequations:
∇
qjL(
q, λ) = −
lnqj−
1− λ
TV¯
j=
0,
j=
1,
2, . . . ,
24,
(8)whereV¯jrepresentsthe j-thcolumnofV.¯ Thus,theoptimalsolutionis
q∗j
=
e−1−λTV¯j>
0,
j=
1,
2, . . . ,
24.
(9)SubstituteEq. (9) intoEq. (7),wehave
minλ
⎧ ⎨
⎩
24j=1
e−1−λTV¯j
+ λ
Tp¯
⎫ ⎬
⎭ .
(10)Then Model(6) istransformedintoModel(10),whichcanbesolvedbytheNewton’smethodbyiteratingthemultipli- ersλ.NotethatProblem(10) isaminimization problemwithoutconstraintsandthenumberofvariablesisreducedfrom 24 (originalproblem)to6.Denote
f
(λ) =
24j=1
e−1−λTV¯j
+ λ
Tp¯ .
Thenthegradientof f(λ)canbeexpressedas
∇
f(λ) = − ¯
Vq∗+ ¯
p,
andtheHessianof f(λ)isexpressedas
∇
2f(λ) = ¯
Vdiag(
q∗)
V¯
T,
where diag(q∗)isa diagonalmatrixwithdiagonalentriesq∗.Itisclearthat ∇2f(λ)isapositive definitematrixbecause all the entries of q∗ are positive and V¯ is a full rank matrix. Suppose that λ(k) is the k-th iteration solution of (10), k=0,1,2,. . .,whereλ(0)istheinitialsolution.TheNewton’siterationformulaisgivenasfollows:
λ
(k+1)= λ
(k)−
∇
2f(λ
(k))
−1∇
f(λ
(k)).
(11)Based ontheabove iterationformula, a sequenceofpoints λk are generateduntilthenorm ofgradient∇f(λk)=0 or lessthanagivensmallenoughthreshold .Inournumericalexperiment,weset =0.0001.Thentheoptimalsolutionλ∗ isobtained.ThestepsareshowninAlgorithm4.1.
5. Numericalexample
Thissection presentsarealGPS trajectorydatasegmentationproblemtovalidate theeffectivenessofPLDSM.Bycom- prehensivelyanalyzingtheGPStrajectorydataandhistoricalbusinessaffairrecords,allthebusinesspoints arefilteredout and thebusiness affaircategoriesare inferred by PLDSM.As acomparison, a widely appliedmachine learning technique KNN isusedtotacklewiththesametrajectorydatasegmentationproblem.TheresultsrevealthatPLDSMperformsbetter ineffectivenessandpracticability.
Example5.1.Suppose thatthe fourthparty logisticsis greatlyconcerned aboutthebusiness affairinformationoftheco- operativethirdparty logisticsvehiclewithID 238245,whichtransportsvarious cargoestodifferentretailingdepotsevery dayinShanghai.Everytime thereisacargodischarging,a businessaffairiscompleted,andthenthevehicledriverwould recordthebusinessaffairinformationincludingthearrivaltime,departuretimeandlocation.Thefourthpartylogisticshas no accessto all thebusiness affair informationexceptsome limited historicalbusiness datafrom 2014/5/1000:00:00to 2014/5/1724:00:00derivedbysurveyingthedriverofthisvehicle(see Appendix).Inthemeanwhile,alltheGPStrajectory dataisavailablebytheGPSreceiverinstalledinthisvehicle.ToverifytheeffectivenessofPLDSM,wetaketheGPStrajec- torydataandhistoricalbusinessdatafrom2014/5/1000:00:00to2014/5/1724:00:00asthetrainingdatasettoconstructa
2700≤t<3000 0 0 0.0000
3000≤t<3300 1 1 1.0000
3300≤t<3600 1 1 1.0000
t≥3600 0 7 0.0000
Fig. 14.Clusters of historical business data.
trajectorydatasegmentationmechanism,thenapplyittosegmentanewGPSdatasequenceofthisvehiclefrom2014/5/18 00:00:00to2014/5/1924:00:00andcomparetheresultswiththesurveyedbusinessaffairinformation.
In the training stage, we firstly detect all the stopping points from 2014/5/10 00:00:00 to 2014/5/17 24:00:00, and statisticthe frequencyofstoppingpoints andbusiness points.According toEq. (1),the distributionfunction ofresidence timeisobtained(see Table3above).
Secondly, the electronic fence is built by using DBSCAN to cluster the historical business data. Remark that there are 36 available business points (see Appendix). However, due to the data perturbation caused by the inexact posi- tioning, each business point has multiple slightly different locations. To fully use the business affair information and achieve a better efficiency of clustering, we take all the varying locations into consideration to cluster the business points. Set the parameters in DBSCAN as follows:
ε
=0.003 and Minpts=3. Then the business points are divided intoseven clusters:C1= {1,7,13,16,19,25,30,34}, C2= {2,8,20,26,31,35},C3= {3},C4= {4,10,14,17,22,27,32,36}, C5= {6,11,18,23,28,33}, C6= {12,24,28} and C7= {5,9,15,21} (see Fig. 14). Foreach cluster, the electronic fence is built(see Fig.15).Aftertheabovetrainingstagewheretheresidencetimefrequencydistributionandelectronicfencearebuilt,westartto segmentthedatasequencefrom2014/5/1800:00:00to2014/5/1924:00:00.Accordingtothedatasegmentationprocedure inFig.4,wefirstlydetectallthestoppingpointsinthisdatastream.AsshowninAppendix,theminimumresidencetime ofall the businesspoints is 605 s. Therefore,the thresholdinthe stoppingpoint detectionprocess is setas605 ssince thestoppingpointcannot beabusiness pointifits residencetime islessthan 605 s.Bythestoppingpoint identification process,14stoppingpointsareobtained.Thenweestimatetheprobabilityofresidencetimesatisfyingthebusinessdemand p3 (see Table 4).Secondly,checktheelectroniclockstatusandfueling tankcapstatusofeachstoppingpoint.Forvehicle 238245,noelectroniclockdataandelectronicfuelingdataisavailable,thenthevaluesofp1 andp2arezeros.
Inthenextstep,foreachstoppingpoint,wecalculateitsprobabilityofresidencelocationsatisfyingthebusinessdemand (see p4inTable4).Itcanbeseenclearlythatthelocationsofsevenstoppingpointslieintheelectronicfence(see Fig.16).
Takestoppingpoint3asanexample,duetodataperturbation,ithas18varyinglocations,16ofwhichlieintheelectronic fence,and2lieoutsidetheelectronicfence.Therefore, p4=16/18=0.8889.
240 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Fig. 15.Electronic fences of different clusters.
9 2423 0.0000 0.0000 1.0000 0.0000 1.0000 1.0000
10 1178 0.0000 0.0000 0.7273 0.0000 0.0000 0.0000
11 1777 0.0000 0.0000 0.5333 0.9167 1.0000 0.9616
12 723 0.0000 0.0000 0.6667 1.0000 1.0000 1.0000
13 1472 0.0000 0.0000 0.6000 1.0000 0.0000 0.6000
14 1079 0.0000 0.0000 0.7273 0.0000 0.0000 0.0000
Table 5
Theconfusionmatrix.
True class Recognized class Positive Negative
Positive T P F N
Negative F P T N
Finally, check whetherthe trajectory ateach stopping point satisfies the business demand. According to the heading change ofthe vehicle, it isfound that the trajectories ofstopping points 1, 2,3, 4,8, 9,11 and12 satisfy the business demand(see Fig.17).Takestoppingpoint1asanexample,atfirst,theheadingofvehicle238245isaround100◦.Atpoint A,thevehicleturnsleft withtheheadingchangingto 350◦.AfterarrivingatB,itstopswithheadingchangingto0◦ and thenstaysunmovingforacertainperiod.From B toC,theheadingchangesfrom0◦ to170◦,representingthatthevehicle completes au-turn.Finally,thevehicleturns leftagainatpoint D andthenleavesinthe directionof 90◦.Therefore,the trajectoryofstoppingpoint1satisfiesthebusinessdemand.Thetrajectoriesbeforeandafterthosestoppingpointscanbe seeninFig.18,wheretheredandgreencurvesdenotethetrajectorybeforeandafterthestoppingpointrespectively.
Afterderiving thecorresponding probabilities: p1,p2,p3,p4 and p5, thevalue of p6 isobtainedby solvingmodel(6) with Newton’s method. The last column of Table 4 lists the final probabilities of these stopping points being business points. Assign the stopping points with p6≥0.5 asbusiness points, then stoppingpoints 1,3, 4, 5,6, 8, 9,11, 12 and 13are recognized asbusinesspoints, wheretheonesmarked withredare truebusiness points inreality.There aretwo commonlyacceptedcriteriameasuringtheefficiencyofdatasegmentation:precisionandrecall,whicharedefinedbasedon thefollowingconfusionmatrix(see Table 5),whereT P denotestruepositivebusinesspoints,representingthenumberof realbusinesspointswhicharerecognizedcorrectly,F N referstothenumberofrealbusinesspointswronglyrecognizedas non-businesspoints, F P standsforthenumberofrealnon-businesspoints wronglyrecognizedasbusinesspointsandT N representsthenumberofrealnon-businesspointsrecognizedcorrectly.Theprecision(P re) andrecall(Rec)areexpressed asfollows:
P re
=
T PT P
+
F P,
Rec=
T PT P
+
F N.
(12)Accordingto Table4 andEq. (12),the precision forPLDSMis70%, andtherecall is100%. As comparison,we employ the mostwidely applied machine learning technique KNN to filterout the business points fromthe 14 stopping points inTable4.Usingthesametraining datasetincludinglongitudex∗,latitude y∗,residencetime δ andtrajectory p5,a KNN classifierisconstructedbyemployingtheEuclideandistanceandsettingparameterK=3.Thesegmentationresultisshown inthelastcolumnofTable6,where1impliesthatthestoppingpointisrecognizedasbusinesspointand0representsthat thestoppingpointisrecognizedasnon-businesspoint.
Theresultsin Table6reveal thatstoppingpoints 3,5,6,11,12,13 and14are recognizedasbusiness pointsby KNN while the real business points are 3, 4, 5, 6, 11,12 and 13. Therefore, the precision and recall for KNN are both 86%.
Although KNN performs better in precision compared with PLDSM, it cannot filter out all the real business points, i.e., businesspoint 4.Incontrast,therecallofPLDSMis100%,representingPLDSMisabletofindallthebusinesspointsfrom the data sequence. In real GPS trajectory data segmentation application, the fourthparty logistics concerns most about the recall ofsegmentation rather thanprecision sinceomitting anypotential business point could significantly influence the inference over the whole business affair andsometimes even lead to a wrong judgment in business affair analysis.
Therefore,achievingalargerrecallisthemostimportantobjectivetocoveralltherealbusinesspoints.Incasethatallthe realbusinesspointsarecovered,weturntothecomparisoninprecision.
242 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Fig. 16.Stopping points in the electronic fence.
Fig. 17.Heading changes at different stopping points.
244 S. Guo et al. / International Journal of Approximate Reasoning 103 (2018) 227–247
Fig. 18.Trajectories before and after stopping points.
9 121.357350 31.253650 2423 1.0000 0.0000
10 121.370883 31.248650 1178 0.0000 0.0000
11 121.389716 31.224650 1777 1.0000 1.0000
12 121.390516 31.241400 723 1.0000 1.0000
13 121.427616 31.253100 1472 0.0000 1.0000
14 121.394250 31.245616 1079 0.0000 1.0000
Fig. 19.Location for stopping point 1 and 8.
Fig. 20.Location for stopping point 9.
In addition, it follows from Table 4 that stopping points 1, 8 and9 are “wrongly” recognized as business points by PLDSM,whichare not theknown businesspoints according tothe surveyedbusinessdata.Then we check theresidence timeandresidencelocationsofthesethreepoints,andstudytheirGPStrajectory.Interestingly,itisfoundthatthisvehicle stopsaround these locationsfora certain period every day.Stopping points 1 and8are in the samelocation shownin Fig.19,whichisnearbyseveralmarkets, squaresandmalls.Everyday,thevehicledrivestothislocation,stays foraround 20 minutes,thentakesau-turnwithachangeof180indirectionandleaves.Asregard tostoppingpoint9,wesearchits location(see Fig.20)byGooglemap,andfindthatitisclosetoasupermarketRTMart.TheGPStrajectorydatarevealsthat thisvehicleusuallystopsatthisareaataround2:00:00,staysfor30minutesandleaves.Itisknownthatthereplenishment ofsupermarketsisconductedduringthemidnighttoavoidthenegativeimpactontheshoppingactivitiesofcustomersin thedaytime.Andfreshfoodsuchasmeat,vegetablesandfruitsareusually pursuedandreplenishedintheearlymorning toensurethefreshnessoffood.Therefore,accordingtothetimeandlocation,wecaninferthatstoppingpoints1,8and9 areprobablybusinesspoints,andthisvehicleisintendedforreplenishmentofmallsandsupermarkets.Thecorecauseof thesethreeexceptionsistheincompleteness ofbusiness dataduetosome particularbusinessaffairsmaybe deliberately concealed by the driver. Then the real precision for PLDSM is actually 100% since all the business points are detected,