Table Of ContentUsing Deep Data Augmentation Training to Address Software and Hardware
Heterogeneities in Wearable and Smartphone Sensing Devices
AkhilMathur‡†,TianlinZhang∗,SouravBhattacharya‡,PetarVeličković∗,LeonidJoffe†
NicholasD.Lane‡⋄,FahimKawsar‡,PietroLió∗
‡NokiaBellLabs,†UniversityCollegeLondon,∗UniversityofCambridge,⋄UniversityofOxford
ABSTRACT Conventionalsolutionstoheterogeneityrequireaddress-
Asmallvariationinmobilehardwareandsoftwarecanpotentially ingindividualsources;forexample,providingtheOSwith
causeasignificantheterogeneityorvariationinthesensordata morereal-timesystembehavior,manufacturingdevicesina
eachdevicecollects.Forexample,themicrophoneandaccelerom- morepreciseanduniformmanner,orswitchingtoreliable
etersensorsondifferentdevicescanrespondverydifferentlyto (butexpensive)sensors.Althoughpossible,thisdirectionis
thesameaudioormotionphenomena.Otherfactors,likethein- notalwaysappropriate.Consumerwearablesandphones
stantaneouscomputationalloadonasmartphone,cancausekey needtobemulti-purposeandlow-cost,whichisatoddswith
behaviorlikesensorsamplingratestofluctuate,furtherpolluting
solutionsthatrequirethemtobehigh-precisioninstruments.
thedata.Whensensingdevicesaredeployedinunconstrainedand
In this paper, we investigate how deep learning algo-
real-worldconditions,examplesofsharplylowerclassificationac-
rithms[15]canbeusedtobuildmodelsofcontextandac-
curacyareobservedduetowhatiscollectivelyknownasthesensing
tivity that are significantly more robust to forms of sen-
systemheterogeneity.Inthiswork,wetakeanunconventionalap-
sorheterogeneitythanexistingmodels.Tothebestofour
proachandargueagainstsolvingindividualformsofheterogeneity,
e.g.,improvingOSbehavior,orthequality/uniformityofcompo- knowledge,ourworkisthefirstwheretwokeyelements
nents.Instead,weproposeandbuildclassifiersthatthemselvesare ofdeeplearning–specificallyrepresentationlearning and
moretolerantofthesevariationsbyleveragingdeeplearninganda dataaugmentationhavebeenstudiedintermsofhowthey
data-augmentedtrainingprocess.Neitheraugmentationnordeep canaddresssuchheterogeneitychallengespresentinmobile
learninghaspreviouslybeenattemptedtocopewithsensorhet- sensingsystems.Dataaugmentationistheprocessofsystem-
erogeneity.Wesystematicallyinvestigatehowthesetwomachine aticallyenrichingdataduringthetrainingprocesstoenable
learningmethodologiescanbeadaptedtosolvesuchproblems,and
a classifier to become more robust to expected noise. For
identifywhenandwheretheyareabletobesuccessful.Wefind
example,deepaudiomodelsusedataaugmentationtocom-
thatourproposedapproachisabletoreduceclassifiererrorson
binebackgroundnoisetoallowspeechrecognitionmodels
anaverageby9%and17%forarangeofinertial-andaudio-based
tooperateinthewild.Weexamineifthesesameprinciples
mobileclassificationtasks.
will extend to the types of noise caused by heterogeneity.
1 INTRODUCTION Representation learning is a characteristic of deep neural
networksthatallowthemtoworkhand-in-handwithdata
Mobilesensingsystemsareenteringanimportantnewphase;
augmentation.Itistheprocessbywhichmultiplelayersof
theyarebeginningtoscaletohundredsofthousands,and
theneuralnetworkcanlearncustomfeatures(intermediate
evenmillionsofactiveusers.However,suchsuccesshigh-
representations)thatarebothdiscriminativeandrobustto
lightsanimportantnewproblemfortheaccuracyofclassi-
thenoisyexamplesobservedattrainingtime.Bystudying
fiersthatprocesssensordataanddetectcontextandactivities
how these two approaches can be used in relation to the
suchassleep[30],exercise[37]andtransportationmode[16].
brandnewproblemdomainofsensorheterogeneity,weex-
Variationsinthesoftwareandhardwareofend-devicesare
ploreifthiswouldleadtothedevelopmentofmorerobust
beingfoundtocausesignificantunexpectedvariationsin
mobilesensingclassifiers.
thesensordatatheycollect[14,17,25].Variationsinclude
Theresultofourstudyisadeepneuralnetworkframe-
samplingrates(differingfromtherequest)andeventhesensi-
workthatintegratesnewtrainingphasesdesignedtodimin-
tivityandresponsedifferencesofthesensoritself;suchdiffer-
ishthechallengesofheterogeneitythroughlearnedfeature
encesareobservedtoevenalterthecoefficientsoffeaturesex-
representationlayers.Becausedeeplearningcanintroduce
tractedfromthesensordatabyappreciableamounts[14,25].
additionalamountsofcomputationalcomplexity[36],we
Asaresult,mobilesensingclassifiers,whichareunprepared
todealwithsuchnoiseandheterogeneity1aresufferingfrom alsoevaluatethesystemoverheadofourmodels;thesere-
sults indicate that by keeping model architectures within
significantdropsinaccuracyandreliabilityin-the-wild.
reasonablelimits,deepmodelscanremainwithinacceptable
levelsofresourceconsumptionforevenwearables,while
1 Throughoutthispaper,weusetheterm“heterogeneity”torefertothe
varioushardwareandsoftwareissuesthatcauseAPIreturnedsensor stillmaintainingappreciabletolerancetoheterogeneity.
datatovaryfromdevice-to-device.
Thecontributionsofthisworkinclude:
• Theproposaltoframethechallengeofsoftwareandhard-
wareheterogeneityindevicesasaformofrepresentational
learning combined with data augmentation at training;
along with the execution of a systematic study of how
thesetwoapproachescanbeadaptedforuseinthisnew
problemdomain.
• Thedevelopmentofaframeworkfordeepclassifiersde-
signedtoaddressheterogeneityconcerns.Thisframework
includesthenovelphasesofaheterogeneitygeneratorand
heterogeneitypipelinethatenableconventionallabeled
data,alongwithactualexamplesofheterogeneousbehavior,
toactastraininginputs.
• Acomprehensiveevaluationofourframeworktocombat
Figure1:Mel-scaleAudioSpectrogramsfrom3devicesthatallcap-
formsofsoftwareandhardwareheterogeneityintwoclas- turethesamespeechaudio.Subtlevariationsacrossthedevicescan
sificationscenarios:audio-sensingandinertial-sensing. beclearlyseeninthehighlightedboxes.
• Ahardwareanalysisoftheresultingdeepmodelswhich
shows thattheirenergy, memory,and runtimerequire- ofmicrophonesfromdifferentdeviceswhentheyarepro-
mentsdonotoverwhelmwearable/mobiledevices. videdthesameaudioinputundersametheenvironmental
conditions.Figure1showstheaudiospectrogramsfora3-
second audio clip recorded by multiple smartphones and
2 CHALLENGESOFHETEROGENEITY
smartwatches.Asobserved,themel-scale[18]spectrograms
Despite the extensive work on building sensor inference
show subtle variations across devices for the same audio
modelsforavarietyofactivityandcontentrecognitiontasks,
input.Consequently,wefoundthatthesevariationsinthe
researchers have found that model accuracy significantly
frequencyspectrumarealsopropagatedtostate-of-the-art
worsenswhendeployed‘inthewild’[6,16].Thishaslargely
audiofeatures,suchasfilterbanksandMFCCs[18],which
beenattributedtothevariationsinusagebehavioranddevice
inturnreducestheaccuracyofaudioclassifiersbuiltontop
heterogeneities. While variations in usage behavior have
ofthesefeatures.
been studied and approaches to mitigate them have been
Next,weshowhowvaryingCPUloadscancausesampling
proposed [19], there has been considerably less focus on
rateinstabilitywithon-deviceaccelerometersusingthedata
addressingsoftwareanddeviceheterogeneities.
providedbyauthorsof[25].Samplingratestabilityisdefined
Bothsoftwareandhardwarecomponentsofamobilede-
as the regularity of the timespan between two successive
vicecontributetotheheterogeneitiesinitssensingbehavior.
sensormeasurements.Weusetwometricstoevaluatethe
Mobiledevicesrunondifferentsoftwareplatformswhichdif-
samplingratestability:(a)creationtime:thetimestampat-
ferintermsofsensoravailability,APIs,samplingfrequency
tachedbytheOSordevicedrivertothesensorreading,and
andresolution.Forexample,[16]discussedhowvariations
(b)arrivaltime:thetimewhenthereadingisreceivedbythe
inGPSdutycyclinginAndroidandiOSAPIsadverselyaffect
applicationcode.Ideally,adeviceshouldhaveaconstantdif-
thedataqualityandtheperformanceofinferencemodels.
ferenceincreationandarrivaltimesbetweentwosuccessive
Similarly,[25]foundthatevenrun-timefactorswithinthe
sensorreadings.However,duetosupportformultitasking
samedevice,suchasinstantaneousI/OloadordelaysinOS-
onmodernsmartphones,theOSoftenprioritizesamongvar-
level timestamp attachment to sensor measurements, can
iousrunningapplicationsbasedontheirCPUload,whichin
leadtoanunstablesamplingrates.
turnmayaffecttheirsamplingratestability.
Lookingathardware-specificheterogeneities,ithasbeen
We tested a benchmarking application on four phones
foundthatimperfectionsduringthemanufacturingprocess
(twoLGNexus4,twoSamsungGalaxySPlus)undertwo
can cause subtle differences in the output of each sensor
differentconditions:(a)undernormalCPUload,and(b)un-
chip. On these lines, [14, 17] presented techniques to ex-
derexperimentallyinducedhighloads.Boththeconditions
ploittheimperfectionsinmobiledevicemicrophonesand
weretestedfor4minutes,andthebenchmarkingapplication
accelerometersasmeanstofingerprintandidentifyusers.
wasconfiguredtosampletheaccelerometeratthedevice’s
EmpiricialExamples.Wenowprovidetwoempiricalex- maximumrate.Figure2showsboxplotsofthecreationand
amplesthatconcretelydemonstrateheterogeneityinsensor arrivaltimeunderbothexperimentalconditions.Weobserve
measurements.First,wecomparethefrequencyresponses thatforLGNexus4,themediancreationandarrivaltime
2
befeasible.Toquantifythisaspect,wepresentanexperiment
in§5.3whichdemonstratesthelimitationsofmicrophone
calibrationtechniquesforheterogeneityuse-cases.Real-time
operatingsystems(RTOS)popularinembeddeddevices[2]
provideanotheralternativetoprocessdataasitcomesin,
without buffering delays. Ideally, they could be leveraged
forprovidingsensormeasurementswithveryhighsampling
rate stability. However, RTOS have poor support for mul-
titaskinganddoingbackgroundprocessing,whichmakes
themunsuitableforphonesandwearables.
Fromahardwareviewpoint,itispossibletodesignhigh
precisionmedical-gradedevices,whichprovideaccurateand
consistentmulti-modalsensormeasurementsoveraperiod
Figure2:Sensoramplingrateschangedramaticallywhenthedevice
isexposedtoanaverage(left)andahighCPUload(right).Thisis oftime.However,suchdevicesaretypicallymoreexpensive
duetotheOSbecomingtoobusyservicingotherrequeststhatit than modern smartphones or wearables. There are other
doesnotkeepupwithitsresponsibilitiestoservicethesensorAPI. affordablesensingdevicesavailableinthemarket(e.g.Acti-
graphsleepandactivitymonitors[1])–howeverthesede-
differencesincreasebynearly1.75xinthehighCPUload vicesfocusonbasicinferencetasksandrequirelessmulti-
condition.ForSamsungGalaxyS+,notonlydoweobserve tasking.Assuch,theydonotfacethechallengesofvarying
an increase in creation time difference, we also see a sig- CPUloadorplatformheterogeneityofconsumerwearables.
nificantly (16x) higher standard deviation in arrival time
ADeepDataAugmentationAlternative. Readilyused
differencesascomparedtothenormalCPUloadcondition.
methodsformodelingactivitiesandcontextonmobilede-
ThisexperimenthighlightsthatvariationsinCPUloadcan
vices are shallow (e.g., SVMs, CRFs, GMMs, single-layer
impact the sampling rate significantly, and moreover the
NNs). This term contrasts them with deep learning algo-
extentofsamplingrateinstabilityvariesacrossdevices.
rithms[7,15];aswewilldescribe,theabilityofdeepmeth-
Boththesetypesofheterogeneitieshavesignificantad-
odstolearn–purelyfromdata–multiplelayersoffeature
verseimpactsontheaccuracyofsensorclassifierstrained
representation,thatwhencombinedwithdataaugmentation,
onthisdata.In§5.2,weempiricallyquantifythisclassifier
helpsthemmeetthechallengesthatheterogeneitypresents.
accuracylossforavarietyofmodels.
Deep learning models have been shown to outperform
3 ADEEPLEARNINGAPPROACH alternativeshallowmodelsintherecentliterature[8,32,33].
In this section, we present conventional methods used to Moreimportantly,deeplearningmodelshavebeensuccessful
mitigatethechallengesofdeviceheterogeneityandcontrast incombatingnoiseintheunderlyingdata–especiallywhen
thesetoourdeeplearningbasedsolution. dataaugmentationmethodsareemployed.Inthecontextof
audiosensing,[22]showedthatDNNmodelsachievehigher
ConventionalSolutions. Inthecontextofactivityrecog-
accuraciesinnoisyenvironmentsoverconventionalmethods.
nition,[25]haveproposedanapproachofclusteringdevices
Strategiesfordataaugmentationforaudiostypicallyinclude
basedontheirrespectiveheterogeneitiesandtrainingatar-
mixinglabeledtrainingaudiowithexamplesofbackground
geted classifier for each identified cluster. We argue that
noiseorperturbationsofspokenwordthatcorrespondto
aclusteringsolutionisnotscalablebecauseitneedstobe
naturalvariationsinhowpeoplespeak.Similarly,invisual
repeatedforeachnewdevice,thuscreatinganexcessivede-
recognition,deepmodelshaveproventoberobustagainst
ploymentoverheadandaclouddependencyforupdates.Sen-
visual variations. Examples include facial alignment [44],
sorcalibrationisacommonlyusedapproachtocounterthe
scenerotation,andimagebackgrounds[31]–introducing
variationsacrossdiversesensors–e.g.,user-drivenandau-
suchdiversityattrainingtimethroughimagemanipulations,
tomaticmethodsaredescribedin[37]tocalibrateaccelerom-
rangingfromsimplerotationstothosebasedoncomplex3D
eterresponsesfromvarioustargetdevices.However,aswe
sceneandfacialmodeling.
showed in the previous section, heterogeneities in sensor
Inthispaper,weaimtousethischaracteristicofdeepmod-
data can even arise due to variation in the CPU loads on
elstoworkrobustlywithnoisydata,inparticularthrough
thesamedevice,andthesetypesofheterogeneitiesarenot
data augmentation, to tackle sensor data heterogeneities.
addressed by calibration. Similarly, for audio sensing, the
Thatis,deviseadeeplearningmodelthatcanlearnrepre-
frequencyspectrumofamicrophone’sresponsevariesnon-
sentationsofthesensormeasurementswhicharerobustto
linearlywiththesourceaudio–assuch,perfectlycalibrating
thesystematicnoiseintroducedbysoftwareandhardware,
themicrophoneresponsesagainstareferencedevicemaynot
3
Figure3:DeepLearningModelArchitecture
Figure4:Frameworkfortrainingdeepmodelsthataremoreresis-
andhenceitcanoutperformshallowmodelswhicharenot
tanttosoftwareandhardwareheterogeneity.Akeyinnovationis
knowntoadapttosuchheterogeneities.
the embedding of conventional training data with a learning al-
gorithm that regulates the amount heterogeneity examples that
BridgingDeepLearningandHeterogeneity.Asshown
areintroduced.Thisallowsthemulti-layerfeaturerepresentation
inFigure3,thearchitectureofadeepmodeliscomprisedof
learnedtoactnotonlyasdiscriminativefeaturesbutalsoberobust
aseriesoflayers,eachlayer,inturn,containsanumberof toformsofheterogeneity.
nodesthatassumeastatebasedonthestateofallnodesin
The training algorithm of the deep model is then able to
thepriorlayer.Thefirstlayer(theinputlayer)aresetbyraw,
exploitthisadditionalinformationthatsupplementsconven-
orlightlyprocessed,data;forexample,awindowofaudio
tionaltrainingdatatoproduceamodelthatisstillaccurately
oraccelerometerdatatobeclassifiedmayberepresented
abletorecognizethedesiredclassesofcontextandactivity;
bythecoefficientsofagenericFFTbank.Thelastlayer(the
butcritically,itisabletodosointhepresenceofadditional
outputlayer)containsnodesthatcorrespondtoinference
system-generatedheterogeneitywithinthesampleddata.
classes(forexample,acategoryofactivityorcontext).The
Theoutcomeofthetrainingprocess(i.e.,theworkflow
layersinbetweenthesetwoarehiddenlayers;theseplay
depicted in Figure 4) is a deep learning model able to be
thecriticalroleofcollectivelytransformingthestateofthe
inserted into any sensing application or wearable system
inputlayer(rawdata)intoaninferencebyactivatinganode
just like a conventional sensor classifier. It operates like
intheoutputlayer.
theseconventionaldesignsandtakesasinputthesensordata
Inthiswork,ouraimistoembedtheawarenessofkey
(thatisstillsubjecttoheterogeneityissues);andclassifies
typesofheterogeneitiesintothedeeplearningmodelduring
thesedataframesintotargetclasses,beforepassingthem
thetrainingphase.Oncethemodellearnstherepresentation
tothehostapplicationorsystem.Thekeydifferenceisthat
of different types of sensor heterogeneities, it can poten-
this model is transparently more tolerant to situations of
tiallyoutperformstate-of-the-artinferencemodelswhichdo
heterogeneity.
nottakesuchdevicevariationsintoaccount.Moreover,we
Belowwediscussthecorearchitecturalcomponentsand
arguethattypicaldeviceheterogeneitiessuchassampling
trainingworkflowdepictedinFigure4.Theindividualcom-
rate variations, or sensor amplitude differences are much
ponentsaredescribedinmoredetailin§4.3and§4.4
simpler,lower-dimensionalperturbationofthesignal,than
for instance, variations in acoustic environments or face TrainingandHeterogeneityDatasets.Inadditiontothe
alignments.Ifdeeplearningmodelshavebeensuccessfully typicaltrainingdatathatisrequiredtoconstructanactivity
trainedtoadapttothelattervariations,theyholdpromise or context model (training data), our framework requires
inthedeviceheterogeneityspaceaswell. what is termed heterogeneity data. These are examples of
different types of software and hardware variations that
4 HETERO-TOLERANTDEEPMODELS impactclassificationofthesensordata,asdetailedin§2.For
Buildingupontheapproachoutlinedthepriorsection,we example,itcouldincludeadatasetof‘spectraldifferences’
nowdetailourdeeplearninggroundedframeworkforper- (i.e.,differencesinaudiofrequencyresponse)inmicrophones
formingactivityandcontextsensing. fromvariousdeviceswhencapturingthesameaudioinput
(anexampleofwhichisshowninFigure1).Otherexamples
4.1 Overview canalsobeincluded,suchasvariationsduetotheprecision
The central idea of our framework calls for a significant of the sensor ADC, that changes how many bits are used
changeinhowcontextandactivitymodelsaretrainedfor to capture the analog signal captured by the microphone.
wearablesandmobiledevices.Specifically,asillustratedin Similar to the current practice of sharing image or audio
Figure4weintroduceheterogeneitygeneratorandheterogene- datasets[3],suchheterogeneitydatasetscanbebuiltupby
itypipelineintothetrainingprocessthatprovidesvarious developersandsharedwithinthecommunity.In§4.2,we
examplesofrealisticformsofsoftwareandhardwarecaused presentmoredetailsontheheterogeneitydatasetsusedin
sensornoiseintothelearningarchitectureofadeepmodel. thispaperforevaluatingourproposedframework.
4
HeterogeneityGeneratorandPipeline.Inpriorresearch, ofvariousmicrophoneswhentheyareprovidedthesame
deepmodelshavebeenshowntolearnrobustfeaturerep- audioinputunderthesameenvironmentalconditions.To
resentations even from noisy data [22, 44]. However, it is buildthisdataset,weplayed2hoursofspeechaudioswhich
unlikelythatthelabeled‘trainingdatasets’willhavemany weresimultaneouslyrecordedbymicrophonesof20differ-
examples of heterogeneity contained in them that a deep ent mobile and wearable devices (shown in Table 1) kept
modelcanlearn.Forexample,inaspeakerIDdataset,itis equi-distantfromtheaudiosource.Priorresearch[12]has
expectedthatthedatacollectorswouldbemoreinterested shownthatamicrophonetransferfunctionhasaconvolu-
ingettinglargeamountsofspeechdatafrommultipleusers, tionaleffectonthespeechsignalsinthetimedomainanda
ratherthancapturingarangeofmicrophoneheterogeneities. multiplicativeeffectinthefrequencydomainasshownin
Assuch,adeepmodelmaynotcopewithunknownmicro- thefollowingequation:
phoneheterogeneitiesthatareexpectedin-the-wild.
Y (ejw) =X(ejw)H (ejw) (1)
Tosolvethisissue,ourframeworkconsistsofahetero- i i
geneity generator, which learns a generative model from
whereX istheoriginalspeechsignal,H isthedistortion
i
examplesofdifferenttypesofheterogeneityavailableinthe
causedbymicrophoneoftheithdevice,andY isthedistorted
i
heterogeneitydataset.Oncetrained,thisgenerativemodel
signaloutputbythemicrophoneoftheith device.Therefore,
canbeusedtoaugmentthetrainingdatawithvarioushet- wecomputeFFT(bins =512)ontherecordedaudio(usinga
erogeneities, effectively transferring the properties of the
slidingwindowof25ms)andthenasshowninEquation2,
heterogeneitiesontothetrainingdata.Foradeepmodelto
weusetheratioofmagnitudeofFFTframes(Y)asameasure
become heterogeneity-tolerant, it is important that while
ofmicrophoneheterogeneity(Φ )betweendevicesi andj.
i,j
training, the model is exposed to a wide range of hetero-
Inthisway,microphonevariationscanbecomputedacross
geneities,aswellasarichrangeofcombinations(e.g.,two
individualdevicesoracrossdevicemanufacturers.
ormoreheterogeneityeffectsco-occurring),andatvariable
levels.Ourproposedgenerativeapproachprovidesascal- Φi,j =Hi/Hj =Yi/Yj (2)
ablewaytocombinedifferentformsofheterogeneitywith
the training data, and to build an augmented training set OurseconddatasetcalledSensorSamplingJitter captures
inaprincipledway.Finally,theheterogeneitypipelineisre- thevariationsinsensorsamplingtimestamps(asdiscussed
sponsibleformergingandmanipulatingtheaforementioned in§2),whicharelikelyduetosystem-relatedfactorssuchas
heterogeneitiestogetherwiththetrainingdata. highCPUloads.Forthis,weusedthedatasetprovidedby
authorsof[25]tomeasurethedifferencesinOS-timestamps
DeepLearningArchitectureandTraining.Thefinalcom-
attachedtoconsecutivesensorsamples–intheabsenceof
ponents of the framework are: (a) the deep model itself –
any system-induced noise, we expect a constant time dif-
whichstartsasanuninitializedlearningarchitecture;and,
ference(1/f seconds)betweenconsecutivesensorsamples,
(b) the training algorithms used to perform feature repre-
where f isthesamplingfrequency.However,duetovarious
sentationlearning.Thetrainingalgorithmshavetwojoint
system-relatedfactorssuchashighCPUloads,thetimediffer-
objectives:(1)tolearndiscriminativerepresentationsbased
encebetweenconsecutivesamplesvariessignificantly,and
ontheclassificationtaskathand(e.g.,recognizingwhether
thesevariationsarecapturedinourheterogeneitydataset.
auserisstressedornotfromtheirspeech)and(2)tolearn
representationsthatareeithertolerantto,orwhichmini- 4.3 HeterogeneityGenerator
mizetheeffectof,softwareandhardwareheterogeneity.The
Heterogeneitygeneratorisagenerativemodel,whichuponob-
learningprocessandmodelarchitectureareconventional
servingsamplesofdiscrepancies (Φ )fromtheheterogeneity
i
withintheareaofdeeplearning.Ourkeyinnovationiscom-
datasets,producesnoisevaluescorrespondingtothepatterns
biningthesewiththeheterogeneitygeneratorandpipeline
intheobservedvalues.Inourcurrentimplementation,we
that–astheiterativeprocessprogresses–canachievethe
useKernelDensityEstimationtoestimatetheprobabilitydis-
jointobjectivesjustdescribed.Importantly,theendoutput
tributionofthespectralandtimestampheterogeneities.As
ofthisprocessisaheterogeneity-tolerantdeepmodelthat
shownin(3),KernelDensityEstimationisanon-parametric
canbeusedatruntimebyanyapplicationorsystem.
methodtoestimatetheprobabilitydensityfunction(ˆf)ofa
randomvariable–hereweuseaGaussiankernel(K)and
4.2 HeterogeneityDatasets
smoothingparameter(h)ischosenusingagridsearch.
Forourevaluation,wehavedevelopedtwoheterogeneity
datasetsrepresentingreal-worldhardwareandsoftwarehet- 1 (cid:88)n
fˆ (Φ) = K (Φ−Φ ) (3)
erogeneitiesinmobiledevices.First,theAudioSpectralDif- h n h i
i=1
ferencedatasetcapturesthedifferenceinfrequencyresponse
5
AsanexampleintheSensorSamplingJitter dataset,the multimodal)sensoryinputstoaprobabilitydistributionacross
generatorestimatestheprobabilitydistributionofthetimes- classes,e.g,.P(C |Θ,I ,I ).Featuresextractedfrom
k accel. дyro.
tampjitterobservedinthedata.Thereafterduringthetrain- therawsensorreadingsareusedtoinitializetheinputlayer
ing of a deep classifier, the generator can sample hetero- units,whereas,unitsintheoutputlayercorrespondtothe
geneityvaluesfromthisestimatedprobabilitydistribution targetinferenceclasses.Thenetworkistrainedinatypical
andaddthemtothetrainingdatatogeneratedcorrupted supervisedlearningfashion,iterativelyupdatingtheweights
datasets.Futureimplementationsofthegeneratorcanalso and biases within it in order to minimize a loss function
usestate-of-the-artgenerativemodelssuchasGenerative on the training data using the Adam [35] optimizer. The
AdversarialNetworks(GANs)[27]. particularlossfunctionusedforthetasksconsideredwithin
our evaluation is the categorical cross-entropy loss, in line
4.4 HeterogeneityPipeline withstandardpracticesforprobabilisticclassificationtasks.
Theheterogeneitypipelineperformsthetaskofembedding Itisdefinedasfollows,foranoutputprobabilitydistribution
avarietyofheterogeneitiesinthe‘trainingdata’inorderto y⃗,anditsrespectiveground-truthprobabilitydistributiony⃗ˆ:
produceanaugmenteddatasetonwhichthedeeplearning
(cid:16) (cid:17) (cid:88)k
modelissubsequentlytrained.Thisisdoneusingtwohyper- L y⃗,y⃗ˆ =− yˆ lny (4)
i i
parameterswhicharetunedduringthedeepmodeltraining i=1
describedin§4.5.Parameterα controlstheratioofcorrupted where k is the number of classes considered in the task.
andcleansamplestobeaddedintheaugmenteddataset(e.g., Finally, the training process also aims to tune the hyper-
70%clean,30%corrupted).Similarly,thehyper-parameterβ parameters(αandβ)whichcontroltheamountandintensity
controlsthedistributionofnoiseintensityinthecorrupted of noise added to the training data by the heterogeneity
samples – for example, it can add many examples of low pipeline.In§5,weprovidemoredetailsontheinputfeatures
heterogeneities,andafewexamplesofhighheterogeneities andtrainingparametersfordifferentclassificationscenarios.
in the corrupted data. We use an exponential decay func-
5 EVALUATION
tiontocontrolthenoiseintensity,anditsexponentialdecay
constantλiscontrolledbythehyper-parameterβ. We now present a series of experiments to compare our
Forourexperimentswithmicrophonedata,thepipeline framework’saccuracytoconventionalmethods,aswellas
samples‘audiospectraldifference’noise(Φ)fromthegen- testingitsfeasibilityonwearablehardware.Ourexperiments
erativemodel,andmultipliesitwiththeFFTframesfrom areguidedbythefollowingresearchquestions:
the original training data to obtain corrupted FFT frames • Dataaugmentationhasbeenusedpreviouslytotraindeep
(asdiscussedearlier,microphonenoiseismultiplicativein neural networks robust to acoustic environment noise
thefrequencydomain).Forthe‘samplingjitter’noise,the or lighting conditions. Does this approach also help in
pipelinesamplesexamplesoftimestampjitterfromthegen- trainingmodelsthatarerobusttosensingheterogeneities?
erator,andusesthesetocorruptthesensortimestampsin • Doesmerelyincreasingtheamountoftrainingdatasolve
theoriginalcleandataset,i.e.,makethemnon-uniform,as
theunderlyingproblemofsensorheterogeneity?
wouldbeexpectedunderrealisticCPUloads.Thecorrupted
• Whatistheruntimeandenergyoverheadofusingadeep
datasetsarethenconcatenatedwiththecleantrainingdataset
modelonmodernsmartphonesandwearables?
toobtainanaugmented setforthedeepmodeltotrainon.
Theobjectivehereistoexposethemodeltomeasurementsof Thekeyhighlightsfromourexperimentalresultsinclude:
thesameeventwhichcouldhavefeasiblybeenrecordedby
• Ourframeworkisabletobettercopewithhardwareand
devices/sensorscompletelyunrepresentedwithinthetraining
softwareheterogeneitiesthanstate-of-the-artshallowand
set,therebyaidingitsgeneralizationproperties.
deepbaselineclassifiersfortwoexamplesensingscenarios
–inertialsensing(9%gain)andaudiosensing(upto17%
4.5 DeepArchitectureandTraining
gain).
Ourframeworkintegratesadeepmodelcomprisedsolely
offullyconnectedfeed-forwardlayers,howevertheframe- • Merelyincreasingtheamountoftrainingdata,orusing
work is capable of incorporating newer deep learning ar- conventionaldevicecalibrationtechniquesdonotmatch
chitectures(e.g.,CNN,RNN).Theprecisearchitecture(i.e., theperformanceofourhetero-tolerantdeepmodel.
numberoflayersandunitsperlayer)iscustomizedforeach • Eveninextremecaseswheremodelstrainedononeclass
classificationtask(wediscusstheuseofourmethodwith ofdevices(e.g.,smartphones)aretobedeployedonan-
state-of-the-arttask-specificarchitecturesin§5). othertypeofdevices(e.g.,smartwatches),ourframework
The overall aim of this architecture, when performing isabletorecovermorethan50%oftheaccuracylostdue
classification, is to learn a mapping from the (potentially toheterogeneities.
6
• Theruntimeandenergyoverheadofusingthedeepmodel Release Maximum
iswithinacceptablelimitsformodernsmartphonesand Device Year Count samplingrate
wearables.
Nexus4 2012 2 200hz
SamsungS3 2012 2 150hz
5.1 Methodology SamsungS3-mini 2012 2 100hz
Wecompareourframeworkagainstthreebaselineclassifiers LGGSmartwatch 2014 2 200hz
–RandomForests(RF),SupportVectorMachines(SVM),and SamsungGalaxy
2013 2 100hz
aDeepNeuralNetwork(DNN).Thisisdonefortwoscenar- GearSmartwatch
iosofcommonsensingtasks,namelyspeakeridentification Table 2: Devices used in the activity recognition dataset. The re-
(audio sensing) and activity recognition (inertial sensing). questedsamplingratetotheOSis100Hzonly.
Belowwediscussthedatasetsandtheexperimentalsetup forthespeakeridentificationtask,and‘samplingratesjit-
usedinourevaluation. ter’(causedbyeffectssuchasthecurrentCPUloadofthe
sensingsystem)astheheterogeneitymodelfortheactivity
Classification Scenarios. We focus on two classification
recognitiontask.Inbothcases,theheterogeneitypipeline
taskshere,namelyspeakeridentificationandactivityrecog-
generatessamplesofexpectedheterogeneitiesfromthegen-
nition. Note that as our training framework is not task-
eratorandaugmentsthemwiththetrainingdatatoobtaina
specific, it can be applied to any other sensing task. (e.g.,
corrupteddataset.
recognizingspeechkeywords)inthefuture.
DeepModelArchitecture.Weuseafully-connectedDNN
Speaker Identification. The goal here is to perform text-
architecturetoimplementthedeeplearningmodelsinthis
independentspeakerclassificationfromspeechaudios.To
paper.DNNshaverecentlybeenusedforbothactivityrecog-
train and test the classifiers, we use speech recordings of
nition[20]andspeakerverificationstasks[9,11,45],and
30speakers(15males,15females)fromapubliclyavailable
areparticularlyaptforourtargetplatforms–wearableand
dataset[46].Toevaluatetheeffectofdeviceheterogeneity,
embeddeddevices–becauseoftheirlowcomputationaland
weplayedallspeechrecordingsonalaptop,andrecorded
energyoverhead.Thatsaid,ourproposedframeworkisnot
themwith20differentdevices(showninTable1)underthe
limitedtousingDNNsandisexpectedtogeneralizetoother
sameenvironmentalconditions.
deepnetworkarchitectures.
Ourspeakeridentificationmodelisasix-layerDNNini-
Device Type Count OSVersion tializedbyaudiofilterbank[18]featuresextractedusinga
iPhone6 Smartphone 2 iOS10 slidingwindowof25milliseconds.Thenumbersofneurons
MotorolaMotoG Smartphone 4 Android6.0.1 withinthehiddenlayersare,inorder,[1024,512,512,512,
MotorolaNexus6 Smartphone 2 Android7.0.1 256].Ouractivityrecognitionmodelisafour-layerDNNini-
tializedusingstate-of-the-arttimeandfrequencydomain
OnePlus3 Smartphone 2 Android7.0.1
featuresextractedfromaccelerometerandgyroscopedata,
LGUrbane2 Smartwatch 10 Android7.0.1
andconsistsofthreehiddenlayerswith256neuronseach.
Table1:Smartphoneandsmartwatchesusedforthespeakeridenti-
WeinitializeallnetworkweightsusingXavieruniformini-
ficationexperiment.
tialization[26].Forregularization,weuseL -regularisation
2
withλ = 0.01, dropout [43], and batch normalization [34].
ActivityRecognition.Weusethedatasetdetailedin[25]to
BothmodelswereimplementedinKeraswithTensorflow
performclassificationof6activities:biking,sitting,standing,
backend.
walking,stairs-upandstairs-down.Theinputdatacomes
from gyroscope and accelerometer at a requested rate of ClassifierBaselines. Totraintheshallowbaselineclassi-
100Hz,andiscapturedfrom10differentsmartphonesand fiers,weuseddomain-specificfeaturesforeachtaskthatare
smartwatches as shown in Table 2 when the users were describedineachtaskexperiment.Thesefeaturesarecom-
performingthesamescriptedactivities.Moreimportantly, binedwithtwopopularshallowclassifiers—namely,SVMs
this dataset was captured with realistic and varying CPU andRandomForests—toactastask-specificclassifierbase-
loadsasexpectedin-the-wild,andhenceitcontainsexamples lines.Inadditiontotheseshallowbaselinemodels,weem-
ofsamplingrateheterogeneitiesobservedintherealworld. ployadeepmodelasafurtherbaseline.Thebaselinedeep
modelisalsoamultilayerperceptron(baseline-DNN)with
DataAugmentationSchemes.Weusetwodifferentgen-
thesameconfigurationasthedeepmodelsinourframework
erativemodels(asdiscussedin§4.2)foraugmentinghetero-
describedabove,butistrainedonlyontrainingdatawithno
geneitydatatothecleantrainingdata.Weusevariations
heterogeneityaugmentation.
in ‘audio spectral differences’ as the heterogeneity model
7
5.2 Hetero-TolerantDeepModelGains fromall10smartwatchesusedintheexperiment,andtested
Here we comparethe performance ofour hetero-tolerant onaheld-outtestsetfromthesamesmartwatches.Because
deepmodel(Hetero-DNN)againstthebaselineclassifiers. theclassifiersareexposedtoallthetestdevices,weexpect
thattheywouldlearnthedevice-specificmicrophonevari-
ExperimentSetup.FortrainingActivityclassifiers,weex-
ations – as such, the result here would serve as an upper
tract state-of-the-art time-domain and frequency-domain
boundonclassifierperformance.Ourresultsinfigure5show
features [25] from inertial data. As shown in Table 2, the
thatboththebaselineDNNandhetero-DNNsignificantly
datasethas10devicescapturingthesamephysicalactivity–
outperformtheshallowclassifiers(45%accuracygain),im-
however,duetothesoftwareandhardwareheterogeneities,
plyingthatthedeepmodelscapturedthespeakervariations
thedatacollectedbyeachdeviceislikelytobedifferent.
muchbetterthantheshallowmodels.However,asexpected,
Next,forthespeakerIDexperiment,weplayedthetrain-
weobservethatthebaselineDNNperformsaswellasthe
ingaudiosonalaptop,andrecordedthemwith20devices
hetero-DNN–thisisbecausethebaseline-DNNwasexposed
(Table1)placedequidistantfromtheaudiosource.Weeval-
todatafromall10devicesineachexperiment,andwasable
uatehowtheinherentmicrophonevariationsineachdevice
tolearnthemicrophonevariationsfromeachdevicewithout
impactdifferentaudioclassifiers.Forbuildingthespeaker
theneedforanydataaugmentation.
models,weuseaudiofilterbank[18]featuresextractedfrom
Next,weturntothemoreinterestingscenario:leave-one-
thespeechdata.
device-out(LODO):thetestdatainthisscenariocomesby
EvaluationConditions.Weperformthreetypesofevalu- aspecificdevice,andnoneofthemodelshaveanopportu-
ation,motivatedbypracticalscenariosinwhichinference nity to train on the data recorded by the device held out
modelsareexpectedtoencounterdataheterogeneities: fortesting.Thedifferencesinaudiorecordedbythisdevice
arenotexplicitlycaptured,andtherefore,themodelswill
Leave-one-device-out:Acommonyetchallengingscenario
havetoattempttoovercomethischallengingproblemusing
formodelbuildersiswhentheyhaveaccesstodatafrom
thetrainingdatafromotherdevicesalone.Infigure6,we
only a certain number of devices while training, and yet
observethatalltheclassifierssufferalossinaccuracyinthe
the models are expected to run on any unseen device in
LODOsetting,suggestingthathardwareandsoftwarehet-
real-world.Toevaluateourframeworkinthisscenario,we
erogeneitiesacrossdeviceshaveasignificantadverseimpact
performaleave-one-device-out experiment,whereininfer-
ontheseclassifiers.Thepresenceoftheseheterogeneitiesis
encemodelsaretrainedusingtrainingdatafromasubset
particularlysurprisingbecauseallthesmartwatchesusedin
of devices, and tested on an unseen group of devices. We
theexperimenthadthesamemodel(LGUrbane)andwere
assumethattheheterogeneitydatasetisavailablefromthe
running the same OS (Android Wear 2.0). Regardless, the
entiresetofdevicesbeforehand.
hetero-DNN was able to cope with these heterogeneities
K-fold Cross Validation: Our second experimental setting muchbetterthanthebaselineDNNwhichshowedanaccu-
evaluatesascenariowheremodelbuildershavetrainingdata racydropof15%,andthehetero-DNNwasabletorecover
fromtheentireclassoftargetdevices,howeverthedatadoes 33%ofthisaccuracyloss.
notcaptureallformsofheterogeneitiesthatareexpected
in-the-wild(e.g.,duetovariationsinCPUloads).Toevaluate
thisscenario,wedoak-foldcrossvalidationwherewetrain
80
themodelsonasubsetofdatafromalltargetdevices,and
y
c60
testtheirperformanceonanunseenheld-outtestset. ura
c40
c
Inter-class Validation: Our final evaluation caters to a sce- A
20
nariowherealready-developedsensingmodelsneedtobe
0
redeployedonnewclassesofdevices,fore.g.,asoftwarecom- RF SVM D N N Hetero-DNN
panytriestodeployasmartphone-developedaudiomodel
Figure5:Averageaccuraciesofvariousclassifiersink-foldcross-
inwhichtheyinvestedconsiderabletimeandmoney,ona validation for the microphone sensing task on smartwatches. In
wearablependantwithamicrophone.Forthis,wepresent thisbenchmarkexperiment,baseline-DNNperformsaswellasthe
twoexperiments(fortheaudiosensingtask)wherewetrain hetero-DNNwhenbothclassifiersareexposedtoallthedevicesin
thetestset.Howeverinsubsequentexperimentswhenunseende-
theclassifiersondatafromsmartphonesandtesttheirper-
vicesappearinthetestset,thebaseline-DNNisnotabletomaintain
formanceonsmartwatchesandvice-versa.
highaccuracy.
SpeakerIdentificationResults.Wefirstpresentfindings
Finally,wepresentascenariowheresensingmodelsare
ofthek-foldcross-validationtest(k=10)onsmartwatches,
tested on a class of devices different from the class of de-
wheretheclassifiersaretrainedonasubsetofdatacoming
vicesinthetrainingset.Forthis,wetrainmodelsusingdata
8
(LODO)and10-foldcross-validationexperimentsacrossmul-
tiplesmartphones.Weobservethatinbothsettings,hetero-
DNNhasthehighestclassificationaccuracy(87%and88%
respectively)relativetothealternativemodelingapproaches
–representingagainof9%overthebaseline-DNNs.This
suggests that hetero-DNN is able to better cope with the
runtimevariations(e.g.,duetoCPUloads)withinthesame
device(inthecaseof10-foldcross-validation),aswellasthe
inter-devicevariationsinaccelerometersamples(inthecase
ofLODOsetting).
Figure6:AverageaccuraciesofvariousclassifiersintheLODOset-
tingforthemicrophonesensingtask.Numbersonthebarsshow
theaccuracydropfromthek-foldvalidationscenario.Hetero-DNN
model is better able to cope with the heterogeneities across the 80 80
smartwatchesascomparedtoallothermodels. Accuracy4600 Accuracy4600
20 20
recordedonlyonsmartphones,andtestthemonspeechdata
0 0
RF SVM D N N Hetero-DNN RF SVM D N N Hetero-DNN
fromsmartwatches.Thesameexperimentisrepeatedinthe
oppositecondition-i.e.,wetrainonthesmartwatchaudios Figure8:Accuraciesofvariousclassifiersink-fold(left)andLODO
andtestonthesmartphoneaudios.FromFigure7(left),we (right)settingontheactivityrecognitiondataset.Thehetero-DNN
observeadrasticreductioninaccuracieswhenthemodels modeloutperformsotherclassifiersbymorethan9%.
trainedonlyonsmartwatchesareevaluatedonsmartphones
5.3 ComparisonwithDeviceCalibration
– the baseline DNN shows a drop of 22% in classification
Inthissection,wecompareourproposedframeworkagainst
accuracy.Whilehetero-DNNalsoshowsanaccuracyloss,
approachestosolvesensorheterogeneitiesthroughdevice
itislessthanhalf(10%)ofwhatisshownbythebaseline
calibration.
DNN.Inotherwords,ourapproachoftrainingamodelwith
augmenteddataisabletorecovermorethan50%oftheac- ExperimentalSetup.Wetakeanexamplescenarioofmi-
curacylostduetodeviceheterogeneities.Notethatthese crophonecalibration,whereinthefrequencyresponseofmi-
resultsareaveragedacrossalltestsmartphones–wealso crophonesfromthetestdevices(smartphones)arecalibrated
observedthatforspecificsmartphonemodels(e.g.,MotoG), againstthemicrophonesoftrainingdevices(smartwatches).
theaccuracylossduetoheterogeneitieswasmuchhigher Forcalibration,weplayeda5-minutespeechaudiowhich
(34%),andhetero-DNNwasabletorecover60%ofthisloss. wasrecordedsimultaneouslybythetrainingandtestdevices.
Finally,inFigure7(right))weobservesimilartrendswhen Next,wecomputedFFTofthespeechsignalsusingasliding
themodelistrainedonsmartphonesandtestedonsmart- windowof25ms,andmeasuredtheratio2 ofFFTsineach
watches–herethebaselineDNNsuffersa30%accuracyloss windowbetweenthetrainingandtestdevices.Finally,for
ascomparedto13%lossbythehetero-DNN. eachtestdevice,wecomputedanaverageoftheobserved
FFTratiosacrossallwindowsanduseditasthe‘frequency
calibration’measureforthespecificdevice.
80 -22% -10% 5600 -13% Toevaluatetheperformanceofthiscalibrationapproach,
curacy4600 curacy3400 -30% whaedretrpaeianteeddathsepeeaxkpeerriimdeennttififrcoamtioFnigculraess7i(fileefrt)onwhdeartaeifnrowme
Ac20 Ac20 smartwatchesandtesteditonsmartphones.Hereweevalu-
10
0 0 atewhethercalibratingthemicrophonesofthetestdevices
DNN DNN Hetero-DNN DNN DNN HeteroDNN
(watches)(phones)(phones) (phones)(watche s ) (watches) (smartphones)againstthetrainingdevices(smartwatches)
canimprovetheaccuracyofthebaselineclassifiers.
Figure7:(Left)TestaccuracieswhenthespeakerIDmodelistrained
onsmartwatchesandtestedonsmartphones.(Right)Testaccuracies Results.Ourresultsshowthatwithmicrophonecalibration,
whenthespeakerIDmodelistrainedonsmartphonesandtestedon the accuracy of the baseline-DNN on smartphones drops
smartwatches.Inbothscenarios,thebaselineDNNsuffersadras- drasticallytojust29%,muchworsethantheuncalibrated
ticlossinaccuracy(22%and30%)whereastheaccuracylossforthe
baseline-DNN(67%)andhetero-DNN(77%).Thisisbecause
hetero-DNNismuchsmaller.
thefrequencyresponseofmicrophonesvarynon-linearly
ActivityRecognitionResults.InFigure8,wepresentthe 2 Asnotedearlier,theimpactofmicrophonevariationsonspeechis
findingsfortheinertialsensingtaskinleave-one-device-out multiplicativeinthefrequencydomain.
9
Hetero-DNN foreverydaysensingtasksinchallengingandbeyondthe
80
capabilitiesofmostsystemorappdevelopers.
y60
c
a
cur40 -11% -8% -6% 5.5 HardwareFeasibility
c
A
20 Theintroductionofdeeplearningalgorithmsintothemod-
0 elingofcontextandactivitywillcauseincreasedmodelcom-
1x 1.5x 2x
plexityrelativetotheconventionalshallowmethodsfound
Figure 9: (Left) Effect of increasing the amount of training data.
Evenwith2xtrainingdata,thebaselineclassifierdoesnotmatch intoday’smobilesandwearables.Assuch,wenowturnour
theaccuracyoftheaugmentedclassifierhetero-DNN.(Right)De- attentiontoevaluatingtheperformanceofourapproachon
velopmentboardfortheSnapdragon400processor.Thisboardal- wearablehardware.
lowseaseofevaluationwhilestillprovidingequivalentprocessing
andenergymeasurementsastheSnapdragonprocessorinstalledon Metric ActivityRecognition SpeakerIdentification
smartwatcheslikeMotorolaMoto360. EstimationTime(ms) 6.1 89
Memory 600KB 16.8MB
Energy(mJ) 1.1 19
withtherecordedspeech–assuch,calibratingthemicro-
Table3:ModelResourcesUsageonSnapdragonSoC
phonesusingasample5-minuteaudiodoesnothelpingener-
alizingitsperformancetoneweraudios.Ontheotherhand, ExperimentSetup.Weconductedourexperimentonthe
hetero-DNN incorporates microphone variations directly QualcommSnapdragon400(showninFigure9).AMonsoon
intothetrainingprocessandisabletolearnrobustfeature powermonitor[4]wasusedtomeasuretheenergyperfor-
representationsthroughnon-lineartransformationsofthe manceofourmodelontheSnapdragon.Wetestonthesame
inputdata,whichresultsinhigherclassificationaccuracies. modelsize/designsthatweredescribedin§5.1.
Representative Hardware. Deep learning often brings
5.4 Canlargertrainingdataalonesolve
additionallevelsofmodelcomplexity.Weensurethatthis
theproblemofheterogeneity? overheadisstillacceptablebyprofilingourdeepmodelson
Wenowseektoanswerafundamentalquestionaroundhet- aQualcommSnapdragon400processor,commonlyfoundin
erogenousmobilesensing:doesincreasingtheamountof manypopularsmartwatchessuchasMoto360(rev.2)[5].
trainingdataeventuallysolvetheproblemofsensorhetero- TheSnapdragonincludesaquad-core1.4GHzCPUand1
geneities.Weconductasmartwatch-basedexperimentfor GBofRAM,althoughinversionsthatshiponwatchesRAM
speakeridentification–inadditiontothe10smartwatches islimitedto512MBs.WhileGPUandDSParealsoavailable
used in our previous experiments, we record speech data ontheSoC,duetoalackofdriversupportweareforcedto
from10moresmartwatchesunderthesameexperimental usetheCPUonlyinexperiments.
conditions. Thereafter, we gradually increase the amount
ResultsTable3showstheresourceusageofourdeepin-
of training data fed to the baseline DNN, and evaluate its
ferencemodels(hetero-DNNs)inbothscenarios.Thedeep
impactonclassifieraccuracywhenthismodelistestedon
modeltakesabout6msforanactivityinference,and89ms
smartphones.Morespecifically,weareinterestedtolearn
forspeakeridentification–bothofwhicharereasonablefor
iftheincreaseintrainingdatahelpsinrecoveringtheaccu-
areal-worldsensingapplication.Further,weobservethat
racylostduetoheterogeneity,andwhetheritcanmatchthe
theenergyoverheadsofthedeepmodelsarealsowithinrea-
accuracyofouraugmentedclassifier.
sonablelimits–eachactivityinferencetakes1.1mJenergy;
Results.Figure9(left)showsthatasweincreasetheamount assumingastandard300mAhwearablebattery,itamounts
oftrainingdataby1.5timesand2times,theaccuracyofthe to0.005%batteryusageforonehourofcontinuousactivity
baseline-DNN(whichistrainedonsmartwatches)doesin- inference.Similarly,foronehourofcontinuousspeakerin-
creasewhenitistestedonsmartphones.However,evenwith ferences,thedeepmodelwillconsumeonly0.6%ofbattery.
2xmoretrainingdata,thebaselineDNNisnotabletoreach Wenoterecenttechniques(e.g.,[11,28,29,39])foroptimiz-
theaccuracyofthehetero-DNN(79%)whichwastrained ingdeepmodels,ifappliedthehetero-DNNswouldlikely
ontheoriginaldatasetaugmentedwithmicrophonespectral improveperformancesignificantly.
heterogeneities.Thekeytakeawayhereisthatourproposed
6 DISCUSSIONANDLIMITATIONS
approachofincorporatingdevicesheterogeneitiesdirectly
Inthissection,wediscussthelimitationsofourwork,and
intothetrainingprocesscancompensatefortheneedtocol-
outlinetheavenuesforfutureworkonthistopic.
lectverylargetrainingdatasets.Whileitcanbearguedthat
with very large scale training datasets, the baseline DNN Incorporating other types of sensing heterogeneities.
models may eventually achieve similar accuracies as the Thisworkfocusedontwosourcesofheterogeneitiesinsen-
hetero-DNN–howevercollectingsuchlarge-scaledatasets sor data: microphone variations and sampling-jitters. We
10
Description:Real-time operating systems (RTOS) popular in embedded devices [2] provide another alternative to process data as it comes in, without buffering delays. Ideally, they could be leveraged for providing sensor measurements with very high sampling rate stability. However, RTOS have poor support for mul