This repository has been archived by the owner on Oct 28, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpaper-euspn15.tex
738 lines (633 loc) · 37.5 KB
/
paper-euspn15.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
% This paper will be submitted to EUSPN15.
\documentclass[3p,times,procedia]{elsarticle}
\flushbottom
\usepackage{ecrc}
\volume{00}
\firstpage{1}
\journalname{Procedia Computer Science}
\runauth{Christophe Van Ginneken, Jef Maerien,
Christophe Huygens, Wouter Joosen, Danny Hughes}
\jid{procs}
\usepackage{amssymb}
\usepackage[figuresright]{rotating}
\input{packages}
\input{squeeze}
\begin{document}
\begin{frontmatter}
\dochead{6th International Conference on Emerging Ubiquitous Systems and
Pervasive Networks, EUSPN-2015 and the 5th International Conference on Current
and Future Trends of Information and Communication Technologies in Healthcare,
ICTH 2015,}
\title{
\FOO: A framework for efficient source code generation\\
of detection algorithms for the Internet of Things
}
\author{Christophe Van Ginneken}
\author{Jef Maerien}
\author{Christophe Huygens}
\author{Wouter Joosen}
\author{Danny Hughes}
\address{
iMinds-DistriNet, KU Leuven, 3001 Leuven, Belgium\\
\{firstname.lastname\}@cs.kuleuven.be
}
\begin{abstract}
The ability to detect patterns in its application context is what makes
\emph{things} \emph{smart}. The more patterns it can detect, the more accurate
its operational results will be. However, supporting an adequate set of pattern
detection algorithms imposes significant overhead on resource-constrained
devices such as those in the Internet of Things. Manual fusion of the
algorithms reduces resource consumption by optimizing resource sharing, yet,
proves to be time-consuming, repetitive and error-prone. To address this
problem we propose \FOO, a framework for the generation of efficient
combinations of detection algorithms. It consists of a domain specific language
and a code generator. The language allows formally describing the intent of
detection algorithms and supports the generator in organizing the source code.
As a case study, we apply \FOO to the generation of an intrusion detection
system for wireless sensor nodes. A side-by-side comparison shows that
\FOO-generated source code reduces message passing overhead, execution time and
memory footprint in comparison to sequential calls into standalone
implementations of the algorithms.
\end{abstract}
\begin{keyword}
Internet of Things, Domain Specific Languages, Source Code Generation, Source
Code Optimization, Intrusion Detection, Context Awareness
\end{keyword}
\end{frontmatter}
\section{Introduction}
\label{introduction}
The Internet of Things (IoT), the mother of all ubiquitous and pervasive
networks, populates our daily lives with millions of internet-connected
\emph{smart things}: from smart watches that assist visually impaired
persons\cite{porzi2013smart} to monitoring systems for waste management in
\emph{smart cities}\cite{zanella2014internet}. Their applications cannot be
more diverse, however they share one common concern: each of these
\emph{things} is resource-constrained, relying on a battery and operating with
limited processing power and memory.
Although their applications at first seem unrelated, under the hood they share
a common paradigm: pattern recognition. With the introduction of \emph{smart
things}, we want to augment our lives with information that is hard, or at
least time and resource consuming, to obtain otherwise. The ability to detect
patterns in its usage or its applied context, is what makes these \emph{things}
\emph{smart}. A watch that recognizes gestures can perform certain actions on
behalf of its wearer. That same watch can also detect a wet floor and notify
this hazard\cite{porzi2013smart}.
Monitoring human activities is an active research field for more than a
decade\cite{lara2013survey}, but with the advent of the IoT it now moves into
the real world at an increased pace. Equipped with wearables, hosting numerous
sensors, the possibilities seem endless, but so are the number of detection and
recognition algorithms that have to run simultaneously on these small
\emph{things}.
Another prime example of a class of algorithms that inhibit the same properties
is intrusion detection (ID). The large number of security related opportunities
for attackers requires an equally sized amount of ID algorithms, all with a
similar structure, still independent in their operation.
% scientific contribution
The contributions of this work are threefold: the first contribution consists
in the formulation of a design pattern for detection/recognition algorithms for
IoT devices. This defines the target algorithms that fit our proposed
framework. The second contribution is a framework called \FOO.\@ It enables the
formal description of algorithms on a functional level and provides a code
generator producing source code for a given platform and configuration. This
way, \FOO also addresses the heterogeneous nature of both the IoT and the
algorithms. The third contribution introduces functional code fusion (FCF) as a
source code generation paradigm to address the combination of algorithms. FCF
identifies common data and functions, and organizes code to eliminate redundant
iterations, tests and computations. Results show reduced usage of the wireless
radio, execution time and memory usage.
% structure
The remainder of this paper proceeds as follows: section \ref{problems}
identifies problems that arise when combining multiple algorithms in the
context of resource-constrained devices. Section \ref{pattern} formulates a
pattern common to detection/recognition algorithms in the IoT context. Section
\ref{design} introduces our framework, its design and components. In section
\ref{evaluation} we apply it to a case study related to ID and evaluate the
results. Section \ref{related} explores related work in the field of DSLs and
code generation. Finally, section \ref{conclusion} summarizes our findings,
draws conclusions and identifies topics for future work.
\section{Problems when combining multiple algorithms}
\label{problems}
Simply taking existing implementations of algorithms and calling them in
sequence, will in most cases result in suboptimal code. Three prime examples on
the source code level illustrate this:
\begin{description}[style=unboxed,leftmargin=0cm]
\item[Loops] by themselves are not a problem, but different loops, that
perform the same operations in sequence, can cause overhead. An example is
the handling of messages: each algorithm parses each observed message on the
network, processing the same byte stream, performing comparable operations to
find similar patterns. Polling sensors for their value is another example.
This redundancy results in an execution overhead, putting a strain on the
power consumption.
\item[Common data] cross-cuts the different algorithms, that have to store
the same information over and over again within their own scope. Typical
examples here include information about neighboring nodes, sensor readings,
etc.\ This results in a lot of duplicated information in memory, raising the
hardware requirements.
\item[The network] allows algorithms to operate on a distributed scale.
Scattered calls to a message sending function cause multiple accesses to the
wireless radio, keeping it active and energy consuming more than strictly
needed.
\end{description}
Two operational aspects are orthogonal to these problems and further amplify
them:
\begin{description}[style=unboxed,leftmargin=0cm]
\item[Reimplementing] algorithms manually to eliminate these issues is not a
one-time cost. A release of a new version of an algorithm, or simply a
reselection of algorithms, due to changing requirements, can trigger it.
Also, changing, or simply implementing the algorithms from scratch, holds the
risk of making mistakes.
\item[The heterogeneity of IoT devices] results in different software stacks,
each with for example their own network API\@. This partially explains why
there are hardly any implementations of algorithms readily available to
developers: there is no single platform to target for researchers, which
lowers the opportunity to reuse prior work.
\end{description}
\section{A design pattern for detection/recognition algorithms}
\label{pattern}
The source code level problems that arise when combining multiple algorithms,
as presented in section \ref{problems}, are in fact inherent to the design
pattern that forms the foundation for the type of algorithms we identify in
typical detection/recognition applications in an IoT context. The formulation
of such a design pattern will allow us to define a strategy for the
optimization of the combination of these algorithms. We start by extracting two
common activities: the collection of sensor values and the evaluation of
thresholds. Periodically collecting sensor values requires preparing the
hardware for value extraction, waiting for the value to become available,
reading the bytes from the sensor, interpreting these bytes and recording this
interpretation for further analysis. This can be as simple as reading a single
byte, but can also be as complex as parsing a network message.
A single sensor value hardly ever results in a decision. The combination, often
over time, of these values and the evaluation of thresholds will be key to
decide if a change in the environment requires an action. Updating such
statistics and evaluating the corresponding thresholds should be distinct
operations. This decoupling of detection and decision is the core of the design
pattern for detection algorithms. Listing \ref{alg:pattern} presents the design
pattern in pseudo-code. Applying the pattern in the context of multiple
algorithms, extends its scope and an application pattern such as described in
listing \ref{alg:application} appears.
We see the three aforementioned problems manifest themselves: first, nesting of
loops causes redundant executions of potentially the same sensor readings
inside individual algorithms' $collect\_data$ functions. Second, each
individual algorithm stores the same sensor values in memory. Third, scattered
calls to network functions (in case of distributed algorithms) lead to
suboptimal use of the wireless radio.
When implementing the solution from scratch, it would be possible to identify
these problems and reorganize the code in such a way that it reads and stores
sensor values in memory once or centrally parses network messages. In section
\ref{problems} we identified two operational problems that show that this is
not feasible, or at least suboptimal. We therefore present \FOO, a framework
that enables formally describing such algorithms and fusing common data and
functions in an automated way, to overcome both these problems.
\begin{algorithm}[t]
\small
\caption{Detection/recognition algorithm pattern}
\label{alg:pattern}
\begin{algorithmic}
\Require{readings: global variable to hold collected sensor data for the given algorithm}
\Require{devices: global variable to store information about neighbor devices} \Comment{optional in case of distributed algorithms}
\Function{collect\_data}{}
\ForEach{$sensor \in required\_sensors$}
\Let{values[]}{$sensor$} \Comment{collect sensor data}
\Let{$reading_{sensor}$}{\Call{interpret}{values[]}} \Comment{update interpreted sensor info}
\EndFor
\State \Call{send}{$devices_x$, ``info''} \Comment{optionally exchange info} \label{alg:id-algo-pattern-send1}
\EndFunction
\Function{do\_housekeeping}{}
\ForEach{$reading \in readings$} \label{alg:id-algo-pattern-loop2} \label{alg:id-algo-pattern-common-data}
\If{$reading > \dots$} \Comment{validate recorded value}
\State \dots \Comment{take actions}
\State \Call{send}{$device_x$, ``info''} \Comment{optionally communicate} \label{alg:id-algo-pattern-send2}
\EndIf
\EndFor
\EndFunction
\end{algorithmic}
\end{algorithm}
\begin{algorithm}[t]
\small
\caption{Application pattern of multiple algorithms}
\label{alg:application}
\begin{algorithmic}
\ForEach{$algorithm \in algorithms$}
\State \Call{algorithm.collect\_data}{}
\EndFor
\ForEach{$algorithm \in algorithms$} \Comment{at a given interval}
\State \Call{algorithm.do\_housekeeping}{}
\EndFor
\end{algorithmic}
\end{algorithm}
\section{Design}
\label{design}
The previous sections identified loops, common data and the network as three
important code level problems. These problems are inherent to detection
algorithms, but conflict with the target environment, the IoT\@. \FOO provides
a framework to efficiently implement these algorithms and offers a solution for
all identified problems. It consists of a 1) DSL, 2) source code generator and
3) software library.
The DSL enables researchers to formally describe algorithms, without focusing
on a specific language or platform. The code generator can avoid the identified
problems concerning loops, common data and network access, because the formal
description provides the underlying intent, not the technical implementation.
It applies Functional Code Fusion (FCF) to identify hotspots for optimization.
A key component of FCF are \emph{functional domains} that harbor common
functionality.
We will first position \FOO as a development strategy and motivate its design.
Second, we will look at the principal design components of the DSL.\@ Third, we
discuss the structure of the code generator and introduce its software library.
\subsection{\FOO as a development strategy}
\label{positioning}
The introduction of a code generator in the development process changes the way
of working: the source code is no longer the driving force, controlled by
developers. It is literally an intermediate representation between code
generator and compiler. A code generator constructs this representation using
three components, as shown in figure \ref{fig:design}: 1) the description of
the algorithm in the DSL, 2) information about the functional domains, and 3)
information about the platform and target language. At the level of the
compiler, a software library offloads specialized implementations and allows
for custom extensions. The applied grayscale indicates the components' platform
dependency. This hierarchy is important from a development effort point of
view. The level of dependency on the target platform dictates the level of
effort to implement changes. This is important in relation to the scalability
of the framework itself.
\begin{figure}[t]
\centering
\scalebox{0.70}{
\begin{tikzpicture}[
node distance=9mm,
,>=stealth,very thick,black!50,text=black,
every new ->/.style={shorten >=1pt}
]
\node (dsl) [user] {DSL};
\node (cg) [representation,right=of dsl,align=center] {Code\\Generator};
\node (code) [generated,right=of cg] {code};
\node (comp) [representation,right=of code] {Compiler};
\node (image) [generated,right=of comp] {image};
\node (domains) [dom,above=of cg, align=center] {Functional\\Domains};
\node (plat) [platform,below=of cg,align=center] {Platform \&\\Language};
\node (lib) [native,below=of comp,align=center] {Software\\Library};
\draw [->] (dsl) -> (cg);
\draw [->] (domains) -> (cg);
\draw [->] (plat) -> (cg);
\draw [->] (cg) -> (code);
\draw [->] (code) -> (comp);
\draw [->] (lib) -> (comp);
\draw [->] (comp) -> (image);
\end{tikzpicture}
}
\caption{The high level design of \FOO, showing the resources, processes and
artifacts. A grayscale background represents the level of platform dependency,
with darker shades indicating a higher dependency. Both the source code and the
installable image are artifacts.}
\label{fig:design}
\end{figure}
\subsection{\FOO as a domain specific language}
\label{dsl-design}
It's important to remember that a DSL is merely a \emph{thin veneer over a
(semantic) model} \cite{fowler2010domain}. Its sole purpose should be
populating the model. \FOO's DSL and model focus on avoiding constructions that
result in problems concerning loops, common data and the use of the network. To
do this, \FOO uses an event-based approach to define functionality. Describing
algorithms based on events removes the technical interpretations and allows FCF
to identify the intent of algorithms and avoid the identified source code
problems.
\subsubsection{Loops}
Loops are a major contributor to the problem of algorithm fusibility. Loops in
program code are technical constructs and in fact often hide the intended
functionality. The example of parsing network messages illustrates this. Each
algorithm deals with parsing in the same way: loop over the bytes in the
payload and look for a pattern that identifies what the algorithm is looking
for. This is in fact a technical implementation choice. The underlying
functional intent is to react when such a pattern is present in a message. The
loop has \emph{activated} this functionality, but the real goal is to
\emph{passively} react on an event, not \emph{actively} look for it. Event
handling therefore can transform constructions with loops to their functional
meaning, allowing identification and fusion of common functionality.
\subsubsection{Common data}
Different detection algorithms typically deal with the same classes of
information, be it sensor data or network messages. Functional domains
consolidate these and store and manage this data in a unified way. \FOO
provides the concept of such domains as plug-ins to the DSL, extending it with
domain specific storage and logic. For example in the case of distributed
algorithms a \emph{Nodes} domain will manage information about neighbor nodes,
exposing them as objects to the algorithms, relieving them from managing these
themselves. Functional domains are also accessible as collections, allowing
execution of functions in their scope, again eliminating the need for explicit
loops.
\subsubsection{Unified messaging}
\label{dsl-unified-msg}
Access to the network, scattered around an algorithm, causes the wireless radio
to be more active than needed. Delaying the sending of messages and grouping
them can lower this overhead. Sending and receiving of network messages is
tightly integrated with the previously mentioned concepts of event handling and
for example the \emph{Nodes} domain. An underlying software library can handle
all interaction with the network, triggering functions by means of events when
parsing messages and exposing an API to send messages. This approach gives the
framework complete control over communication and allows for optimized
marshaling and grouping of messages.
\subsection{\FOO as a code generator}
To convert the DSL-based algorithms into production-ready code, \FOO applies
code generation techniques. Although different approaches are applicable, like
template-based text expansion, XSLT transformations or CodeDom
\cite{dollard2004code}, we believe that the required transformations demand a
richer model. A semantic model \cite{fowler2010domain}, tied closely to the DSL
itself, can fulfill this role. Semantic transformations evolve the model into a
more syntax-oriented model, comparable to the idea of CodeDom. Syntactic
transformations can further evolve the model into source code. Figure
\ref{fig:code-generation} shows this process.
\begin{figure}[t]
\centering
\scalebox{.70}{
\begin{tikzpicture}[
node distance=7mm,
,>=stealth,thick,black!50,text=black,
every new ->/.style={shorten >=1pt}
]
\node (dsl) [resource] {DSL};
\node (sm) [representation,right=of dsl,align=center] {Semantic\\Model};
\node (cm) [representation,right=of sm,align=center] {Syntactic\\Model};
\node (code) [resource,right=of cm] {Source Code};
\draw [->] (dsl) -> (sm) node [midway,above=15pt] {parse};
\draw [->] (sm) -> (cm) node [midway,below=10pt,align=center] {semantic\\transform}
node [midway,above=15pt,align=center] {FCF};
\draw [->] (cm) -> (code) node [midway,above=15pt] {emit};
\draw [rounded corners,->]
($ (sm.east) - (5mm,5mm) $)
-- ++(0,-.5)
-| ($ (sm.west) + (5mm,-5mm) $)
node [near start, below=5pt] {type inference};
\draw [rounded corners,->]
($ (cm.east) - (5mm,5mm) $)
-- ++(0,-.5)
-| ($ (cm.west) + (5mm,-5mm) $)
node [near start, below=1pt,align=center] {syntactic\\transform};
\end{tikzpicture}
}
\caption{\FOO's code generation process}
\label{fig:code-generation}
\end{figure}
\subsubsection{Semantic model}
The semantic model is the core of the generator. It offers a way to describe
algorithms so that semantic reinterpretation through FCF is possible. This
reorganization of the functionality of different algorithms allows for more
optimal execution and memory usage. Hence the name \FOO of the framework:
functionality organization optimization. Figure \ref{fig:meta-model} shows the
central part of the semantic model as implemented for \FOO.\@ The concept driving
this model is the structure starting at the \emph{Model} up to the
\emph{Execution Strategy}. This structure encompasses the \emph{Function}s and
underlying \emph{Statement}s and allows for interpretation and fusion of the
algorithms.
\begin{figure}[b]
\centering
\scalebox{.70}{
\begin{tikzpicture}[
edge from parent fork down,
sibling distance=5mm, level distance=20mm, node distance=6mm
]
\node (model) [class,fill=black!10!white] {Model}
child {node (module) [class,fill=black!10!white] {Module} edge from parent [composite]
child {node (strategy) [class,align=center,xshift=-10.5mm,fill=black!10!white] {Execution\\Strategy} edge from parent [composite]
child {node(every) [class,xshift=-9mm,fill=black!10!white] {Interval} edge from parent [subclass]}
child {node (when) [class,xshift=9mm,fill=black!10!white] {Event} edge from parent [subclass]}
}
child {node (function) [class,xshift=10.5mm] {Function} edge from parent [composite]}
};
\node (constant) [class,right=of module] {Constant};
\node (scope) [class,left=of strategy] {Scope};
\node (statement) [class,right=of function] {Statement};
\node (domain) [class,left=of module,xshift=-13mm] {Functional\\Domain};
\draw [aggregate] (module) -- (domain);
\draw [composite] (module) -- (constant);
\draw [dependency] (strategy) -- (scope);
\draw [composite] (function) -- (statement);
\draw [composite] (domain) -- (scope);
\draw [dependency] (strategy) -- (function);
\end{tikzpicture}
}
\caption{Partial Semantic Model for \FOO.\@ Shaded classes indicate the
\emph{event/scheduling} construction around functions.}
\label{fig:meta-model}
\end{figure}
\subsubsection{Functional code fusion}
Thanks to the functional description of events and the effects triggered by
them, the code generator is able to identify the same actions across different
algorithms. This way it is possible to extract the parsing of incoming messages
and group actions on the same sets of data or scheduled function executions.
FCF typically looks for patterns in the semantic model and performs model
transformations to change these patterns to a more optimized organization. It
first performs semantic transformations on the semantic model itself. A second
phase transforms the semantic model into a syntactic model, encoding the
semantics into technical syntax constructs. One example of a semantic
transformation is type inference, which supports the platform independence of
the framework.
\subsubsection{Syntactic model}
The syntactic model is like an abstract syntax tree for a virtual, expressive
language. It contains constructs and ideas found in different languages and
programming paradigms. No single existing programming language supports all
features as is. Syntactic transformations therefore translate the unsupported
constructs into comparable constructs supported by the targeted platform and
language. Depending on the target platform, some constructions aren't
transformed into actually corresponding code. In selected cases, the generator
uses a software library. For example when it needs to parse incoming messages
or schedule tasks. During the generation process, it looks for patterns that
indicate any of these actions and transforms them into calls to the \FOO
software library.
\subsection{\FOO as a software library}
\label{software-lib-design}
A software library allows offloading of common implementation aspects, thus
avoiding small discrepancies in code and suboptimal use of shared resources.
Previously we illustrated this approach with a case for unified messaging:
parsing of incoming messages and grouping of outgoing messages. More common
functionality isn't generated verbatim either. A standard software library
provides functionality for the execution of functions based on a schedule or
the implementation of higher level conceptual functions. Optimization of this
type of algorithms is not within the scope of \FOO.\@ This further gives
freedom of choice to the users of the framework to select appropriate
implementations for this standard functionality.
\section{
Implementation and evaluation:
a case study on intrusion detection in wireless sensor networks
}
\label{evaluation}
In wired networks, firewalls focus on the outer perimeter of the network,
filtering unwanted packets. Still, attacks on services can pass unnoticed. This
is where an Intrusion Detection System (IDS) comes into play: monitoring all
traffic that passes through the firewall, looking for patterns of malicious
activity\cite{denning1987intrusion}. In meshed networks it is not possible for
a single point in the network to oversee all traffic
\cite{mishra2004intrusion}. So every device has to implement its own IDS.\@
Attackers have access to a large set of possible attack vectors
\cite{aschenbruck2012security}, ranging from the physical layer up to the
application layer. Each layer presents different opportunities for launching an
attack. For each of these attacks an algorithm needs to look for patterns or
anomalies. An IDS therefore requires a lot of algorithms to achieve adequate
coverage.
\subsection{Implementation}
To evaluate the prototype we used a system based on the Atmel ATMEGA1284p
micro-controller and the Digi XBee S2 ZigBee module, running no specialized
operating system. Relying on basic hardware and software allows us to validate
that the generator is capable of generating code even for the most basic
environment. For the evaluation we constructed a small meshed network. In this
network, not all devices have direct access to one another, requiring routing
through shared neighboring devices.
We started from a basic application that measures light intensity. The
application was an includable piece of code, allowing integration in both a
manual implementation and generated code base. On top of this baseline we added
two intrusion detection related algorithms: a watchdog
\cite{mishra2004intrusion}, that allows nodes to validate each other's
presence, and a reputation-based algorithm \cite{ganeriwal2008reputation}, that
checks if a parent node is cooperative and actually forwards messages.
\subsection{Analytical results}
Looking at the generated source code, a simple metric is comparing the amount
of code versus the size of the formal description of ID algorithms.
Implementing both algorithms manually results in 637 lines of C code. The
corresponding \FOO implementation requires only 156 lines of DSL code. Looking
at data structures, we see that sharing data eliminates redundant information,
such as a network address. This reduces the memory footprint for one node entry
by 2 bytes, from 27 bytes to 25 bytes. In our implementation we used the ZigBee
network address, which consists of 16 bits. This results in a reduction of
7.4\% of dynamic memory per known device. Using ZigBee's 64 bits unique
addresses, this reduction can easily become 20.5\%.
\subsection{Experimental results}
We also instrumented the code and evaluated four criteria: network usage
(number of packets and bytes), the time required to perform one cycle of the
event loop and the resulting image size. We collected these metrics for four
situations: without ID algorithms, with a watchdog, with reputation-tracking
and with both algorithms. The manual implementation realized both algorithms as
standalone modules and sequentially called them from the base-application's
event loop. Given the \FOO descriptions of the algorithms, the code generator
generated all four cases. Table \ref{tbl:results} shows the collected data.
\begin{table}[ht]
\caption{Experimental results and side-by-side comparison.}
\label{tbl:results}
\begin{tabular*}{\hsize}{@{\extracolsep{\fill}}llrrrrrrr@{}}
\toprule
& & base & \multicolumn{2}{c}{watchdog} & \multicolumn{2}{c}{reputation} & \multicolumn{2}{c}{both} \\
\midrule
\multirow{2}{*}{\pbox{\textwidth}{total packets sent\\in a window of 90s}}
& manual & 20 & 51 & +155\% & 32 & +60\% & 63 & +215\%\\
& generated & 20 & 49 & +145\% & 32 & +60\% & 55 & +175\%\\
\cmidrule{2-9}
& difference & 0 & -2 & -4\% & 0 & 0\% & -8 & -13\%\\
\midrule
\multirow{2}{*}{\pbox{\textwidth}{total bytes sent\\in a window of 90s}}
& manual & 476 & 1933 & +306\% & 860 & +81\% & 2317 & +387\%\\
& generated & 476 & 1897 & +299\% & 884 & +86\% & 2161 & +354\%\\
\cmidrule{2-9}
& difference & 0 & -36 & -2\% & 24 & +3\% & -156 & -7\%\\
\midrule
\multirow{2}{*}{\pbox{\textwidth}{event loop\\average duration ($\mu$s)}}
& manual & 48 & 94 & +96\% & 88 & +83\% & 149 & +210\%\\
& generated & 48 & 121 & +152\% & 121 & +152\% & 138 & +188\%\\
\cmidrule{2-9}
& difference & 0 & 27 & +29\% & 33 & +38\% & -11 & -7\%\\
\midrule
image
size (bytes) & manual & 10466 & 15530 & +48\% & 13306 & +27\% & 18334 & +75\%\\
& generated & 10466 & 18352 & +75\% & 16376 & +56\% & 20998 & +101\%\\
\cmidrule{2-9}
& difference & 0 & 2822 & +18\% & 3070 & +23\% & 2664 & +15\%\\
\bottomrule
\end{tabular*}
\end{table}
It is clear that adding algorithms manually to the base-application has a
cumulative effect. In case of the generated implementation, this effect is no
longer observed: individual algorithms do have a larger impact by themselves,
but the combination of both algorithms is less than the sum of both. Most
remarkable is the effect on the time of one event loop cycle: both algorithms
by itself add 152\%. Adding a second algorithm only augments the impact to
188\%. We see the effect of the generic handling of common functionality. It
comes with an overhead for a single algorithm, but pays for itself when adding
more algorithms.
Table \ref{tbl:results} also compares the manual versus de generated versions,
showing the impact of the code generation: both the number of packets as well
as the amount of bytes sent have dropped by 13\% and 7\%. In case of the
execution time, both algorithms by themselves add an equal amount of time to a
single event loop cycle. When combined, the event loop is about 7\% faster
compared to the manual case. Finally, we see that the software library adds to
the size of the resulting image. However, the cost is close to constant and is
lower for both algorithms together. With roughly 3KB of additional overhead, or
15\% in this simple case, the impact is small. Taking into account the
available memory on the ATMega1284p, this is only 2.3\% of the available
resources.
\section{Related work}
\label{related}
This section focuses on related work in the field of domain specific languages
and code generation, in relation to wireless networks of resource-constrained
devices. These two components are the core of the \FOO framework.
Within the scope of wireless networks of resource-constrained devices, we see
the introduction of DSLs \enquote{to shorten the development cycles
dramatically during deployments, to reduce the dependency on hardware and
software engineers, and to lead to a wider adoption outside the computer
science field} \cite{sadilek2008domain}. TinyScript \cite{levis2004tinyscript}
is an example of a language for wireless sensor networks. It's a complete
programing language that aims to ease the description of algorithms in general.
The Basic-like syntax does enable more productive development. Still, this
approach doesn't address the underlying resource problems inherently connected
to the context. SEAL \cite{elsts2013seal} is a novel complete programming
language that focuses on novice programmers. It is a clear fit for
\emph{sense-and-send} type of applications. It does however share the idea of
using events to overcome the technical nature of loops. In the context of
intrusion detection, we learn that DSLs are predominantly used to describe
patterns of events found in network packets
\cite{sekar1999high,roesch1999snort} in a concise way. The STAT framework with
its STATL language \cite{eckmann2002statl,vigna2003designing} allows modeling
scenarios using state machines. Our event-driven approach shares this idea.
Although not a real DSL, nesC \cite{gay2003nesc} is a general purpose language
that shares some typical concepts with \FOO, such as the high degree of
analysis during code generation to optimize execution. The biggest difference
is that \FOO starts from the intended functionality and can optimize certain
constructions that can't be interpreted from more technical source code.
Code generation, in combination with a DSL, can optimize a development process
by reducing the size of the code base or systematically improving the code
quality. Embedded systems are a prime example of an environment where code
needs to be of a high quality. Focus on size and performance are a logical
choice when dealing with limited storage and processing power. Compilers
implement different techniques to address this at the lowest level
\cite{marwedel2002code}. Optimizing for power efficiency however proves to be
concurrent to size and performance optimization, and requires different
strategies. Data and memory optimizations \cite{panda2001data} can result in
improvements on a lower level. To really preserve energy one requires
higher-level strategies, targeting architecture, radio handling, efficient
software,\dots \cite{naik2001software}. Most of these are beyond the control of
the compiler and require a higher level of abstraction to address them. DSLs
offer a solution here, but other paradigms are possible. A mixed framework
consisting of a virtual machine (VM) and native code extension
\cite{sadilek2007energy}, combined with energy-aware compilation, seems a valid
approach. By offloading the optimization to a VM and only compiling parts and
not the whole, the solution allows updating at a granular scale, thus reducing
the update cost, while maintaining flexibility and lowering the development
effort. In this terminology, \FOO focuses on the reduction of the development
effort and update cost, and maximizes flexibility.
\section{Conclusion and future work}
\label{conclusion}
IoT applications are typically constructed of similar algorithms that adhere to
a pattern of collecting sensor data and periodically checking recorded values
for patterns that require action. Fusing these algorithms is hard but needed to
deal with the resource-constrained devices they are targeting. Frequent
reselection, e.g. due to changing requirements or changing security threats,
requires a flexible way to incorporate these algorithms in subsequent
iterations of the software.
\FOO is a framework consisting of a DSL, a source code generator and software
library. The DSL allows formally describing algorithms, enabling the code
generator to apply functional code fusion to interpret the intent of algorithms
and merge common functionality \& shared data. The software library hosts
common functionality, such as sensor access, message parsing and event
scheduling. The platform independent DSL enables generation of source code
targeting different platforms and languages, which allows high degrees of reuse
with heterogenous devices.
Initial experiments confirm the expected positive impact on resource usage:
grouping of messages lowers the number of packets by 13\%, while the total
number of bytes sent over the network is reduced by 7\%. The execution time is
improved by 7\% and memory usage can be reduced by 20\%.
Future research will focus on the integration of applications with the
generated functionality. We see opportunities for applications to adopt the
event-driven approach and hope to expand the capabilities of \FOO to this
level. A continued effort is targeting the reduction of platform-dependent
parts in the implementation. This way it is possible to add more platforms and
languages with less effort. In the longer run, we want to investigate the
possibility to extend the functional code fusion paradigm to more domains.
\section{Acknowledgements}
This research is partially funded by the Research Fund KU Leuven and partially
by the EU FP7 project NESSoS\@.
\bibliographystyle{elsarticle-num}
\bibliography{literature/referenties}
\end{document}