Embedded Systems Design Laboratory

Computer Science | Faculty of Engineering, LTH

Denna sida på svenska This page in English

Master Thesis Proposals

Efficient Implementation of Neural Networks

(Announced November 9, 2017)

Neural networks as used in computing applications such as deep learning are very large graph-like computational structures, where a vast number of asynchronously operating kernels, typically performing very simple operations, are connected by channels used to send and receive packets of data, usually small scalars. The structure of these networks often exhibits a high degree of regularity, and may include feedback. Our goal is to efficiently implement such networks on modern computing platforms which involve a number of processing elements, from simple specially constructed circuits in hardware or programmable logic, to ALUs, to GPUs, to full-fledged processor cores, which are communicating in a variety of ways, including point-to-point connections, network-on-chip fabrics, bus structures with shared memory, often non-uniform and/or hierarchical, and uniform memory accessed through a hierarchy of coherent caches, and any combination of these.

The fundamental challenge is to map the kernels of the neural network, its "neurons", onto the processing elements of the target platform, which tend to be much smaller (typically orders of magnitude smaller) in number and connected in a very different topology than the neurons, with the goal of maximizing system performance, usually throughput. We propose two projects that explore techniques for performing this mapping in different scenarios and at different times.

1. Compile-time mapping of neural networks to parallel execution platforms

In this project, the goal is to develop an offline mechanism for mapping some standard neural network representation onto a range of parallel computing platforms. It involves

  1. selection and description of the platform family
  2. development of a technique for describing an instance of the family
  3. characterization of relevant performance aspects of the platform (computational speed, communication bandwidth and latency, arbitration policies where applicable)
  4. a technique for describing the mapping problem onto a platform instance as an optimization problem, e.g. using constraint programming or heuristic graph algorithms
  5. case study of the resulting mappings and their performance an utilization metrics.

2. Dynamic load balancing for neural networks

This project, aims to develop a run-time infrastructure for dynamically load-balancing a neural network running on a parallel and/or distributed platform, such as multiple GPUs or a cluster of network-connected computers, e.g. Raspberry Pis. Such a system might be employed when a sufficiently accurate performance characterization of the target platform is unavailable, or when the performance of the platform components varies over time in unpredictable ways. The project consists in developing a run-time infrastructure that will move parts of the neural network between processing elements during execution in response to measurements obtained during runtime, such as utilization, the throughput and latency of communication between processing elements etc. The objective is to gradually improve the overall execution performance and to react to fluctuations in the performance of platform components. Depending on the platform chosen for this work, the individual nodes operating in it might only have partial information about their local neighborhood in the system, and will have to make load balancing decisions based on that, raising the question of whether and how this locality conflicts with the objective of global improvement. This project could be expanded into a PhD project and it dovetails with our ongoing research on runtime infrastructure for dataflow programs on processor arrays.

Contact person: Jörn Janneck

Compiler-Assisted Code Compression

(Announced April 3, 2017)

Embedded systems are often constrained in terms of power and energy consumption. Although the processor is the one consuming most of the active power/energy, the memory is becoming an important contributor to the overall system power, especially for systems with long idle times. Offline code compression and runtime decompression is one method to reduce the physical memory size as well as the communication overhead between the processor and memory system. 

The goal of this thesis is to investigate the potential for power/energy consumption reduction by employing compile-time code compression and runtime decompression in a RISC V-based architecture, using an LLVM compile flow. The study will be centered around compile-time techniques for improving code locality, such that runtime decompression overhead is reduced. The project is closely related to a master's thesis carried out at EPFL, Switzerland, dedicated to designing the hardware for runtime decompression.

Prerequisites: 2 students. Some knowledge of compilers, preferably experience with LLVM. Some understanding of processor architectures.


  3. R. Gómez, "Memory Energy Optimizations for I0T Processors". LTH M.Sc, Jun, 2016. 

Contact person: Flavius Gruian

Routing and Scheduling for Time-Sensitive Networks

(Announced April 3, 2017)

A large class of embedded applications are dependent on predictable communications for accurate control (automotive, avionics) while also requiring enough bandwidth for less critical software (multimedia streaming). Time-Sensitive Networking is a set of standards under development by IEEE 802.1, which are becoming increasingly attractive for applications requiring connections with low-latency, high-bandwidth and availability. Three traffic classes are specified by TSN, namely time-triggered (TT), constrained bandwidth (CB), and best effort (BE), all providing support for a varying grade of criticality.  The most critical, TT, requires careful routing and scheduling in order to meet the application demands.
The goal of this project is to propose routing and scheduling tools for the TT and CB traffic, for a predefined network topology and application set. The applications will be taken from the streaming domain, where throughput is more important than latency, but latency is also constrained. One direction is to use constraint programming to define and solve the problem, but heuristics or integer linear programing may also be of use.

Prerequisites: 2 students. Knowledge of constraint programming or other optimization techniques is highly recommended. Some knowledge of packet switched networks (Ethernet), routing and scheduling techniques is a big plus.


  1. Time - Sensitive networking task Group:
  2. M. L. Raagaard, "Algorithms for the Optimization of Safety-Critical Networks", DTU M.Sc, Jan 2017.
  3. JaCoP - Java Constraint Programming solver:

Contact person: Flavius Gruian

Software-defined Networking for Streaming Applications

(Announced April 3, 2017)

Software Defined Networks have been proposed as a way of virtualizing networks, the same way hypervisors can be used to virtualize hardware platforms. Streaming applications, defined as a communicating network of actors could benefit from abstracting their underlying communication infrastructure in several ways. For instance, actor mobility across processing nodes could be made transparent to the actors, by reconfiguring the network. OpenFlow (Open Networking Forum) is a protocol for SDN that seems to gain momentum, and several switches implementing it are already available for use. The goal of this project is to examine how OpenFlow can be used in the context of streaming applications, in particular for virtualizing the communication structure of the application. Further, assuming some mobility of functionality, examine how the reconfiguration of the network can be carried out, without significantly affecting the functionality or performance of the application.

In practice this would imply:

  • proposing a method to program/configure the network for a given distribution of functionality (actors on nodes)
  • evaluating the performance of the solution with SDN compared to a standard dynamic routing network
  • a method for reconfiguring/updating the network when an actor moves to a different node
  • evaluating the impact on performance of the reconfiguration

Prerequisites: 2 students. Knowledge of optimization techniques and C-programming is highly recommended. Knowledge of packet switched networks, routing and scheduling is a plus.


  1. OpenFlow website:
  2. Durner, R., Blenk, A. and Kellerer, W., 2015, June. Performance study of dynamic QoS management for OpenFlow-enabled SDN switches. In Quality of Service (IWQoS), 2015 IEEE 23rd International Symposium on (pp. 177-182). IEEE.
  3. Eker, J. and Janneck, J., 2003. CAL language report (Vol. 3). Tech. Rep. ERL Technical Memo UCB/ERL.

Contact person: Flavius Gruian

Modeling and Solving Course Timetabling

(Announced October 19, 2015)


Course timetabling at universities is the activity to plan time for each lecture, lab or lesson and a room (lab or lecture hall) for it. It is very tedious work when a person doing this needs to take care of many heterogeneous constraints. The constraints can be divided into hard constraints and soft constraints. Hard constraints define room availability, room sizes, different requirements on the schedule on number of lecture or labs during a week, number of weeks, etc. Soft constraints represent different kind of preferences. For example, lecturer preferences on lecture time or preferred room.

Solving the timetabling problem with these constraints requires special skills of planners that try to achieve a timetable that meets the set of constraints.  Since the timetables are mostly done manually, any automatic support will make this work easier.

It should also be note that the problem, in general, is computationally complex (NP-hard).

Project goal

The project goal is to study the problem and investigate current needs at LTH. It includes investigation of of the problem, type of constraints, the problem size and possible modeling as well as solving techniques using constraint programming in first place. The project  will address both totally automatic solutions as well as decision support for planners.

The implementation work will include development of prototype implementation and evaluation of performance as well as applicability to real world problems.

To carry out the project a good understanding of constraint programming paradigm is required. Preferably the student should have studied EDAN01 Constraint Programming course.

Using the CAL data flow language as an input specification for a modem programming flow.

In order to deliver high quality output, modems have tight real-time requirements, typically defined in terms of minimum guaranteed throughput and/or maximum latency.

Embedded platforms for modems are expected to handle several streams at the same time, each with its own rate. Typically, functionality can be divided in jobs, i.e. minimal groups of communicating tasks that are started and stopped independently. The number of use-cases (i.e. combinations of simultaneously executing job instances) can be high.

In the approach being studied at ST-Ericsson in Eindhoven, the Netherlands, modem transceivers are modeled as data flow graphs [1]. 

The target hardware platform is a heterogeneous Multiprocessor System-On-Chip (MPSoC). 

On an MPSoC, transceivers share computation, storage, and  communication resources. This poses a particularly difficult problem for the scheduling of real-time applications: resource sharing leads to uncertainty about resource provision, and, consequently, uncertainty of the temporal behavior.

The overall scheduling  strategy proposed at ST-Ericsson [2] for this system mixes static (compile-time) and dynamic techniques (run time).

Intra-job scheduling (i.e scheduling of tasks that belong to the same job) is handled by means of static order, i.e., per job and per processor, a static ordering of actors is found that respects the Real-Time requirements while trying to minimize processor usage. Inter-job scheduling is handled by means of local Time Division Multiplex (TDM)  schedulers: per job and per processor a slice time duration S is allocated.

A tool is available that performs both mapping and temporal analysis of the mapped application.

A weakness of the current approach is that the data flow model is generated  manually, upon studying the actual application code. This means that the model may not actually conform to the actual implementation.

To bridge this gap between modeling and implementation, the model should be automatically extracted from the implementation code. For this to be possible, the implementation has to be specified in data flow. 

CAL  is a domain-specific language that provides useful abstractions for dataflow programming with actors. CAL has been used in a wide variety of applications and has been compiled to hardware and software implementations, and work on mixed HW/SW implementations is under way. 

The goals of this project are:


  1. To show that CAL can be used to specify the inter-task communication behavior of a radio application running on an ST-Ericsson platform: this can be done by either adapting an existing radio implementation in CAL to the ST-Ericsson multiprocessor, or by re-implementing the task communication in an existing implementation of a radio application in the ST-Ericsson multiprocessor;
  2. To show that a data flow analysis model can be generated from the CAL specification, and that this model can be analyzed by ST-Ericsson's data flow analysis tools;
  3. To show that the CAL specification of the inter-task communication behavior can reuse algorithms already coded in C, using core intrinsics
  4. To show that code using the ST-Ericsson runtime can be generated from the CAL specification.


The applicant preferably has a background in computer science, basic knowledge about signal processing, as well as good insights into compiler technology and embedded systems. 

The work is to be carried outing two phases: a first phase in Lund, for familiarization with the CAL language and associated tooling, And a second phase at the ST-Ericsson unit in Eindhoven, The Netherlands, for a minimum period of 6 months, under the local supervision of Orlando Moreira.

During the stay in Eindhoven, ST-Ericsson will grant a montlhy allowance to the applicant, to help with the costs of living in the Netherlands.

 [1] E. Lee and D. Messerschmitt  “Synchronous Data Flow”, Proceedings of the IEEE, 1987

[2] O. Moreira, F. Pereira, and M. Bekooij, “Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor”, Proceedings of the ACM Embedded Software (EMSOFT) Conference, Salzburg, Austria, 2007

[2] O. Moreira, et al, “Online Resource Management for a multiprocessor with a network-on-chip”, Proceedings of the ACM Symposium on Applied Computing, 2007

Contact person: Jörn W. Janneck

Global constraints in JaCoP (several project proposals)

This is the set of projects that have a common aim to develop different global constraints for Open Source Java Constraint Programming solver (JaCoP). This solver is written entirely in Java and provides a broad selection of constraints and search methods.

Constraint programming over finite domain offers an elegant way of modeling and solving combinatorial problems but for efficiency reasons needs global constraints that encapsulate specific reasoning algorithms. These algorithms makes the modeling and solving easier. The goal of a project in this area is to develop and test one global constraints. The possible candidates for the project include (but are not limited to):

  • Smart table constraint [1],
  • Tree constraint (specific graph constraints) [2].

To carry out the project a good understanding of constraint programming paradigm is required. Preferably the student should have studied EDAN01 Constraint Programming course.

 [1] Jean-Baptiste Mairy, Yves Deville, and Christophe Lecoutre, "The Smart Table Constraint", International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization, 2015.

[2] Nicolas Beldiceanu, Pierre Flener, and Xavier Lorca, "The tree Constraint", International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, 2005.

Contact person: Kris Kuchcinski