Cyber-Physical Systems and Mixed Simulations

Size: px

Start display at page:

Download "Cyber-Physical Systems and Mixed Simulations"

Jewel Crawford
5 years ago
Views:

1 Master Research Internship Master Thesis Cyber-Physical Systems and Mixed Simulations Author: Tran Van Hoang Supervisor: Professor Bernard Pottier

2 Abstract Climate change has received much attention in recent years. The needs of prediction and validation of real systems behaviors and natural phenomena are critical. Simulation is a good candidate for this mission. However, the major problem is that modeling and simulating complicated and large physical systems are time-consuming. Despite many commercial software now exist for such systems (water, forest modeling as examples), require a considerable knowledge of specific physical processes, and about the study areas. Thus, at the first step, we propose a practical way for simply modeling physical systems, especially natural system, by using Cellular Automata (CA). The PickCell tool developed at Lab-STICC laboratory will facilitate that process. As a point, GPU computations and parallelisms will be proposed as an important part of this methodology. The purpose is to accelerate large size physical simulations. In addition, we propose the use of distributed simulations to deal with the lack of interoperability between simulations. To do that, we use an IEEE standard High Level Architecture (HLA) for designing the system supporting mixed simulations being based on synchronous systems. This also makes a great chance of conducting the simulations Cyber-Physical Systems.

3 Acknowledgments I would like to give special thanks to professor Bernard Pottier, and all of my colleagues at LabSTICC, UBO. I appreciate the supports not only in research activities but also in my daily life. Tran Van Hoang, Brest, France, 12/06/2015

4 Contents 1 Introduction Motivations and Objectives Cyber-Physical Systems (CPS) Cellular Automata (CA) Physical simulations based on cell networks PickCell tool and cell networks Physical simulations based on cell networks Case study and applications Routing algorithm Remarks Simulations with Cuda programming model GPU and Cuda programming model Accelerating simulations by using Cuda Details of GPU implementation of simulations Performance measurement principles Distributed simulation with HLA Overview of The High Level Architecture (HLA) Time management in HLA Distributed physical simulation Conclusion Contributions Future works Bibliography 40 i

5 1 Introduction 1.1 Motivations and Objectives Nowadays, developing countries have su ered from natural disasters such as typhoon, tsunami, fire, and flood. For example, in Mekong Delta of Vietnam, under the impacts of climate change, the sea level rise around. This could make the flooding Mekong Delta every year. Thus, environment surveillance and prediction of such phenomenon become necessary. Simulation is a good approach for that purpose. It helps human make better decisions to prevent or relieve the impacts. In recent years, wireless sensor network (WSN) emerges as a good candidate in monitoring the environment. Several inspiring projects have been launched as a common aim to sense the environment [10]. Sensors are used to collect status of physical systems and send status data to computer systems for processing, analyzing. Some reactions will be sent back to physical systems. A such integration between physical systems and computer systems pertain to Cyber- Physical Systems (CPS), as presented in Section 1.2. Therefore, it is necessary to consider sensing processes. The objective is to support and to validate operations of the WSN. Especially, it is responsible for dangerous accidents such as monitoring chemical store placed at residents regions. A composing model of the parallel simulations of the two sides of the CPS will thus be conducted. However, modeling and simulating physical systems confront many issues. These systems often appear as huge systems and complex behaviour. This leads to a lot e ort for designing the models. Moreover, the lack of interoperability is also a major challenge. In fact, they always impact to each other in the real world. For instance, the fire spread is influenced by several other factors, namely weather conditions, wind directions and speeds, responding abilities, and sensing performance of the wireless sensor network (WSN). In such systems, the model consists of a lot of components (fire spreading, weather conditions, firefighter, and WSN). These components and the their relations result in large scale models. Such models are very di cult to maintain and adopt. These circumstances bring about: long run times for simulation runs. long time for developing and testing of such models. huge e ort for maintaining and for adapting the models for other perspectives. 1

6 CHAPTER 1. INTRODUCTION low flexibility and reusability. Traditionally, there are two common approaches to handle these problems. One solution is the employment of powerful hardware. The other is breaking up the model into a set of submodels, which are distributed on di erent computer systems. However, they come from separate works. Thus, in this project, we use a hybrid approach of the association of distributed models and parallel computations. It aims to enable and to adapt to huge size and complex behavior physical systems. This approach can be viewed under two main aspects. For the problem of computing performance, the use of parallel simulations based on GPU is suggested. The powerful GPU has been considered in several studies to speed up large simulations over the last years. To deal with the lack of the interoperability of simulations, we use an IEEE standard High Level Architecture (HLA), which provides independent simulations the ability to communicate together in the context of a synchronous system. The thesis is roughly divided into five chapters: Chapter 1: An introduction to the motivations and the objectives of the study is presented. An overview of related concepts will be described such as Cyber-Physical Systems (CPS), Cellular Automata (CA). A description of PickCell tool and its applications will end the chapter. Chapter 2: A new approach is to simplify the process of modeling physical systems. The approach is facilitated by the PickCell tool in accordance with the CA. Chapter 3: Describing the use of Cuda programming model to simulate physical models. Some experiments are conducted to evaluate the feasibility of the solution. Chapter 4: Using the HLA standard to deal with the lack of interoperability of several simulations. It enables parallel simulations to be able to communicate together in context of distributed systems. Chapter 5: Summarising the contributions and presenting future work. 1.2 Cyber-Physical Systems (CPS) Cyber-Physical Systems (CPS) are integration of computation and physical processes [9], [1]. In which, embedded computers and networks monitor and control the physical processes. It includes feedback loops where physical processes a ect computations and vice versa. 2

7 CHAPTER 1. INTRODUCTION Figure 1.1: An example of Cyber-Physical System. An example of CPS is illustrated in Figure 1.1, as an illustrating of monitoring accidents (pollution, flood, landslides, chemical spreading, as example) in the river. A WSN can be used to observe the status of the river via sensors. Sensors forward status data to computer systems, which will carry out computations. An analysis of computed results can lead to some emergency operations, giving some signals or closing the basin, in the case of the accidents. Apparently, for implementation of this type of system, one of critical challenges is system integration. Therefore, to obtain the interoperability of simulations, an integration solution is required. In fact, on [22], the authors presented a co-simulation framework based on the HLA standard. That work focus on integrating heterogeneous systems, designed in di erent tools and languages, as CPSs. However, the given prototype has not taken care for phenomena and computation performance as well. Thus, the considerations in this project are expected to provide another perspective on phenomena simulations. 1.3 Cellular Automata (CA) Cellular Automaton (CA) is one of the techniques used in simulating complex physical systems such as self-reproduction in biology, di usion models in chemistry. The famous Game of Life, it illustrates that cellular automata have capacity of producing dynamic patterns and structures [2], [3]. According to [4], a major e ort is presented to show the advantages of using CA for modeling systems, especially for natural phenomena. The use of CA for modeling phenomena is clearer, more accurate, and more complete than conventional mathematical system. Moreover, the transition rules of CA models are often simpler than mathematical equations, but the result produced is more comprehensive. It can mimic the actions of any possible physical systems. A 3

8 CHAPTER 1. INTRODUCTION CA typically consists of two main components. The first component is a cellular space that is a lattice of cells, each with an identical pattern of local connection to other cells for input and output. The cell has a set of states that is chosen from a finite number states. In the simplest case each cell can have the binary states 1 or 0. A set of cells called neighbourhood is defined relatively to the specified cell (center). The states of the neighbours will be used to calculate the next state of the center according to the defined rule. The number of neighbour depend on the pattern chosen in modeling process. The second component is a transition rule (CA rule) giving the update of the state (at time t+1) of each cell according to its current state and the states of its neighbourhood (at time t). Typically, the rules for updating states of all cells are the same and do not change over time. Generally, the CA exits under various forms. The simplest CA is one being the one-dimensional lattice, meaning that all the cells are arranged in a line. Then, the neighbourhood of the cell are just in its left and its right. Meanwhile, for the two-dimensional lattice, the most common types of neighbourhood are Moore neighbourhood and Von Neumann neighbourhood (see Figure 1.2). Figure 1.2: Von Neumann and Moore neighbourhood (distance = 1). In Von Neumann neighbourhood, each cell has four neighbourhood, north (N), south (S), east (E), and west (W). We thus have 32 (2 5 ) possible. Meanwhile, for the latter, each cell totally has nine cells, then 512 (2 9 ) possible patterns can be produced. In both cases, the distances are one and transition function is supposed to generate. Therefore, in order to model systems with this approach, the two components need be accomplished: the cellular space and the transition rules or behavior. In the next chapter, an approach proposed in Lab-STICC laboratory to automatically generate the cellular spaces (cell networks) from geographic data is briefly presented. Input data and behavior of each cell will be later determined according to di erent interests on a certain physical system. 4

9 2 Physical simulations based on cell networks This chapter presents a brief description about PickCell, a tool allowing to generate cell networks of physical systems. Their structures thus will be described in the second section as well. We next propose a methodology to develop physical simulations in term of the cell networks. Lastly, some cases are examined to demonstrate the use of the proposed methodology. 2.1 PickCell tool and cell networks PickCell tool PickCell is a modeling tool, has developed in Lab-STICC in recent years (more in document [8]). It enables to access geographic data from various public resources as input data, namely GoogleMap, OpenStreetMap, or even picture files. The tool uses these data to analyze, process, and generate cell network structures of physical processes. The main feature of the tool is extracting visible properties (potential physical systems) on geographic data such as river, forest, or road system. A process start from input data. The final results are a set of separated physical systems being represented by a group of cell networks, presented in Section Generally, this process is performed throughout three main steps: Preprocessing data: Geographic data are usually yet well presented, especially in the case of satellite and air images. At this step, the tool increases the contrast of the data to serve the following steps. Segmenting data into cells: In order to achieve interest regions on the data such as rivers, or roads. The data are divided into small cells. Their sizes (x, y) depend on the objective on the desirable models. In which, x and y parameter represent the width and the height of cells, respectively. It makes sense that with the same size of input data, if x and y values are small, the number of cells will large or vice versa. Recognizing similar cells and grouping into layers: Typically, the tool uses 3 standard components of color (Red-Green-Blue) to classify divided cells into defined layers. Each contains a set of cells with similar colors. Next, the relations between these cells in the same layer will be defined depending on a certain CA pattern. As a result, for each layer, we have a set of cells organized as a network due to their relations (or links). These 5

10 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS sets are considered as cell networks. The details of cell networks will be presented in the next section Cell network As mentioned previous, a cell network is a group of cells and the relations between them. Each typically has its data consisting of four elements: identity, local state (such as pollution density, insect population, geographic positions), links to other cells (or its neighbour), and relative positions to its the neighbour. The last one means that a cell is capable of determining the directions of its neighbour, which can be located at the eastern, the western, the northern, or the southern. This property can be useful in various situations such as simulating the weather, or flow of the fluid. For the sake of simplicity, it can be organized as pairs of number, shown in Table 2.1. Direction Value East (1,0) West (-1,0) North (0,-1) South (0,1) Table 2.1: A proposed organization of directions in a cell network. Table 2.1 formally shows an example of a cell network, which is generated from PickCell tool except for its data represented by the column named Pollution Density. The data can be loaded at the beginning of simulations or at runtime. 6

11 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Cell Id Pollution Density Neighbour Id Directions , 25, 1, 600 (-1,0), (1,0), (0,-1), (0,1) , 0 (-1,0), (0,1) , 26 (-1,0), (0,1) , 0, 589 (-1,0), (1,0), (0-1) (1,0) Table 2.2: The table presents a cell network structure of 601 cells generated by PickCell tool (Von Neumann 1 CA). The use of the cell network brings some advantages in developing physical simulations. Firstly, each cell network is a clear and consistent structure. All cells come from a certain physical system. They own the same type local data and have the same behaviour. This structure looks like a class in OOP (Object Oriented Programming) and its cells are objects being instantiated from that class. Under the view of software engineering, it thus especially useful in maintaining the systems. It is simple to add necessary properties to states or transitions of the models. Secondly, cell networks generated from PickCell tool help to tackle the latency of input data. Many phenomena simulations have used raster data as the input for their models. It is often di cult to distinguish the interest regions with this type of data. The limitation causes the useless computations occurring on the outside of those regions. For example, in [23], data cells are not belonging to the real interest area (rivers) will be marked NoData in the preprocessing step. The use of models built from cell networks will avoid this useless processing in default. In addition to the cell network structure, the PickCell tool also allows to extract visible data. This is useful for displaying and analyzing simulated results. Figure 2.1 demonstrates how a river system is displayed from extracted visual data. In current version, the tool enables to generate two dimension data in the format of two concurrent programming languages, Cuda and Occam [6], [7]. The third dimension data for elevation will appear soon in the next version. 7

CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Figure 2.1: A cell network of a river system generated from PickCell tool with Von Neumann 1.

12 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Figure 2.1: A cell network of a river system generated from PickCell tool with Von Neumann 1. In short, cell networks generated from PickCell tool are presented as skeletons for simulation models. In order to obtain a complete model by this approach, two other components need to be considered: input data and transition rules. These will be presented in the next section. 2.2 Physical simulations based on cell networks The cell network structure early presented is one of main components for this methodology. Each model has at least three other components: cell network, input data, and transition rule. The first one will be generated from geographic data with the facilitation of PickCell tool. Whereas, the two others will be defined according to the characteristics of physical systems. A summary of the methodology is depicted in Figure 2.2. The process has three main steps. Initially, it begins with geographic data. These data are next processed to generate a cell network by the PickCell tool. The cell network is associated with input data and transition rule to make up a complete model. Lastly, this model is executed by a simulator. Currently, the cell networks are generated in two versions, Cuda and Occam codes. Cuda was chosen in this work due to adequation of its model. 8

13 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Figure 2.2: A summary of the proposed process which is used to conduct physical simulations. 2.3 Case study and applications This section describes a case study that has been applied to study region. It is a small area located in Mekong Delta of Vietnam, as shown in Figure 2.3. In which, there are totally three physical systems: river, forest, and road. The first two of those, river system and forest system, which were considered in this project. Considering applications of the proposed approach, there are two models will be conducted from the study region. One is the model of forest fire spread. The other is river pollution di usion. In addition, we assume that a Wireless sensor network (WSN) is used to monitor the status of the forest. Thus, a model of WSN is also developed. Details of three models are later described in this section. Another assumption is that there are communications between those three systems. One happens as the fire spreading close to the river. Then, ashes of the fire will pollute to the river. Meanwhile, as the sensors of the WSN recognised the fire appearing near to them, these sensors will raise emergency signals. This scene will be clarified and used as an application for a solution presented in Chapter 4. 9

14 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Figure 2.3: The study region: A small area in Mekong Delta, the South of Vietnam. (data source: OpenStreetMap [16]) In reality, there are many elements of input data will be used for models and transition rules are often very complicated. The goal is to create simulations as real as possible. However, in our case, some basic characteristics will be picked to express the possibility of the proposed methodology. Particularly, the input data and the transition rule of each model are presented as follows: The di usion of pollution in the river This model is used to simulate the di usion of pollution in a river. Regarding the context of pollution, it is possible to think of various potential situations such as chemical, oil, contaminant. Then, the di usion much depends on the density. Thus, the pollution density was kept as input data for this model. Each cell contains an amount of pollution density, which represents the cell state. The states are changed according to the transition rule. Input data: Pollution density. Transition rule: At every time step, to achieve a new state at time t+1, each cell will perform sequential tasks: If the local density value is larger than zero, it will be randomly subtracted a certain amount of its density. That proportion will be equally transported to its neighbour. Next, it will receive some proportions from its neighbour. Finally, the addition and the subtraction will be updated to prepare for the next step (time+1). 10

CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS 2.3.2 The fire spread in the forest A model used for simulating the fire spread in the forest. It is reproduced from a sample in CORMAS [15].

15 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS The fire spread in the forest A model used for simulating the fire spread in the forest. It is reproduced from a sample in CORMAS [15]. Each cell has four possible states: tree, fire, ash, and empty. At the beginning, some cells are initialized with the state fire, while others are tree. Input data: Tree, fire, ash, and empty. Transition rule: If a cell is tree at time t, it will become fire at time t+1 in the case that there is at least one of its neighbour is fire. If a cell is fire at time t, it will become ash at time t+1. If a cell is ash at time t, it will become empty at time t Wireless sensor network (WSN) In this study, WSN plays as a sensing component role. It regularly collect raw data from the environment, processes that data, and raises emergency alert in the case of the fire detected. A WSN will monitor status of the forest. To do that, a set of sensors will be deployed in the forest border because our consideration is the spread of the fire to other systems. In this case, we give a simple way using a distributed algorithm for the deployment of sensors. The algorithm will be described in Section 2.4. A simple WSN is achieved as shown in Figure 2.4. Figure 2.4: Deploying sensors along the forest border extracted from the study region with the 4 neighbour pattern. The communication range and the sensing range are 25 and 5 cells units, respectively. Typically, sensors have two types of ranges. One is to indicate the sensing capacity of the sensor. This sensing range can be small. Meanwhile, the other, communication range, can be 11

16 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS longer due to radio link technology. Thus, as deploying sensors, it is necessary to make sure that sensors are connected together depending on the value of the communication ranges. Input data: Sensing data. Transition rule: At every step, the nodes check data received from the fire forest simulation. In case of fire detected at some points, signals will be raised. 2.4 Routing algorithm This section presents a routing algorithm implemented in parallel. Taking advantage of the GPU computation, a new version of this algorithm was implemented in Cuda starting from a Occam program. The routing table which can be used for deploying sensors as described in previous. We assume that the network has the shape and structure like the cell network as introduced in Section Generally, it consists of n nodes, numbered 0 to n-1, they are viewed as their identity, as showed in Figure 2.5. Associating to each node is two elements: route table and temperate table. Inwhich, route table will store identities of itself and other nodes, to which it has reached after t step. The structure of this table is presented in Table 2.3. Meanwhile, temperate table will only contains new nodes identity, to which it reached at each step. It means that after each step, the values held by temperate table are completely replaced by the new ones while the route table can be added more new records or will be unchanged. At each step, each node performs two main tasks that are sending out local temperate tables to its neighbor and receiving temperate tables from them as well. These tasks will be performed n-1 times. This is to assume that the maximum distance will be obtained. The algorithm is presented as the following: Algorithm in parallel: Initializing Adding node s id to local temperate table and route table with distance is zero, link index is -1. For i to n For each neighbour Sending local temperate table to neighbour. Receiving a temperate table from the neighbor. Emptying local temperate table For each id in received temperate table If id does not exist in the route table. Adding id, i as distance, and a link index to route table. 12

17 CHAPTER 2. PHYSICAL SIMULATIONS BASED ON CELL NETWORKS Adding id to local temperate table. Figure 2.5: A simple network. Node 0 Node 1 Known Id Distance Links Known Id Distance Links Node 2 Node 3 Known Id Distance Links Known Id Distance Links Table 2.3: An example of route table at node 0 after 3 steps. These tables show information held by nodes in the network. Each node can know who it can reach and the distance to destinations.that it can achieved. 2.5 Remarks The chapter presented a variety of subjects. The most noticeable is the concept of cell network. It plays an important role in developing physical models. For the next chapter, parallel computations will be employed to simulate these models. 13

18 3 Simulations with Cuda programming model This chapter describes Cuda programming model and its applications. One goal is to show a adequation of mapping between GPU architecture and cell network structure. Besides, it enables to solve the problems of both large cell networks and complicated behavior. Next, the performance tests on computation will be conducted in di erent scenarios due to the necessary considerations on the e ectiveness of this approach. 3.1 GPU and Cuda programming model Introduction to GPU The Graphic Processing Unit (GPU) [5] is massively multithreaded - many core chips composed of hundreds of cores and thousands of threads. This provides the capacity for processing large data in parallel. Thus, it is widely used in parallel computations. a simplified of a motherboard architecture is depicted in Figure 3.1. There are two parts, the left part for the CPU (host) and the right one for the GPU (device). They are connected together by a PCI bus. On the CPU, only host memory is considered in this model. Meanwhile, the GPU chip comes with a set of streaming multiprocessors (SM). Each consists of several scalar processors (SP), a set of registers, a shared memory. An on-chip shared memory is visible for all threads that executed on a SM. A global memory is shared for all SMs. 14

CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.1: A simplified motherboard architecture. 3.1.2 Cuda programming model Cuda (Compute Unified Device Architecture) is created by NVIDIA.

19 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.1: A simplified motherboard architecture Cuda programming model Cuda (Compute Unified Device Architecture) is created by NVIDIA. It provides a platform for parallel computing and programming model. It enables to increase computing performance by harnessing the power of the GPU. Cuda provides a set of extensions to C/C++ language, to express parallel programs. The GPU has thousands of threads handing multiple tasks while a CPU consists of a few threads for sequential serial processing. Thus, a Cuda program typically consists of CPU code (host code) and one or more kernels (device code) running concurrently on the GPU. As shown in Figure 3.2, the compute-intensive portions of the application will be sent to the GPU, while the remainder of the code still runs on the CPU. Kernels are executed by many several threads with private local variables and shared memory. The executions of blocks are synchronous while those of threads in each block are independent. In addition, each of the CPU and the GPU has its own separate memory. They cannot directly access the memory of each other. Thus, we need to explicit transfer data between the two memories via PCI bus. 15

CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.2: Anatomy of a CUDA program. 3.2 Accelerating simulations by using Cuda Programming with Cuda, means programming a large number of threads with own shared memory and concurrent executing the same task.

20 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.2: Anatomy of a CUDA program. 3.2 Accelerating simulations by using Cuda Programming with Cuda, means programming a large number of threads with own shared memory and concurrent executing the same task. Therefore, if there is a need to address a large number of repeated works which are the same, it is convenient to apply this model. In our case, each model owns a cell network, input data for each cell, and a common transition rule for entire cells. This makes sense that each cell has its local data and global behavior. Every cells must make the same computation on its own data at each step in order to achieve new states for the system. It is thus simple to map each cell to each thread being responsible for the processing of that cell, as illustrated in Figure

21 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.3: The mapping between the cell network structure and the GPU architecture. According to this model, data need to be moved on the global memory to share between threads. Figure 3.4 shows the data flow of physical simulations in term of CUDA programming. This can be summarized into some main steps: Initializing initial states (input data) for all network cells. Transferring data (cells states and network structure) to the GPU for computations. For each cycle, the new states of all nodes will be concurrently computed on the GPU. These states will updated with new values to prepare for the next cycle. Sending data back to to the CPU memory possibly to display and analyze the results. It is optional, if the result of each step is not considered for displaying and analyzing at run-time, these operations can be omitted. Figure 3.4: Data flow in the system. Obviously, if the phase of displaying and analyzing is ignored, the execution of simulation 17

22 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL mostly is run on device. Hence, it is believed that the benefit of performance in this case will be proportional to the size of cell networks. It becomes more worthwhile in the case of simulating phenomena, which often appear with large sizes and very complicated transition functions. Moreover, this proposition provides an opportunity to achieve computations and statistics in real time. This increasingly becomes important when the needs of predictions of many emergent cases increase, namely clouds of insects, flooding, tra c congestion, tsunami, fire. For those situations, the systems can directly access available data from the natural environment via observing systems. The simulations use input data to conduct useful information (directions of clouds of insects or the level of flood at a certain time in the future, for example). 3.3 Details of GPU implementation of simulations In this section, the details of GPU implementations of three main simulations will be presented: pollution di usion, forest fire and wireless sensor network. All of them are developed by C programming language in accordance with Cuda model. These implementations are resulted from the analysis in the previous section. The formal presentations of implementations are described as the following. Host program implemented on the CPU (1) Initializing the initial values for all cells. (2) Copying the cell network structure and data from the CPU host memory to the GPU device memory and launching the kernel. Kernel program implemented on the GPU (3) Looping each cycle. (4) Computing the new states for each cell. (5) Updating new states to each cell. (6) Reading back results to the CPU and output the results (once for each time step or more). Apparently, the execution runs mostly on GPU (from (3) to (5)). Others do not much a ect to global performance if line (6) is not considered. Then, line (1) is executed once and line (2) is run twice. Thus, as a comparison, the execution time on CPU can be omitted. In the next section, some initial measurements will be performed for evaluating the e ectiveness 18

23 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL of using the massively parallel architecture GPU to accelerate the computation of phenomena simulations. 3.4 Performance measurement principles In order to validate the performance of the proposal methodology, a few measurement tests were performed. The simulation of pollution di usion in the river was chosen as a case. The description of the pollution di usion model follows Section The implementation of the transition function presented in Listing 3.2. There are two data structures used. The NodeState structure contains states of cells, the Canaux structure consists of links to neighbours. Listing 3.1: Transition function { } device NodeState computestate(nodestate nowstate, int nodeindex, Canaux channels) NodeState mystate ; int nbin, nodein ; float receive ; /// Getting pollution density of the c e l l mystate = nowstate [ nodeindex ] ; /// Getting number of neighbours of the c e l l nbin = channels [ nodeindex ]. nbin ; receive = 0; for ( int i = 0; i < nbin ; i++) { /// Getting id of the neighbours nodein = channels [ nodeindex ]. read [ i ]. node ; receive = receive + ((nowstate d[nodein].density / 2.0) / (float) channels [ nodein ]. nbin) ; } /// Computing the new state mystate. density = (mystate. density / 2.0) + receive ; return mystate ; We have tested and have evaluated the computational e ciency in various studies. The concentration of these tests is to show how the GPU speeds up the simulations when comparing to the CPU. Therefore, the time for transferring data between CPU and GPU are omitted in most cases. The time execution of the simulation on the host is also ignored due to most of computation being moved on the device. As mentioned earlier, the simulation execution costs depend on two main components: cell networks (size and type of CA pattern chosen) and the complexity of transition rules. Thus, many di erent aspects related to these components will be concerned. All tests have been tried on a PC with hardware configuration shown in Table 3.1. Information about Graphics Device is presented in Table 3.2 (more details, see [11]). We have used a pro- 19

24 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL filing tool nvprof [17] to estimate time for GPU computation and the standard library time.h for that on the CPU. Intel(R) Xeon(R) CPU E GHz Num. CPUs 8 Num. Cores/CPU 4 Architecture RAM i GB Table 3.1: Technical data of PC used. GeForce GTX 680 Num. cores 1536 Maximum number of threads per block 1024 Global memory 4 GB Table 3.2: Technical data of NVidia graphics card used. The first scenario: The comparison of time computation between the CPU and the GPU was carried out. All tests follow the model of river pollution di usion (Section 2.3.1) with the pattern of 8 neighbourhoods and 1,000 cycle runs for each test. The transport time was considered in this case study. The computation on both the CPU and the GPU are influenced by the size of cell networks (number of cells), but not by the size of cells. Since, the cell is a basic element in cell networks, the computations are careless about the pixels of cells. With the same studied region, as the size of cells is smaller, we can process a larger cell network. Otherwise, the cell network is small if a bigger size of cells is chosen. Thus, the sizes of cells were regardless the performance tests. Table 3.3 shows the time executions of the pollution di usion model on the CPU and the GPU with 1,000 cycles. The network sizes used between 1,220 and 83,661 cells. Regarding the network size, the number of cells influence the performance for both the CPU and the GPU. On the CPU, the upward trend is very noticeable. The great increase starts from the size of 10,703 to 83,661 at a rate of 0.26(s)/1,000 cells. It is projected that the trend anticipation will be maintained with bigger sizes. Whereas, the increase on the GPU is not dramatic. It gradually rises between 1,220 and 83,661 at a rate of 0.01(s)/1,000 cells. Table 3.3 presents that the GPU is overwhelmingly faster than the CPU. The gap increasingly becomes significant according to the rise of the number of cells. This is visually expressed in Figure 3.5. As the size of cell network is 83,661, the GPU is approximately 22 times faster 20

25 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL than the CPU. It is that the use of GPU is very vital in the case of vast systems. Time (seconds)/1,000 cycles Num. cells Cell size (Pixel) CPU GPU 1,220 10x ,703 5x ,425 2x ,661 2x Table 3.3: The computation comparison between the CPU and the GPU in the case of pollution di usion model. Figure 3.5: Demonstrating the accelerating time of using the GPU for physical simulation. Figure 3.6 shows an example about physical simulation on GPU. The cell network of a river is generated by PickCell tool with the use of four neighbor pattern. Meanwhile, the model of pollution di usion is referred from Section 2.2. Initially, two polluted points are randomly created in the river. These points contain an amount of pollution density as their data states. At every step, system states are changed according to the transition function. 21

CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.6: Illustrating a simulation of di using pollution in a river following the model described in Section 2.

26 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.6: Illustrating a simulation of di using pollution in a river following the model described in Section 2.2. It is initialized with two polluted points (black points). The second scenario: Di erent sizes of cell networks are still taken into account. The two popular patterns of CA (Von Neumann 1 and Moore 1) and the di erence of number of cycles are considered as well. The model are used as the previous case. The achieved results are presented in Table 3.4. One of these attempts is shown in Figure 3.6. The values shown in Table 3.4 indicate that the increase of cycles does not much a ect to the execution time. It can be understood that the transition functions are very simple to generate major di erences. 22

27 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Num. cells Cell size (Pixel) CA Pattern Time (seconds) / Num. cycles 100 1,000 10, ,000 1,000,000 1,220 10x10 VN ,220 10x10 Moore ,703 5x5 VN ,703 5x5 Moore ,425 2x2 VN ,425 2x2 Moore ,661 2x2 VN ,661 2x2 Moore , Table 3.4: Measurements results. Regarding CA patterns, for small networks, the di erences between Von Neumann 1 and Moore 1 are not very remarkable. However, in the case of larger ones, Von Neumann 1 is significantly faster than the other. As a case, as running time is 10,000 cycles and network size is 83,661 cells, the Moore 1 takes (s) while the Von Neumann 1 just takes 8.948(s). The former is about 1.6 times slower than the latter, as shown in Figure 3.7. Figure 3.7: The graph displays the increase of the gap between two CA patterns with 10,000 cycles. The third scenario: It aims to show that the execution time also depends on transition function. To do that, we modified a little on the previous version. Particularly, at every step, each cell loses an random amount of the pollution density. The implementation is shown as below. 23

28 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL { } Listing 3.2: Transition function (version 2) device NodeState computestate(nodestate nowstate, int nodeindex, Canaux channels, curandstate devstates) NodeState mystate ; float losspercentage, receive, loss ; int nbin, nodein ; mystate = nowstate [ nodeindex ] ; /// Generating a random value in [ ] by generatenumber function. losspercentage = generatenumber(devstates, nodeindex) ; /// Calculating an amount of loss. loss = losspercentage mystate. density ; /// Getting number of neighbour nbin = channels [ nodeindex ]. nbin ; receive = 0; for ( int i = 0; i < nbin ; i++) { /// Getting id of the neighbour nodein = channels [ nodeindex ]. read [ i ]. node ; receive = receive + ((nowstate[nodein ]. density / 2.0) / ( float) channels [ nodein ]. nbin) ; } /// Computing the new state mystate. density = (mystate. density / 2.0) + receive loss ; if (mystate.density < 0.0) { mystate. density = 0.0; } return mystate ; The graph 3.8 demonstrates the influences of transition rules on execution time in this approach. The version 2 is slower than version 1 due to the more complex behaviour. The increase of time is stable following the size of the networks. 24

29 CHAPTER 3. SIMULATIONS WITH CUDA PROGRAMMING MODEL Figure 3.8: Comparing the execution time between previous transition function (version 1) and the new one (version 2). 25

30 4 Distributed simulation with HLA The simulations of large systems often face with the performance issues. The use of Cuda programming model can deal with those. However, the lack of interoperability between simulations poses a major challenge. Thus, the High Level Architecture [(HLA) [12], [13], [20]] standard is proposed as a solution for addressing that new demand. According to this standard, the distribution of many sub-simulations can be achieved instead of the development of one vast simulation. The integration of Cuda model and the HLA leads to a hybrid solution in which several parallel simulations can be distributed on di erent computer systems. This chapter gives a brief description of the application of HLA on parallel simulations. 4.1 Overview of The High Level Architecture (HLA) The High Level Architecture (HLA) ( [12], [13], [20]) is a standard for distributed simulations, the main goal is to support interoperability and reusability of simulations. The HLA was developed by the United States Department Defense (DoD) to facilitate the integration of distributed simulation models within an HLA environment. It allows the division of a large scale model into a number of manageable components, while maintaining interaction between them. Over the last years, the HLA is deployed in a wide range of simulation application areas including transportation and the manufacturing industry. But, it hardly appears in simulation about phenomena, especially the climate change area. The HLA is thus suggested as a potential approach of composition of parallel simulations in this project. 26

31 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.1: HLA Federation. In HLA terminology, the entire system is represented by a federation. Each simulator referring to the federation is called a federate. A set of federates is connected via Run Time Infrastructure (RTI). These federates can be established on di erent platforms and connected together by a network system. In such case, RTI can be viewed as distributed operating systems for interconnect cooperating system federates. Figure 4.1 describes the global architecture of a HLA simulation. Generally, the HLA specification defines: Asetofrules:Thisdescribestheresponsibilities of federates and their relationship with RTI. There are ten rules. One of them is that all exchange of data among federates should occur via the RTI during a federation execution. An interface specification: The interface specification prescribes the interface between each federate and the Runtime Infrastructure (RTI), which provides communication services to the federates. The interface specification is divided into some main management areas: Federation management: Federation management includes main tasks such as creating federations, joining federates to federations, resigning federates from federations, and destroying federations. Declaration management: This allows federates publish and subscribe class attributes and interactions to RTI. Other federates can only subscribe to an attribute or an interaction when they were published by the federates owning them. Object management: Which includes the tasks of creating, and sending the updates of objects to other federates. Ownership management: The RTI allows federates to distribute the responsibility for updating and deleting object instances with a few restrictions. Time management: This focuses on the implementation of time management policies and negotiate time advances. This mechanism allows to create several simulations running concurrently. 27

32 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA An Object Model Template (based on the OMT standard [14]): This component defines how information is communicated between federates, and how the federates and federation have to be documented (using Federation Object Model FOM). FOM defines the shared objects, attributes, and interactions for whole federation. There are two elements can be exchanged between federates: An object: is an entity that represents actor playing in the simulation. It contains shared data that are created by a federate during the federation execution and persist until it is destroyed. The FOM defines all classes of object, a case presented in Table 4.2. As a federate wants to publish or subscribe to an object, it must compatibly define that object in its FOM. Objects store their data in attributes. An interaction: is a broadcast message that any federate can send or receive. A publishing federate sends out an interaction to the federates, which have subscribed to the publisher. If no subscribing federate receives the interaction, the data it carries are lost. The FOM also defines all classes of interaction. As a federate wants to publish or subscribe to an interaction, it must compatibly define that interaction in its FOM. Interactions carry data in parameters. Figure 4.2: Illustrating a high level of the interplay between a federate and a federation. Figure 4.2 depicts the interplay between a federate and a federation. Initially, a federate will try to create a federation, or to connect to existing one on RTI. It then specifies what data will be shared with other federates by using publishing services. These published objects or published interactions will be available to all federates, which also has a connection to the same 28

33 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA federation. An federate want to send data to other federates, it has to register objects and call an update service. That data will be automatically reflected to subscribers by the RTI. Releasing allocated resources is always necessary at the end. 4.2 Time management in HLA The RTI provides a variety of optional time management services. It is important to understand time management to manage the mechanism of exchanging events between federates. Each federate manages its own logical time and communicate this time to the RTI. The RTI will ensure correct coordination of federates by advancing time coherently. In the discrete event simulation literature, logical time is equivalent to simulation time. It is used to make sure that federates observe events in the same order [19]. It helps to avoid many problems such as causality violation, or di erent results led from repeated executions of the simulation with the same input data. Logical time is not mapped to real time Time policies According to the HLA time policies, each federate is involved in the progress of time. In some cases, it is necessary to map the progress of one federate to the progress of another. A federate needs to request a regulation policy to participate in the decision for the progress of time. A constrained federate follows the time progress imposed by other federates. As our approach, the synchronization of logical time from di erent federates is necessary. Thus, the federating and constrained federates are allowed, as shown in Table 4.3. This enables participating federates can exchange data together Time progress The second portion of the time management component provides a mechanism to advance simulation time within each federate. There are two particular services which federates can invoke to request time advancement from the RTI. The timeadvancerequest is used to implement time-stepped federates; the nexteventrequest is used to implement event-based federates. The granted time is given by timeadvancegrant service. Generally, a time management cycle consists of three steps. First, a federate sends a request for time advancement. Next, the federates can receive ReflectAttributeValues callbacks. The RTI completes the cycle by invoking a federate defined procedure called timeadvancegrant to indicate the federate s logical time has been advanced. 29

CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.3: A model of time advancement request is used in this project. 4.2.

34 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.3: A model of time advancement request is used in this project Time synchronization As presented in previous sections, all simulations have to synchronize their local logical time to ensure the causality. The constrained parameter and regulating parameter are enabled for all simulations. The former ensures federates to be able to send the updates and interactions in causal order. In the other hand, the latter allows federates to able receive those updates and interactions from the RTI. Since a passive visualization federate does not send any updates or interactions, it has no impact to the time advance of the federation. Therefore, only constrained parameter is enabled and regulating can be switched o in the case of visualization. Table 4.1 shows time policies proposed for the case study (Section 2.3). Federate Time constrained Time regulating Time advance Forest Yes Yes Time stepped River Yes Yes Time stepped WSN Yes Yes Time stepped Visualization Yes Yes/No Time stepped Table 4.1: Time management of the federation. To synchronize activities between several federates participating in a federation, the RTI gives a mechanisms for exchanging data between them. In this case, times will be associated with exchanged data in coordinating federate activities. The RTI allows federates communicate explicit synchronization points. Figure 4.4 illustrates a process of synchronizing between two federates, the river federate and the forest federate. 30

35 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.4: Federate synchronization First of all, one of available federates sends a synchronizing request to the RTI and in this case it is the river federate. Then, the RTI will send the response to river federate and later send an announce to other federates to achieve a synchronization point. A service will be used by federates to confirm the synchronized point achieved. In the next portion, some issues relating to exchanging data is considered in the context of distributed simulations Exchanging data Exchanging data between simulation federates is one important part of distributed systems. However, a question arriving in this case is that what kind of data must be shared, where the communication will happen. Regarding the type of exchanging data, it is determined by characteristics of real systems as well as interoperability between them. As our case, there is a communication between four federates: forest, river, WSN, and visualization. The forest federate transports its status to the river federate and the WSN federate. Meanwhile, the three federates need to provide their data to the visualization federate for analyzing the results. To achieve it, the forest federate will publish its data (forest status and position) as an object class (ForestNode). The river federate and the WSN federate need to subscribe it. As the same case, the river federate and the WSN federate also publish the object classes RiverNode and WSNNode, respectively. These published classes have to be declared in the FOM of the federation, has a structure as indicated in Table

36 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Object Class Attributes Published by Subscribed by ForestNode State, Position ForestNode River, WSN, Visualization RiverNode Pollution density, Position RiverNode Visualization WSNNode State, Position WSNNode Visualization Table 4.2: Objects and their attributes, publishers and subscribers. In some cases, it is also important to specify where data will be exchanged between two federates, especially in the case of physical systems owning very large sizes. Indeed, it is often ine ciency to send the entire data via the RTI because of issues with network performance and local computation yield. Thus, we proposed a solution for a general case of exchanging data between two adjacent systems. Adjacent situation is two physical systems that have a common frontier or some places in common. It is useless if unrelated information is sent to others. For example, as shown in Figure 2.3, new polluted points to the river can be caused by the ashes of forest fire only appears at the frontier of the two systems. Forest fire federate sends regularly its states to the river federate. The latter only takes care states of points close to it instead of entire forest states. This not only takes time for transporting data between federates, but also lead to less e cient in computation at receiver side. A solution based on the morphology theory [21] can be used to address that issue. That enables to smooth the boundary of physical systems by applying basic operations such as erosion, and dilation. To summarize, only the status data at the boundary of forest will be sent to the RTI. 4.3 Distributed physical simulation This section presents an application of using of the HLA standard for unifying parallel several simulations, or called a mixed simulation. The study region was suggested as shown in Figure 2.3. The whole model was split in three simulation federates: forest fire spread, river pollution di usion, and WSN. The simulation federates was all implemented in accordance with the Cuda programming model and the HLA standard as well. These parallel simulations are executed concurrently as three di erent simulators. Their models were presented in Section 2.2. In addition to the three federates, the last one, visualization, is designed as a supportive federate. The overview about the federation can be seen in Figure

CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.5: A structure for a proposed federation. Repeating the communication that was proposed in Section 2.

37 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA Figure 4.5: A structure for a proposed federation. Repeating the communication that was proposed in Section 2.3, forest fire spread will produce ashes which result in some new polluted points and dusts to river pollution di usion at time t. The latter will include these new data to its model at time t+1. The communication depends on a specify condition. There is also the communication between WSN and forest fire spreading, the sensors regularly collect the forest status as it is the goal of sensing. New information will be sent to observers. In the case of fire detected, the observers will raise emergency signals as the fires were detected. To do that, the synchronization needs to be achieved as indicated in Table 4.1 and the shared data have to be declared as shown in Table 4.2. The file FOM for the federation was represented in the cyber.fed file shown in 4.1. Listing 4.1: cyber.fed file ;; Cyber physical simulation (Fed (Federation Cyber) (FedVersion v1.0) (Federate river Public ) (Federate forest Public ) (Federate wsn Public ) (Federate visualization Public ) (Objects (Class ObjectRoot (Attribute privilegetodelete reliable timestamp) (Class RTIprivate) (Class ForestNode (Attribute PositionX RELIABLE TIMESTAMP) (Attribute PositionY RELIABLE TIMESTAMP) (Attribute State RELIABLE TIMESTAMP) ) (Class RiverNode (Attribute PositionX RELIABLE TIMESTAMP) (Attribute PositionY RELIABLE TIMESTAMP) (Attribute Density RELIABLE TIMESTAMP) )

38 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA ) (Class SensorNode (Attribute PositionX RELIABLE TIMESTAMP) (Attribute PositionY RELIABLE TIMESTAMP) (Attribute State RELIABLE TIMESTAMP) ) ) ) Forest fire spread federate The model of this simulation federate was presented in Chapter 2. In which, there are some fires (red points) being randomly initialized in the forest. These fires will spread around according to the transition function and CA pattern of the model. An example about the spreading is shown in Figure 4.6. The green, red, grey, and white points represent the trees, fires, ashes, and empty states, respectively. Figure 4.6: An example of simulating of fire spread in the forest. The pattern of 4 neighbour is used. The red color represents fire trees and the gray color implies ashes formed by the fire. The ashes can be formed after some steps. These ashes are able to pollute the river as shown in Figure River pollution di usion federate The model of pollution di usion in the river was also presented in Chapter 2. Initially, there are some polluted points randomly generated in the river. During the progress of di usion, it always receives status data about the fire from forest federate via the RTI. It will check the data to determine whether the ashes will pollute some river cells or not. This defends on a specific condition. For each river cell, if the distance to an ash cell is equal or less than a specify 34

CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA threshold, the pollution density of river cell will decrease in inverse proportion of that of the distance.

39 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA threshold, the pollution density of river cell will decrease in inverse proportion of that of the distance. The RTI only sends that data to river federate as soon as it receives an update call from forest federate. The update call only appears when ashes presented in the scope of the forest boundary. Figure 4.7: A result is got from visualization federate. This demonstrates the exchanging data between the two simulations via the RTI. Two regions marked with the red circles representing the new pollution created by the ashes, which are formed from the forest fire after 4 steps WSN federate The model of WSN was also introduced in Chapter 2. Every time step, nodes will receive the data from forest federate via the RTI and only consider to cells in the scope of the sensing range. If it detected that there are fire, it will forward that information to a observer for making decision. The signals will be raised as the fire is recognized. As depicted in Figure 4.8, thered rings indicate that fires have been detected at those sensors Visualization federate The viewer federate is based on the 2D visualization X Window System. As mentioned above, it first subscribes all necessary data, which have been published by other federates. The aim is to provide a overview on the results as shown in Figure 4.8. Initially, the background of the viewer is drawn from visible data extracted from PickCell tool. During the federation execution, this federate will receive data from others and update the view at every step. 35

CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA 4.3.5 A case study This section describes a case of the federation.

40 CHAPTER 4. DISTRIBUTED SIMULATION WITH HLA A case study This section describes a case of the federation. Initially, one federate creates a federation on the RTI and waits for other federates to participate. Another federate will connect to that federation and also wait until the last coming. The first one will send a request to others to achieve a synchronization point. After the responses of other federates, the synchronization point is achieved. They run on the same time progress. At each time step, these federates exchange data together via the RTI. Figure 4.8 presents the results captured from visualization federate. Figure 4.8: Illustrating an interoperability between the four federates via the RTI Simulation tools Along with the PickCell tool, which is developed at LabSTICC laboratory. An Open Source software, CERTI [18], was used in this project. The CERTI RTI supports HLA 1.3 specification (C++ and Java). The X Window System was used to support for displaying the results of simulation federates. 36

! High Level Architecture (HLA): Background. ! Rules. ! Interface Specification. Maria Hybinette, UGA. ! SIMNET (SIMulator NETworking) ( )

Outline CSCI 8220 Parallel & Distributed Simulation PDES: Distributed Virtual Environments Introduction High Level Architecture! High Level Architecture (HLA): Background! Rules! Interface Specification»