Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o 1
Itroductio This presetatio aims to give you a idea of how SCI ca be used for embedded / realtime solutios. SCI Reflective Memory is a software Reflective Memory solutio. SCI Reflective Memory is a library that you ca use to build Reflective Memory applicatios from, without havig to cosider the low-level implemetatio of SCI. Dolphi Itercoect Solutios AS 2
SCI Reflective Memory Applicatio specific code built i Reflective Memory shell SISCI library SISCI Driver IRM Driver Reflective Memory Dolphi Itercoect Solutios AS 3
Cotets Itroductio to Reflective Memory Dolphis HW ad SW used i buildig SCI Reflective Memory SCI Reflective Memory techical descriptio, features ad beefits Dolphi Itercoect Solutios AS 4
SCI Reflective Memory Lab 1600-1730 Test ad evaluatio of SCI Reflective Memory demo programs. The exercises are foud i your labmaual (oe sheet). Dolphi Itercoect Solutios AS 5
Reflective Memory Applicatio specific code built i Reflective Memory shell SISCI library SISCI Driver IRM Driver Reflective Memory Dolphi Itercoect Solutios AS 6
Reflective Memory Reflective Memory systems are a solutio to problems raised by message passig i multicomputer eviromets. Reflective Memory systems belog to the class of disributed shared memory systems (DSM) Dolphi Itercoect Solutios AS 7
Reflective Memory Private memory module Reflective Memory Private memory module Reflective Memory Private memory module Reflective Memory Each system processor icludes a dual-ported local physical memory. A part of memory is cofigured as logically shared. The Reflective Memory is composed of all these physically distributed, logically shared memory parts mapped ito a global (shared) address space: The Reflective Memory Space. Dolphi Itercoect Solutios AS 8
Reflective Memory Private memory module Reflective Memory Private memory module Reflective Memory Private memory module Reflective Memory The mai idea of Reflective Memory is that if a shared data item might be reused, a accurate copy of it should be kept i each processors local memory. Dolphi Itercoect Solutios AS 9
Reflective Memory Private memory module Private memory module Read operatios are performed o local memory Reflective Memory Private memory module Reflective Memory Reflective Memory Write operatios geerates automatic updates of all system copies by a broadcast trasactio Dolphi Itercoect Solutios AS 10
Advatages ad disadvatages of Reflective Memory systems compared to other DSM systems: Advatages: Computatio typically overlaps with commuicatio Memory access time is usually costat ad thus determiistic. Because of their iheret replicatio thay are good for fault tolerace Simpler, ad have bee commercially implemeted for decades. Read operatios are fast. Disadvatages: For applicatios characterized with loger sequeces of writes to the same segmets, RM systems may produce ueccessary traffic. The itercoectio medium usually represet a bottleeck due to may data trasfers. Processes that write to the same shared memory locatio must be explicitly sychroized. Dolphi Itercoect Solutios AS 11
Reflective Memory applicatios Aircraft, Ship ad Submarie Simulators Automated Testig Systems Idustrial Automatio High-Speed Data Acquisitio Dolphi Itercoect Solutios AS 12
Reflective Memory features Reflective Memory updates ca occur o ay type of itercoect. Reflective Memory systems ca use ay type of topology. Reflective Memory systems are ot limited by ay particular memory cosistecy model. The shared memory regios ca be mapped either dyamically or statically. Dolphi Itercoect Solutios AS 13
Typical Reflective Memory features u Automatic updates of remote shared memory copies u Data filterig: Maybe ot every temporarily stored variable have to be reflected? u Reflective Memory cosistecy: The shared regio ca oly be accessed by oe party at the time. u Oly shared writes are propagated through the system Dolphi Itercoect Solutios AS 14
Typical Reflective Memory features u oe-to-all broadcast commuicatio (hardware based) u computatio overlaps with commuicatio u Hardware support for heterogeeous computig could sigificatly improve system usability. u explicitly sychroizatio (hardware based): Hardware support for sychroizatio icrease performace. Dolphi Itercoect Solutios AS 15
Why SCI Reflective Memory? Reflective Memory is a DSM architecture, like SCI, oly orgaized i aother way. Reflected Memory could easily be implemeted i Dolphi s HW ad SW. SCI systems have good fault tolerace ad redudacy characteristics. Competitive performace ratio for Dolphi s SCI products (Will get back to this later). Dolphi Itercoect Solutios AS 16
SCI Reflective Memory Applicatio specific code built i Reflective Memory shell SISCI library SISCI Driver IRM Driver Reflective Memory SCI Reflective Memory is a software reflective memory solutio based o Dolphis Adapter cards ad software. SCI Reflective Memory is a SISCI programmig shell that programmers ca use to write applicatio specific code for their Reflective Memory applicatio. Dolphi Itercoect Solutios AS 17
PMC/PCI SCI-64 Adapter Card SCI Reflective Memory is a SISCI based SCI solutio ad ca be used with all dolphi products that supports SISCI. Adapter Cards u D307 - SBus u D310 - PCI32 u D314 - PMC32 u D320 - PCI64 u D323 - PMC64 u D330 - PCI 66 Switches u D505-4 way (SBus) u D512-4 way (PCI) u D515-4 - 16 way (PCI) u D525-8 way switch Dolphi Itercoect Solutios AS 18
Programmig Iterface: Applicatio (Performace tool) SISCI library SISCI Driver IRM Driver Applicatio (i.e C- style) SISCI API SISCI driver IRM driver Hardware abstractio layer (PAL) PCI-SCI adapter card Dolphi Itercoect Solutios AS 19
SISCI features Access to High Performace HW Highly Portable Cross Platform / Cross Operatig system iteroperable Simplified SCI Programmig Flexible Reliable Data trasfers Hostbridge / Adapter Optimizatio i libraries Dolphi Itercoect Solutios AS 20
SCI Reflective Memory Applicatio specific code built i Reflective Memory shell SISCI library SISCI Driver IRM Driver Reflective Memory Dolphi Itercoect Solutios AS 21
SCI Reflective Memory: Geeral SISCI API SISCI IRM Socket API TCP/UDP IP DLPI/ NDIS User Space Kerel Space GENIF The first demo of SCI Reflective Memory is implemeted for a two ode reflective memory cofiguratio. The implemetatio is doe i User Space. Reflective memory Hardware implemetatio Software implemetatio User Space Kerel Space Dolphi Itercoect Solutios AS 22
SCI Reflective Memory: Overview SCI Reflective Memory Library SCI Reflective Memory Features Reflective Memory Example programs Performace Dolphi Itercoect Solutios AS 23
SCI Reflective Memory Library: Overview Idea Structure u Memory Maagemet u Sychroizatio Applicatios Dolphi Itercoect Solutios AS 24
SCI Reflective Memory Library: Idea A library to build applicatios from i order to provide a flexible iterface to our cards. SISCI fuctios Relatio to other SISCI C- programs Sychroizatio Dolphi Itercoect Solutios AS 25
SCI Reflective Memory Library: Structure Memory maagemet u Applicatio specific code should be used for processig, ad the SISCI fuctios for memory access Sychroizatio u I order to guaratee that the local shared reflective memory copies are kept up to date oly oe ode is grated writeaccess at the time. u Read operatios ca occur at ay time. Dolphi Itercoect Solutios AS 26
SCI Reflective Memory Library: Memory Maagemet Segmets, duplex mappig. Memory read ad write operatios Dolphi Itercoect Solutios AS 27
SCI Reflective Memory Library: Segmets, duplex mappig Local segmet(node1) Remote segmet map (Node2) Remote segmet map (Node1) Local segmet (Node2) The ode preparig to trasfer data has to coect to a segmet o the ode receivig data. I order to get the two odes to write to each other, they both have to create (at least) oe local segmet, ad they both have to ope up a coectio to the remote segmet (which is created as local o the other ode) Dolphi Itercoect Solutios AS 28
SCI Reflective Memory Library: Segmets, duplex mappig Local segmet (Node1) Remote segmet map (Node1) Local segmet (Node2) Remote segmet map (Node2) For all RM copies to be uiform, there is a eed for a additioal mappig as show above. This mappig is carried out by by writig to both the localsegmet- ad remotesegmet mappig durig each write operatio. The operatios are the same o both odes. Dolphi Itercoect Solutios AS 29
SCI Reflective Memory Library: Segmets, duplex mappig Node1: local-map: Create, prepare, map (local), set available remote-map: Coect, map (remote) For each write operatio to local memory, a write operatio to the remote memory is automatically carried out by software. Node2: local-map: Create, prepare, map (local), set available remote-map: Coect, map (remote) For each write operatio to local memory, a write operatio to the remote memory is automatically carried out by software. Dolphi Itercoect Solutios AS 30
SCI Reflective Memory Library: Segmets, duplex mappig If data is writte to the Reflective Memory, it is first writte ito local memory, the trasferred to remote memory by ay of the SISCI data trasfer fuctios. The programmer is resposible for obeyig the strict orderig rule: All write operatios to the local memory shall be reflected to the remote memory immediately. Dolphi Itercoect Solutios AS 31
SCI Reflective Memory Library: Memory read ad write operatios Remote access by SISCI fuctios SCIMemCopy SCITrasferBlock SISCI DMA Egie *remoteptr = value; u Local access by *localptr=value; memcpy(localbuffer, dummybuffer, size); Dolphi Itercoect Solutios AS 32
SCI Reflective Memory Library: Data trasfer SRC Buffer Private Size Local Segmet Local RM Remote Segmet offset Remote RM A private memory buffer is copied ito the Reflective Memory Space All three steps are madatory Dolphi Itercoect Solutios AS 33
SCI Reflective Memory Library: Sychroizatio A cetral poit i a RM system is RM cosistecy. RM read operatios ca be performed o local memory, but it should ot be possible to have modified data aother place i the system. A method that esures cosistet RM copies whe odes are competig for the shared resources is eeded. Practically this meas that a local access should ot be possible whe a remote access is i progress, ad oly oe ode should have write access to the shared data at the time. Dolphi Itercoect Solutios AS 34
SCI Reflective Memory Library: Sychroizatio Reflective Memory cosistecy u Pollig - asychroous u Iterrupts timesliced Pollig is used for better flexibility Dolphi Itercoect Solutios AS 35
SCI Reflective Memory: How to build Reflective Memory applicatios Memory access is take care of by the reflective memory trasfer fuctios Sychroizatio is used to protect the shared data from corruptio Dolphi Itercoect Solutios AS 36
SCI Reflective Memory: Features The SCI Reflective Memory is for a two ode reflective memory cofiguratio. If more odes shall be supported a modified sychroizatio sheme has to be implemeted. Apart from that there is o other limits i makig a multiode SCI Reflective Memory Dolphi Itercoect Solutios AS 37
SCI Reflective Memory: Geeral features All odes share the RM space. All odes have a local copy of the etire RM space. The local copies o the subsequet odes are automatically updated. The sychroizatio logic esures that oly oe ode has write access to the RM at the time, keepig all RM copies cosistet. RM write operatios are multicasted to all odes i the system. Dolphi Itercoect Solutios AS 38
SCI Reflective Memory: Geeral features computatio overlaps with commuicatio: Usig DMA trasfers for update of remote RM copies eables computatio to overlap with commuicatio, whe specific flags are set. Oe-to-all multicast commuicatio is used for remote RM updates. Shared data regios are orgaized as segmets Dolphi Itercoect Solutios AS 39
SCI Reflective Memory: Geeral features Push-oly: Oly shared write operatios are propagated through the system. A write to the local RM is distributed (reflected) to the RM o all odes. RM read operatios are performed o the local RM copy. DMA-, block-, memcopy- ad shared memory trasfers are supportedby the SISCI API ad the SCI Reflective Memory. Whe buildig a applicatio the desired trasfer mechaism ca be selected. Dolphi Itercoect Solutios AS 40
SCI Reflective Memory: Supported OS I geeral this is just like for the rest of the SISCI package, but sice SCI Reflective Memory is uder developmet we have ot bee able to port to all operatig systems (OS) yet. Curretly supported OS are: u Widows (NT & 2000, x86) u Liux (2.2) u Solaris (2.6 / 7, SPARC) Next i lie of OS that are beig ported to: u Lyx u VxWorks (POWERPC) Dolphi Itercoect Solutios AS 41
SCI Reflective Memory example programs Geeral Reflective Memory Special Reflective Memory Multimap Reflective Memory Dolphi Itercoect Solutios AS 42
Geeral Reflective Memory Local Segmet Node 1 Local Segmet Node 2 Oly oe SISCI segmet is created o each ode The segmets are liked together i RM style. Dolphi Itercoect Solutios AS 43
Special Reflective Memory Local Segmet Node 1 - Write access ode 1 - Write access ode 2 Local Segmet Node 2 Bot odes have read access to the whole Reflective Memory Space segmet, but write access to differet halves of the Reflective Memroy Space. Not really a Reflective Memory solutio, but a example of how it ca be maipulated for specific applicatios Dolphi Itercoect Solutios AS 44
Multimap Reflective Memory Local Segmet 1 Node 1 Local Segmet 2 Node 1 Local Segmet 3 Node 1 Local Segmet N Node 1 Local Segmet 1 Node 2 Local Segmet 2 Node 2 Local Segmet 3 Node 2 Local Segmet N Node 2 Istead of puttig the whole RM space i oe segmet, the user of rm_multimap cotrols several segmets. Thus the oly time odes are competig for a resource is whe the same segmet is requested by more tha oe (both odes) at the same time. Dolphi Itercoect Solutios AS 45
How to ru the example programs I the start-up face of each program you will be asked to eter: u Adapter umber u Remote Nodeid u SegmetSize u (Number of segmets) u help Dolphi Itercoect Solutios AS 46
How to ru the example programs These are the available commads: rm-read: Read from Reflected Memory. rm-write: Write data to the Reflected Memory. Special RM write fuctios: rm-dma: DMA trasfers betwee two odes. rm-block: Block trasfers betwee two odes. rm-shmem: Shared memory trasfers betwee two odes. rm-memcopy: Trasfer data to a previously mapped remote area. Dolphi Itercoect Solutios AS 47
How to ru the example programs Special RM test fuctios: bech-dma: DMA trasfers betwee two odes. RM style. bech-block: Block trasfers betwee two odes.rm style. bech-shmem: Shared memory trasfers betwee two odes.rm style. bech-memcopy: Trasfer data to a previously mapped remote area. bech-full: Test of all RM write-trasfers betwee two odes. Special RM test fuctios where oly the remote copy is writte to: sigle-dma: DMA trasfers betwee two odes. sigle-block: Block trasfers betwee two odes. sigle-shmem: Shared memory trasfers betwee two odes. sigle-memcopy: Trasfer data to a previously mapped remote area. sigle-full: Test of all RM write-trasfers betwee two odes. Dolphi Itercoect Solutios AS 48
How to ru the example programs test-dma: DMA trasfers betwee two odes, o syc. test-block: Block trasfers betwee two odes, o syc. test-shmem: Shared memory trasfers betwee two odes, o syc. test-memcopy:trasfer data to a previously mapped remote area, o syc. test-full: Test of all NON-RM write-trasfers betwee two odes. file: Prit performace parameters to file performace: Prit performace parameters for this ode parameters: Prit key parameters for this ode loops: Number of write-commads i the test routies costart: Test with traffic from both odes startig cocurretly costop: Disable cocurret start sigal help: This helpscree q: quit Dolphi Itercoect Solutios AS 49
Performace The measuremets have bee made uder the operatig system (OS) Widows 2000, but performace is ot OS depedet. Dolphi Itercoect Solutios AS 50
SISCI Performace Highly depedet of the PC Chipsets Latecy 2.2 microsecods Throughput Applicatio to Applicatio usig SISCI u 85 MB/s (33Mhz/32 Bit PCI) u 120 MB/s (33 Mhz/64 Bit PCI) u 240 MB/s* (66 Mhz/64 Bit PCI) Dolphi Itercoect Solutios AS 51
Performace The characteristics of the test machies were: u DELL PowerEdge 6300 u Petium II Xeo u CPU clock 400 MHz u 256 MB RAM u 512 KB Level 2 Cache Memory u 440 NX PCI Chipset u Four system processors Dolphi Itercoect Solutios AS 52
Performace (oe-way) The throughput of remote write operatios The throughput of a loop cotaiig RM sychroizatio ad remote write operatios. The throughput of a loop cotaiig RM sychroizatio, local write operatios ad remote write operatios. RM-style Dolphi Itercoect Solutios AS 53
Performace (oe-way) RM SCIMemCopy trasfers without writig to the local segmet: --------------------------------------------------------------------------- Segmet size: Latecy: Throughput: --------------------------------------------------------------------------- 524288 5331.96 us 93.77 MB/s 262144 2645.78 us 94.49 MB/s 131072 1329.71 us 94.01 MB/s 65536 672.17 us 92.98 MB/s 32768 343.92 us 90.86 MB/s 16384 179.37 us 87.11 MB/s 8192 97.49 us 80.13 MB/s 4096 56.49 us 69.15 MB/s 2048 36.04 us 54.20 MB/s 1024 25.76 us 37.92 MB/s 512 20.57 us 23.73 MB/s 256 17.95 us 13.60 MB/s 128 16.30 us 7.49 MB/s 64 13.92 us 4.38 MB/s Dolphi Itercoect Solutios AS 54
Performace RM SCIMemCopy trasfers: ---------------------------------------------------- Segmet size: Latecy: Throughput: ---------------------------------------------------- 524288 9953.69 us 50.23 MB/s 262144 3436.81 us 72.74 MB/s 131072 1704.31 us 73.34 MB/s 65536 853.20 us 73.25 MB/s 32768 428.55 us 72.92 MB/s 16384 221.69 us 70.48 MB/s 8192 105.26 us 74.22 MB/s 4096 58.48 us 66.80 MB/s 2048 37.67 us 51.85 MB/s 1024 26.29 us 37.15 MB/s 512 21.23 us 23.00 MB/s 256 18.53 us 13.18 MB/s 128 16.49 us 7.40 MB/s 64 14.02 us 4.35 MB/s Dolphi Itercoect Solutios AS 55
Performace NON-RM SCIMemCopy trasfers: ---------------------------------------------------- Segmet size: Latecy: Throughput: ---------------------------------------------------- 524288 5337.58 us 93.68 MB/s 262144 2639.14 us 94.73 MB/s 131072 1320.58 us 94.66 MB/s 65536 663.29 us 94.23 MB/s 32768 334.80 us 93.34 MB/s 16384 170.47 us 91.66 MB/s 8192 88.66 us 88.12 MB/s 4096 47.61 us 82.05 MB/s 2048 27.18 us 71.87 MB/s 1024 16.87 us 57.89 MB/s 512 11.77 us 41.49 MB/s 256 9.16 us 26.66 MB/s 128 7.84 us 15.57 MB/s 64 4.97 us 12.29 MB/s Dolphi Itercoect Solutios AS 56
Performace (Trasfer i both directios simultaously) The throughput of remote write operatios The throughput of a loop cotaiig RM sychroizatio ad remote write operatios. The throughput of a loop cotaiig RM sychroizatio, local write operatios ad remote write operatios. RM-style Dolphi Itercoect Solutios AS 57
Performace RM SCIMemCopy trasfers without writig to the local segmet: ---------------------------------------------------- Segmet size: Latecy: Throughput: ---------------------------------------------------- 524288 8374.53 us 119.40 MB/s 262144 4190.23 us 119.33 MB/s 131072 2095.92 us 119.32 MB/s 65536 1053.26 us 118.74 MB/s 32768 528.37 us 118.31 MB/s 16384 269.90 us 115.79 MB/s 8192 139.10 us 112.35 MB/s 4096 74.64 us 104.75 MB/s 2048 42.96 us 91.12 MB/s 1024 27.95 us 70.01 MB/s 512 21.41 us 45.64 MB/s 256 18.41 us 26.54 MB/s 128 16.77 us 14.59 MB/s 64 14.08 us 8.69 MB/s Dolphi Itercoect Solutios AS 58
Performace RM SCIMemCopy trasfers: ---------------------------------------------------- Segmet size: Latecy: Throughput: ---------------------------------------------------- 524288 10945.86 us 91.35 MB/s 262144 4692.67 us 106.48 MB/s 131072 2412.39 us 103.72 MB/s 65536 1154.77 us 108.26 MB/s 32768 606.89 us 103.02 MB/s 16384 312.37 us 100.05 MB/s 8192 146.39 us 106.77 MB/s 4096 77.05 us 101.41 MB/s 2048 44.27 us 88.27 MB/s 1024 28.84 us 67.88 MB/s 512 22.07 us 44.27 MB/s 256 19.02 us 25.71 MB/s 128 17.01 us 14.36 MB/s 64 14.17 us 8.63 MB/s Dolphi Itercoect Solutios AS 59
Performace NON-RM SCIMemCopy trasfers: ---------------------------------------------------- Segmet size: Latecy: Throughput: ---------------------------------------------------- 524288 8369.08 us 119.48 MB/s 262144 4183.97 us 119.51 MB/s 131072 2089.99 us 119.62 MB/s 65536 1043.58 us 119.80 MB/s 32768 519.83 us 120.25 MB/s 16384 260.53 us 120.01 MB/s 8192 130.05 us 120.16 MB/s 4096 65.20 us 119.86 MB/s 2048 33.23 us 117.79 MB/s 1024 18.69 us 104.66 MB/s 512 12.13 us 80.75 MB/s 256 9.40 us 52.07 MB/s 128 7.91 us 30.96 MB/s 64 5.05 us 24.36 MB/s Dolphi Itercoect Solutios AS 60
Future Plas We are workig i fidig parters that are iterested i joiig us i developig a applicatio based o SCI Reflective Memory for them. PSB66 release Dig deeper ito kerel space ad/or hardware to optimize performace ad ease of use Dolphi Itercoect Solutios AS 61
Key statemet The idusty leadig throughput, ad latecy of Dolphis itercoect solutios will soo be available for the Reflective Memory market. Dolphi Itercoect Solutios AS 62
Importat terms We hope that you ow will uderstad the meaig of the terms: u Reflective Memory u PMC/PCI Adapter Cards u SISCI u SCI Reflective Memory trasfer fuctios u SCI Reflective Memory sychroizatio u SCI Reflective Memory duplex mappig of segmets Dolphi Itercoect Solutios AS 63
Questios? Dolphi Itercoect Solutios AS 64
Thak you for listeig to this presetatio! See you i the Lab i half a hour! SCI Reflective Memory Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o Dolphi Itercoect Solutios AS 65