Scalable ASIP Implementation and Parallelization of a MIMO Sphere Detector

Size: px
Start display at page:

Download "Scalable ASIP Implementation and Parallelization of a MIMO Sphere Detector"

Transcription

1 Scalable ASIP Implementation and Parallelization of a MIMO Sphere Detector Esther P. Adeva and Björn Mennenga and Gerhard Fettweis Vodafone Chair Mobile Communication Systems Technische Universität Dresden Dresden, Germany esther.perez, mennenga, fettweis@ifn.et.tu-dresden.de Abstract High detection complexity is known to be one of the major challenges in MIMO communications based on spatial multiplexing. Tuple Search Detector (TSD) was recently introduced, significantly reducing detection complexity in comparison to conventional algorithms while achieving close to full maxlog-app BER performance. Besides high computational complexity, irregular control flow and sequential nature of the tree search represent major limitations of depth-first-based detectors, frustrating efficient application of parallelization techniques and hence leading to inefficient realizations with regard to most practical applications. This work 1 presents a novel TSD implementation, scalable in constellation size and number of antennas and mapped to a highly parallel and pipelined ASIP architecture. Major challenges and key strategies enabling a high-throughput and low-complexity realization are presented and performance of the resulting flexible and efficient detector implementation is evaluated. Proposed realization is shown to achieve > 300 Mbps throughput at a reference clock frequency of 400 MHz (regarding 4 4 MIMO transmission with 16-QAM), by far outperforming comparable state-of-the-art realizations. Keywords MIMO detection, sphere decoder, tuple search, parallelism, SIMD, pipeline-interleaving, STA, ASIP. I. INTRODUCTION Future mobile communication systems will make use of multiple-input multiple-output (MIMO) techniques in combination with high constellation orders to enhance spectral efficiency. Transmission of spatially multiplexed data streams allows increasing data rates as well as diversity. However, the inherent high detection complexity and lack of flexibility of common detector realizations still represent limiting factors towards efficient implementations. Conventional low-complex detector realizations provide poor BER performance (e.g. Successive Interference Cancelation -SIC detector), whereas high-ber-performance implementations present unusable high complexity (e.g. Single Tree Search -STS detector). In this regard, TSD [1] has demonstrated to achieve significant complexity reduction while providing close to full max-log-app BER performance, outperforming state-of-theart detection strategies [2] like STS [3] or List Sphere Detection (LSD) [4]. Variable search complexity of so-called depthfirst detection algorithms (like STS and LSD) represents one of the major challenges towards efficient realizations. Resulting 1 This work has been supported by the German Federal Ministry of Education and Research (BMBF) under grant 01BU1001. irregular and data-dependent control flow typically frustrates efficient algorithm parallelization, thus considerably limiting throughput enhancement. In this work, a novel TSD-ASIP model is presented. Lowcomplexity, flexibility, high throughput (>300 Mbps at a clock frequency of 400 MHz, using 4 4 transmission and 16-QAM) and high efficiency characterize the proposed implementation, by far outperforming state-of-the-art realizations. Throughout this work, major challenges with regard to practical realization as well as key strategies enabling such an efficient implementation are described and discussed. The considered communications system model is introduced in section II, followed by description of the complexityreduced TSD (section III) representing the basis of our implementation. Regularized control flow enabling processor vectorization is also described in this section. In order to minimize the required resources and speed up the computations, fixed-point arithmetic is utilized. Resolution, flexibility and influence on detection performance of the proposed fixed-point representation are analyzed in section IV. Considered parallelism approaches and implementation challenges are detailed in section VI. The proposed processor architecture is described in section VII, especially focusing memory organization. Challenges regarding the integration of conditional memory access into pipeline-interleaved and vectorized schemes are also discussed in this section. Throughout this work, computational complexity, critical path (V) and throughput (VI-C) of the resulting implementation are evaluated. Proposed implementation results are additionally discussed and compared to other works in section VIII. Finally, main highlights of this work are summarized in section IX. II. SYSTEM MODEL Throughout this paper, we consider a N T N R MIMO system based on a BICM transmission strategy with N T transmit and N R receive antennas, as depicted in Fig. 1. A vector u of i.i.d. information bits is encoded by the outer channel code with rate R. The resulting stream of vectors c is bit-interleaved and portioned into blocks c of N T L bits, where L denotes the number of bits per transmit symbol. For the transmission, the corresponding bits c C, covered in the set of permitted bit vectors, are mapped (e.g. gray mapping) onto complex constellation symbolsx(c) = [x 0,...x NT 1] T =

2 Transmitter Binary Data Source Receiver Binary Data Sink û u Outer c Interleaver c Encoder Hard Decision Le(u) R Outer Decoder L a,dec + L e,dec π Interleaver π -1 Interleaver π AWGN L e,det + - L a,det Constellation Mapper + + Detector / Demapper Fig. 1: Communications system model with BICM transmitter and iterative receiver. map(c) X, being X the set of valid transmit symbols with cardinality #X = #C = 2 L = Q. The transmit energy is normalized so that E{xx H } = E S /N T I, where E s represents the average transmit energy of the transmitter. Regarding the transmission, we consider an uncorrelated, flat fading channel and an additive noise vector n C NR 1 at the receiver with complex components of zero mean i.i.d. gaussian random variables of variance N 0 /2 per real dimension (E{nn H } = N 0 I). The considered passive channel is represented by H C NR NT with entries of a zero mean i.i.d. gaussian random process of variance 1 and is assumed to be perfectly known at the receiver. The received signal y is therefore given by: y = Hx+n and the signal-to-noise-ratio (SNR = E s /N 0 ) at the receiver applied to the energy of one information bit can be stated as E b /N 0 = E s N R /N 0 N T LR. In order to ensure comparability of results, a setup equivalent to the one used in e.g. [1], [5], [6] has been used for our simulations. These have been carried out for a rate 1/2 PCCC with (7 R,5) convolutional codes, an information block size of 9216 bits (including tail bits), gray mapping, and spatial and temporal fading. The detection of the transmitted bits is carried out by complex-valued tuple search sphere detector in conjunction with a BCJR based decoder with 8 internal iterations and, for the sake of simplicity, without detector decoder iterations. III. COMPLEXITY-REDUCED TUPLE SEARCH MIMO DETECTION A. Fundamentals Task of the focused detector is the determination of bits c most likely sent as well as of reliability information for these bits. This can be accomplished by calculating log-likelihood ratios (L-values): ( ) P (cm,l = +1 y) L(c m,l y)=ln P (c m,l = 1 y) 1 N 0 min {λ 0}+ 1 min c c m,l =+1 N {λ 0},(1) 0 c c m,l = 1 where (1) results from application of the max-log approximation [7]. The l-th bit of a symbol sent by the m-th antenna is represented by c m,l and λ 0 represents the distance metric. For the considered case without detector decoder iterations (i.e. without a priori information), this metric depends only on the Euclidean distance between the set of received symbols y and the representative of the symbol ˆx(c) likely transmitted through the channel H: λ 0 (y,c) = y Hˆx(c) 2 Consequently, besides the most probably sent symbol arg min{λ 0 } (i.e. the detection ˆx(c) c C hypothesis) and its corresponding metric λ 0 (c ML ), the detector has to determine also the counter-hypotheses arg min {λ 0 } and their metrics for each bit. ˆx(c) c C,c m,l c ML m,l B. Tree Search Basics Since brute force (full max-log-app) detection of (1) is known to be of exponentially growing computational complexity with the number of transmit antennas and order of the constellation, several close to optimal detection approaches have been lately proposed, some of the most promising based on tree search strategies. As described in detail in [5], transforming the detection problem is permitted by QRdecomposition (QRD) of H = QR, where Q is unitary and R an upper triangular matrix. With modified received symbols y = Q H y, determination of the Euclidean distance y Rˆx(c) 2 (2) can be interpreted as a tree search. Resulting from this, λ 0 can be recursively calculated through the layered partial metric λ i = λ i+1 }{{} + y i }{{} r 2 N T 1 iiˆx i, y i = y i r ijˆx j, (3) metric from already estimated symbols interference reduced symbol j=i+1 by adding the metric of the corresponding parent node to the squared distance between a representative of the estimated symbol and an interference reduced symbol. The search is carried out in depth-first fashion, successively extending the selected nodes by analyzing their child nodes. As presented in [8], a regularized control flow and the parallel calculation of sibling parent nodes as well as of leaf nodes permit a so called one-node-per-cycle implementation [9]. Based on this, the average number of node extensions n performed in the detection is taken as complexity measure throughout this paper. C. Tuple Search and Bit-Specific Candidate Determination Computing the L-values in (1) requires determination of a detection hypothesis and all counter-hypotheses as described in section III-A. Explicit search for all needed minimums leads to impractically high n [10]. Therefore, instead of searching all possible minima, TSD introduced in [1] searches a subset of T most likely leaves, similarly to the approach used by LSD. The metrics λ 0 of these leaves are stored in a search tuple T := {λ 0 (c 1 ),λ 0 (c 2 ),...,λ 0 (c T )}, defining the sphere radius as the maximum metric in the tuple: R = max c t c t T {λ 0,t}. (4) Tuple search can be additionally combined with separated bitspecific storage of information for the L-value calculation [1].

3 The resulting TSD achieves much better BER performance than LSD at a significantly reduced n compared to STS detection. Additionally, adjustment of SNR n trade-off is enabled by varying the size of the tuple T. D. Complexity Reduction Strategies like sorted QR decomposition [1], MMSE preprocessing [1] as well as sphere and L-values clipping [1] are widely applied approaches towards n reduction. Novel approaches like search sequence determination (SSD) [11] and metric estimation (ME) [12] contribute to further reduce n as well as the computations involved in the detection, as described in the following. 1) Search Sequence Determination (SSD): In [11], the commonly used Schnorr-Euchner (SE) enumeration as well as all the costly metric calculations (3) and sorting operations it requires are replaced by few inexpensive basic operations through a heuristic technique. This approach is based on geometrical position analysis relative to reference nodes, reducing the node selection task to simple geometrical case differentiation. Reference nodes ˆx i are determined by ˆx i = y i = y i, (5) r ii (with ( ) representing the rounding operator). Resulting enumeration is defined by fixed sequences which are mapped to constellation symbols during the detection, based on the relative position between y i and ˆx i. Computational complexity of this strategy can be further decreased by reducing the amount and length of considered sequences. As proposed in [11], in this work only two decision regions (i.e. two sequences) have been considered, sequences of 14 partially combined elements are used and adapted leaf sequences are applied. 2) Metric Estimation: The computationally costly operations required for the metric calculation (3) can be considerably simplified by an estimation based on SSD s geometrical approach [12]. Euclidean distances in (3) are replaced by predefined geometrical distances d defined for each decision region: r 2 ii y i ˆx i 2 r 2 ii d2. (6) Moreover it is possible to precalculate r 2 ii as well as r2 ii d2, hence reducing the complex products in (3) to a single real multiplication or even completely dissolving it. This considerable reduction in computational complexity makes this technique especially interesting for realizable hardware implementations. E. Algorithm Partitioning and Regularization TS-based detection process can be decomposed, as proposed in [8], in an arbitrary number of regularized loops. Operations performed within each loop have been partitioned into the task blocks a... n illustrated in Fig. 2. Selection of first node to be extended (5) is performed in a. SSD s geometric position analysis is subsequently carried out in b, determining the corresponding search sequence and the second node to Fig. 2: TSD regularized control flow diagram. be extended. Metric (6) and interference (3) associated to these nodes are determined by c, f, g and h. Subsequently, radius tuple is updated (4) by d. Final task within the loop is selection of the tree layer to be processed during next loop ( e). As mentioned in section III-B, a so called one-node-percycle(-loop) realization is desired. Consequently, sibling parent nodes and leaf nodes are assumed to be processed in parallel [11]. For this purpose, it is necessary to define separated blocks which process first ( a, c, f), second ( b, g, h) and subsequent nodes ( i, j, k) individually. Likewise, definition of task blocks processing candidate leaf nodes ( l, m) for later L-values (1) calculation ( n) is required. IV. FIXED-POINT REPRESENTATION AND RESULTS Conventional hardware implementation typically makes use of fixed-point arithmetic in order to limit the required latency and hardware resources. In the context of MIMO detection, overflow and quantization errors incurred by fixed-point representation typically lead to SNR performance degradation. It is therefore necessary to define appropriate resolution, attending to the trade-off between SNR performance and hardware complexity. Based on the analyses in [7], a maximum word width of 8 bits 2 is proposed in this work, whereas 0.3dB SNR loss has to be accepted, as shown in IV-B. This practical representation is enabled by the normalization strategy proposed in IV-A. The resulting detector presents lower hardware complexity and higher efficiency than comparable sphere detector (SD) realizations [13], [14], typically requiring greater precision (10-16 bits) to achieve similar SNR performance. A. Normalization Parameters involved in (1) and hence (3) (y i, R = {r ii,r ij } and λ i, with i,j = {1,...,(N T 1)},i j) possess dynamic range fluctuating with the received energy E r. In order to define fixed ranges, independent of the transmission conditions, removal of E r dependency from r ii and λ i is required. For this purpose, a scaling factor s f = f E r [7] is utilized to normalize these parameters ( r ii = r ii /s f, λi = λ i /s 2 f ), with f representing a constant factor, leading to (1) modified as 2 Integer and fractional word lengths have been individually determined for each of the parameters involved in the detection. Integer word lengths have been selected to cover the parameters range, while fractional lengths have been determined based on BER performance obtained through simulations. Details on the analysis can be found in [7].

4 Fig. 3: n required for 10 5 BER, corresponding to different transmission and detection schemes L(c m,l y) = L(c m,l y)/s 2 f. Concerning y i and r ij, s f normalization is not necessary since application of SSD strategy (5) eliminates the energy dependency. A certain scaling factor f is nevertheless required in order to accommodate their dynamic range to the proposed 8-bit quantization [7]. Based on this, ỹ = y i /f and r ij = r ij /f are additionally defined. To ensure flexibility of the proposed detector implementation, fixed-point representation has to be additionally independent of system configuration. For this purpose, factors f and f can be adjusted according to the transmission scheme, as shown in table I 3. As a result, definition of different fixed point representations depending on transmission characteristics as in [14] is avoided. By applying the proposed normalization, 8-bit fixed-point representation is enabled, independent of transmission conditions and adaptable to different N T and Q by simple adjustment of f and f. N T N T Q f f TABLE I: Scaling factors f and f, to enabling flexibility of the proposed fixed-point representation. B. Simulation Results Fig. 3 depicts SNR n achieved at 10 5 BER, using different transmission and detection schemes. FLP denotes floating point, while FXP stands for fixed point. Results corresponding to unclipped (L max = ) STS(FLP) detection are depicted representing full max-log-app SNR performance bound. SIC(FXP) detection has been implemented using the proposed TSD(FXP) realization with limited n = N T, hence representing the lowest n bound. Fig. 3 also includes results corresponding to the proposed TSD for the selected SNR n trade-off (adjusted through T ). Fixed-point arithmetic leads to 0.3dB SNR degradation with respect to the analogous floating point approach. Metric underestimations resulting from limited precision lead, in addition, to n reduction of 20% (slightly greater reduction in 8 8 with 64-QAM system). Higher detection accuracy (i.e. greater T ) can ease the shown 3 Values determined through simulation for the given system model. SNR performance loss ( n reduction). It should be noticed that SNR n difference between TSD(FLP) and TSD(FXP) is approximately constant for different transmission schemes, hence validating the normalization approach proposed in IV-A. For the selected SNR n trade-off, TSD(FXP) presents a total SNR loss of dB with respect to full max-log- APP SNR performance bound, while processing 1-6 orders of magnitude less nodes than unclipped STS(FLP). This loss is mainly caused by ME approach, and can be partially compensated by adjusting T. TSD provides considerable SNR performance gain with respect to SIC ( 2.5dB for 4 4 with 4-QAM transmission, 4dB in the remaining cases) at only slightly greater n (<30%). Resulting from this, proposed soft-output TSD(FXP) approaches hard-output SIC complexity (in terms of n), especially concerning low-order systems, while achieving close to full max-log-app SNR performance. Additionally, SNR n relationship is adjustable attending to transmission requirements, ranging from close to full maxlog-app SNR performance at significantly reduced n, to SIC SNR performance and complexity. V. COMPUTATIONAL COMPLEXITY AND CRITICAL PATH ESTIMATION In order to estimate the costs of a hardware implementation, computational complexity and latency will be analyzed in the subsequent. Resulting from application of complexity reduction techniques in III-D only basic operations (addition, comparison and shift operations) are required to carry out the detection [7]. The amount of required computations c is taken as computational complexity measure in this work. Cost of comparison and addition operations is assumed to be approximately equivalent, whereas shift operation cost is considered negligible. Table II summarizes c (add-equivalent operations, i.e. addition and comparison) required by each task block. As shown in this table, node selection ( a, b, i) and especially metric determination ( c, g, j) tasks present the lowest computational complexity, moreover not varying with N T,Q or T. Complexity of radius update task ( d) grows linearly with T. Remaining tasks complexity depends on system order (Q,N T ), presenting l and m (i.e. handling bit-specific metrics) the greatest complexity. Attending to the resulting overall computational complexity c total, doubling N T or Q implies increasing c total by a factor of 1.5 ( c total 3/4 N T 3/4 Q). Cost in terms of latency l of shift operations is neglected, while cost of addition and comparison operations is assumed to be equivalent to l = 1 clock cycle. This consideration is a quite conservative design constraint, which ensures the targeted throughput (depicted in VI-C and V) in a worstcase hardware implementation. Attending to data dependency 4, most of the computations enclosed within each task block may be performed in parallel. Considering sufficient parallelization level (varying with N T,Q and T ), fixed l can be assumed, 4 Detailed analysis of data dependency is considered to be out of the scope of this work and thus is not presented.

5 QAM 16-QAM 64-QAM 64-QAM 64-QAM l (T = 4) (T = 8) (T = 16) (T = 8) (T = 16) a 4 1 b 11 1 c 2 1 d e f g 1 1 h i 15 1 j 1 1 k l m n c total l cp TABLE II: Computational complexity c (in number of add-equivalent operations) and latency of each task block, for different transmission schemes. as shown in table II. Resulting from this, latency of the regularized loop becomes independent of the transmission system, in contrast to computational complexity. Loop latency can be further reduced by partially overlapping execution of the comprehended task blocks, resulting in the task scheduling illustrated in Fig. 4(c). Since output from blocks h, l, m and n is never required in the immediately subsequent loop [7], the critical path is comprised by blocks a e. Considering latencies given in table II, the critical path presents a latency of l cp = 5 clock cycles. Similar computational complexity analysis can be applied to conventional SD and SIC detector implementations [7], leading to task latencies shown in Fig. 4(a) and 4(b), respectively. As illustrated in Fig. 4, l cp of the proposed 8-bit TSD implementation (Fig. 4(c)) is < 1/4 of l cp corresponding to conventional (16-bit precision) SD realizations (Fig. 4(a)) and < 1/2 of common (16-bit precision) SIC detectors l cp (Fig. 4(b)). SSD and ME approaches in combination with the proposed task scheduling represent the main strategies contributing to this considerable critical path latency reduction. VI. PARALLELIZATION STRATEGIES FOR THROUGHPUT ENHANCEMENT Implementation of bit- and instruction-level parallelism mentioned in V does not require major considerations besides dependency analysis 4. Resulting detection throughput is dominated by 1) the number of loops required to complete the detection (or equivalently, n) and2) the loop latencyl cp (for a given frequency f ref ). In order to enhance throughput, further parallelization approaches are investigated in this work. A. Pipeline-Interleaving Task blocks defined in III-E can be grouped according to their execution times t ex, depicted in Fig. 4(c). Thereby 5 sets of tasks Sx, x = {0,...,4} are defined, each assigned to a different processing element (PE). Notice that tasks executed after l cp cycles ( m and n) are scheduled together with tasks corresponding to the following loop. Based on this, throughput can be enhanced by interleaving execution of Sx Fig. 4: Loop latency (in clock cycles), corresponding to different detection strategies. corresponding to different detections d in pipeline fashion. Fig. 5 illustrates pipeline-interleaving of Sx d, where e.g. S3 2 represents execution of task set 3 (see Fig. 4) corresponding to detection 2 (i.e. third interleaved detection). Throughput and resource utilization are maximized when the number of interleaved detections equals the maximum number of possible pipeline stages P, i.e. P = 5 for the identified critical path (l cp = 5). Assuming in addition that a new detection starts as soon as another has finished, no PE is idle after filling in the pipe. By applying these considerations, resulting average detection throughput τ is given by: τ TSD = (NT L) P n l cp f ref = NT L n f ref. P=lcp B. SIMD Vectorization TSD control flow regularization presented in section III-E enables direct implementation of SIMD vectorization. Assuming accordingly vectorized memory access (as described in VII-C), q individual detection paths can be performed in parallel as illustrated in Fig. 5. Vectorization of n-variable SD results in increased average n ( n) per vector element [8], in contrast to n-fixed detectors (M-algorithm, K-best... ). Assuming that a vector is completely processed as soon as execution of all its q parallel slices have finished, the overall vector latency will be thus determined by the slowest path (i.e. greater n). As a result, n increases with the degree of vectorization q, thereby dramatically affecting the achievable speedup. A simple approach to ease this problem consists in bounding n n max [8]. Consequent early termination of tree searches with n > n max typically leads to BER performance loss. However, it is possible to find suitable n max, e.g. n max = 1.25 n, causing negligible BER performance loss while allowing to achieve almost the maximum feasible speedup q [8]. Simplicity and effectiveness of this approach make it especially interesting for vectorization of n-variable SD. By applying q-fold SIMD parallelism with bounded n, τ is consequently enhanced by a factor of almost q, τ q-simd q τ TSD TSD.

6 Fig. 6: ASIP-based detector architecture Fig. 5: Pipeline-interleaving scheme of 5 detection paths (5 pipeline stages). C. Throughput Analysis Table III shows τ corresponding to different detection approaches and transmission schemes, considering same BER performance (10 5 BER). For simplicity, SIMD vectorization has not been considered here (q = 1). Under the conservative latency assumption in V, hardware realization of the proposed pipelined architecture can be expected to achieve high frequency. Based on this consideration, clock frequency f ref = 400 MHz will be taken as reference in the following. TSD results correspond to the detection accuracy used in IV-B. As shown in table III, τ TSD is 8 times greater than τ of conventional SD implementations [7]. For low system orders (2 2 MIMO with 64-QAM, 4 4 MIMO with 4-,16- QAM) proposed TSD achieves 1-4 times lower τ than a conventional SIC detector. For high order systems (8 8 MIMO with 64-QAM) τ TSD is 15 times lower than conventional SIC τ. The gap between TSD and SIC in terms of τ has been thus considerably reduced compared to conventional SD. By applying pipeline-interleaving techniques (VI-A), τ TSD is enhanced 40 times with regard to common (unpipelined) SD implementations, surpassing τ of conventional (unpipelined) SIC detector by a factor of times greater (except for 8 8 MIMO with 64-QAM). Additionally, implementation of SIC detection based on our TSD realization achieves >2.5 times greater τ than conventional SIC realizations. Throughput enhancement via SIMD vectorization is not directly possible in the case of conventional depth-first SD due to their unregularized control flow. Conventional SIC detectors as well as the proposed n-bounded TSD achieve approximately the same speedup factor ( q). Resulting from this, proposed lowcomplex and flexible TSD realization achieves much higher τ than conventional SD implementations, approaching τ of common SIC detectors (for low order systems) and even surpassing it (TSD-based SIC implementation). N T N T Q-QAM Conventional SD (unpip.) TSD (unpip.) TSD (pip.) Conventional SIC (unpip.) TSD-based SIC (unpip.) TABLE III: Average throughput τ (in Mbps) achieved at f ref = 400 MHz and10 5 BER, corresponding to different transmission and detection schemes and considering a scalar architecture. VII. ASIP IMPLEMENTATION MODEL A. Synchronous Transfer Architecture (STA) The ASIP design proposed in this work has been modeled based on Synchronous Transfer Architecture (STA) [15]. STA architectural template is comprised of basic modules known as functional units (FUs). FUs present an arbitrary number of input and output ports and are connected through a multiplexing interconnection network. Output ports are buffered with registers so that data produced by a FU can be directly consumed by connected FUs without need for additional intermediate storage. The design is controlled by Very Long Instruction Words (VLIWs). At each cycle, the multiplexing network and the functionality of each FU is configured by different segments of the corresponding VLIW. The resulting system forms a synchronous network, which at each clock cycle consumes and produces some data. VLIW-based control and definition of vector data paths enable both instruction and data level parallelism. B. Architecture Block diagram in Fig. 6 depicts the architecture of the proposed STA-based ASIP model. A control unit decodes VLIWs stored in the instruction memory (IMEM) and manages the execution flow. Data path contains register files, data memories (SMEM, VMEMI, VMEMO) and the MIMO detector module (proposed TSD). The detector is comprised of several FUs, each implementing one of the detection tasks illustrated in Fig. 2. The proposed processor model has been dimensioned to support up to N T = 8 and Q = 64. Pipelineinterleaving scheme with P = 5 is assumed. For simplicity, q-fold vectorization with q = 1 is assumed in the following, unless otherwise stated. C. Memory Architecture and Scaling The presented ASIP-based detector model relies on instruction memory and data memories specified in table IV and described in the following. 1) Scalar Memory (SMEM): SMEM contains system parameters (N T,L,T,L max and n max ) as well as channel-dependent parameters, namely σ, N 0, r ij (3) and r 2 ii d2 (6). While system data is assumed to be constant for a given transmission system, channel data varies with frequency depending on the channel coherence time (e.g. each m detections). Assuming q-fold processor vectorization and channel coherence time equivalent to k detection vectors (m = kq), SMEM content can be shared by kq individual

7 detection paths. Resulting from this, vectorization of SMEM is unnecessary. This leads to enormous memory savings, directly proportional to the degree of vectorization q. 2) Input and Output Vector Memories (VMEMI, VMEMO): VMEMI represents a buffer (of length bf) storing the symbols y to be detected. The MIMO detector module gets a complete received symbol y on each read access, as long as there is data available in the input buffer. L-values resulting from the detection of y are stored into VMEMO. Both VMEMI and VMEMO are q-fold vector memories with regard to SIMD implementation. In order to avoid memory growing with q, usage of joint scalar buffers could be investigated as future work. Both pipeline-interleaving and SIMD vectorization are supported by the designed memory structure. Resulting from these design considerations, size of SMEM is fixed, while VMEMI and VMEMO size scales with the degree of vectorization q and the desired buffer length bf, as depicted in table IV. For a sample non-vectorized design (q = 1) with bf = 40, a total of 3KB memory is required, corresponding 2.8KB to data memory. Memory Width Length Size (bits) ( words) (bytes) Content IMEM Program code SMEM System and channel info. VMEMI 128 bf 16 bf q Received symbols VMEMO 384 bf 48 bf q Soft output (L-values) TABLE IV: Memory interface specifications, supporting up to N T,N R = 8 and Q = 64. D. Memory Access Implementation concepts and analyses presented in sections IV to VI are based on the assumption of constant data availability. Latency of memory tasks like read/write access or address generation have been therefore disregarded. In the following, challenges concerning the integration of these tasks into the proposed regularized scheme (III-E) are presented, as well as the strategies considered to overcome them. 1) Memory Address Generation: Due to different nature and frequency of data memories access, address generation functionality has been separately defined and implemented for each memory. VMEMI and SMEM access is sequentially performed, thus relying on simple pointers and a register file for address storage. In contrast, VMEMO access is not sequential, hence requiring more elaborate approaches. It is desired that y and corresponding L-values occupy in VMEMI and VMEMO (respectively) homologous memory areas. In order to implement this sorting while dealing with variable detection latency, a tagging approach is proposed. A tag is assigned to each y during its processing, indicating the VMEMI-address where this symbol was stored. When detection is complete, the tag is used to determine the corresponding VMEMO address (requiring only simple shift and addition operations). Simplicity of the proposed strategy enables calculating addresses in parallel to detection tasks execution. Resulting from this, address generation requires no additional clock cycles and detection latency depicted in V remains unaffected. 2) SMEM Data Temporal Locality: SMEM content is very frequently accessed by most detector FUs. Access frequency is further increased by the proposed pipeline-interleaving approach (VI-A). As a consequence, data demand rate is much higher than serving capacity of the defined memory interface 5. Simplest solutions like including stall cycles or enhancing interface bandwidth lead to speedup detriment or limit scalability of the design. To solve the problem avoiding these negative effects, partial copy of SMEM content into a register file is proposed (block o in Fig. 6). Since register access is significantly faster than memory access, data availability problem is solved and throughput speedup remains unaffected. In addition, costly memory access is reduced, since SMEM is only accessed when its content varies (depending on channel coherence time). It should be noticed that additional storage required does not scale with q in case of processor vectorization. As part of future work, SMEM could be completely eliminated, hence reducing memory requirements. E. Conditional Control Flow Fig. 7(a) illustrates control flow including memory access, with Detection Loop representing the regularized loop shown in Fig. 2. As depicted in this diagram, data memories access is conditional, triggered by detection termination and channel update. Assuming that channel information is shared by kq > P individual detection paths (VII-C), decision on SMEM access can be jointly performed for the P interleaved detection paths. In contrast, decision on VMEMI and VMEMO access is individual, since it depends on variable detection latency. Joint control flow of interleaved detection paths is consequently frustrated. In order to satisfy individual memory access demand without disrupting the pipelined execution of the remaining detection paths, the regularized and extended pipeline-interleaving scheme shown in Fig. 7(b) is proposed, in combination with the following approaches: Unconditional VMEMI access: periodic and unconditional read operations are performed during the complete detection process. Internal signaling detects when data has been effectively consumed (i.e. a new detection begins) in order to update VMEMI address pointer accordingly. Assisted control flow: VMEMO access is conditional and triggered by up to P = 5 detection paths. SMEM access is additionally conditional, depending on channel coherence time. Resulting from this, complex conditional branch operations have to be performed by the sequencer, as illustrated by arrows in Fig. 7(b). Based on control signals received from the detector module, a Flow Control Unit (FCU, p in Fig. 6) identifies triggering events and determines the corresponding IMEM address, subsequently transferring it to the sequencer. As a result, 5 Read access is assumed to take 2 clock cycles.

8 (a) Control flow diagram (b) Pipeline-interleaving scheme Fig. 7: Extended control flow and pipeline-interleaving schemes, including data memory access. execution flow is altered by performing uniquely simple unconditional branch operations. By applying these techniques, no overhead (i.e. additional clock cycles) is caused due to execution flow control or memory access. Resulting detection latency depends uniquely on the detection process, as described in VI-C. Achievable speedup provided by parallelization approaches considered in VI remains hence unaffected. VIII. RESULTS DISCUSSION In this work a novel ASIP-based detector implementation has been presented, combining low complexity and high BER performance of TSD with high throughput of highly parallel and pipelined architectures. Performance comparison to different bread-first and depth-first detector realizations ratifies the suitability of the proposed TSD implementation. Table V depicts considered detectors features. f ref corresponds to the maximum clock frequency f max achieved by the corresponding hardware implementations on the specified CMOS technologies. Concerning this work (not including hardware realization) fref TSD = 400 MHz has been selected as reference, according to the explanations in VI-C f ref has been extracted from [14], [16] [19] corresponding to 10 4 BER or alternatively 1% FER, depending on the metric utilized MHz for SO system, 250 MHz for SISO system. by each author. 71 MHz represents τ down-scaled to min{f ref } = 71 MHz. The benefit in n reduction provided by TSD grows exponentially with N T and Q [2]. However, a 4 4 MIMO system using 16-QAM constellation has been considered to ensure comparability of results. As illustrated in table V, the proposed pipelined TSD-ASIP outperforms in terms of throughput all considered works (except [16], providing hard output) at same frequency and comparable error rate. τ TSD is 40% greater than τ corresponding to the considered n-fixed strategies [18], [19]. Compared to depthfirst approaches,τ TSD 2.5 τ [17] andτ TSD 7 τ [14] (notice that [14] is SISO capable, but τ [14] corresponds to f max achieved by SO-only system, provided by the authors). The considerable throughput increase achieved by TSD-ASIP can be further enhanced by a factor of q through SIMD vectorization (notice that except [16], considered realizations also implement some sort of parallelism and / or pipelining approaches). It should be also noticed that to achieve comparable BER performance, SNR required by TSD is much lower than SNR required by remaining approaches. IX. CONCLUSION In this work key strategies enabling efficient implementation of a complexity-reduced tuple search sphere detector have been presented. A novel flexible 8-bit fixed-point representation has been introduced, reducing the complexity compared to conventional 16-bit implementations. Proposed regularization and architectural concepts permit efficient application of pipelining and parallelization approaches. As a result, a high-throughput realization is enabled, achieving > 300Mbps at low SNR for a reference frequency of 400 MHz (considering 4 4 MIMO transmission with 16-QAM) 7. Under comparable conditions, proposed implementation achieves up to 7 times greater throughput than relevant state-of-the-art realizations, whereas greater improvement can be still achieved by exploiting SIMD vectorization and possibly using higher frequency. Low-complexity, flexibility, high throughput and high efficiency of the resulting detector make it a very favorable candidate for MIMO transmission in a wide range of applications. 7 HDL instantiation of a 4 4 system using a 64-QAM constellation and T = 8 reached f = 454 MHz on 65-nm technology, achieving a throughput of Mbps. Further results on hardware implementation analysis will be submitted to SiPS Work This work [16] [17] [14] [18] [19] Detector DF-SD (TS) DF-SD DF-SD (STS) DF-SD (STS) BF-SD (K-best) BF-FSD Input/Output SO HO SO SISO SO SO close to close to ML close to close to close to BER performance maxlogapp significant SNR loss maxlogapp maxlogapp maxlogapp suboptimal Flexibility with respect to SO 4,16,64-QAM ,64-QAM, QPSK ,4 4, QAM 16-QAM 2 2,4 4, QAM 16-QAM Technology µm 0.25 µm 90 nm 0.18 µm 0.13 µm f ref (MHz) τ f ref q 9.5dB 25dB 17.25dB 15dB τ 71 MHz q SISO - Soft Input, Soft Output HO - Hard Output DF - Depth-First tree traversal SO - Soft Output only BF - Breadth-First tree traversal TABLE V: Comparative between sphere search based MIMO detectors.

9 ACKNOWLEDGMENT The authors would like to thank the project partner Blue Wonder Communications (Dresden, Germany) for an excellent cooperation. REFERENCES [1] B. Mennenga, A. von Borany, and G. Fettweis, Complexity reduced Soft-In Soft-Out Sphere Detection based on Search Tuples, in Proceedings of the IEEE International Conference on Communications (ICC 09), Dresden, Germany, [2] E. P. Adeva and B. Mennenga, Survey on an Efficient, Low-complex Tuple Search Based Sphere Detector, Submitted to IEEE 34th Sarnoff Symposium, [3] C. Studer, A. Burg, and H. Bölcskei, Soft-output sphere decoding: Algorithms and VLSI implementation, IEEE Journal on Selected Areas in Communications, Feb [4] M. S. Yee, Max-log-MAP sphere decoder, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 05), vol. 3, Mar [5] B. Hochwald and S. ten Brink, Achieving near-capacity on a multipleantenna channel, IEEE Transactions on Communications, vol. 51, pp , Mar [6] E. Zimmermann and G. Fettweis, Unbiased MMSE Tree Search Detection for Multiple Antenna Systems, in Proceedings of the International Symposium on Wireless Personal Multimedia Communications (WPMC 06), San Diego, USA, Sep [7] B. Mennenga, Aufwandsgünstige Detektion in Mehrantennensystemen mittels komplexitätsreduzierter Baumsuchverfahren. Dissertation, Technische Universität Dresden, [8] B. Mennenga, E. Matus, and G. Fettweis, Vectorization of the Sphere Detection Algorithm, in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS 09), Taipei, Taiwan, [9] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H. Bolcskei, VLSI implementation of MIMO detection using the sphere decoding algorithm, IEEE Journal of Solid-State Circuits, vol. 40, pp , Jul [10] J. Jaldén and B. Ottersten, Parallel Implementation of a Soft Output Sphere Decoder, in Proceedings of Asilomar Conference on Signals, Systems, and Computers, [11] Mennenga and Fettweis, Search Sequence Determination for Tree Search based Detection Algorithms, in Proceedings of the IEEE Sarnoff Symposium 2009, Princeton, USA, [12] B. Mennenga and G. Fettweis, Simplified Search Sequence and Metric Determination for Tree Search based Detection Algorithms, Submitted to the IEEE Transaction on Wireless Communications, [13] M. Myllyla, J. R. Cavallaro, and M. Juntti, Architecture Design and Implementation of the Metric First List Sphere Detector Algorithm, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, [14] E. M. Witte, F. Borlenghi, G. Ascheid, R. Leupers, and H. Meyr, A Scalable VLSI Architecture for Soft-input Soft-output Depth-first Sphere Decoding, IEEE Transactions on Circuits and Systems II: Express Briefs, [15] G. Cichon, A Novel Compiler-Friendly Micro-Architecture for Rapid Development of High-Performance and Low-Power DSPs. Dissertation, Technische Universität Dresden, [16] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H. Bleskei, VLSI Implementation of MIMO detection using the sphere decoding algorithm, IEEE Journal on Solid-State Circuits, [17] C. Studer, A. Burg, and H. Bolcskei, Soft-output Sphere Decoding: Algorithms and VLSI Implementation, IEEE Journal on Selected Areas in Communications, [18] Z. Guo and P. Nilsson, Algorithm and Implementation of the K-best Sphere Decoding for MIMO Detection, IEEE Journal on Selected Areas in Communications, [19] B. Wu and G. Masera, A Novel VLSI Architecture of Fixed-complexity Sphere Decoder, in 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2010.

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,

More information

A Parallel Smart Candidate Adding Algorithm for Soft-Output MIMO Detection

A Parallel Smart Candidate Adding Algorithm for Soft-Output MIMO Detection A Parallel Smart Candidate Adding Algorithm for Soft-Output MIMO Detection Ernesto Zimmermann and Gerhard Fettweis David L. Milliner and John R. Barry Vodafone Chair Mobile Communications Systems School

More information

Advanced Receiver Algorithms for MIMO Wireless Communications

Advanced Receiver Algorithms for MIMO Wireless Communications Advanced Receiver Algorithms for MIMO Wireless Communications A. Burg *, M. Borgmann, M. Wenk *, C. Studer *, and H. Bölcskei * Integrated Systems Laboratory ETH Zurich 8092 Zurich, Switzerland Email:

More information

Design of Mimo Detector using K-Best Algorithm

Design of Mimo Detector using K-Best Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 8, August 2014 1 Design of Mimo Detector using K-Best Algorithm Akila. V, Jayaraj. P Assistant Professors, Department of ECE,

More information

Performance Analysis of K-Best Sphere Decoder Algorithm for Spatial Multiplexing MIMO Systems

Performance Analysis of K-Best Sphere Decoder Algorithm for Spatial Multiplexing MIMO Systems Volume 114 No. 10 2017, 97-107 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Performance Analysis of K-Best Sphere Decoder Algorithm for Spatial

More information

Staggered Sphere Decoding for MIMO detection

Staggered Sphere Decoding for MIMO detection Staggered Sphere Decoding for MIMO detection Pankaj Bhagawat, Gwan Choi (@ece.tamu.edu) Abstract MIMO system is a key technology for future high speed wireless communication applications.

More information

A FIXED-COMPLEXITY MIMO DETECTOR BASED ON THE COMPLEX SPHERE DECODER. Luis G. Barbero and John S. Thompson

A FIXED-COMPLEXITY MIMO DETECTOR BASED ON THE COMPLEX SPHERE DECODER. Luis G. Barbero and John S. Thompson A FIXED-COMPLEXITY MIMO DETECTOR BASED ON THE COMPLEX SPHERE DECODER Luis G. Barbero and John S. Thompson Institute for Digital Communications University of Edinburgh Email: {l.barbero, john.thompson}@ed.ac.uk

More information

PARALLEL HIGH THROUGHPUT SOFT-OUTPUT SPHERE DECODER. Q. Qi, C. Chakrabarti

PARALLEL HIGH THROUGHPUT SOFT-OUTPUT SPHERE DECODER. Q. Qi, C. Chakrabarti PARALLEL HIGH THROUGHPUT SOFT-OUTPUT SPHERE DECODER Q. Qi, C. Chakrabarti School of Electrical, Computer and Energy Engineering Arizona State University, Tempe, AZ 85287-5706 {qi,chaitali}@asu.edu ABSTRACT

More information

VLSI Implementation of Hard- and Soft-Output Sphere Decoding for Wide-Band MIMO Systems

VLSI Implementation of Hard- and Soft-Output Sphere Decoding for Wide-Band MIMO Systems VLSI Implementation of Hard- and Soft-Output Sphere Decoding for Wide-Band MIMO Systems Christoph Studer 1, Markus Wenk 2, and Andreas Burg 3 1 Dept. of Electrical and Computer Engineering, Rice University,

More information

NOWADAYS, multiple-input multiple-output (MIMO)

NOWADAYS, multiple-input multiple-output (MIMO) IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 56, NO. 3, MARCH 2009 685 Probabilistic Spherical Detection and VLSI Implementation for Multiple-Antenna Systems Chester Sungchung Park,

More information

K-BEST SPHERE DECODER ALGORITHM FOR SPATIAL MULTIPLEXING MIMO SYSTEMS

K-BEST SPHERE DECODER ALGORITHM FOR SPATIAL MULTIPLEXING MIMO SYSTEMS ISSN: 50-0138 (Online) K-BEST SPHERE DECODER ALGORITHM FOR SPATIAL MULTIPLEXING MIMO SYSTEMS UMAMAHESHWAR SOMA a1, ANIL KUMAR TIPPARTI b AND SRINIVASA RAO KUNUPALLI c a Department of Electronics and Communication

More information

Soft-Input Soft-Output Sphere Decoding

Soft-Input Soft-Output Sphere Decoding Soft-Input Soft-Output Sphere Decoding Christoph Studer Integrated Systems Laboratory ETH Zurich, 09 Zurich, Switzerland Email: studer@iis.ee.ethz.ch Helmut Bölcskei Communication Technology Laboratory

More information

The Effect of Preprocessing to the Complexity of List Sphere Detector Algorithms

The Effect of Preprocessing to the Complexity of List Sphere Detector Algorithms The Effect of Preprocessing to the Complexity of List Sphere Detector Algorithms Markus Myllylä and Markku Juntti Centre for Wireless Communications P.O. Box 4, FIN-914 University of Oulu, Finland {markus.myllyla,

More information

Complexity Assessment of Sphere Decoding Methods for MIMO Detection

Complexity Assessment of Sphere Decoding Methods for MIMO Detection Complexity Assessment of Sphere Decoding Methods for MIMO Detection Johannes Fink, Sandra Roger, Alberto Gonzalez, Vicenc Almenar, Victor M. Garcia finjo@teleco.upv.es, sanrova@iteam.upv.es, agonzal@dcom.upv.es,

More information

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 8.7 A Programmable Turbo Decoder for Multiple 3G Wireless Standards Myoung-Cheol Shin, In-Cheol Park KAIST, Daejeon, Republic of Korea

More information

Reconfigurable Real-time MIMO Detector on GPU

Reconfigurable Real-time MIMO Detector on GPU Reconfigurable Real-time MIMO Detector on GPU Michael Wu, Yang Sun, and Joseph R. Cavallaro Electrical and Computer Engineering Rice University, Houston, Texas 775 {mbw, ysun, cavallar}@rice.edu Abstract

More information

DUE to the high spectral efficiency, multiple-input multiple-output

DUE to the high spectral efficiency, multiple-input multiple-output IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 1, JANUARY 2012 135 A 675 Mbps, 4 4 64-QAM K-Best MIMO Detector in 0.13 m CMOS Mahdi Shabany, Associate Member, IEEE, and

More information

Analysis of a fixed-complexity sphere decoding method for Spatial Multiplexing MIMO

Analysis of a fixed-complexity sphere decoding method for Spatial Multiplexing MIMO Analysis of a fixed-complexity sphere decoding method for Spatial Multiplexing MIMO M. HARIDIM and H. MATZNER Department of Communication Engineering HIT-Holon Institute of Technology 5 Golomb St., Holon

More information

RAPID PROTOTYPING OF THE SPHERE DECODER FOR MIMO SYSTEMS

RAPID PROTOTYPING OF THE SPHERE DECODER FOR MIMO SYSTEMS RAPID PROTOTYPING OF THE SPHERE DECODER FOR MIMO SYSTEMS Luis G. Barbero and John S. Thompson Institute for Digital Communications University of Edinburgh Email: {l.barbero, john.thompson}@ed.ac.uk Keywords:

More information

Polar Codes for Noncoherent MIMO Signalling

Polar Codes for Noncoherent MIMO Signalling ICC Polar Codes for Noncoherent MIMO Signalling Philip R. Balogun, Ian Marsland, Ramy Gohary, and Halim Yanikomeroglu Department of Systems and Computer Engineering, Carleton University, Canada WCS IS6

More information

Transaction Level Model Simulator for NoC-based MPSoC Platform

Transaction Level Model Simulator for NoC-based MPSoC Platform Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits & Systems, Hangzhou, China, April 15-17, 27 17 Transaction Level Model Simulator for NoC-based MPSoC Platform

More information

Parallel High Throughput Soft-Output Sphere Decoding Algorithm

Parallel High Throughput Soft-Output Sphere Decoding Algorithm J Sign Process Syst (2012) 68:217 231 DOI 10.1007/s11265-011-0602-1 Parallel High Throughput Soft-Output Sphere Decoding Algorithm Qi Qi Chaitali Chakrabarti Received: 6 October 2010 / Revised: 9 June

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading

Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading Sayantan Choudhury and Jerry D. Gibson Department of Electrical and Computer Engineering University of Califonia, Santa Barbara

More information

Flexible N-Way MIMO Detector on GPU

Flexible N-Way MIMO Detector on GPU 01 IEEE Workshop on Signal Processing Systems Flexible N-Way MIMO Detector on GPU Michael Wu, Bei Yin, Joseph R. Cavallaro Electrical and Computer Engineering Rice University, Houston, Texas 77005 {mbw,

More information

Flexible N-Way MIMO Detector on GPU

Flexible N-Way MIMO Detector on GPU Flexible N-Way MIMO Detector on GPU Michael Wu, Bei Yin, Joseph R. Cavallaro Electrical and Computer Engineering Rice University, Houston, Texas 77005 {mbw2, by2, cavallar}@rice.edu Abstract This paper

More information

Semidefinite Relaxation-Based Soft MIMO. Demodulation via Efficient Dual Scaling

Semidefinite Relaxation-Based Soft MIMO. Demodulation via Efficient Dual Scaling Semidefinite Relaxation-Based Soft MIMO Demodulation via Efficient Dual Scaling SEMIDEFINITE RELAXATION-BASED SOFT MIMO DEMODULATION VIA EFFICIENT DUAL SCALING BY MAHSA SALMANI, B.Sc. a thesis submitted

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs

Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs Sayantan Choudhury and Jerry D. Gibson Department of Electrical and Computer Engineering University of Califonia, Santa Barbara

More information

Performance of Sphere Decoder for MIMO System using Radius Choice Algorithm

Performance of Sphere Decoder for MIMO System using Radius Choice Algorithm International Journal of Current Engineering and Technology ISSN 2277 4106 2013 INPRESSCO. All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Performance of Sphere Decoder

More information

Baseline V IRAM Trimedia. Cycles ( x 1000 ) N

Baseline V IRAM Trimedia. Cycles ( x 1000 ) N CS 252 COMPUTER ARCHITECTURE MAY 2000 An Investigation of the QR Decomposition Algorithm on Parallel Architectures Vito Dai and Brian Limketkai Abstract This paper presents an implementation of a QR decomposition

More information

DESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM

DESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM DESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM Wu Di, Li Dezhi and Wang Zhenyong School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, China

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

On the Optimizing of LTE System Performance for SISO and MIMO Modes

On the Optimizing of LTE System Performance for SISO and MIMO Modes 2015 Third International Conference on Artificial Intelligence, Modelling and Simulation On the Optimizing of LTE System Performance for SISO and MIMO Modes Ali Abdulqader Bin Salem, Yung-Wey Chong, Sabri

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

OVER the past decade, multiple-input multiple-output

OVER the past decade, multiple-input multiple-output IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 21, NOVEMBER 1, 2014 5505 Reduced Complexity Soft-Output MIMO Sphere Detectors Part I: Algorithmic Optimizations Mohammad M. Mansour, Senior Member,

More information

Performance of multi-relay cooperative communication using decode and forward protocol

Performance of multi-relay cooperative communication using decode and forward protocol Performance of multi-relay cooperative communication using decode and forward protocol 1, Nasaruddin, 1 Mayliana, 1 Roslidar 1 Department of Electrical Engineering, Syiah Kuala University, Indonesia. Master

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

Distributed MIMO. Patrick Maechler. April 2, 2008

Distributed MIMO. Patrick Maechler. April 2, 2008 Distributed MIMO Patrick Maechler April 2, 2008 Outline 1. Motivation: Collaboration scheme achieving optimal capacity scaling 2. Distributed MIMO 3. Synchronization errors 4. Implementation 5. Conclusion/Outlook

More information

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors *

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * Hsin-Ta Chiao and Shyan-Ming Yuan Department of Computer and Information Science National Chiao Tung University

More information

Extended Dataflow Model For Automated Parallel Execution Of Algorithms

Extended Dataflow Model For Automated Parallel Execution Of Algorithms Extended Dataflow Model For Automated Parallel Execution Of Algorithms Maik Schumann, Jörg Bargenda, Edgar Reetz and Gerhard Linß Department of Quality Assurance and Industrial Image Processing Ilmenau

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

A tree-search algorithm for ML decoding in underdetermined MIMO systems

A tree-search algorithm for ML decoding in underdetermined MIMO systems A tree-search algorithm for ML decoding in underdetermined MIMO systems Gianmarco Romano #1, Francesco Palmieri #2, Pierluigi Salvo Rossi #3, Davide Mattera 4 # Dipartimento di Ingegneria dell Informazione,

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Qingwei Li for the degree of Doctor of Philosophy in Electrical and Computer Engineering presented on January 4, 008. Title: Efficient VLSI Architectures for MIMO and

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

THE turbo code is one of the most attractive forward error

THE turbo code is one of the most attractive forward error IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 63, NO. 2, FEBRUARY 2016 211 Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Youngjoo Lee, Member, IEEE, Meng

More information

Multi-Hop Virtual MIMO Communication using STBC and Relay Selection

Multi-Hop Virtual MIMO Communication using STBC and Relay Selection Multi-Hop Virtual MIMO Communication using STBC and Relay Selection Athira D. Nair, Aswathy Devi T. Department of Electronics and Communication L.B.S. Institute of Technology for Women Thiruvananthapuram,

More information

GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems

GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems Noname manuscript No. (will be inserted by the editor) GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems Michael Wu Bei Yin Guohui Wang Christoph Studer Joseph R. Cavallaro Received:

More information

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila

More information

«Computer Science» Requirements for applicants by Innopolis University

«Computer Science» Requirements for applicants by Innopolis University «Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

Chain Coding Streamed Images through Crack Run-Length Encoding

Chain Coding Streamed Images through Crack Run-Length Encoding Chain Coding Streamed Images through Crack Run-Length Encoding D.G. Bailey School of Engineering and Advanced Technology, Massey University, Palmerston North, New Zealand. Email: D.G.Bailey@massey.ac.nz

More information

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR)

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) P. Vincy Priscilla 1, R. Padmavathi 2, S. Tamilselvan 3, Dr.S. Kanthamani 4 1,4 Department

More information

Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014

Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014 Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014 1/26 ! Real-time Video Transmission! Challenges and Opportunities! Lessons Learned for Real-time Video! Mitigating Losses in Scalable

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

MULTIPLE-INPUT MULTIPLE-OUTPUT (MIMO) antenna

MULTIPLE-INPUT MULTIPLE-OUTPUT (MIMO) antenna 408 IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 4, APRIL 2015 A Near-ML MIMO Subspace Detection Algorithm Mohammad M. Mansour, Senior Member, IEEE Abstract A low-complexity MIMO detection scheme is presented

More information

IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA

IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA Lakshmy Sukumaran 1, Dharani K G 2 1 Student, Electronics and communication, MVJ College of Engineering, Bangalore-560067

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Research Article Cooperative Signaling with Soft Information Combining

Research Article Cooperative Signaling with Soft Information Combining Electrical and Computer Engineering Volume 2010, Article ID 530190, 5 pages doi:10.1155/2010/530190 Research Article Cooperative Signaling with Soft Information Combining Rui Lin, Philippa A. Martin, and

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

An Implementation of a Soft-Input Stack Decoder For Tailbiting Convolutional Codes

An Implementation of a Soft-Input Stack Decoder For Tailbiting Convolutional Codes An Implementation of a Soft-Input Stack Decoder For Tailbiting Convolutional Codes Mukundan Madhavan School Of ECE SRM Institute Of Science And Technology Kancheepuram 603203 mukundanm@gmail.com Andrew

More information

Degrees of Freedom in Cached Interference Networks with Limited Backhaul

Degrees of Freedom in Cached Interference Networks with Limited Backhaul Degrees of Freedom in Cached Interference Networks with Limited Backhaul Vincent LAU, Department of ECE, Hong Kong University of Science and Technology (A) Motivation Interference Channels 3 No side information

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

On the Achievable Diversity-Multiplexing Tradeoff in Half Duplex Cooperative Channels

On the Achievable Diversity-Multiplexing Tradeoff in Half Duplex Cooperative Channels On the Achievable Diversity-Multiplexing Tradeoff in Half Duplex Cooperative Channels Kambiz Azarian, Hesham El Gamal, and Philip Schniter Dept. of Electrical Engineering, The Ohio State University Columbus,

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

A Phase-Coupled Compiler Backend for a New VLIW Processor Architecture Using Two-step Register Allocation

A Phase-Coupled Compiler Backend for a New VLIW Processor Architecture Using Two-step Register Allocation A Phase-Coupled Compiler Backend for a New VLIW Processor Architecture Using Two-step Register Allocation Jie Guo, Jun Liu, Björn Mennenga and Gerhard P. Fettweis Vodafone Chair Mobile Communications Systems

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Performance of UMTS Radio Link Control

Performance of UMTS Radio Link Control Performance of UMTS Radio Link Control Qinqing Zhang, Hsuan-Jung Su Bell Laboratories, Lucent Technologies Holmdel, NJ 77 Abstract- The Radio Link Control (RLC) protocol in Universal Mobile Telecommunication

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm Atish A. Peshattiwar & Tejaswini G. Panse Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, E-mail : atishp32@gmail.com,

More information

On combining chase-2 and sum-product algorithms for LDPC codes

On combining chase-2 and sum-product algorithms for LDPC codes University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding A Modified Medium Access Control Algorithm for Systems with Iterative Decoding Inkyu Lee Carl-Erik W. Sundberg Sunghyun Choi Dept. of Communications Eng. Korea University Seoul, Korea inkyu@korea.ac.kr

More information

Pattern based Residual Coding for H.264 Encoder *

Pattern based Residual Coding for H.264 Encoder * Pattern based Residual Coding for H.264 Encoder * Manoranjan Paul and Manzur Murshed Gippsland School of Information Technology, Monash University, Churchill, Vic-3842, Australia E-mail: {Manoranjan.paul,

More information

High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder

High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder I J C T A, 9(24), 2016, pp. 291-298 International Science Press High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder Parvathy M.*, Ganesan R.*,** and Tefera

More information

Reliable Communication using Packet Coding for Underwater Acoustic Channels

Reliable Communication using Packet Coding for Underwater Acoustic Channels Reliable Communication using Packet Coding for Underwater Acoustic Channels Rameez Ahmed and Milica Stojanovic Northeastern University, Boston, MA 02115, USA Email: rarameez@ece.neu.edu, millitsa@ece.neu.edu

More information

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization

Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization Fazal Hameed and Jeronimo Castrillon Center for Advancing Electronics Dresden (cfaed), Technische Universität Dresden,

More information

King Fahd University of Petroleum & Minerals. Electrical Engineering Department. EE 575 Information Theory

King Fahd University of Petroleum & Minerals. Electrical Engineering Department. EE 575 Information Theory King Fahd University of Petroleum & Minerals Electrical Engineering Department EE 575 Information Theory BER Performance of VBLAST Detection Schemes over MIMO Channels Ali Abdullah Al-Saihati ID# 200350130

More information

A Review on Analysis on Codes using Different Algorithms

A Review on Analysis on Codes using Different Algorithms A Review on Analysis on Codes using Different Algorithms Devansh Vats Gaurav Kochar Rakesh Joon (ECE/GITAM/MDU) (ECE/GITAM/MDU) (HOD-ECE/GITAM/MDU) Abstract-Turbo codes are a new class of forward error

More information

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill 447 A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE (Extended Abstract) Gyula A. Mag6 University of North Carolina at Chapel Hill Abstract If a VLSI computer architecture is to influence the field

More information

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 6, June 1994, pp. 573-584.. The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor David J.

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Experiment 3. Getting Start with Simulink

Experiment 3. Getting Start with Simulink Experiment 3 Getting Start with Simulink Objectives : By the end of this experiment, the student should be able to: 1. Build and simulate simple system model using Simulink 2. Use Simulink test and measurement

More information

Abstract. Keywords. 1. Introduction. Suneeta V. Budihal 1, Rajeshwari M. Banakar 2

Abstract. Keywords. 1. Introduction. Suneeta V. Budihal 1, Rajeshwari M. Banakar 2 Complexity analysis of MIMO sphere decoder using radius choice and increasing radius algorithms Suneeta V. Budihal 1, Rajeshwari M. Banakar 2 Abstract Maximum Likelihood (ML) detection is the optimum method

More information

IN mobile wireless networks, communications typically take

IN mobile wireless networks, communications typically take IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL 2, NO 2, APRIL 2008 243 Optimal Dynamic Resource Allocation for Multi-Antenna Broadcasting With Heterogeneous Delay-Constrained Traffic Rui Zhang,

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Min Li, limin@imec.be Wireless Research, IMEC Introduction SDR Baseband Platforms Today are Usually Based on ILP + DLP + MP Massive

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

Coordinated Multi-Point in Mobile Communications

Coordinated Multi-Point in Mobile Communications Coordinated Multi-Point in Mobile Communications From Theory to Practice Edited by PATRICK MARSCH Nokia Siemens Networks, Wroctaw, Poland GERHARD P. FETTWEIS Technische Universität Dresden, Germany Pf

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information