By Andrew Siska, Applications Engineer Sr Staff, and Meng He, Product Marketing Engineer Sr, Cypress Semiconductor Corp. The term Bit Slicing was once dominant in history books as a technique for constructing a processor from processor modules of smaller bit width where each of these components processes one field or "slice" of an operand. Bit slice processors usually consist of an arithmetic logic unit (ALU) of 2, 4 or 8 bits and control lines. Using multiple, simpler ALUs was seen as a way to increase computing power in a cost-effective manner. The latest system-on-chip (SOC) technology revives bit-slice in a programmable fashion to serve the purpose of offloading the main CPU by intelligently assigning processing tasks to other processing. How did bit-slice evolve over years? In the early 70s, a number of very complex microprocessor designs passed the 8-bit barrier using very simple arithmetic logic units (ALUs). These sophisticated programmable digital systems weren t designed using 8, 16 or 32-bit microprocessors but rather cascaded 4-bit processors, known as bit-slice processors. These processors had very simple instruction sets (much simpler than today s RISC processors) but performed some very sophisticated processing. Devices such as AMD's Am2900 family and National Semiconductor s IMP-16 and IMP-8 were typically found in aviation systems, guidance and tracking systems, and early signal processing applications. Many of these bit-slice processors have gone the way of thru-hole components and have been replaced by the more popular 8- through 32-bit processors that are found in the market today. However, bit-slice processors are still found in some military, aerospace, industrial, and academic designs, and they are far from being dead. The marriage of programmable logic, such as PLDs and FPGAs with multiple reduced instructions set ALUs has opened up a new palette for the digital designers. The programmable face-off of bit-slice technology Given the numerous microprocessors and microcontrollers on the market today, why would one build a design using bit-slicing techniques? Given the many embedded designs the reader has probably completed during their design career the answer is simple there are numerous tasks better performed by hardware than software. In order to keep production costs down, it s more cost-effective to select a high-performance processor and implement the hardware functions in software. What if instead of opting for a high-performance processor, a designer was able to use a low-cost microprocessor that included programmable logic and a number of simple instruction set ALUs. The microprocessor would then be able to perform simple tasks while the programmable logic and ALUs would handle the more complex, higher bit width processes. Let us explore a device with 24 such ALUs, which we will call data-paths with a mixture of PLDs. The data-path shown in Figure-1 below contains 1. 2. 3. 4. 5. 6. 7. An 8-bit single-cycle ALU that can perform general-purpose functions including add, subtract, AND, OR, XOR, and PASS Associated compare and condition generation circuits Built in Cyclic Redundancy Check (CRC) and Pseudo Random Sequence (PRS) generation Variable Most Significant Byte (MSB) to be programmable specified for arbitrary width digital functions Two 4-byte deep FIFOs, two 8-bit wide data registers and two 8-bit accumulators Data inputs that can be support different types of data inputs: Configuration, control, and serial and parallel data Data output that can be various signals such as conditional, status data, etc. Page 1 of 6
Figure 1: Datapath Architecture Each one of these 8 bit data-paths can be coupled to its 8-bit data-path neighbor, which in turn can be coupled to its neighbor, and so on. An architecture of this nature effectively yields an 8 to n-bit processor in multiples of 8 bit. Note that the FIFOs, data registers, accumulators, and ALUs in the data-path can all be configured as n-bit in this manner. In addition, multi-byte data-path modules automatically chain the 8-bit data-paths together and the control signals and status outputs for each of the data-paths in the module. For instance, if 8 bits is not enough for a particular application, the data-path can be coupled to a neighboring data-path to form a 16 or higher bit processor. An additional benefit of this architecture is each instruction requires only 1 clock cycle. As a consequence, designs will run at hardware speed instead of processor state speed. In applications that are oversampled, or do not need the highest clock rates, the single ALU block in the data-path can be efficiently shared with two sets of registers and condition generators. ALU and shift outputs are registered and can be used as inputs in subsequent cycles. Usage examples include support for 16-bit functions in one (8-bit) data-path or interleaving a CRC generation operation with a data shift operation. An enhancement made to the standard bit slice architecture is the inclusion of Programmable Logic. This allows developers to include a standard state machine using Verilog. In addition, arithmetic functions that normally consume a large number of logic gates are no longer a concern because these functions can be implemented in the standard ALU and controlled by the state machine. Also note that the main processor and ALUs can run on separate clocks. For instance, the core processor can be clocked at 24 MHz while the ALUs can be clocked at 48 MHz or higher. Figure 2 below shows three 8-bit ALUs or data-paths chained together to form a 24-bit processor. Page 2 of 6
Figure 2: Datapath Chaining Architecture A 16-bit example In this example, we are going to create a 16-bit pattern generator with the 16-bit pattern shifted out continuously using the PSoC 3 Programmable System-on-chip and PSoC Creator development environment from Cypress Semiconductor. In this project, we are only using the digital portion inside the chip without involving the main CPU. One data path is set for the least significant 8-bits and another data-path for the most significant 8-bits. Figure 3 shows the data path configuration for the least significant 8-bits of the 16-bit pattern generator, and Figure 4 shows the data path configuration for the most significant 8-bits of the 16-bit pattern generator. Page 3 of 6
Figure 3: Datapath Configuration LSB Page 4 of 6
Figure 4: Datapath Configuration MSB In both Figure 3 and Figure 4, the ALU instructions are identical. A reset or clear of A0 (accumulator 0) is performed when Dynamic Configuration register 0 is pointed to by the state machine. The value in A0 is shifted right one bit when the state machine points to Configuration Register 1, and the value in A1 (accumulator 1) is incremented when Dynamic Configuration register 3 is pointed to. The bit shifted out of the high order ALU is shifted in to the low order ALU shifting into and out of an ALU is accomplished by setting CHAIN in the SIA field of Static Configuration Register 6 in the low order ALU (Figure 3) and setting CHAIN in the CIA field of Static Configuration Register 6 in the high order ALU (Figure 4). Since both the high and low order 8-bit data paths are clocked by a common clock, they act as a single 16-bit processor and are completely independent of the central processor no firmware, processor intervention, or stolen processor cycles is needed to run the pattern generator. This simple project demonstrates how to connect multiple data-path ALUs. Rather than requiring a high performance microcontroller to run tasks in what appears to be real-time, developers can use a simple microcontroller to manage the application and leave the real-time background tasks to multiple ALUs combined with programmable logic. System-on-chip (SOC) technology revives bit-slicing in a programmable fashion to serve the purpose of offloading the main CPU by intelligently assigning processing tasks to other on-chip programmable hardware. With a bit-slicing architecture,, developers can not only develop a standard state machine but the arithmetic functions as well that normally consume a large amount of logic gates. Neither is a cause for concern because these will be implemented in the standard ALU contained in the data path logic and/or controlled by the PLD based state machine, allowing the modern embedded system engineer to focus on the overall system power consumption and efficiency. Page 5 of 6
Cypress Semiconductor 198 Champion Court San Jose, CA 95134-1709 Phone: 408-943-2600 Fax: 408-943-4730 http://www.cypress.com Cypress Semiconductor Corporation, 2007. The information contained herein is subject to change without notice. Cypress Semiconductor Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in a Cypress product. Nor does it convey or imply any license under patent or other rights. Cypress products are not warranted nor intended to be used for medical, life support, life saving, critical control or safety applications, unless pursuant to an express written agreement with Cypress. Furthermore, Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress products in life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. PSoC Designer, Programmable System-on-Chip, and PSoC Express are trademarks and PSoC is a registered trademark of Cypress Semiconductor Corp. All other trademarks or registered trademarks referenced herein are property of the respective corporations. This Source Code (software and/or firmware) is owned by Cypress Semiconductor Corporation (Cypress) and is protected by and subject to worldwide patent protection (United States and foreign), United States copyright laws and international treaty provisions. Cypress hereby grants to licensee a personal, non-exclusive, non-transferable license to copy, use, modify, create derivative works of, and compile the Cypress Source Code and derivative works for the sole purpose of creating custom software and or firmw are in support of licensee product to be used only in conjunction with a Cypress integrated circuit as specified in the applicable agreement. Any reproduction, modification, translation, compilation, or representation of this Source Code except as specified above is prohibited without the express written permission of Cypress. Disclaimer: CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Cypress reserves the right to make changes without further notice to the materials described herein. Cypress does not assume any liability arising out of the application or use of any product or circuit described herein. Cypre ss does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress product in a life -support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. Use may be limited by and subject to the applicable Cypress software license agreement. Page 6 of 6