In previous section we discussed about the parallelism which can exists in an appllication. Modern processors employ intelligent processor architectures to exploit the parallelism of the computer programs. In this section we will discuss about different processor architectures.

(1) Von-Neumann (or stored program computer) architecture

Von-Neumann architecture was one of the primitive architecture. At the time of its inventions, the computer programs were very small and simple and memory cost was very high. Under Von-Neumann architecture, the program and data are stored in the same memory, and are accessed on the same bus. Each instructions is fetched (read from the memory), decoded and executed. During the decode state, any operands (if needed) are fetched from the same memory. Von-Neumann Computers are also called stored program computers, because instructions (or program) are stored in a ROM (Read only Memory), which can not be changed during run-time.

(2) Harvard architecture

Harvard architecture is a modification over Von-Neumann architecture. In Harvard architecture, separate data paths (address and data buses) exist to access code (program) and data (data operands). This makes it possible to fetch instructions and data same at time (on different buses). Since instructions have a separate data path, next instructions can be fetched while decoding and executing the current instructions.

(3) Harvard Architecture Derivatives

There are some derivatives of Harvard architecture (e.g. modified Harvard and Super Harvard) which have multiple data paths for data access - such architectures are more suited for data intensive applications (such as digital signal processing) which require multiple data operands for each instruction execution. Since these data operands can be fetched in parallel, a significant performance improvement is achieved.

(4) CISC (Complex Instruction Set Computer)

In the earlier days of computers, people coded their application in Machine code or Assembly code. There was no concept of high (or middle) level language. Writing codes in machine language was a tedious process. To make the programming easier and faster, computers supported a large number of instructions. These instructions could do complex operations - a single instruction could fetch one or more operands and do one or more operations on those operands. This made the programming much easier as programmer had to write less code (less number of instruction) to achieve a given task. Another favorable factor for the advance of complex instruction set was the memory cost. Since the memories were very costly, designers wanted dense instruction set (to reduce the memory requirements).

(5) RISC (Reduced Instruction Set)

Most complex instructions in CISC processor take many processor cycles to execute. In a pipelined processor, the overall speed of the processor depends on the slowest operation being performed. This means that the relatively complex instructions even slow down the execution of simpler instructions. Thus complex instructions were a major performance bottle-neck.

With the advent of compiler technology, programmers started using High (and middle) level languages. It was compiler’s task to transfer the high level code to the assembly (or machine) language code. Compilers generally used a combination of simple instructions to achieve the complex operations. It was observed that breaking the complex operations in to a combination of simple operations was much efficient (took less processor cycles to execute) than doing the same operations using a single complex instruction. Hence most of the complex instructions were not being used by compiler generated programs. Most of the addressing modes (offered by CISC) were also not being used by compilers. This led to a shift in processor design philosophy. Processor designers started focusing on reducing the size and complexity of instruction sets (since most complex instructions were not being used) and making a small and simple instruction set (which could be used by compilers) - this could help in two ways. First, simpler instructions could speed up the pipe-line and thus provide a performance improvement. Second, simple instruction set implies less computer hardware and thus reduced cost. Therefore the design goal was to provide basic simple instructions, which could execute faster. Compilers could use these instructions to construct complex operations.

Another interesting trend in this era (early eighties) was a sharp increase in the speed of processors (whereas the memory speeds still remained comparatively low). This meant that memory accesses were becoming a bottle-neck. This led to the design of processors with large number of internal registers (which could be used for temporary storage rather than depending on external and slower memory) and cache memories.

(6) DSPs

Digital Signal Processors are special purpose processors with their processing units and instruction set tailored to suit the Signal Processing Applications. MAC (Multiply and Accumulate) and Shifter (Arithmetic and Logical shift) units are added to the DSP cores since Signal Processing Algorithms heavily depend on such operations. Circular Buffers, Bit Reversal Addressing, Hardware Loops, and DAG (Digital Address Generators) are some other common features of a DSP Architecture. Since Signal Processing Applications are data intensive, the data I/O bandwidth of these processors is designed to be high. In modern days, lot of embedded systems run signal processing applications (cell phones, portable media players etc).

(7) VLIW architecture

“Very Large Instruction Word” architecture consists of multiple ALUs in parallel. These architectures have been designed to exploit the “Instruction Level Parallelism” in application. Programmers can break their code such that each ALU can be loaded in parallel. The operation to be done on each ALU (in a given cycle) forms the instruction word (for that cycle). It is completely up to the programmer to take care of partitioning of the application across different ALUs. Also, it is programmer’s burden (and compiler’s burden if the code is being written in high level language) to make sure that there is not interdependency between the instructions with are part of “instruction word”. The processor does not have any hardware to ascertain (and reschedule) the order of instructions (this is called static scheduling).

(8) VLIW vs super scalar

Super scalar architectures are similar to VLIW architectures in the sense that they have multiple ALUs. But, super scalar processors employ dynamic scheduling of instructions. These architectures have special hardwares to determine the interdependency of the instruction and schedule their execution on different ALUs. The multiple ALUs are hidden to the programmer. Programmers write (and compilers generate) their codes as if only one ALU is available. The processor internal hardware reschedules the instructions to exploit the ILP. Programming for Super Scalar architectures in this way is much simpler than that on VLIW architectures. However, it comes at an added cost of Hardware Complexity. The additional hardware required for dynamic scheduling adds to both the cost and power consumption.

(9) SIMD

SIMD stands for “Single Instruction Multiple Data”. SIMD architectures have multiple ALUs. However in a single processor cycle, same instruction (operation) needs to be executed on all the ALUs. The data inputs to the ALU can be different. The SIMD processor thus executes Same (Single) Instruction on different (Multiple) Data Inputs, in a given cycle. SIMD processors exploit the “Data Level Parallelism” of an application.

(10) Multi-Core architectures

Recent trend in the processor design is of Multi-core architectures. These architecture contain multiple CPU cores (not multiple ALUs) on a single chip. These processor can exploit the “Thread Level Parallelism” of an application.

(11) Stream Processor

Stream Processing is one of the recent trends in processor design. Steam Processing is typically suited for data intensive symmetric multimedia applications. The Stream Architectures exploit ILP, DLP and TLP at the same time.

User Comments

No Posts found !

Login to Post a Comment.