Parallel Multiprocessor Architectures: Master Slave Designs, Symmetric and Asymmetric Multiprocessors

05 Mar

Master/slave designs

The Master/Slave design pattern is another fundamental architecture LabVIEW developers use. It is used when you have two or more processes that need to run simultaneously and continuously but at different rates. If these processes run in a single loop, severe timing issues can occur. These timing issues occur when one part of the loop takes longer to execute than expected. If this happens, the remaining sections in the loop get delayed. The Master/Slave pattern consists of multiple parallel loops. Each of the loops may execute tasks at different rates. Of these parallel loops, one loop acts as the master and the others act as slaves. The master loop controls all of the slave loops, and communicates with them using messaging architectures.

The Master/Slave pattern is most commonly used when responding to user interface controls while collecting data simultaneously. Suppose you want to write an application that measures and logs a slowly changing voltage once every five seconds. It acquires a waveform from a transmission line and displays it on a graph every 100ms, and also provides a user interface that allows the user to change parameters for each acquisition.

The Master/Slave design pattern is well suited for this application. In this application, the master loop will contain the user interface. The voltage acquisition and logging will happen in one slave loop, while the transmission line acquisition and graphing will happen in another.

Applications that involve control also benefit from the use of Master/Slave design patterns. An example is a user’s interaction with a free motion robotic arm. This type of application is extremely control oriented because of the physical damage to the arm or surroundings that might occur if control is mishandled. For instance, if the user instructs the arm to stop its downward motion, but the program is occupied with the arm swivel control, the robotic arm might collide with the support platform. This situation can be avoided by applying the Master/Slave design pattern to the application. In this case, the user interface will be handled by the master loop, and every controllable section of the robotic arm will have its own slave loop. Using this method, each controllable section of the arm will have its own loop, and therefore, its own piece of processing time.

This will allow for more responsive control over the robotic arm via the user interface.

Why Use Master/Slave?

The Master/Slave design pattern is very advantageous when creating multi-task applications. It gives you a more modular approach to application development because of its multi-loop functionality, but most importantly, it gives you more control of your application’s time management. In LabVIEW, each parallel loop is treated as a separate task or thread. A thread is defined as a part of a program that can execute independently of other parts. If you have an application that doesn’t use separate threads, that application is interpreted by the system as one thread. When you split your application up into multiple threads, they each share processing time equally with each other.

This gives you more control on how your application is timed, and gives the user more control over your application. The parallel nature of LabVIEW lends itself towards the implementation of the Master/Slave design pattern.

For data acquisition example above, we could have conceivably put both the voltage measurement and the waveform acquisition together in one loop, and only perform the voltage measurement on every 50th iteration of the loop. However, the voltage measurement and logging the data to disk may take longer to complete than the single acquisition and display of the waveform. If this is the case, then the next iteration of the waveform acquisition will be delayed, since it cannot begin before all of the code in the previous iteration completes. Additionally, this architecture would make it difficult to change the rate at which waveforms were acquired without changing the rate of logging the voltage to disk.

The standard Master/Slave design pattern approach for this application would be to put the acquisition processes into two separate loops (slave loops), both driven by a master loop that polls the user interface (UI) controls to see if the parameters have been changed. To communicate with the slave loops, the master loop writes to local variables. This will ensure that each acquisition process will not affect the other, and that any delays caused by the user interface (for example, bringing up a dialog) will not delay any iteration of the acquisition processes.

Build a Master/Slave Application

The Master/Slave design pattern consists of multiple parallel loops. The loop that controls all the others is the master and the remaining loops are slaves. One master loop always drives one or more slave loops. Since data communication directly between these loops breaks data flow, it must be done by writing and reading to messaging architectures i.e. local or global variables, occurrences, notifiers, or queues in LabVIEW.

Example – Synchronizing Loops

This application has the following requirements:

* Create a Queued Message Handler to handle the user interface. The user interface should contain two toggle process buttons with LEDs, and an exit button.

* Create two separate processes that both switch an individual LED on and off at different rates (100 and 200 ms intervals). These two processes will be controlled using the user interface.

Our first step will be to decide which process will be the master and which processes will be the slaves. In this example, the user interface will be placed inside the master loop, and the two blinking LED processes will be the two slave loops. The user interface will control the operation of each slave loop with local variables.

We are now ready to begin our LabVIEW Master/Slave application. To view the final Master/Slave application, please open the attached VI (

Important Notes

There are some caveats to be aware of when dealing with the Master/Slave design pattern such as messaging architectures use and synchronization.

* Messaging architectures (shared data)

Problem: If multiple loops try to write data to the shared variable at the same time, there is no way to tell which value may ultimately get written. This is known as a race condition.

Solution: Place an “acquire/release semaphore” pair around any piece of code that writes to the global. This ensures that multiple loops don’t attempt to write to the global at the same time. Among the examples included with LabVIEW are a couple that demonstrate the use of semaphores. Semaphores will lock the global data while being written to in order to avoid a race condition.

* Synchronization

Problem: Since the Master/Slave design pattern is not based on synchronization, it is possible that the slave loops begin execution before the master loop. Therefore initializing the master loop before the slave loops begin execution can be a problem.

Solution: Occurrences can be used to solve these kinds of synchronization problems.

Symmetric/Asymmetric Multiprocessors

  1. Symmetric multiprocessing

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors. Processors may be interconnected using buses, crossbar switches or on-chip mesh networks. The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and power consumption of the interconnect among the various processors, the memory, and the disk arrays. Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much higher processor counts at the sacrifice of programmability:

Serious programming challenges remain with this kind of architecture because it requires two distinct modes of programming, one for the CPUs themselves and one for the interconnect between the CPUs. A single programming language would have to be able to not only partition the workload, but also comprehend the memory locality, which is severe in a mesh-based architecture.

SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.

Advantages and disadvantages

SMP has many uses in science, industry, and business which often use custom-programmed software for multithreaded (multitasked) processing. However, most consumer products such as word processors and computer games are written in such a manner that they cannot gain large benefits from concurrent systems. For games this is usually because writing a program to increase performance on SMP systems can produce a performance loss on uniprocessor systems. Recently[update], however, multi-core chips are becoming more common in new computers, and the balance between installed uni- and multi-core computers may change in the coming years.

Uniprocessor and SMP systems require different programming methods to achieve maximum performance. Therefore two separate versions of the same program may have to be maintained, one for each. Programs running on SMP systems may experience a performance increase even when they have been written for uniprocessor systems. This is because hardware interrupts that usually suspend program execution while the kernel handles them can execute on an idle processor instead. The effect in most applications (e.g. games) is not so much a performance increase as the appearance that the program is running much more smoothly. In some applications, particularly compilers and some distributed computing projects, one will see an improvement by a factor of (nearly) the number of additional processors.

In situations where more than one program executes at the same time, an SMP system will have considerably better performance than a uni-processor because different programs can run on different CPUs simultaneously.

Systems programmers must build support for SMP into the operating system: otherwise, the additional processors remain idle and the system functions as a uniprocessor system.

In cases where an SMP environment processes many jobs, administrators often experience a loss of hardware efficiency. Software programs have been developed to schedule jobs so that the processor utilization reaches its maximum potential. Good software packages can achieve this maximum potential by scheduling each CPU separately, as well as being able to integrate multiple SMP machines and clusters.

Access to RAM is serialized; this and cache coherency issues causes performance to lag slightly behind the number of additional processors in the system (aga).

Entry-level systems

Before about 2006, entry-level servers and workstations with two processors dominated the SMP market. With the introduction of dual-core devices, SMP is found in most new desktop machines and in many laptop machines. The most popular entry-level SMP systems use the x86 instruction set architecture and are based on Intel’s Xeon, Pentium D, Core Duo, and Core 2 Duo based processors or AMD’s Athlon64 X2, Quad FX or Opteron 200 and 2000 series processors. Servers use those processors and other readily available non-x86 processor choices, including the Sun Microsystems UltraSPARC, Fujitsu SPARC64 III and later, SGI MIPS, Intel Itanium, Hewlett Packard PA-RISC, Hewlett-Packard (merged with Compaq which acquired first Digital Equipment Corporation) DEC Alpha, IBM POWER and Apple Computer PowerPC (specifically G4 and G5 series, as well as earlier PowerPC 604 and 604e series) processors. In all cases, these systems are available in uniprocessor versions as well.

Earlier SMP systems used motherboards that have two or more CPU sockets. More recently[update], microprocessor manufacturers introduced CPU devices with two or more processors in one device, for example, the POWER, UltraSPARC, Opteron, Athlon, Core 2, and Xeon all have multi-core variants. Athlon and Core 2 Duo multiprocessors are socket-compatible with uniprocessor variants, so an expensive dual socket motherboard is no longer needed to implement an entry-level SMP machine. It should also be noted that dual socket Opteron designs are technically ccNUMA designs, though they can be programmed as SMP for a slight loss in performance

Mid-level systems

The Burroughs D825 first implemented SMP in 1962.[1][2] It was implemented later on other mainframes. Mid-level servers, using between four and eight processors, can be found using the Intel Xeon MP, AMD Opteron 800 and 8000 series and the above-mentioned UltraSPARC, SPARC64, MIPS, Itanium, PA-RISC, Alpha and POWER processors. High-end systems, with sixteen or more processors, are also available with all of the above processors.

Sequent Computer Systems built large SMP machines using Intel 80386 (and later 80486) processors. Some smaller 80486 systems existed, but the major x86 SMP market began with the Intel Pentium technology supporting up to two processors. The Intel Pentium Pro expanded SMP support with up to four processors natively. Later, the Intel Pentium II, and Intel Pentium III processors allowed dual CPU systems, except for the respective Celerons. This was followed by the Intel Pentium II Xeon and Intel Pentium III Xeon processors which could be used with up to four processors in a system natively. In 2001 AMD released their Athlon MP, or MultiProcessor CPU, together with the 760MP motherboard chipset as their first offering in the dual processor marketplace. Although several much larger systems were built, they were all limited by the physical memory addressing limitation of 64 GiB. With the introduction of 64-bit memory addressing on the AMD64 Opteron in 2003 and Intel 64 (EM64T) Xeon in 2005, systems are able to address much larger amounts of memory; their addressable limitation of 16 EiB is not expected to be reached in the foreseeable future.

Asymmetric multiprocessing

Asymmetric Multiprocessing, or AMP, was a software stopgap for handling multiple CPUs before Symmetric Multiprocessing, or SMP was available.

Multiprocessing means more than one CPU in a computer system. The CPU is the arithmetic and logic engine that executes user applications; an I/O interface such as a GPU, even if it is implemented using an embedded processor, does not constitute a CPU because it does not run the user’s application program. With multiple CPUs, more than one user application can run at the same time. All of the CPUs have the same user-mode instruction set, so a running job can be rescheduled from one CPU to another.

Differences between symmetric and asymmetric multiprocessing

Engineers contemplating a migration from a single-core to a multicore processor must identify where parallelism exists in their application. The next decision is how to partition the code over the cores of the device. The two main options are symmetric-multiprocessor (SMP) mode and asymmetric-multiprocessor (AMP) mode.

In some cases, combinations of these make sense as well. There’s just one kernel in SMP mode, and it’s run by all cores. In AMP mode, each core has its own copy of a kernel, which could be different (heterogeneous operating systems) from, or identical (homogenous operating systems) to, the one the other core is executing.

In SMP mode, a single operating system (OS) runs on all processors, which access a single image of the OS in the memory. The OS is responsible for extracting parallelism in the application. It dynamically partitions tasks across the processors, manages the ordering of task completion, and controls the sharing of all resources between the cores.

The latter point is important, especially if there isn’t a one-to-one relationship of chip resources (such as memory controllers) to cores. In this case, the OS will dedicate one core to “own” the resource and alert others if they need to know, but the user needn’t know which core owns the resource.

To the user, the program appears to run on a single processor, and much of the chip complexity is “under the hood” of the OS. This is the simplest programming model. It’s frequently used to scale the performance of an application that has parallelism (i.e., processes), but isn’t inherently parallel (as is the case when managing data streams originating from millions of separate users).

Performance will scale less than linearly with the number of cores put in place. In fact, the incremental value of adding another core approaches zero as the high degree of intercore communication eventually overwhelms the gain adding the core.

In AMP mode, the processor cores in the device are largely unaware of each other. Separate OS images exist in main memory, though there may be a shared location for interprocessor communications. AMP may take the form of multiple instances of the same non-SMP-aware OS.

Or, different OSs may reside on the two cores. For example, if two previously separate processors are being collapsed onto one dual-core device, different OSs may reside on the two cores.

In another usage model for AMP, a statically partitioned, task-offload arrangement-one core manages the computationally intensive calculations while the other runs the main OS. This arrangement sometimes is used with a real-time OS (RTOS), which forwards packets, and the Linux OS, which implements higher-level applications.

In any of these three cases, the user must pay attention to the resource sharing between the two OSs. Neither OS owns the entire chip, so resource sharing isn’t “under the hood” as in SMP mode.

If there’s a single instance of a resource on the multicore device, one core typically “owns” it. That means it receives interrupts and error messages and would be responsible, at the application-level, for passing the required information to the other core

AMP has some advantages, though. It’s the only approach that works when two separate OSs are in place. Also, resources can be dedicated to critical tasks, resulting in more deterministic performance. And it often has higher performance than SMP, because the cores spend less time handshaking with each other.

The decision to use SMP or AMP largely will be about what’s easiest to implement. Applications that already run on an SMP-aware OS, such as Linux, can easily scale by adding more cores under SMP. AMP is a good choice if the application has obvious parallelism that’s easily partitioned to the number of cores at the user level.

Networked multiprocessing is an asymmetric implementation mostly found in embedded architectures where the system typically has a network connection, shared bus, or shared mailbox for communication between processing elements. Otherwise, the main memory isn’t shared.

Asymmetric multiprocessing – In asymmetric multiprocessing (ASMP), the operating system typically sets aside one or more processors for its exclusive use. The remainder of the processors runs user applications. As a result, the single processor running the operating system can fall behind the processors running user applications. These forces the applications to wait while the operating system catches up, which reduces the overall throughput of the system. In the ASMP model, if the processor that fails is an operating system processor, the whole computer can go down.

Symmetric multiprocessing – Symmetric multiprocessing (SMP) technology is used to get higher levels of performance. In symmetric multiprocessing, any processor can run any type of thread. The processors communicate with each other through shared memory.

SMP systems provide better load-balancing and fault tolerance. Because the operating system threads can run on any processor, the chance of hitting a CPU bottleneck is greatly reduced. All processors are allowed to run a mixture of application and operating system code. A processor failure in the SMP model only reduces the computing capacity of the system.

SMP systems are inherently more complex than ASMP systems. A tremendous amount of coordination must take place within the operating system to keep everything synchronized. For this reason, SMP systems are usually designed and written from the ground up.

Go to Exercise # 1

Leave a comment

Posted by on March 5, 2011 in Topic 2


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: