Mutliprocessor-Compatible T-Engine

Yuichi Toyoyama

YRP Ubiquitous Networking Laboratory


Multiprocessor systems in which multiple processors are loaded into a single system are drawing attention also in the world of embedded systems. In the T-Engine project also, development of a multiprocessor-compatible T-Engine and T-Kernel is progressing. In this article, I would like to explain our approach to multiprocessor compatibility in the T-Engine project.

Embedded Systems and Multiprocessors

First, I will explain multiprocessor systems centering on embedded systems.

In embedded systems, controlling one device by means of multiple processors is actually not such a rare thing. In cell-phones, a two-processor configuration of a base band chip that carries out control of communications and calling plus an application processor is frequently used. In automobiles, lots of processors are loaded in the cruising controls and the controls of safety systems. If we look at the past, even in old video recorders of more than 10 years ago, motor control and timer and remote controller control were carried out with different processors.

That we were not very conscious that these systems are multiprocessor systems could be ascribed to the fact that the respective processors ran independently. Operating systems and applications ran independently on each processor, and cooperation between the processors mainly took place by means of data communication at the application level. If we look at this from the viewpoint of programs, we could say that it just so happened that multiple processors were included in the same device, and that there weren't any differences with single processor systems.

However, in keeping with systems overall becoming complex, the demand of wishing to carry out cooperation among processors not at the application level, but rather at the operating system level has become great. Also, if we consider the reusability of software, rather than applications carrying out cooperation independently, having it standardized by means of the operating system has become desirable. In this manner, the necessity of a multiprocessor-compatible operating system is rising.

On the other hand, when we turn our gaze toward information-type computers, such as servers and personal computers that are outside of embedded systems, different forms of multiprocessor systems are being used. In information-type systems, processing power in the processors themselves is strongly demanded. Therefore, multiprocessor systems aimed at raising the processing power and executing at high speed by dispersing processing to multiple processors have been developed.

We could say that in contrast to embedded systems having aimed at function dispersal by assigning a processor to each function, information-type systems have dispersed the processor load to multiple processors.

However, even in embedded systems, in keeping with products becoming highly functional, processor processing power has come to be made a necessity in the same manner as information-type systems. Accordingly, attention is being drawn also to multiprocessor systems aimed at load dispersion, which have been used in information-type systems up to now.

In this manner, although we say multiprocessor for short, there are multiprocessor systems aimed at function dispersion, which have been common in embedded systems; and multiprocessor systems aimed at load dispersion, which have been common in information-type systems. In order to distinguish the two, we call the former an Asymmetric Multiple Processor (hereafter, AMP), and the latter a Symmetric Multiple Processor (hereafter, SMP). The term asymmetric expresses the fact that the role of the respective processors is different, and the term symmetric conversely express the fact the roles are the same and there is no difference.

Multiprocessors and the Operating System

In an AMP system, software such as respective application programs and operating systems run on top of processors with different roles. A human who is the developer decides at the design stage which processor carries out what processing and what programs will be executed. To that extent, one could say that the operating system load in AMP is comparatively light. Basically, it can be realized if we provide functions to carry out synchronization and communication among processors. Also, as I have already mentioned, if we realize synchronization and communication among processors at the application level, special multiprocessor compatibility in the operating system itself becomes unnecessary. Even something like not using an operating system is possible (Fig. 1).

Figure 1. Asymmetric Multiple Processor

In contrast to this, in an SMP system, the assigning of processing to processors is normally carried out dynamically by the operating system at program execution time. Accordingly, neither the application programs nor users are particularly conscious of the fact that there is a multiprocessor. I think this fact is easy to understand if we call to mind a personal computer into which a multiprocessor has been loaded. Personal computers loaded with a multiprocessor has already been put into practical use and are being marketed. Even if we try using a personal computer loaded with a multiprocessor, there is not normally any consciousness of that fact. Even with the application programs, the same program should run, without any distinction between a single processor and a multiprocessor. In this manner, in SMP, the number of processors is concealed by the operating system. To put it conversely, a lot of multiprocessor-compatible functions come to be demanded in the operating system. Information-type operating systems, such as Windows and Linux, which are commonly used in personal computers, provide these SMP-compatible functions (Fig. 2).

Figure 2. Symmetric Multiple Processor

Multicore Processors

In the world of embedded systems, there is the appearance of the multicore processor as one factor for which multiprocessors have begun to attract attention. A multicore processor is something in which multiple processor cores have been built inside a single chip package. Because we can grasp multiprocessors as something made into one chip, we could say that there is almost no difference between multicores and multiprocessors from the viewpoint of software. However, when we look at them from the viewpoint of hardware, in various aspects, from the implementation area to manufacturing cost, they come to be different. We could say that as a result of the appearance of multicore processors, the adoption of multiprocessor systems has become realistic even in embedded systems.

Multicore processors also are roughly divided into AMP and SMP. In particular, it is expected that the SMP multicore processor can be used as a "faster processor." Functions demanded in embedded systems increase daily and the systems are being made highly functional, but wishing for high processing power, there is a limit to continually raising the processor's running clock. Still more, with such things as battery driven devices, raising the clock speed connects with an increase in power consumption, and it cannot be realized easily. Therefore, instead of raising the clock speed, one solution becomes increasing the processing power by increasing the number of processors. For this case, the use of SMP, which does not make us conscious of the number of processors from the application program, is being considered.

AMP and SMP

As you can understand from what I have explained up to here, AMP and SMP are not simply things one compares in the manner of which one is superior, but rather, depending on the use and the purpose, it comes down to a decision of which one is suitable. If the functions have been clearly fixed and separation is easy, then the adoption of AMP is simple. Also, with AMP that basically runs in units of individual processors, it is possible to inherit as is a large amount of the techniques and resources of conventional embedded software.

If high speed execution of applications that necessitates processor processing power is the purpose, then SMP in which one is not conscious of multiprocessors, is probably the one that is suitable. However, because SMP is still a new technology in the field of embedded systems, there are very many points about which we must carry out study and verification. In embedded systems, so-called real-time operating system functions, such as high responsiveness and predictability of the processing time, are sought in regard to the operating system. In SMP, where processing is automatically assigned to and carried out on multiple processors, how far real-time performance can be maintained will be a future issue.

At present, operating systems that are compatible with SMP are basically information-type operating systems. An information-type operating system employs round robin scheduling in multitask processing. Because the independence of individual tasks (processes) is high with round robin scheduling, we could say that it is suitable with SMP. Although we run individual tasks with different processors, differences do not appear in the basic operations. However, in round robin scheduling, obtaining sufficient real-time performance is difficult (Fig. 3).

Figure 3. Multiprocessing in the case of round robin scheduling

In contrast to this, in a real-time operating system, we carry out scheduling based on priority levels. In this method, while a task with a high priority level is running, a task with a lower priority level cannot run. If we simply apply this method to SMP, in spite of there being multiple processors, the actual carrying out of processing will only be the processor that is executing the task with the highest priority level, and it will come about that we will be unable to obtain the advantage of raising execution speed by means of multiprocessors. Of course, simply applying absolute priority level scheduling to SMP is unrealistic, and, in actuality, it would probably be implemented so that tasks are simultaneously executed on individual processors. This means that the application program model is more or less different from the single processor. In the previous section, I mentioned that there are expectations for the use of SMP as a "faster processor," but it appears that with real-time operating system programs we will not arrive in circumstances where we raise the processing speed simply by increasing processors (Fig. 4).

Figure 4. Multiprocessing in the case of absolute priority level scheduling

Based on the above, for a real-time operating system compatible with SMP, an overall study, from the scheduling scheme to the application program model, will become necessary.

T-Engine's Multiprocessor Compatibility

I will at last proceed with a discussion of multiprocessor compatibility in T-Engine. With T-Engine, based on the study I have mentioned up to now, it is our policy to move forward with compatibility for both methods, AMP and SMP.

The Roadmap to Multiprocessor Compatibility

The first thing that is important is multiprocessor compatibility of T-Kernel, which is the operating system of T-Engine. In Fig. 5, I show the roadmap for multiprocessor compatibility with T-Kernel.

Figure 5. The T-Kernel roadmap

In regard to AMP, because of such things as applying it to existing embedded systems technology is easy, and also because a study has been in progress to a certain degree from the age of ITRON, we are planning to release AMP prior to SMP.

We will proceed with development of SMP also in parallel with AMP. Concerning SMP, we are aiming at an SMP-compatible real-time operating system suitable for embedded systems, after sufficiently carrying out test manufacture and verification.

And then, finally, rather than two types of T-Kernel, an AMP-compatible one and a SMP-compatible one, coming into existence, we will go on to attempt their unification into a single specification as a multiprocessor-compatible T-Kernel. From the very start, T-Engine was not something to be tied to hardware, such as a CPU type, rather it came into existence as a platform on top of which software could be distributed just by recompiling. Hereafter also, we will move forward aiming at a platform on top of which we can distribute software without being influenced to the best of our ability by the differences in single processors and various AMP and SMP processors. For that reason, the multiprocessor-version of T-Kernel is being developed while stressing compatibility with present T-Kernels.

Test Manufacture of a Multiprocessor Version of T-Engine

As hardware for executing a multiprocessor-compatible T-Kernel and carrying out evaluation and verification, a T-Engine becomes necessary.

At present, in the YRP Ubiquitous Networking Laboratory, we have test manufactured µT-Engine/MP211, which is loaded with an NEC Electronics Corporation AMP-type multiprocessor, for use in the evaluation of an AMP-compatible T-Kernel (refer to "Column: AMP-Type, Multiprocessor-Loaded µT-Engine/MP211" on page 57).

In the future, we are planning to go on to develop T-Engines for other processors and a SMP-compatible T-Kernel.

Regarding the AMP-Compatible T-Kernel

I would like to explain the AMP-compatible T-Kernel in a little more detail. However, the AMP-compatible T-Kernel in still something that is in the middle of development, and thus because there could be cases where the items explained here differ with the final specification, please be careful.

Synchronization and Communication Function of AMP-Compatible T-Kernel

In the case of AMP, T-Kernel runs individually on top of each processor. If it's SMP, T-Kernel is one on the entire multiprocessor system, and thus if it's simple SMP compatibility, there is no distinction between SMP-compatible T-Kernel and single processor T-Kernel when looked at from the application program. However, in AMP-compatible T-Kernel, a function for synchronization and communications among the application programs that run on the individual T-Kernels becomes necessary.

Broadly divided, there are two AMP-compatible T-Kernel synchronization and communication functions.

First, as for the first one, it is synchronization at the time of system startup. In T-Kernel, tasks and synchronization and communications objects are dynamically created. If it's a single processor, the order of task creation can be decided. However, among different processors, this is not possible. What becomes most problematic is that it is not possible to know whether an object has already been created when a task tries to access an object on another processor after system startup. When we try to resolve this at the application level, there is nothing other than retrying until communications succeeds or to take sufficient time. This is not very efficient. For that reason, we have created an initialization synchronization function in AMP-compatible T-Kernel.

In AMP-compatible T-Kernel, before the system starts up and the first task is executed, special handlers for initialization are always executed. Inside these handlers, object creation system calls can be executed. And then, initialization handler execution termination in all the processors is synchronized. In other words, all the initialization handlers terminate, and then for the first time the execution of tasks begins. Accordingly, at the time of task execution, the existence of objects created by initialization handlers is guaranteed even outside the self process (Fig. 6).

Figure 6. Synchronous processing at startup time

The second is a synchronization and communication function during the execution of application programs. A synchronization and communication function at the time of processing, as a rule, is realized as communication among processors using memory that is shared among the processors. However, in AMP-compatible T-Kernel, we have placed importance on compatibility with conventional T-Kernel, and looking from the application program, we will not create a special synchronization and communication function. Instead, we have extended the synchronization and communication function of conventional T-Kernel so that it can be used also among processors.

For example, there is the semaphore as the representative synchronization and communication object. Tasks use T-Kernel system calls, and they can perform operations of acquisition and return of resources in regard to semaphores. In AMP-compatible T-Kernel, even though the semaphore is on a different processor from the task, it is possible to perform operations with the same system calls.

In T-Kernel, all objects, including tasks, are distinguished by means of ID numbers that are automatically assigned. In AMP-compatible T-Kernel, these ID numbers have become unique things system wide. When a system call that carries out an operation vis-à-vis a certain object is issued, that target object, in a case where it is on the same processor as the task that issued the system call, carries out the same processing as in the case of a single processor. However, in a case where the target object exists on another processor, communication between processors is automatically carried out, and the processing is carried out on the targeted processor (Fig. 7).

Figure 7. Synchronization and communication processing between processors

In AMP-compatible T-Kernel, we are not limited to semaphores, but rather it is possible to use almost all synchronization and communication objects, such as event flags and message buffers, among the processors.

Resolution of Object ID Numbers

In AMP-compatible T-Kernel also, objects are all created dynamically in the same manner as the conventional T-Kernel, and their ID numbers are automatically assigned at the time of their creation. What becomes a problem in AMP is the method for finding out the object ID numbers on top of the other processors. In using synchronization and communications also, first, the synchronization and communication object ID numbers become necessary. Accordingly, in AMP-compatible T-Kernel, a function has been created that acquires the ID numbers from the object location (domain) and name (object).

These domain and object names, from the point of view of compatibility, serve as optional attributes, and they can be added on to the necessary objects.

By attaching them to domains, an object access protection function has also been studied. In systems of the type where we use multiprocessors, it is predicted that the scale of programs will also become large, and the necessity of a protection function that didn't exist in the conventional T-Kernel is being closed up. Moreover, by means of the access range becoming clear, improvements in efficiency on implementations are also expected. For example, we can devise efficient implementations for objects that we know haven't been accessed from other processors.

Portability of the AMP Version of T-Kernel

One of the characteristics of T-Kernel is a high level of portability that is not dependent on special hardware. Single processor T-Kernel runs on various processors, such as ARM, MIPS, and SuperH. AMP-compatible T-Kernel, in the same manner, does not depend on a particular processor or architecture, rather it is being designed with stress on portability so that it can easily run on various types of multiprocessor systems.

In processing related to an AMP-compatible T-Kernel multiprocessor, the hardware dependent parts are collected together in a driver that communicates among the processors. This driver is a special driver that is embedded at a lower level of AMP-compatible T-Kernel, where it waits to receive low level communications among the processors. When a system call is issued in an AMP-compatible T-Kernel, the AMP-compatible T-Kernel judges whether its target object exists in a self process. If the target object exists on another processor, the AMP-compatible T-Kernel communicates with the AMP-compatible T-Kernel of the processor that will become the target using low level communications based on this driver. Actually, because the T-Kernel of each processor carries out operations in regard to objects, this portion of the processing is basically the same as a single processor (Fig. 8)

Figure 8. Communication scheme between processors

When porting AMP-compatible T-Kernel to new hardware, in regard to the multiprocessors, it is completed just by implementing in accordance with the hardware the driver for communicating among the processors.

The explanation of AMP-compatible T-Kernel mentioned up to here is based on a specification still under study. In the future, following progress in development, we would like to release to the public more detailed information.

Porting Software to AMP-Compatible T-Kernel

Because AMP-compatible T-Kernel has been developed while placing importance on compatibility with the existing T-Kernel, I think that people who have been developing with T-Kernel up to now can easily migrate to it. In addition, porting programs developed on top of T-Kernel to AMP-compatible T-Kernel will also be easy. However, because there also exist a few points to be careful about, I would like to briefly explain about AMP-compatible T-Kernel programs, centering on porting from single processor T-Kernel.

Porting from a Single Processor

We will now consider a case in which we will port a program that has been developed on a single processor system up to now to an AMP system that uses AMP-compatible T-Kernel.

In response to functions assigned to individual processors, we must divide the program, but if the original program is structured so that there is sufficient independence for each function, this is finished comparatively easily. In a case where memory and so on is shared across functions that we will divide, it is necessary to first resolve that. Recompiling is easy because with AMP-compatible T-Kernel there is compatibility at the system call level with single processor T-Kernel (Fig. 9).

Figure 9. Porting from a single processor (simple example)

However, normally, a program will not run satisfactorily just by means of that. The first thing that will become a problem is the point that task scheduling will be carried out by each processor.

When mutual exclusion based on priority level straddles processors, it becomes invalid. For example, on a single processor, while task A with the highest priority level was executing, there was no running of other tasks. Accordingly, even while task A was using resources shared with other tasks, there were no cases of being interfered with by other tasks. However, when it comes to multiprocessors, even while task A is executing, other tasks are running on other processors. In a case where task B on another processor shares the same resources as task A, the possibility arises that a resource collision could occur as a result of task A not carrying out mutual exclusion properly.

Disable dispatch and disable interrupt also are valid only in special processors. With a single processor, if a dispatch or interrupt ended up being excluded, there were no cases in which other tasks ran of their own accord, but this will not hold true in multiprocessors.

The method of resolving this is to perform mutual exclusion with shared resources by using, without fail, a synchronization and communication object.

Also, it depends on the processing order, but caution is necessary also for spots that were running on an execution base without particularly employing synchronization. On a single processor, the processing of task A may certainly finish faster than task B, but, on a multiprocessor system, one doesn't know whether that will be guaranteed or not. Employing synchronization by using synchronization and communication objects is necessary.

Even if the above are carried out nicely, execution efficiency in AMP is also another problem.

AMP-compatible T-Kernel can operate an object on another processor also in the same manner as an object on a self process, but the processing time clearly becomes slower in proportion to the overhead time of the communication between the processors. Also, for multiprocessor systems to demonstrate their power, it is necessary to keep to individual processors running simultaneously. Multiprocessing will end up losing its meaning through the frequent occurrence of waits for synchronization and communication between one task and the task of another processor, .

Raising the independence of each processor and decreasing the level of dependence among processors are important.

Porting in a Multiprocessor Environment

As for migrating a system already on an AMP multiprocessor environment to AMP-compatible T-Kernel, it is much easier than the case of a single processor to the extent that it already runs on a multiprocessor.

If the operating system you are already using is conventional T-Kernel, the synchronization and communication function among the processors should be an independently implemented one, and thus porting that part will become the main work.

In a case where you are directly carrying out communication among processors using shared memory, the same thing is possible in AMP-compatible T-Kernel also by securing memory with shared attributes across the processors. However, because mutual exclusion is not carried out with shared memory in AMP-compatible T-Kernel, using a semaphore, etc., and carrying out mutual exclusion is necessary.

In a case where you are carrying out delivery of data packets and stream through some method or other, mailboxes and message buffers can be applied. These functions are the same as those of T-Kernel.

In a case where you are carrying out communication among processors by means of a procedure call, such as a remote procedure call (RPC), a rendezvous port can be used. A rendezvous port is a function that realizes client-server model communications more flexibly. The synchronization of the procedure (rendezvous) call and the procedure (rendezvous) reception can be secured by means of T-Kernel. The rendezvous port is a function that exists in both T-Kernel and µITRON, but we could say that it will display its power in a distributed environment, such as a multiprocessor.

Porting from Another Operating System

One could say that porting from another operating system also is basically the same as porting from single processor T-Kernel, which I have described up to here. However, the phase of porting from that operating system to T-Kernel will be added.

Porting from µITRON to T-Kernel can be carried out comparatively easily, because of the fact that T-Kernel is an operating system that is a continuation of ITRON technology. In µITRON, studies on compatibility with multiprocessors and distributed environments have been carried out, but standard specifications do not exist. For those looking hard at future compatibility with multiprocessors, this might also be a good opportunity to migrate to T-Kernel.

In Closing

Multiprocessor systems are a new field in embedded systems. In particular, SMP-type systems that are drawing a lot of attention recently are the world of the future in embedded systems, as they have actual results indeed in information-type computers. At present, almost all the operating systems for embedding that have a reputation for SMP compatibility are ones in which the mechanisms of an information-type operating system were implemented as is. Even how far the real-time performance necessary in embedded systems can be drawn out in multiprocessors is unknown territory. However, it is a fact that the power of the operating system is necessary in drawing out and mastering the functions of the multiprocessors.

We are carrying out development of T-Kernel aiming at "the optimum multiprocessor-compatible real-time operating system for embedded systems." As I have introduced in this article, the situation is that a test manufactured multiprocessor version of T-Engine has been completed, and we are progressing steadily with testing and verification in actual environments. As we are planning to make public the results later, I would like you to take notice of this new age T-Kernel at that time.

____________________

Column

AMP-Type Multiprocessor Loaded µT-Engine/MP211

At the YRP Ubiquitous Networking Laboratory, we have test manufactured multiprocessor loaded µT-Engine/MP211 for T-Engine multiprocessor-compatible research and development.

On µT-Engine/MP211, we have loaded NEC Electronics Corp.'s MP211 for the processor. MP211 is an asymmetric multicore processor that has three CPU cores and a DSP loaded onto a single chip. The CPU cores use the ARM926, and 128 megabytes of main memory is built in inside the package.

For the board, we made it not the Standard T-Engine specification, but rather the small-scale µT-Engine so that it can be easily embedded in devices when we use it in various tests in the future. However, in terms of functions, we also loaded an LCD interface and audio input/output, and gave it almost the same performance as Standard T-Engine. The thinking is that graphic and audio functions also are necessary in verifying the performance of multiprocessor-compatible T-Kernel.

Also, the expansion bus we made compatible with the SH2/V850 expansion bus, and thus we made it so that it can be used together with an expansion board. On µT-Engine/MP211, there is no network interface, but by using this expansion board, it is possible to add one.

In the YRP Ubiquitous Networking Laboratory, at present, we are using µT-Engine/MP211 to carry out research and development of a multiprocessor-compatible T-Kernel that is compatible with AMP. In regard to the details, we would like to announce them soon.


The above article on T-Engine appeared on pages 48-57 in Vol. 99 of TRONWARE . It was translated and loaded onto this Web page with the permission of Personal Media Corporation.

Copyright © 2006 Personal Media Corporation

Copyright © 2006 Sakamura Laboratory, University Museum, University of Tokyo