Cyber Physical Computing

Department of Computer Science

The University of Illinois at Urbana Champaign

Partner

Sites

 

 

Home

Introduction

Research

People

Papers

Why Join Us?

 

For inquiries, please see contact info.


Feasible Region Calculus

 

Sponsor: NSF


 

1. Introduction

Timing properties of software (relating to its ability to meet timing constraints) have traditionally been analyzed using real-time scheduling theory. This theory is perhaps one of the most studied and well understood aspects of real-time computing. However, the current state of the art is far from complete.

Most prior research addressed small systems such as single processors or multiprocessor machines. In contrast, future computing systems will be marked by massive distribution. They will see increased convergence of computation, communication, and sensing, and will augment a widely-distributed physical environment, interacting with it under constraints of real time and space. Despite the very large volume of real-time computing literature that presents different analysis techniques for computing the ability of systems to meet time constraints (called schedulability analysis in real-time computing terminology), there is a lack of a fundamental theory that allows systematic analysis of distributed real-time systems by appropriately composing properties of their components. Consider, by contrast, the fundamental laws of circuit theory used to analyze linear electric circuits. In that theory, a small number of fundamental rules (e.g., Kirchhoff laws) allow a designer to analyze complex circuits of arbitrary interconnection topology, reducing them to their effective transfer functions and deducing their end-to-end characteristics, such as total impedance, current draw, and voltage drop. Feasible region calculus aims to develop a similar theory for distributed real-time computing based on very encouraging recent results from our group, which suggest rules for composition of temporal behavior of real-time system components.

Observe that the capacity of a distributed system is a complicated concept because the rate at which a distributed system can execute an application is a function of not only the speed and topology of the system but also the application flow graph. For example, an application with more internal parallelism will run faster on a given system than one that is primarily serialized. Furthermore, in a time-constrained system, tasks that finish late do not contribute useful work. Hence, it is important to quantify only the fraction of work that can be done on time. We call it the real-time capacity of the system. A mathematical foundation is being developed for composing real-time capacity expressions given application flow graphs, time constraints and resource topology. This foundation will utilize a small set of composition rules to arrive at the corresponding application-aware capacity expressions. We shall also consider the limits when system components become infinitesimal and the number of components needed for an application grows to infinity. In such cases, it will be more appropriate to consider resource density and liquid task flows rather than individual instances of resources and tasks. Capacity expressions will be re-written in terms of density and flow distributions. Such models will be suitable for future computing artifacts such as smart paint and amorphous computing systems where individual computing ``particles'' become insignificant and computations may involve very large numbers of components.


2. Utilization-Based Feasible Regions

One of the big gaps in the theoretical underpinnings of real-time computing have traditionally lied in the inability to relate aggregate metrics (such as utilization) to per-task temporal performance metrics (such as response time of individual tasks) except in some restricted special cases (such as that of scheduling periodic tasks). The first major result of feasible region calculus has been the derivation of regions, defined in the state space of resource utilization, that imply satisfaction of all individual temporal performance constraints. Utilization bounds for schedulability existed previously only for periodic tasks. However, the periodicity requirement was a limit on applicability. Moreover, no framework existed for composing the bounds of individual components to compute those of aggregate systems. Composition is complicated by the fact that in a multistage system a task can still meet its end-to-end deadline despite overload on some resources as long as others are underloaded. Hence, looking at resources in isolation may be misleading. Instead, a surface exists in a space whose dimensions are the utilizations of individual resources such that all utilization vectors below that surface result in schedulable systems. The shape of that surface is a key research problem addressed by feasible region calculus. 

Observe that most scheduling problems are NP-complete, which typically calls for heuristic solutions. We demonstrate that computationally tractable sufficient solutions to the general problem of temporal analysis of distributed resource constrained communicating tasks are also possible if the feasible region is appropriately defined. The following references constitute a brief tutorial on feasible region calculus.


The first basic result: a utilization bound for aperiodic tasks under fixed-priority scheduling. This result drops the periodicity requirement that prevented prior work from being applicable to a large category of systems with no regularity in task arrival patterns.


Extensions of the basic results to multiprocessors (these are derived mainly for the sake of completeness and intellectual curiosity).


Extensions that account for mutual exclusion (blocking). These allow modeling systems of non-independent tasks, in which semaphores or other synchronization constraints are used to regulate access to passive resources.


Extensions to resource pipelines. These basic results allow composing feasible regions of individual components to compute those of distributed systems.

  • Tarek Abdelzaher, Gautam Thaker, Patrick Lardieri, "A Feasible Region for Meeting Aperiodic End-to-end Deadlines in Resource Pipelines," IEEE International Conference on Distributed Computing Systems, Tokyo, Japan, March 2004.
  • William Hawkins and Tarek Abdelzaher, "Towards Feasible Region Calculus: An End-to-end Schedulability Analysis of Real-Time Multistage Execution," IEEE Real-time Systems Symposium, Miami, Florida, December 2005.

An application of feasible region calculus; computing the real-time capacity of a sensor network.

  • Tarek F. Abdelzaher, Shashi Prabh, Raghu Kiran, "On Real-time Capacity Limits of Multihop Wireless Sensor Networks," IEEE Real-time Systems Symposium, Lisbon, Portugal, December 2004.

2. Non-Utilization-Based Feasible Regions

The scheduling policy has a dramatic effect on the ability of a system to meet deadlines. Hence, feasible region calculus must allow composition under different scheduling policies. An interesting and important consideration is that often the scheduling policy is imposed on the application developer by legacy factors such as operating system support. Hence, the scheduling policy is often a constraint and not a design parameter. This is very different from the typical direction in real-time computing that focuses on analyzing a handful of ``good'' policies with no general framework for incorporating arbitrary (often suboptimal) scheduling policies into the analysis as design constraints. A significant contribution of out work therefore lies in its ability to analyze arbitrary scheduling policies within the same general framework for quantifying real-time capacity. To the best of our knowledge, ours is the first such general framework in real-time computing literature.

The incorporation of arbitrary scheduling policies into the capacity analysis has very important ramifications on the analytic engine used. Previous efforts have attempted to relate latency (or ability to meet deadlines) to system state (typically, resource utilization) in distributed systems. The implicit assumption has been that utilization is a good predictor of latency or the ability of a system to meet deadlines. Unfortunately, we have shown that, depending on the resource scheduling policy, utilization may or may not be a good predictor of ability to meet latency constraints. Instead, we developed an automated method for choosing, for any scheduling policy, a family of load metrics (not necessarily utilization) to represent system state such that it becomes possible to predict (based on any metric in that family) whether or not deadlines are met. Specifically, if the load, as measured by the new metrics, is within a certain range, it is assured that all distributed tasks finish on time. As before, composition rules are defined to determine feasible regions defined on non-utilization-based load metrics.

·         Xue Liu and Tarek Abdelzaher, "Non-utilization Bounds: On Bounding New Load Metrics for Schedulability Analysis," invited to ACM Transactions on Embedded Systems, 2006.


 

3. Sub-additive Delay Composition

 

The above line of research focused on extensions to the derivation of utilization bounds or utilization-like bounds for schedulability. These extensions addressed aperiodic tasks, new load metrics, and pipelines. The idea was to quantify feasible regions defined in the space of resource utilization such that end-to-end deadlines are met as long as the system operated within the feasible regions derived.

 

Feasible regions, like utilization bounds, however, are high-level derived constructs. A more basic and fundamental question is to ask: how does delay compose in distributed systems? Uniprocessor schedulability results are “easy” because delays on a single processor compose by addition. If task A is preempted by task B and task C, then it is preempted by the sum of B’s and C’s (worst case) computation times. This additive property is the foundation for schedulability analysis expressions for uniprocessors. The problem with distributed systems is that delays do not necessarily compose additively. For example, if task A is preempted by B and C on processors X and Y, where X and Y form a pipeline, then task A is not necessarily preempted by the sum of B’s and C’s (worst case) computation times on X and Y. This is due to the inherence concurrency in distributed systems. If A waits for B and C on processor X, then while A finally executes on X, B and C might execute on Y. Hence, A does not wait for all of B and C on Y. The delay composition is therefore sub-additive. The interesting question is, can we describe how delays compose in distributed systems? The following papers derive the basic delay composition theorems for resource pipelines and directed acyclic graphs:

 

  • Praveen Jayachandran and Tarek Abdelzaher, “A Delay Composition Theorem for Real-Time Pipelines,” Euromicro Conference on Real-Time Systems, Pisa, Italy, July 2007 (Best student paper award)
  • Praveen Jayachandran and Tarek Abdelzaher “Transforming Distributed Acyclic Systems into Equivalent Uniprocessors Under Preemptive and Non-Preemptive Scheduling,'' ECRTS Prague, Czech Republic, July 2008.
  • Praveen Jayachandran and Tarek Abdelzaher “Delay Composition in Preemptive and Non-preemptive Real-Time Pipelines,'' Invited to Journal of Real-Time Systems, 2008.

 

The above papers lead to the development of a delay composition algebra. It enables a new approach to schedulability analysis in distributed systems: namely, a reduction-based approach.  The algebra reduces distributed system workload to an equivalent uniprocessor workload that can be analyzed using uniprocessor schedulability analysis techniques to infer end-to-end delay and schedulability properties of each of the original distributed jobs.

 

The algebraic operators in the delay composition algebra perform reductions on a resource graph (representing a distributed system) into a smaller graph by merging nodes. Each node represents a resource in the system and is characterized by a load matrix that summarizes its workload. When nodes are merged in the original system, their workloads are merged as well into an equivalent workload for the resulting node. The main function of the algebraic operators of delay composition algebra is to perform workload reductions as nodes merge. The reductions stop when only one node remains, meaning that the distributed system workload is fully reduced to that of an equivalent uniprocessor. Analyzing the workload on the resulting uniprocessor using any of the standard uniprocessor schedulability analysis techniques determines the schedulability of the original distributed task set.

 

A key consideration in developing operators of the new algebra is that the operand set must be closed under composition. In other words, both the operands and the result of each operator must have the same structure. As mentioned above, the operands refer to the load matrices of the nodes being merged. The result is the load matrix of the resulting merged node. Since merged nodes that result from composition represent more than one physical node, the workload representation for a node should be the same whether this node represents a single resource or the result of merging an entire subsystem. A key contribution of this project has been the attainment of such a general representation (the load matrix of a node) and the derivation of simple algebraic operators that manipulate this representation to reduce distributed systems workload into a single-node workload that is equivalent from a schedulability perspective.

 

Observe that if the operand matrices include variables (e.g., task computation times that are yet unspecified), applying our algebraic operators to reduce the distributed system to a single node yields equivalent uniprocessor load expressions that are a function of those variables. Furthermore, applying schedulability tests (e.g., uniprocessor utilization bounds) to the equivalent load yields schedulability expressions that are a direct function of the original load variables of the distributed system.  This makes it possible to quantify the distributed system parameter space for which schedulability is attained.

 

In summary, existing techniques for analyzing delay and schedulability of jobs in distributed systems can be broadly classified into two categories: (i) decomposition-based, and (ii) extension-based. The decomposition-based techniques break the system into multiple subsystems, analyze each subsystem independently using current uniprocessor analysis techniques, then combine the results. The extension-based techniques explore ways to extend current uniprocessor analyses to accommodate distributed tasks and resources. In contrast, we developed a third category of techniques for analyzing distributed systems based on reduction (as opposed to decomposition or extension). Rather than breaking up the problem into subproblems, or extending uniprocessor analyses to more complex systems, we systematically reduce the distributed system schedulability problem to a single simple problem on a uniprocessor. The uniprocessor problem is then solved thereby determining the schedulability of the original distributed system. The following paper describes the delay composition algebra:

 

  • Praveen Jayachandran and Tarek Abdelzaher, “Delay Composition Algebra: A Reduction-based Schedulability Algebra for Distributed Real-Time Systems,” IEEE Real-time Systems Symposium, Barcelona, Spain, December 2008. (Currently nominated for best paper)