How Task Scheduling Works

The scheduler evaluates a task graph. The graph is a directed graph where each node is a task. Each task points to its successor, which is another task that is waiting on it to complete, or NULL. Method task::parent() gives you read-only access to the successor pointer. Each task has a refcount that counts the number of tasks that have it as a successor. The graph evolves over time.

Snapshot of Task Graph for the Fibonacci Example

The figure above shows a snapshot of a task graph for the Fibonacci example where:

The scheduler runs tasks in a way that tends to minimize both memory demands and cross-thread communication. The intuition is that a balance must be reached between depth-first and breadth-first execution. Assuming that the tree is finite, depth-first is best for sequential execution for the following reasons:

Though breadth-first execution has a severe problem with memory consumption, it does maximize parallelism if you have an unlimited number of physical threads. Since physical threads are limited, it is better to use only enough breadth-first execution to keep the available processors busy.

The scheduler implements a hybrid of depth-first and breadth-first execution. Each thread has its own deque[8] of tasks that are ready to run. When a thread spawns a task, it pushes it onto the bottom of its deque. The following figure shows a snapshot of a thread's deque that corresponds to the task graph in figure above.

A Thread's Deque

When a thread participates in task graph evaluation, it continually executes a task obtained by the first rule below that applies:

  1. Get the task returned by method execute for the previous task. This rule does not apply if execute returned NULL.

  2. Pop a task from the bottom of its own deque. This rule does not apply if the deque is empty.

  3. Steal a task from the top of another randomly chosen deque. If the chosen deque is empty, the thread tries this rule again until it succeeds.

Rule 1 is discussed in Scheduler Bypass. The overall effect of rule 2 is to execute the youngest task spawned by the thread, which causes depth-first execution until the thread runs out of work. Then rule 3 applies. It steals the oldest task spawned by another thread, which causes temporary breadth-first execution that converts potential parallelism into actual parallelism.

Getting a task is always automatic; it happens as part of task graph evaluation. Putting a task into a deque can be explicit or implicit. A thread always pushes a task onto the bottom of its own deque, never another thread's deque. Only theft can transfer a task spawned by one thread to another thread.

There are three conditions that cause a thread to push a task onto its deque:

The example in Fibonacci Numbers does not have implicit pushing, because it explicitly waits for children to complete. It uses set_ref_count(3) for a task having only two children. The extra reference protects the successor from being implicitly pushed. Continuation Passing has a similar example that employs implicit pushing. It uses set_ref_count(2) for a task having two children, so that that task executes automatically when the children complete.

To summarize, the task scheduler's fundamental strategy is 'breadth-first theft and depth-first work". The breadth-first theft rule raises parallelism sufficiently to keep threads busy. The depth-first work rule keeps each thread operating efficiently once it has sufficient work to do.

See Also

[8] Double-ended queue.