For what purposes are multithreaded systems used? Multithreaded Application Architectures

Earlier posts talked about multithreading in Windows using CreateThread and other WinAPI, as well as multithreading in Linux and other *nix systems using pthreads. If you write in C++11 or later, you have access to std::thread and other threading primitives introduced in that language standard. Next we will show you how to work with them. Unlike WinAPI and pthreads, code written in std::thread is cross-platform.

Note: The above code was tested on GCC 7.1 and Clang 4.0 under Arch Linux, GCC 5.4 and Clang 3.8 under Ubuntu 16.04 LTS, GCC 5.4 and Clang 3.8 under FreeBSD 11, as well as Visual Studio Community 2017 under Windows 10. CMake before version 3.8 cannot speak the compiler to use the C++17 standard specified in the project properties. How to install CMake 3.8 on Ubuntu 16.04. For code to compile using Clang, the libc++ package must be installed on *nix systems. For Arch Linux the package is available on AUR. Ubuntu has the libc++-dev package, but you may encounter a problem that prevents the code from building easily. Workround is described on StackOverflow. On FreeBSD, to compile the project you need to install the cmake-modules package.

Mutexes

Below is simplest example using threads and mutexes:

#include
#include
#include
#include

Std::mutex mtx;
static int counter = 0 ;


for (;; ) (
{
std::lock_guard< std:: mutex >lock(mtx) ;

break ;
int ctr_val = ++ counter;
std::cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}

}
}

int main() (
std::vector< std:: thread >threads;
for (int i = 0 ; i< 10 ; i++ ) {


}

// can"t use const auto& here since .join() is not marked const

thr.join();
}

Std::cout<< "Done!" << std:: endl ;
return 0 ;
}

Note the wrapping of std::mutex in std::lock_guard in accordance with the RAII idiom. This approach ensures that the mutex will be released when exiting the scope in any case, including when exceptions occur. To capture several mutexes at once in order to prevent deadlocks, there is the std::scoped_lock class. However, it only appeared in C++17 and therefore may not work everywhere. For earlier versions of C++, there is a template std::lock that is similar in functionality, although it requires writing additional code to correctly release locks using RAII.

RWLock

A situation often arises in which an object is accessed more often by reading than by writing. In this case, instead of a regular mutex, it is more efficient to use a read-write lock, also known as RWLock. RWLock can be held by several reading threads at once, or by just one writing thread. RWLock in C++ corresponds to the classes std::shared_mutex and std::shared_timed_mutex:

#include
#include
#include
#include

// std::shared_mutex mtx; // will not work with GCC 5.4
std::shared_timed_mutex mtx;

static int counter = 0 ;
static const int MAX_COUNTER_VAL = 100 ;

void thread_proc(int tnum) (
for (;; ) (
{
// see also std::shared_lock
std::unique_lock< std:: shared_timed_mutex >lock(mtx) ;
if (counter == MAX_COUNTER_VAL)
break ;
int ctr_val = ++ counter;
std::cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}

int main() (
std::vector< std:: thread >threads;
for (int i = 0 ; i< 10 ; i++ ) {
std:: thread thr(thread_proc, i) ;
threads.emplace_back(std::move(thr));
}

for (auto & thr : threads) (
thr.join();
}

Std::cout<< "Done!" << std:: endl ;
return 0 ;
}

By analogy with std::lock_guard, the classes std::unique_lock and std::shared_lock are used to capture RWLock, depending on how we want to capture the lock. The std::shared_timed_mutex class appeared in C++14 and works on all* modern platforms (not to mention mobile devices, game consoles, and so on). Unlike std::shared_mutex, it has methods try_lock_for, try_lock_unti and others that attempt to lock the mutex within a given time. I strongly suspect that std::shared_mutex must be cheaper than std::shared_timed_mutex. However, std::shared_mutex appeared only in C++17, which means it is not supported everywhere. In particular, the still widely used GCC 5.4 does not know about it.

Thread Local Storage

Sometimes you need to create a variable, like a global one, but which only one thread can see. Other threads also see the variable, but for them it has its own local meaning. For this they came up with Thread Local Storage, or TLS (has nothing to do with Transport Layer Security!). Among other things, TLS can be used to significantly speed up the generation of pseudorandom numbers. Example of using TLS in C++:

#include
#include
#include
#include

Std::mutex io_mtx;
thread_local int counter = 0 ;
static const int MAX_COUNTER_VAL = 10 ;

void thread_proc(int tnum) (
for (;; ) (
counter++ ;
if (counter == MAX_COUNTER_VAL)
break ;
{
std::lock_guard< std:: mutex >lock(io_mtx) ;
std::cout<< "Thread " << tnum << ": counter = " <<
counter<< std:: endl ;
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}

int main() (
std::vector< std:: thread >threads;
for (int i = 0 ; i< 10 ; i++ ) {
std:: thread thr(thread_proc, i) ;
threads.emplace_back(std::move(thr));
}

for (auto & thr : threads) (
thr.join();
}

Std::cout<< "Done!" << std:: endl ;
return 0 ;
}

The mutex here is used solely to synchronize output to the console. No synchronization is required to access thread_local variables.

Atomic variables

Atomic variables are often used to perform simple operations without the use of mutexes. For example, you need to increment a counter from multiple threads. Instead of wrapping int in std::mutex, it is more efficient to use std::atomic_int. C++ also offers the types std::atomic_char, std::atomic_bool and many others. Lock-free algorithms and data structures are also implemented using atomic variables. It is worth noting that they are very difficult to develop and debug, and do not work faster than similar algorithms and data structures with locks on all systems.

Sample code:

#include
#include
#include
#include
#include

static std:: atomic_int atomic_counter(0) ;
static const int MAX_COUNTER_VAL = 100 ;

Std::mutex io_mtx;

void thread_proc(int tnum) (
for (;; ) (
{
int ctr_val = ++ atomic_counter;
if (ctr_val >= MAX_COUNTER_VAL)
break ;

{
std::lock_guard< std:: mutex >lock(io_mtx) ;
std::cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}

int main() (
std::vector< std:: thread >threads;

int nthreads = std::thread::hardware_concurrency();
if (nthreads == 0 ) nthreads = 2 ;

for (int i = 0 ; i< nthreads; i++ ) {
std:: thread thr(thread_proc, i) ;
threads.emplace_back(std::move(thr));
}

for (auto & thr : threads) (
thr.join();
}

Std::cout<< "Done!" << std:: endl ;
return 0 ;
}

Note the use of the hardware_concurrency procedure. It returns an estimate of the number of threads that can be executed in parallel on the current system. For example, on a machine with a quad-core processor that supports hyper threading, the procedure returns the number 8. The procedure can also return zero if the evaluation could not be made or the procedure is simply not implemented.

Some information about the operation of atomic variables at the assembler level can be found in the article Cheat Sheet for Basic x86/x64 Assembly Instructions.

Conclusion

As far as I can see, this all works really well. That is, when writing cross-platform applications in C++, you can safely forget about WinAPI and pthreads. In pure C, since C11, there are also cross-platform threads. But they are still not supported by Visual Studio (I checked), and are unlikely to ever be supported. It's no secret that Microsoft doesn't see any interest in developing support for the C language in its compiler, preferring to focus on C++.

There are still many primitives left behind the scenes: std::condition_variable(_any), std::(shared_)future, std::promise, std::sync and others. I recommend cppreference.com to check them out. It might also be worth reading the book C++ Concurrency in Action. But I must warn you that it is no longer new, contains a lot of water, and in essence retells a dozen articles from cppreference.com.

The full version of the source code for this note, as usual, is on GitHub. How do you currently write multi-threaded applications in C++?

Which topic raises the most questions and difficulties for beginners? When I asked teacher and Java programmer Alexander Pryakhin about this, he immediately answered: “Multithreading.” Thanks to him for the idea and help in preparing this article!

We will look into the inner world of the application and its processes, we will understand what the essence of multithreading is, when it is useful and how to implement it - using Java as an example. If you're learning another OOP language, don't worry: the basic principles are the same.

About streams and their sources

To understand multithreading, let's first understand what a process is. A process is a piece of virtual memory and resources that the OS allocates to run a program. If you open several instances of one application, the system will allocate a process for each. In modern browsers, a separate process can be responsible for each tab.

You've probably come across the Windows "Task Manager" (in Linux it's the "System Monitor") and you know that unnecessary running processes load the system, and the heaviest of them often freeze, so they have to be terminated forcibly.

But users love multitasking: don’t feed them bread - let them open a dozen windows and jump back and forth. There is a dilemma: you need to ensure the simultaneous operation of applications and at the same time reduce the load on the system so that it does not slow down. Let's say the hardware cannot keep up with the needs of the owners - the issue needs to be resolved at the software level.

We want the processor to be able to execute more commands and process more data per unit time. That is, we need to fit more executed code into each time slice. Think of a unit of code execution as an object - this is a thread.

It is easier to approach a complex task if you break it down into several simple ones. The same is true when working with memory: a “heavy” process is divided into threads that take up fewer resources and deliver the code to the computer more quickly (see below for how exactly).

Every application has at least one process, and every process has at least one thread, which is called the main thread and from which new ones are launched if necessary.

Difference between threads and processes

    Threads use memory allocated for a process, and processes require a separate space in memory. Therefore, threads are created and terminated faster: the system does not need to allocate new address space to them each time and then release it.

    Processes each work with their own data - they can exchange something only through the mechanism of interprocess interaction. Threads access each other’s data and resources directly: what one changes is immediately available to everyone. A thread can control its “brethren” in the process, while a process controls exclusively its “daughters”. Therefore, switching between streams is faster and communication between them is organized more easily.

What is the conclusion from this? If you need to process a large amount of data as quickly as possible, break it into chunks that can be processed by separate threads, and then put the result together. This is better than creating resource-hungry processes.

But why does such a popular application as Firefox go the route of creating multiple processes? Because it is for the browser that isolated operation of tabs is reliable and flexible. If something is wrong with one process, it is not necessary to terminate the entire program - it is possible to save at least part of the data.

What is multithreading

Now we come to the main thing. Multithreading is when the application process is divided into threads that are processed in parallel - in one unit of time - by the processor.

The computing load is shared between two or more cores so that the interface and other program components do not slow down each other.

Multi-threaded applications can also be run on single-core processors, but then the threads are executed in turn: the first one worked, its state was saved - the second one was allowed to work, the threads were saved - they returned to the first one or launched the third one, etc.

Busy people complain that they only have two hands. Processes and programs can have as many hands as needed to complete a task as quickly as possible.

Wait for the signal: synchronization in multi-threaded applications

Imagine multiple threads trying to modify the same data area at the same time. Whose changes will ultimately be accepted and whose changes will be reversed? To avoid confusion when working with shared resources, threads need to coordinate their actions. To do this, they exchange information using signals. Each thread tells the others what it is doing now and what changes to expect. This way, data from all threads about the current state of resources is synchronized.

Basic synchronization tools

Mutual exclusion (mutual exclusion, abbreviated as mutex) - a “flag” that passes to the thread that currently has the right to work with shared resources. Prevents other threads from accessing the occupied memory area. There can be several mutexes in an application, and they can be shared between processes. There is a catch: mutex forces the application to access the operating system kernel every time, which is expensive.

Semaphore - allows you to limit the number of threads accessing a resource at a given moment. This will reduce the CPU load when executing code that has bottlenecks. The problem is that the optimal number of threads depends on the user's machine.

Event - you define a condition upon the occurrence of which control is transferred to the desired thread. Threads exchange data about events in order to develop and logically continue each other’s actions. One received the data, the other checked its correctness, the third saved it to the hard drive. Events vary in how they are signaled. If you need to notify several threads about an event, you will have to manually set the cancel function to stop the signal. If there is only one target thread, you can create an event with automatic reset. It will stop the signal itself after it reaches the stream. For flexible flow control, events can be queued.

Critical section - a more complex mechanism that combines a loop counter and a semaphore. The counter allows you to delay the start of the semaphore for the desired time. The advantage is that the kernel is used only if the section is busy and the semaphore needs to be turned on. The rest of the time the thread runs in user mode. Alas, the section can only be used within one process.

How to Implement Multithreading in Java

The Thread class is responsible for working with threads in Java. To create a new thread to perform a task means to create an instance of the Thread class and associate it with the desired code. This can be done in two ways:

    subclass Thread;

    implement the Runnable interface in your class, and then pass instances of the class to the Thread constructor.

While we will not touch upon the topic of deadlock situations, when threads block each other’s work and freeze, we will leave this for the next article. Now let’s move on to practice.

Example of multithreading in Java: ping pong with mutexes

If you think something terrible is about to happen, breathe out. We will look at working with synchronization objects almost in a game form: two threads will be transferred with a mutex. But in essence, you will see a real application, where at one time only one thread can process public data.

First, let's create a class that inherits the properties of Thread already known to us, and write a “kickBall” method:

Public class PingPongThread extends Thread( PingPongThread(String name)( this.setName(name); // override the thread name ) @Override public void run() ( Ball ball = Ball.getBall(); while(ball.isInGame()) ( kickBall(ball); ) ) private void kickBall(Ball ball) ( if(!ball.getSide().equals(getName()))( ball.kick(getName()); ) ) )

Now let's take care of the ball. Ours will not be simple, but memorable: so that it can tell who hit it, from which side and how many times. To do this, we use a mutex: it will collect information about the work of each thread - this will allow isolated threads to communicate with each other. After the 15th hit, we will take the ball out of play so as not to seriously injure it.

Public class Ball ( private int kicks = 0; private static Ball instance = new Ball(); private String side = ""; private Ball()() static Ball getBall())( return instance; ) synchronized void kick(String playername) ( kicks++; side = playername; System.out.println(kicks + " " + side); ) String getSide())( return side; ) boolean isInGame())( return (kicks< 15); } }

And now two player threads enter the scene. Let's call them, without further ado, Ping and Pong:

Public class PingPongGame ( PingPongThread player1 = new PingPongThread("Ping"); PingPongThread player2 = new PingPongThread("Pong"); Ball ball; PingPongGame())( ball = Ball.getBall(); ) void startGame() throws InterruptedException ( player1 .start(); player2.start(); ) )

“The stadium is full of people - it’s time to start the match.” Let's announce the opening of the meeting officially - in the main class of the application:

Public class PingPong ( public static void main(String args) throws InterruptedException ( PingPongGame game = new PingPongGame(); game.startGame(); ) )

As you can see, there is nothing mind-blowing here. This is just an introduction to multithreading, but you already have an idea of ​​how it works and can experiment - limiting the duration of the game not by the number of hits, but by time, for example. We will return to the topic of multithreading later - we will look at the java.util.concurrent package, the Akka library and the volatile mechanism. Let's also talk about implementing multithreading in Python.

Threads and processes are related concepts in computing. Both are a sequence of instructions that must be executed in a specific order. Instructions in separate threads or processes, however, can be executed in parallel.

Processes exist in the operating system and correspond to what users see as programs or applications. A thread, on the other hand, exists within a process. For this reason, threads are sometimes called "lightweight processes". Each process consists of one or more threads. The existence of multiple processes allows a computer to perform multiple tasks "simultaneously." The existence of multiple threads allows a process to share work for parallel execution. On a multiprocessor computer, processes or threads can run on different processors. This allows for truly parallel work.

Absolutely parallel processing is not always possible. Threads sometimes need to be synchronized. One thread may be waiting for another thread's result, or one thread may need exclusive access to a resource that is being used by another thread. Synchronization issues are a common cause of errors in multi-threaded applications. Sometimes a thread can end up waiting for a resource that will never be available. This ends in a condition called deadlock.

The first thing you need to learn is the process consists of at least one thread. In the OS, each process has an address space and a single control thread. In fact, this is what defines the process.

On the one side, the process can be viewed as a way of combining related resources into one group. A process has an address space containing program text and data, as well as other resources. Resources include open files, child processes, unhandled alarm messages, signal handlers, accounting information, and more. It is much easier to manage resources by combining them in the form of a process.

On the other side, a process can be viewed as a stream of executable commands or simply a thread. A thread has a program counter that keeps track of the order in which actions are executed. It has registers that store current variables. It has a stack containing the process execution log, where a separate frame is allocated for each procedure that has been called but has not yet returned. Although a thread must execute within a process, a distinction must be made between the concepts of thread and process. Processes are used to group resources, and threads are objects that take turns executing on the CPU.

The concept of threads adds to the process model the ability to simultaneously execute several programs in the same process environment, sufficiently independent. Multiple threads running in parallel in one process is the same as multiple processes running in parallel on the same computer. In the first case, threads share address space, open files, and other resources. In the second case, processes share physical memory, disks, printers, and other resources. Threads have some of the properties of processes, so they are sometimes called lightweight processes. Term multithreading also used to describe the use of multiple threads in a single process.

Any the stream consists of two components:

kernel object, through which the operating system controls the flow. Statistical information about the thread is also stored there (additional threads are also created by the kernel);
thread stack, which contains the parameters of all functions and local variables needed by the thread to execute the code.

Let's draw a line: The main difference between processes and threads, is that processes are isolated from each other, so they use different address spaces, and threads can use the same space (within the process) while performing actions without interfering with each other. This is what it's all about convenience of multi-threaded programming: by dividing the application into several sequential threads, we can increase performance, simplify the user interface and achieve scalability (if your application is installed on a multiprocessor system, executing threads on different processors, your program will work at amazing speed =)).

1. Thread determines the sequence of code execution in a process.

2. The process does not execute anything, it simply serves as a thread container.

3. Threads are always created in the context of a process, and their entire life passes only within its boundaries.

4. Threads can execute the same code and manipulate the same data, and also share handles to kernel objects, since the handle table is created not in separate threads, but in processes.

5. Since threads consume significantly less resources than processes, try to solve your problems by using additional threads and avoid creating new processes (but approach this wisely).

Multitasking(English) multitasking) - the property of an operating system or programming environment to provide the possibility of parallel (or pseudo-parallel) processing of several processes. True multitasking of the operating system is possible only in distributed computing systems.

File:Screenshot of Debian (Release 7.1, "Wheezy") running the GNOME desktop environment, Firefox, Tor, and VLC Player.jpg

The desktop of a modern operating system, reflecting the activity of several processes.

There are 2 types of multitasking:

· Process multitasking(based on processes - simultaneously running programs). Here, a program is the smallest piece of code that can be controlled by the operating system scheduler. It is better known to most users (working in a text editor and listening to music).

· Threaded multitasking(thread based). The smallest element of managed code is a thread (one program can perform 2 or more tasks simultaneously).

Multithreading is a specialized form of multitasking.

· 1 Properties of the multitasking environment

· 2 Difficulties in implementing a multitasking environment

· 3 History of multitasking operating systems

· 4 Types of pseudo-parallel multitasking

o 4.1 Non-preemptive multitasking

o 4.2 Collaborative or cooperative multitasking

o 4.3 Preemptive or priority multitasking (real time)

· 5 Problematic situations in multitasking systems

o 5.1 Starvation

o 5.2 Race condition

· 7 Notes

Properties of a multitasking environment[edit | edit source text]

Primitive multitasking environments provide pure “resource sharing”, when each task is assigned a certain area of ​​​​memory, and the task is activated at strictly defined time intervals.

More advanced multitasking systems allocate resources dynamically, with a task starting in memory or leaving memory depending on its priority and system strategy. This multitasking environment has the following features:

· Each task has its own priority, according to which it receives processor time and memory

· The system organizes task queues so that all tasks receive resources, depending on the priorities and strategy of the system

· The system organizes interrupt processing, by which tasks can be activated, deactivated and deleted

· At the end of the specified time slice, the kernel temporarily transfers the task from the running state to the ready state, giving resources to other tasks. If there is insufficient memory, pages of unexecuted tasks can be pushed out to disk (swapping), and then after a time determined by the system, restored in memory

· The system protects the task address space from unauthorized interference of other tasks

· The system protects the address space of its kernel from unauthorized task interference

· The system recognizes failures and freezes of individual tasks and stops them

· The system resolves conflicts of access to resources and devices, avoiding deadlock situations of general freezing from waiting for blocked resources

· The system guarantees each task that sooner or later it will be activated

· The system processes real-time requests

· The system provides communication between processes

Difficulties in implementing a multitasking environment[edit | edit source text]

The main difficulty in implementing a multitasking environment is its reliability, expressed in memory protection, handling of failures and interruptions, protection from freezes and deadlocks.

In addition to being reliable, the multitasking environment must be efficient. The expenditure of resources on its maintenance should not: interfere with processes, slow down their work, or sharply limit memory.

Multithreading- a property of a platform (for example, an operating system, a virtual machine, etc.) or an application, consisting in the fact that a process spawned in the operating system can consist of several streams, executed “in parallel,” that is, without a prescribed order in time. When performing some tasks, such division can achieve more efficient use of computer resources.

Such streams also called threads of execution(from English thread of execution); sometimes called "threads" (literal translation of English. thread) or informally "threads".

The essence of multithreading is quasi-multitasking at the level of one executable process, that is, all threads are executed in the address space of the process. In addition, all threads of a process not only have a common address space, but also common file descriptors. A running process has at least one (main) thread.

Multithreading (as a programming doctrine) should not be confused with either multitasking or multiprocessing, despite the fact that operating systems that implement multitasking usually also implement multithreading.

The advantages of multithreading in programming include the following:

· Simplification of the program in some cases due to the use of a common address space.

· Less time spent on creating a thread compared to the process.

· Increasing process performance by parallelizing processor calculations and I/O operations.

· 1 Types of thread implementations

· 2 Interaction of threads

· 3 Criticism of terminology

· 6 Notes

Types of thread implementation[edit | edit source text]

· Flow in user space. Each process has a thread table similar to the kernel process table.

The advantages and disadvantages of this type are as follows: Disadvantages

1. No timer interrupt within one process

2. When you use a blocking system request for a process, all of its threads are blocked.

3. Complexity of implementation

· Flow in kernel space. Along with the process table, there is a thread table in kernel space.

· "Fibers" fibers). Multiple user-mode threads running in a single kernel-mode thread. The kernel space thread consumes noticeable resources, most notably physical memory and the kernel mode address range for the kernel mode stack. Therefore, the concept of “fiber” was introduced - a lightweight thread that runs exclusively in user mode. Each thread can have multiple "fibers".

Interaction of threads[edit | edit source text]

In a multi-threaded environment, problems often arise when concurrent threads share the same data or devices. To solve such problems, thread interaction methods such as mutual exclusions (mutexes), semaphores, critical sections and events are used.

· Mutex (mutex) is a synchronization object that is set to a special signal state when not occupied by any thread. Only one thread owns this object at any time, hence the name of such objects (from English mut usually ex inclusive access - mutually exclusive access) - simultaneous access to a shared resource is excluded. After all necessary actions are completed, the mutex is released, allowing other threads to access the shared resource. An object can maintain a recursive acquisition a second time by the same thread, incrementing the counter without blocking the thread, and requiring subsequent release multiple times. This is, for example, the critical section in Win32. However, there are also implementations that do not support this and result in a thread deadlock when attempting a recursive capture. This is FAST_MUTEX in the Windows kernel.

· Semaphores are available resources that can be acquired by multiple threads at the same time until the resource pool is empty. Then additional threads must wait until the required amount of resources is available again. Semaphores are very efficient because they allow simultaneous access to resources. A semaphore is a logical extension of a mutex - a semaphore with a count of 1 is equivalent to a mutex, but the count can be greater than 1.

· Events. An object that stores 1 bit of information “signaled or not”, over which the operations “signal”, “reset to unsignaled state” and “wait” are defined. Waiting on a signaled event is the absence of an operation with immediate continuation of the thread's execution. Waiting on an unsignaled event causes the thread to suspend execution until another thread (or the second phase of the interrupt handler in the OS kernel) signals the event. It is possible to wait for several events in the “any” or “all” modes. It is also possible to create an event that is automatically reset to an unsignaled state after waking up the first - and only - waiting thread (such an object is used as the basis for implementing a “critical section” object). They are actively used in MS Windows, both in user mode and in kernel mode. There is a similar object in the Linux kernel called kwait_queue.

· Critical sections provide synchronization similar to mutexes, except that the objects representing critical sections are accessible within the same process. Events, mutexes, and semaphores can also be used in a single-process application, but critical section implementations in some operating systems (such as Windows NT) provide a faster and more efficient mutually exclusive synchronization mechanism—the "get" and "release" operations on the critical section are optimized for the case of a single thread (no competition) in order to avoid any system calls leading to the OS kernel. Like mutexes, an object that represents a critical section can only be used by one thread at a time, making them extremely useful for delimiting access to shared resources.

· Conditional variables (condvars). They are similar to events, but are not objects that occupy memory - only the address of the variable is used, the concept of “variable contents” does not exist, the address of an arbitrary object can be used as a condition variable. Unlike events, setting a condition variable to a signaled state has no consequences if there are no threads currently waiting on the variable. Setting an event in a similar case entails storing a "signaled" state within the event itself, after which subsequent threads wishing to wait for the event continue executing immediately without stopping. To fully use such an object, the operation “release the mutex and wait for the condition variable atomically” is also necessary. Actively used in UNIX-like operating systems. Discussions about the advantages and disadvantages of events and condition variables are a prominent part of the discussions about the advantages and disadvantages of Windows and UNIX.

· IO completion port (IOCP). Implemented in the OS kernel and accessible through system calls, the “queue” object with the operations “put a structure at the tail of the queue” and “take the next structure from the head of the queue” - the last call suspends the execution of the thread if the queue is empty and until another thread will not make the "put" call. The most important feature of IOCP is that structures can be placed into it not only by an explicit system call from user mode, but also implicitly within the OS kernel as a result of completing an asynchronous I/O operation on one of the file descriptors. To achieve this effect, you must use the "associate file descriptor with IOCP" system call. In this case, the structure placed in the queue contains the error code of the I/O operation, and also, if this operation is successful, the number of bytes actually entered or output. The completion port implementation also limits the number of threads executing on a single processor/core after receiving a structure from the queue. The object is specific to MS Windows, and allows the processing of incoming connection requests and data chunks in server software in an architecture where the number of threads can be less than the number of clients (there is no requirement to create a separate thread with resource costs for each new client).

· ERESOURCE. A mutex that supports recursive acquisition, with shared or exclusive acquisition semantics. Semantics: an object can be either free, or owned by an arbitrary number of threads in a shared manner, or owned by just one thread in an exclusive manner. Any attempts to make grabs that violate this rule result in the thread blocking until the object is released so that the grab is allowed. There are also operations like TryToAcquire - it never blocks the thread, it either acquires it, or (if locking is needed) it returns FALSE without doing anything. It is used in the Windows kernel, especially in file systems - for example, any disk file opened by someone is associated with an FCB structure, in which there are 2 such objects for synchronizing access to the file size. One of them, the paging IO resource, is captured exclusively in the file's truncated path, and ensures that there is no active cache or memory-mapped I/O on the file at the time of truncation.

· Rundown protection. A semi-documented (calls are present in header files, but not in the documentation) object in the Windows kernel. A counter with the operations “increase”, “decrease” and “wait”. The wait blocks the thread until the decrement operations decrement the counter to zero. Additionally, an increment operation can fail, and having a currently active timeout causes all increment operations to fail.

Clay Breshears

Introduction

Intel's multithreading implementation methods include four main stages: analysis, design and implementation, debugging, and performance tuning. This is the approach used to create a multi-threaded application from sequential code. Working with software during the implementation of the first, third and fourth stages is covered quite widely, while information on the implementation of the second step is clearly not enough.

Many books have been published on parallel algorithms and parallel computing. However, these publications mainly deal with message passing, distributed memory systems, or theoretical parallel computing models that are sometimes inapplicable to real multi-core platforms. If you're ready to get serious about multithreaded programming, you'll probably need knowledge about developing algorithms for these models. Of course, the use of these models is quite limited, so many software developers may still have to implement them in practice.

Without exaggeration, we can say that the development of multi-threaded applications is first and foremost a creative activity, and only then a scientific activity. In this article, you'll learn eight simple rules that will help you expand your base of parallel programming practices and improve the efficiency of implementing thread computing in your applications.

Rule 1. Highlight the operations performed in the program code independently of each other

Parallel processing is applicable only to those sequential code operations that are performed independently of each other. A good example of how actions independent of each other lead to a real single result is the construction of a house. It involves workers of many specialties: carpenters, electricians, plasterers, plumbers, roofers, painters, masons, landscapers, etc. Of course, some of them cannot start working before others finish their work (for example, roofers will not start work until the walls are built, and painters will not paint those walls unless they are plastered). But in general we can say that all people involved in construction act independently of each other.

Let's consider another example - the work cycle of a DVD rental store, which receives orders for certain films. Orders are distributed among the station workers, who search for these films in the warehouse. Naturally, if one of the workers takes a disc from a warehouse on which a film with Audrey Hepburn is recorded, this will in no way affect another worker looking for the next action movie with Arnold Schwarzenegger, and certainly will not affect their colleague who is looking for discs with new season of Friends. In our example, we assume that all out-of-stock issues have been resolved before orders arrive at the rental location, and that packaging and shipping of any order will not affect the processing of others.

In your work, you will probably encounter calculations that can only be processed in a certain sequence, and not in parallel, since the various iterations or steps of the loop depend on each other and must be performed in a strict order. Let's take a living example from the wild. Imagine a pregnant deer. Since gestation lasts on average eight months, no matter how you look at it, the fawn will not appear in a month, even if eight deer become pregnant at the same time. However, eight reindeer at the same time would do a great job if you harnessed them all to Santa Claus's sleigh.

Rule 2: Apply concurrency at a low level of granularity

There are two approaches to parallel partitioning of sequential program code: bottom-up and top-down. First, at the code analysis stage, code segments (so-called “hot” spots) are identified, which take up a significant portion of the program’s execution time. Separating these code segments in parallel (if possible) will provide the greatest performance gains.

The bottom-up approach implements multi-threaded processing of code hot spots. If parallel partitioning of the found points is not possible, you need to examine the application's call stack to determine other segments that are available for parallel partitioning and have been running for a long time. Let's say you're working on an application that compresses graphics. Compression can be implemented using several independent parallel threads that process individual image segments. However, even if you manage to implement multi-threading hot spots, do not neglect the analysis of the call stack, as a result of which you can find segments that are available for parallel division, located at a higher level of the program code. This way you can increase the granularity of parallel processing.

In the top-down approach, the work of the program code is analyzed, and its individual segments are identified, the execution of which leads to the completion of the entire task. If major code segments are not clearly independent, analyze their component parts to look for independent calculations. By analyzing your code, you can identify the code modules that take the most CPU time to execute. Let's look at the implementation of threading in an application designed for video encoding. Parallel processing can be implemented at the lowest level - for independent pixels of one frame, or at a higher level - for groups of frames that can be processed independently of other groups. If the application is being created to process multiple video files simultaneously, parallel division at this level may be even simpler, and the detail will be at the lowest level.

Parallel computation granularity refers to the amount of computation that must be performed before synchronization between threads. In other words, the less frequently synchronization occurs, the lower the level of detail. Threaded computing at a high granularity can cause the system overhead associated with organizing threads to exceed the amount of useful computation performed by those threads. Increasing the number of threads while maintaining the same amount of computation complicates the processing process. Low-granularity multithreading causes less system latency and has greater potential for scalability, which can be achieved by introducing additional threads. To implement parallel processing at low granularity, it is recommended to use a top-down approach and organize threads at a high level of the call stack.

Rule 3: Build scalability into your code so that its performance increases as the number of cores increases.

Not so long ago, in addition to dual-core processors, quad-core processors appeared on the market. Moreover, Intel has already announced the creation of a processor with 80 cores, capable of performing a trillion floating point operations per second. Since the number of cores in processors will only increase over time, your code must have adequate scalability potential. Scalability is a parameter by which one can judge the ability of an application to adequately respond to changes such as an increase in system resources (number of cores, memory size, bus frequency, etc.) or an increase in data volume. Considering that the number of cores in future processors will increase, write scalable code that will increase performance due to increased system resources.

To paraphrase one of C. Northecote Parkinson's laws, we can say that “data processing occupies all available system resources.” This means that as computing resources (such as the number of cores) increase, all of them will likely be used for data processing. Let's return to the video compression application discussed above. The appearance of additional processor cores is unlikely to affect the size of processed frames - instead, the number of threads processing the frame will increase, which will lead to a decrease in the number of pixels per thread. As a result, due to the organization of additional threads, the amount of overhead will increase, and the degree of parallelism will decrease. Another more likely scenario would be an increase in the size or number of video files that would need to be encoded. In this case, organizing additional threads that will process larger (or additional) video files will allow you to divide the entire amount of work directly at the stage where the increase occurred. In turn, an application with such capabilities will have high potential for scalability.

Designing and implementing parallel processing using data decomposition provides increased scalability compared to using functional decomposition. The number of independent functions in the program code is most often limited and does not change during application execution. Since each independent function is allocated a separate thread (and, accordingly, a processor core), then with an increase in the number of cores, additionally organized threads will not cause a performance increase. So, parallel partitioning models with data decomposition will provide increased potential for application scalability due to the fact that with an increase in the number of processor cores, the volume of processed data will increase.

Even if the program code organizes threaded processing of independent functions, it is likely that additional threads can be used that are launched when the input load increases. Let's return to the example of building a house discussed above. The unique goal of construction is to complete a limited number of independent tasks. However, if you are ordered to build twice as many floors, you will probably want to hire additional workers in some specialties (painters, roofers, plumbers, etc.). Therefore, you need to develop applications that can adapt to the data decomposition that occurs as a result of increased workload. If your code implements functional decomposition, consider organizing additional threads as the number of processor cores increases.

Rule 4: Use thread-safe libraries

If you might need a library to handle data in hot spots in your code, be sure to consider using pre-built functions instead of your own code. In short, don't try to reinvent the wheel by developing code segments whose functionality is already provided in optimized library procedures. Many libraries, including the Intel® Math Kernel Library (Intel® MKL) and Intel® Integrated Performance Primitives (Intel® IPP), already contain multi-threaded functions optimized for multi-core processors.

It is worth noting that when using procedures from multithreaded libraries, you must make sure that calling a particular library will not affect the normal operation of the threads. That is, if procedure calls are made from two different threads, each call must return the correct results. If procedures access and update shared library variables, a “data race” may occur, which will have a detrimental effect on the reliability of the calculation results. To work correctly with threads, the library procedure is added as a new one (that is, it does not update anything other than local variables) or is synchronized to protect access to shared resources. Conclusion: before using any third-party library in your program code, read the documentation attached to it to make sure it works correctly with threads.

Rule 5: Use an appropriate threading model

Let's say that the functions from multithreaded libraries are clearly not enough to split all the relevant code segments in parallel, and you had to think about organizing threads. Don't rush to create your own (cumbersome) threading structure if the OpenMP library already contains all the functionality you need.

The disadvantage of explicit multithreading is the inability to precisely control threads.

If you only need parallel separation of resource-intensive loops, or the additional flexibility that explicit threads provide is of secondary importance to you, then in this case there is no point in doing extra work. The more complex the implementation of multithreading, the greater the likelihood of errors in the code and the more difficult its subsequent modification.

The OpenMP library is focused on data decomposition and is especially well suited for threaded processing of loops that work with large amounts of information. Despite the fact that only data decomposition is applicable to some applications, it is necessary to take into account additional requirements (for example, an employer or customer), according to which the use of OpenMP is unacceptable and it remains to implement multithreading using explicit methods. In this case, OpenMP can be used to pre-thread to estimate potential performance gains, scalability, and the estimated effort required to subsequently partition the code using explicit threading.

Rule 6. The result of the program code should not depend on the execution sequence of parallel threads

For sequential code, it is sufficient to simply define an expression that will be executed after any other expression. In multi-threaded code, the order of execution of threads is not defined and depends on the instructions of the operating system scheduler. Strictly speaking, it is almost impossible to predict the sequence of threads that will be launched to perform any operation, or to determine which thread will be launched by the scheduler at a subsequent moment. Prediction is primarily used to reduce application latency, especially when running on a platform with a processor that has fewer cores than threads. If a thread is blocked because it needs to access an area not written to the cache or because it needs to perform an I/O request, the scheduler will suspend it and start a thread ready to run.

A direct result of uncertainty in thread scheduling is data race situations. Assuming that one thread will change the value of a shared variable before another thread reads that value may be wrong. With luck, the order of execution of threads for a specific platform will remain the same across all runs of the application. However, small changes in the state of the system (for example, the location of data on the hard drive, memory speed, or even a deviation from the nominal AC frequency of the power supply) can trigger a different order of thread execution. Thus, for program code that works correctly only with a certain sequence of threads, problems associated with data race situations and deadlocks are likely.

From a performance point of view, it is preferable not to limit the order in which threads are executed. A strict sequence of thread execution is allowed only in cases of extreme necessity, determined by a predetermined criterion. If such circumstances occur, threads will be launched in the order specified by the provided synchronization mechanisms. For example, imagine two friends reading a newspaper that is laid out on the table. Firstly, they can read at different speeds, and secondly, they can read different articles. And here it doesn’t matter who reads the newspaper spread first - in any case, he will have to wait for his friend before turning the page. At the same time, there are no restrictions on the time or order of reading articles - friends read at any speed, and synchronization between them occurs immediately when turning the page.

Rule 7: Use stream local storage. If necessary, assign locks to individual data areas

Synchronization inevitably increases the load on the system, which in no way speeds up the process of obtaining the results of parallel calculations, but ensures their correctness. Yes, synchronization is necessary, but it should not be abused. To minimize synchronization, thread local storage or allocated memory areas (for example, array elements marked with the identifiers of the corresponding threads) are used.

The need to share temporary variables between different threads arises quite rarely. Such variables must be declared or allocated locally to each thread. Variables whose values ​​are intermediate results of thread execution must also be declared local to the corresponding threads. Synchronization will be required to summarize these intermediate results in some common memory area. To minimize possible stress on the system, it is preferable to update this general area as rarely as possible. Explicit multithreading methods provide thread-local storage APIs that ensure the integrity of local data from the start of one multi-threaded code segment to the next (or from one multi-threaded function call to the next execution of the same function).

If storing threads locally is not possible, access to shared resources is synchronized using various objects, such as locks. It is important to correctly assign locks to specific data blocks, which is easiest to do if the number of locks is equal to the number of data blocks. A single locking mechanism that synchronizes access to multiple memory regions is used only when all these regions reside in the same critical section of the program code.

What should you do if you need to synchronize access to a large amount of data, for example, an array consisting of 10,000 elements? Organizing a single lock for the entire array is likely to create a bottleneck in the application. Do you really have to organize locking for each element separately? Then, even if 32 or 64 parallel threads access the data, you will have to prevent access conflicts to a fairly large memory area, and the probability of such conflicts occurring is 1%. Fortunately, there is a kind of golden mean, the so-called “modulo locks”. If N modulo locks are used, each lock will synchronize access to the Nth portion of the total data area. For example, if two such locks are organized, one of them will prevent access to even array elements, and the second will prevent access to odd elements. In this case, threads, accessing the required element, determine its parity and set the appropriate lock. The number of modulo locks is selected taking into account the number of threads and the probability of simultaneous access by several threads to the same memory area.

Note that the simultaneous use of multiple locking mechanisms is not allowed to synchronize access to one memory area. Let's remember Segal's law: “A person who has one watch knows exactly what time it is. A man who has a few watches is not sure of anything.” Suppose that access to a variable is controlled by two different locks. In this case, the first lock can be used by one code segment, and the second by another segment. Then the threads executing these segments will find themselves in a race situation for the shared data they are simultaneously accessing.

Rule 8. Change the software algorithm if required to implement multithreading

The criterion for assessing the performance of applications, both sequential and parallel, is execution time. The asymptotic order is suitable as an estimate of the algorithm. Using this theoretical indicator, it is almost always possible to evaluate the performance of an application. That is, all other things being equal, an application with a growth rate of O(n log n) (quick sort) will run faster than an application with a growth rate of O(n2) (selective sort), although the results of these applications are the same.

The better the asymptotic execution order, the faster the parallel application runs. However, even the most productive sequential algorithm cannot always be divided into parallel threads. If a program hotspot is too difficult to split, and there is no way to implement multithreading at a higher level of the hotspot's call stack, you should first consider using a different sequential algorithm that is easier to split than the original one. Of course, there are other ways to prepare program code for thread processing.

To illustrate the last statement, consider the multiplication of two square matrices. Strassen's algorithm has one of the best asymptotic execution orders: O(n2.81), which is much better than the O(n3) order of the ordinary triple nested loop algorithm. According to Strassen's algorithm, each matrix is ​​divided into four submatrices, after which seven recursive calls are made to multiply n/2 × n/2 submatrices. To parallelize recursive calls, you can create a new thread that will sequentially perform seven independent submatrix multiplications until they reach a given size. In this case, the number of threads will increase exponentially, and the granularity of the calculations performed by each newly formed thread will increase as the size of the submatrices decreases. Let's consider another option - organizing a pool of seven threads working simultaneously and performing one submatrix multiplication each. When the thread pool is finished running, the Strassen method is called recursively to multiply the submatrices (as in the sequential version of the code). If the system running such a program has more than eight processor cores, some of them will be idle.

The matrix multiplication algorithm is much easier to parallelize using a triple nested loop. In this case, data decomposition is used, in which the matrices are divided into rows, columns or submatrices, and each thread performs certain calculations. The implementation of such an algorithm is carried out using OpenMP pragmas inserted at some level of the loop, or by explicitly organizing threads that perform matrix division. To implement this simpler sequential algorithm, much less modifications to the program code will be required compared to the implementation of the multi-threaded Strassen algorithm.

So, now you know eight simple rules for effectively converting sequential program code into parallel. By following these rules, you will create multi-threaded solutions much faster that have increased reliability, optimal performance and fewer bottlenecks.

To return to the Multithreaded Programming tutorials web page, go to

An example of building a simple multi-threaded application.

Born about the reason for the large number of questions about building multi-threaded applications in Delphi.

The purpose of this example is to demonstrate how to properly build a multi-threaded application, with long-term work being moved to a separate thread. And how in such an application to ensure interaction between the main thread and the worker thread to transfer data from the form (visual components) to the thread and back.

The example does not claim to be complete; it only demonstrates the simplest ways of interaction between threads. Allowing the user to “quickly create” (who knows how much I don’t like this) a correctly working multi-threaded application.
Everything is commented in detail (in my opinion), but if you have any questions, ask.
But I warn you again: Streams are not easy. If you have no idea how it all works, then there is a huge danger that often everything will work fine for you, and sometimes the program will behave more than strangely. The behavior of an incorrectly written multi-threaded program very much depends on a large number of factors that are sometimes impossible to reproduce during debugging.

So an example. For convenience, I have included the code and attached an archive with the module and form code

unit ExThreadForm;

uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls;

// constants used when transferring data from a stream to a form using
// send window messages
const
WM_USER_SendMessageMetod = WM_USER+10;
WM_USER_PostMessageMetod = WM_USER+11;

type
// description of the thread class, a descendant of tThread
tMyThread = class(tThread)
private
SyncDataN:Integer;
SyncDataS:String;
procedure SyncMetod1;
protected
procedure Execute; override;
public
Param1:String;
Param2:Integer;
Param3:Boolean;
Stopped:Boolean;
LastRandom:Integer;
IterationNo:Integer;
ResultList:tStringList;

Constructor Create(aParam1:String);
destructor Destroy; override;
end;

// description of the class of the form using the stream
TForm1 = class(TForm)
Label1: TLabel;
Memo1: TMemo;
btnStart: TButton;
btnStop: TButton;
Edit1: TEdit;
Edit2: TEdit;
CheckBox1: TCheckBox;
Label2: TLabel;
Label3: TLabel;
Label4: TLabel;
procedure btnStartClick(Sender: TObject);
procedure btnStopClick(Sender: TObject);
private
(Private declarations)
MyThread:tMyThread;
procedure EventMyThreadOnTerminate(Sender:tObject);
procedure EventOnSendMessageMetod (var Msg: TMessage);message WM_USER_SendMessageMetod;
procedure EventOnPostMessageMetod(var Msg: TMessage); message WM_USER_PostMessageMetod;

Public
(Public declarations)
end;

var
Form1: TForm1;

{
Stopped - demonstrates the transfer of data from a form to a thread.
Does not require additional synchronization since it is simple
single-word type, and is written by only one thread.
}

procedure TForm1.btnStartClick(Sender: TObject);
begin
Randomize(); // ensuring randomness in the sequence using Random() - has nothing to do with the flow

// Create an instance of a thread object, passing it an input parameter
{
ATTENTION!
The thread constructor is written in such a way that a thread is created
suspended because it allows:
1. Control the moment of its launch. This is almost always more convenient, because...
allows you to configure the stream even before launching, passing it input
parameters, etc.
2. Because a link to the created object will be saved in the form field, then
after the self-destruction of the thread (see below) which occurs when the thread is running
may occur at any time, this link will become invalid.
}
MyThread:= tMyThread.Create(Form1.Edit1.Text);

// However, since the thread was created suspended, any errors
// during its initialization (before launching), we must destroy it ourselves
// why use try / except block
try

// Assigning a thread completion handler in which we will receive
// results of the thread, and “overwrite” the link to it
MyThread.OnTerminate:= EventMyThreadOnTerminate;

// Since we will take the results in OnTerminate, i.e. to self-destruction
// stream, then we will relieve ourselves of the worries of destroying it
MyThread.FreeOnTerminate:= True;

// An example of passing input parameters through the fields of a stream object, at the point
// creating an instance when it is not yet running.
// Personally, I prefer to do this through the parameters of the overridden
// constructor (tMyThread.Create)
MyThread.Param2:= StrToInt(Form1.Edit2.Text);

MyThread.Stopped:= False; // also a kind of parameter, but changing depending on
// thread running time
except
// since the thread has not yet started and cannot self-destruct, let's destroy it "manually"
FreeAndNil(MyThread);
// and then let the exception be processed as usual
raise;
end;

// Since the thread object has been successfully created and configured, it's time to run it
MyThread.Resume;

ShowMessage("Stream started");
end;

procedure TForm1.btnStopClick(Sender: TObject);
begin
// If the thread instance still exists, then ask it to stop
// Moreover, we’ll just “ask”. In principle, we can also “force” it, but it will be
// exclusively an emergency option, requiring a clear understanding of all this
// stream kitchen. Therefore, it is not considered here.
if Assigned(MyThread) then
MyThread.Stopped:= True
else
ShowMessage("The thread is not running!");
end;

procedure TForm1.EventOnSendMessageMetod(var Msg: TMessage);
begin
// method for processing a synchronous message
// in WParam the address of the tMyThread object, in LParam the current LastRandom value of the thread
with tMyThread(Msg.WParam) do begin
Form1.Label3.Caption:= Format("%d %d %d",);
end;
end;

procedure TForm1.EventOnPostMessageMetod(var Msg: TMessage);
begin
// method for processing an asynchronous message
// in WParam the current value of IterationNo, in LParam the current value of the LastRandom of the thread
Form1.Label4.Caption:= Format("%d %d",);
end;

procedure TForm1.EventMyThreadOnTerminate(Sender:tObject);
begin
// IMPORTANT!
// The OnTerminate event handling method is always called in the context of the main
// thread - this is guaranteed by the tThread implementation. Therefore, you can freely
// use any properties and methods of any objects

// Just in case, make sure that the object instance still exists
if not Assigned(MyThread) then Exit; // if it is not there, then there is nothing to do

// getting the results of the work of a thread of an instance of a thread object
Form1.Memo1.Lines.Add(Format("The thread ended with result %d",));
Form1.Memo1.Lines.AddStrings((Sender as tMyThread).ResultList);

// Destroy the reference to the thread object instance.
// Since our thread is self-destructing (FreeOnTerminate:= True)
// then after the OnTerminate handler completes, the thread object instance will be
// destroyed (Free), and all references to it will become invalid.
// To avoid accidentally running into such a link, delete MyThread
// Let me note again - we will not destroy the object, but only delete the link. An object
// will destroy itself!
MyThread:= Nil;
end;

constructor tMyThread.Create(aParam1:String);
begin
// Create an instance of a SUSPENDED thread (see comment when creating an instance)
inherited Create(True);

// Create internal objects (if necessary)
ResultList:= tStringList.Create;

// Obtaining initial data.

// Copying the input data passed through the parameter
Param1:= aParam1;

// An example of receiving input data from VCL components in the constructor of a thread object
// This is acceptable in this case, since the constructor is called in the context
// main thread. Therefore, VCL components can be accessed here.
// But I don’t like this, because I think it’s bad when a thread knows something
// about some form. But what can’t you do for demonstration.
Param3:= Form1.CheckBox1.Checked;
end;

destructor tMyThread.Destroy;
begin
// destruction of internal objects
FreeAndNil(ResultList);
// destroying the base tThread
inherited;
end;

procedure tMyThread.Execute;
var
t:Cardinal;
s:String;
begin
IterationNo:= 0; // results counter (cycle number)

// In my example, the body of the thread is a loop that ends
// or at an external “request” the Stopped parameter passed through the variable will be completed,
// or simply by completing 5 cycles
// It’s more pleasant for me to write this through an “eternal” loop.

While True do begin

Inc(IterationNo); // next cycle number

LastRandom:= Random(1000); // random number - to demonstrate passing parameters from the stream to the form

T:= Random(5)+1; // time for which we will fall asleep if we are not terminated

// Dull operation (depending on the input parameter)
if not Param3 then
Inc(Param2)
else
Dec(Param2);

// Generate an intermediate result
s:= Format("%s %5d %s %d %d",
);

// Add the intermediate result to the list of results
ResultList.Add(s);

//// Examples of passing intermediate results to the form

//// Passing through a synchronized method - the classic way
//// Flaws:
//// - the synchronized method is usually a method of the thread class (for access
//// to the fields of the stream object), but to access the form fields, it must
//// "know" about it and its fields (objects), which is usually not very good with
//// points of view of program organization.
//// - the current thread will be suspended until execution is completed
//// synchronized method.

//// Advantages:
//// - standard and universal
//// - in a synchronized method you can use
//// all fields of the stream object.
// first, if necessary, you need to save the transmitted data in
// special fields of the object object.
SyncDataN:= IterationNo;
SyncDataS:= "Sync"+s;
// and then provide a synchronized method call
Synchronize(SyncMetod1);

//// Transmission via synchronous message sending (SendMessage)
//// in this case, data can be passed through message parameters (LastRandom),
//// and through the fields of the object, passing the address of the instance in the message parameter
//// stream object - Integer(Self).
//// Flaws:
//// - the thread must know the handle of the form window
//// - as with Synchronize, the current thread will be suspended until
//// complete processing of the message by the main thread
//// - requires significant CPU time for each call
//// (for switching threads) so a very frequent call is undesirable
//// Advantages:
//// - as with Synchronize, when processing a message you can use
//// all fields of the stream object (if, of course, its address was passed)


//// start the thread.
SendMessage(Form1.Handle,WM_USER_SendMessageMetod,Integer(Self),LastRandom);

//// Transmission via asynchronous message sending (PostMessage)
//// Because in this case, by the time the main thread receives the message,
//// the sending thread may have already completed, passing the instance address
//// thread object is not valid!
//// Flaws:
//// - the thread must know the handle of the form window;
//// - due to asynchrony, data transfer is possible only through parameters
//// messages, which significantly complicates the transfer of data of size
//// more than two machine words. Convenient to use for transferring Integer, etc.
//// Advantages:
//// - unlike previous methods, the current thread will NOT
//// suspended, but will immediately resume execution
//// - unlike a synchronized call, a message handler
//// is a form method that must have knowledge of the thread object,
//// or know nothing at all about the stream if data is transmitted only
//// via message parameters. That is, the thread may not know anything about the form
//// in general - only its Handle, which can be passed as a parameter before
//// start the thread.
PostMessage(Form1.Handle,WM_USER_PostMessageMetod,IterationNo,LastRandom);

//// Check for possible completion

// Check completion by parameter
if Stopped then Break;

// Check completion on occasion
if IterationNo >= 10 then Break;

Sleep(t*1000); // Fall asleep for t seconds
end;
end;

procedure tMyThread.SyncMetod1;
begin
// this method is called using the Synchronize method.
// That is, despite the fact that it is a method of the tMyThread thread,
// it runs in the context of the application's main thread.
// Therefore, he can do everything, or almost everything :)
// But remember, there is no need to “tinker” here for a long time

// Passed parameters, we can extract them from the special field where we
// saved before calling.
Form1.Label1.Caption:= SyncDataS;

// or from other fields of the flow object, for example, reflecting its current state
Form1.Label2.Caption:= Format("%d %d",);
end;

In general, the example was preceded by my following thoughts on the topic....

Firstly:
THE MOST IMPORTANT rule of multi-threaded programming in Delphi:
In the context of a non-main thread, you cannot access the properties and methods of forms, and indeed all components that “grow” from tWinControl.

This means (somewhat simplified) that neither in the Execute method inherited from TThread, nor in other methods/procedures/functions called from Execute, it is forbidden do not directly access any properties or methods of visual components.

How to do it right.
There are no common recipes here. More precisely, there are so many and different options that you need to choose depending on the specific case. That’s why they refer you to the article. Having read and understood it, the programmer will be able to understand how best to do it in a given case.

In a nutshell:

Most often, an application becomes multi-threaded either when it is necessary to do some long-term work, or when you can do several things at the same time that do not heavily load the processor.

In the first case, implementing work inside the main thread leads to “slowing down” of the user interface - while the work is being done, the message processing loop is not executed. As a result, the program does not respond to user actions, and the form is not drawn, for example, after the user moves it.

In the second case, when work involves active exchange with the outside world, then during forced “downtime”. While waiting for data to be received/sent, you can do something else in parallel, for example, again send/receive other data.

There are other cases, but less common. However, this doesn’t matter. Not about that now.

Now, how is all this written? Naturally, a certain most common case, somewhat generalized, is considered. So.

Work carried out in a separate thread, in general, has four entities (I don’t know what to call it more precisely):
1. Initial data
2. The actual work itself (it may depend on the source data)
3. Intermediate data (for example, information about the current state of work)
4. Output (result)

Most often, visual components are used to read and display most of the data. But, as mentioned above, you cannot directly access visual components from a stream. How to be?
Delphi developers suggest using the Synchronize method of the TThread class. Here I will not describe how to use it - there is the above-mentioned article for that. I will only say that its use, even correct, is not always justified. There are two problems:

First, the body of a method called via Synchronize is always executed in the context of the main thread, and therefore, while it is executing, the window message processing loop is again not executed. Therefore, it must be executed quickly, otherwise we will get all the same problems as with a single-threaded implementation. Ideally, the method called via Synchronize should generally only be used to access the properties and methods of visual objects.

Secondly, executing a method via Synchronize is an “expensive” pleasure caused by the need for two switches between threads.

Moreover, both problems are interconnected and cause a contradiction: on the one hand, to solve the first, it is necessary to “shred” the methods called through Synchronize, and on the other, they then have to be called more often, losing precious processor resources.

Therefore, as always, you need to approach wisely, and for different cases, use different ways of interacting with the outside world:

Initial data
All data that is transmitted to the stream and does not change during its operation must be transmitted before it is launched, i.e. when creating a thread. To use them in the body of a thread, you need to make a local copy of them (usually in the fields of a TThread child).
If there are source data that can change while the thread is running, then access to such data must be done either through synchronized methods (methods called via Synchronize) or through the fields of a thread object (a descendant of TThread). The latter requires some caution.

Intermediate and output data
Here again there are several ways (in order of my preference):
- A method for asynchronously sending messages to the main application window.
Typically used to send messages to the main application window about the status of a process, transmitting a small amount of data (for example, percentage of completion)
- A method for synchronously sending messages to the main application window.
It is usually used for the same purposes as asynchronous sending, but allows you to transfer a larger amount of data without creating a separate copy.
- Synchronized methods, where possible, combining the transfer of as much data as possible into one method.
Can also be used to receive data from a form.
- Through the fields of a stream object, ensuring mutually exclusive access.
You can read more in the article.

Eh. It didn't work out well again