runtime

.NET memory model

ECMA 335 vs. .NET memory models

ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.

In the course of multiple releases .NET runtime implementations settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the .NET runtimes in their current implementation with expectation of that being carried to future releases.

Alignment

When managed by the .NET runtime, variables of built-in primitive types are properly aligned according to the data type size. This applies to both heap and stack allocated memory.

1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively. 8-byte variables are 8-byte aligned on 64 bit platforms. Native-sized integer types and pointers have alignment that matches their size on the given platform.

The alignment of fields is not guaranteed when FieldOffsetAttribute is used to explicitly adjust field offsets.

Atomic memory accesses

Memory accesses to properly aligned data of primitive and Enum types with size with sizes up to the platform pointer size are always atomic. The value that is observed is always a result of complete read and write operations.

Primitive types: bool, char, int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64, native int, native unsigned int.

Values of unmanaged pointers are treated as native integer primitive types. Memory accesses to properly aligned values of unmanaged pointers are atomic.

Managed references are always aligned to their size on the given platform and accesses are atomic.

The following methods perform atomic memory accesses regardless of the platform when the location of the variable is managed by the runtime.

Example: Volatile.Read<double>(ref location) on a 32 bit platform is atomic, while an ordinary read of location may not be.

Unmanaged memory access

As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior. Example: memory accesses through pointers whose target address is not properly aligned to the data access size may be not atomic or cause faults depending on the platform and hardware configuration.

Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:

These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.

As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.

Side-effects and optimizations of memory accesses

.NET runtime assumes that the side-effects of memory reads and writes include only observing and changing values at specified memory locations. This applies to all reads and writes - volatile or not. This is different from ECMA model.

As a consequence:

The practical motivations for these rules are:

Thread-local memory accesses

It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.

Cross-thread access to local variables

Order of memory operations

Note that volatile semantics does not by itself imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.

volatile. and unaligned. IL prefixes can be combined where both are permitted.

It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.

C# volatile feature

One common way to introduce volatile memory accesses is by using C# volatile language feature. Declaring a field as volatile does not have any effect on how .NET runtime treats the field. The decoration works as a hint to the C# compiler itself (and compilers for other .Net languages) to emit reads and writes of such field as reads and writes with volatile. prefix.

Process-wide barrier

Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.

The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process’ affinity mask could be a suitable implementation.

Synchronized methods

Methods decorated with MethodImpl(MethodImplOptions.Synchronized) attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.

Data-dependent reads are ordered

Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference. This guarantee applies to both managed references and unmanaged pointers.

Example: reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.

var x = nonlocal.a.b;
var y = nonlocal.a;
var z = y.b;

// cannot have execution order as:

var x = nonlocal.a.b;
var y = nonlocal.a;
var z = x;

Object assignment

Object assignment to a location potentially accessible by other threads is a release with respect to accesses to the instance’s fields/elements and metadata. An optimizing compiler must preserve the order of object assignment and data-dependent memory accesses.

The motivation is to ensure that storing an object reference to shared memory acts as a “committing point” to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (for example, method table and necessary flags are set) when other threads, including background GC threads are able to access the instance. The reading thread does not need to perform an acquiring read before accessing the content of an instance since runtime guarantees ordering of data-dependent reads.

The ordering side-effects of reference assignment should not be used for general ordering purposes because:

There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document.

It is believed that compiler optimizations do not violate the ordering guarantees in sections about data-dependent reads and object assignments, but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue: https://github.com/dotnet/runtime/issues/79764

Instance constructors

.NET runtime does not specify any ordering effects to the instance constructors.

Static constructors

All side-effects of static constructor execution will become observable no later than effects of accessing any member of the type. Other member methods of the type, when invoked, will observe complete results of the type’s static constructor execution.

Hardware considerations

Currently supported implementations of .NET runtime and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these as well because the large body of preexisting software will make it burdensome to break common assumptions.

Examples and common patterns

The following examples work correctly on all supported implementations of .NET runtime regardless of the target OS or architecture.


static MyClass obj;

// thread #1
void ThreadFunc1()
{
    while (true)
    {
        obj = new MyClass();
    }
}

// thread #2
void ThreadFunc1()
{
    while (true)
    {
        obj = null;
    }
}

// thread #3
void ThreadFunc2()
{
    MyClass localObj = obj;
    if (localObj != null)
    {
        // accessing members of the local object is safe because
        // - reads cannot be introduced, thus localObj cannot be re-read and become null
        // - publishing assignment to obj will not become visible earlier than write operations in the MyClass constructor
        // - indirect accesses via an instance are data-dependent reads, thus we will see results of constructor's writes
        System.Console.WriteLine(localObj.ToString());
    }
}

public class Singleton
{
    private static readonly object _lock = new object();
    private static Singleton _inst;

    private Singleton() { }

    public static Singleton GetInstance()
    {
        if (_inst == null)
        {
            lock (_lock)
            {
                // taking a lock is an acquire, the read of _inst will happen after taking the lock
                // releasing a lock is a release, if another thread assigned _inst, the write will be observed no later than the release of the lock
                // thus if another thread initialized the _inst, the current thread is guaranteed to see that here.

                if (_inst == null)
                {
                    _inst = new Singleton();
                }
            }
        }

        return _inst;
    }
}

public class Singleton
{
    private static Singleton _inst;

    private Singleton() { }

    public static Singleton GetInstance()
    {
        Singleton localInst = _inst;
        if (localInst == null)
        {
            // unlike the example with the lock, we may construct multiple instances
            // only one will "win" and become a unique singleton object
            Interlocked.CompareExchange(ref _inst, new Singleton(), null);

            // since Interlocked.CompareExchange is a full fence,
            // we cannot possibly read null or some other spurious instance that is not the singleton
            localInst = _inst;
        }

        return localInst;
    }
}
internal class Program
{
    static bool flag;

    static void Main(string[] args)
    {
        Task.Run(() => flag = true);

        // the repeated read will eventually see that the value of 'flag' has changed,
        // but the read must be Volatile to ensure all reads are not coalesced
        // into one read prior entering the while loop.
        while (!Volatile.Read(ref flag))
        {
        }

        System.Console.WriteLine("done");
    }
}