runtime

.NET Cross-Plat Performance and Eventing Design

Introduction

As we bring up CoreCLR on the Linux and OS X platforms, it’s important that we determine how we’ll measure and analyze performance on these platforms. On Windows we use an event based model that depends on ETW, and we have a good amount of tooling that builds on this approach. Ideally, we can extend this model to Linux and OS X and re-use much of the Windows tooling.

Requirements

Ideally, we’d like to have the following functionality on each OS that we bring-up:

Scoping to the Current Landscape

Given that we’ve built up a rich set of functionality on Windows, much of which depends on ETW and is specific to the OS, we’re going to see some differences across the other operating systems.

Our goal should be to do the best job that we can to enable data collection and analysis across the supported operating systems by betting on the right technologies, such that as the landscape across these operating systems evolve, .NET is well positioned to take advantage of the changes without needing to change the fundamental technology choices that we’ve made. While this choice will likely result in some types of investigations being more difficult due to absent features that we depend upon on Windows, it is likely to position us better for the future and align us with the OS communities.

Linux

Proposed Design

Given that the performance and tracing tool space on Linux is quite fragmented, there is not one tool that meets all of our requirements. As such, we’ll use two tools when necessary to gather both performance data and tracing data.

For performance data collection we’ll use perf_events, an in-tree performance tool that provides access to hardware counters, software counters and system call tracing. Perf_event will be the primary provider of system-wide performance data such as CPU sampling and context switches.

For tracing we’ll use LTTng. LTTng supports usermode tracing with no kernelspace requirements. It allows for strongly typed static events with PID and TID information. The system is very configurable and allows for enabling and disabling of individual events.

Tools Considered

Perf_Events

Pros

Cons

LTTng

Pros

Cons

SystemTap

Pros

Cons

DTrace4Linux

Pros

Cons

FTrace

Pros

Cons

Extended Berkeley Packet Filter (eBPF)

Pros

Cons

Infrastructure Bring-Up Action Items

OS X

Proposed Design

On OS X, the performance tooling space is much less fragmented than Linux. However, this also means that there are many fewer options.

For performance data collection and tracing, we’ll use Instruments. Instruments is the Apple-built and supported performance tool for OS X. It has a wide range of collection abilities including CPU sampling, context switching, system call tracing, power consumption, memory leaks, etc. It also has support for custom static and dynamic tracing using DTrace as a back-end, which we can take advantage of to provide a logging mechanism for CLR events and EventSource.

Unfortunately, there are some features that Instruments/DTrace do not provide, such as resolution of JIT compiled call frames. Given the existing tooling choices, and the profiler preferences of the OS X community of developers, it likely makes the most sense to use Instruments as our collection and analysis platform, even though it does not support the full set of features that we would like. It’s also true that the number of OS X specific performance issues is likely to be much smaller than the set of all performance issues, which means that in many cases, Windows or Linux can be used, which will provide a more complete story for investigating performance issues.

Tools Considered

Instruments

Pros

Cons

DTrace

Pros

Cons

Infrastructure Bring-Up Action Items

CLR Events

On Windows, the CLR has a number of ETW events that are used for diagnostic and performance purposes. These events need to be enabled on Linux and OS X so that we can collect and use them for performance investigations.

Platform Agnostic Action Items

Linux Action Items

OS X Action Items

EventSource Proposal

Ideally, EventSource operates on Linux and OS X just like it does on Windows. Namely, there is no special registration of any kind that must occur. When an EventSource is initialized, it does everything necessary to register itself with the appropriate logging system (ETW, LTTng, DTrace), such that its events are stored by the logging system when configured to do so.

EventSource should emit events to the appropriate logging system on each operating system. Ideally, we can support the following functionality on all operating systems:

Supporting all of these requirements will mean a significant investment. Today, LTTng and DTrace support all of these requirements, but do so for tracepoints that are defined statically at compile time. This is done by providing tooling that takes a tool specific manifest and generates C code that can then be compiled into the application.

As an example of the kind of work we’ll need to do: LTTng generates helpers that are then called as C module constructors and destructors to register and unregister tracepoint provider definitions. If we want to provide the same level of functionality for EventSource events, we’ll need to understand the generated code and then write our own helpers and register/unregister calls.

While doing this work puts us in an ideal place from a performance and logging verbosity point-of-view, we should make sure that the work done is getting us the proper amount of benefit (e.g. is pay-for-play). As such, we should start with a much simpler design, and move forward with this more complex solution once we’ve proven that the benefit is clear.

Step # 1: Static Event(s) with JSON Payload

As a simple stop-gap solution to get EventSource support on Linux and OS X, we can implement a single EventSource event (or one event per verbosity) that is used to emit all EventSource events regardless of the EventSource that emits them. The payload will be a JSON string that represents the arguments of the event.

Step # 2: Static Event Generation with Strongly-Typed Payloads

Once we have basic EventSource functionality working, we can continue the investigation into how we’d register/unregister and use strongly typed static tracepoints using LTTng and DTrace, and how we’d call them when an EventSource fires the corresponding event.

Compatibility Concerns

In general, we should be transparent about this plan, and not require any compatibility between the two steps other than to ensure that our tools continue to work as we transition.

Step # 1 Bring-Up Action Items

Proposed Priorities

Given the significant work required to bring all of this infrastructure up, this is likely to be a long-term investment. As such, it makes sense to aim at the most impactful items first, and continually evaluate where we are along the road.

Scenarios

We’ll use the following scenarios when defining priorities:

To support these scenarios, we need the following capabilities:

We expect that the following assumptions will hold for the majority of developers and applications:

Work Items

Priority 1

Priority 2

Future: