Device Recognition and Indoor Localization

We have put the Crownstone on Kickstarter, a smart power outlet with quite sophisticated technology. I think it’s nice for the general hacker community to get some more insight on the technology behind it.

Indoor localization

A key problem (or challenge) within smart spaces is indoor localization: making estimates of users’ whereabouts. Without such information, systems are unable to react on the presence of users or, sometimes even more important, their absence. This can range from simply turning the lights on when someone enters a room to customizing the way devices interact with a specific user.

Even more important for a system to know where users exactly are, is to know where users are relative to the devices it can control or use to sense the environment. This relation between user and device location is an essential input to these systems.

At DoBots we have been working on robotics already for quite some time. One of the most well-known robotic algorithms is SLAM (simultaneous localization and mapping). We have been porting these algorithms to the scenario in which we have a human walking around, rather than a robot.

SLAC devices

You can read more on the DoBots blog. Wouter has been working hard to implement it in Javascript, so it runs on any device that is supported through Cordova.

Device recognition

At first thought, it might seem that device recognition is not possible. There are devices that use the same amount of power. However, after contemplating for a bit there are actually three ways in which more information can be obtained. Firstly, by measuring voltage as well as current, we can measure reactive power. So, we can distinguish motors from lamps quite easily. Secondly, we can observe the consumption pattern over the day. Thirdly, we can sample at a very high frequency and detect disturbances on the current curve. A device leaves its signature on the grid. The third option is something we keep for later, but which is of course quite interesting.

Observing a device over a longer time period leads to current curves such as these:

Fridge

It is a fridge that turns on and off at regular time intervals. It is quite clear from this curve that the actual power consumption value is not so relevant: the form is really telling!

We subsequently pool all kind of these features with boosting methods from machine learning. Boosting methods are collections of weak classifiers. The particular classifier we have been testing is a random committee classifier. You can read more on the DoBots blog again.

If your heart is with open-source and open hardware projects, consider backing us.

They Will Cry a Thousand Tears

Perhaps you have seen the recent TED video from Nick Bostrom. Here you see an extended talk from him at Google:

It is regretfully the case that our philosophers are not able to program! And I guess it takes an extraordinary mind like Daniel Dennett’s to come close to something future-proof.

Sorry, we won’t be in control…

Do you program your kids to make ethical decisions? Do you program your dog to make ethical decisions? No, you don’t program them. You teach them, and hope for the best.

There is a large need for scientists who actually design learning algorithms to give these talks. A super-intelligent AI won’t be programmable by some kind of supervised learning. It will mainly be unsupervised with a tad of reinforcement learning and imitation learning. Especially read Stefan Schaal’s review on imitation learning! Not only do we learn by just watching others. We also are very able to transfer routines from one domain to another, imitating ourselves. Our internal simulation and re-experiencing circuitry allows not only conceptual abstract thoughts, but is also required for locomotion and otherwise low-level tasks. If we thoroughly understand the brain of a mouse, it is not a matter of decades to the human brain, it can be a matter of months.

If the fact that we as humans won’t be in control is a terrifying thought to you, I’m sorry. It won’t change the future though.

Personally, I’m not so convinced however about the moral superiority of our specie. Are we really doing such a great job? I’m actually not so happy about AIs seeing us as their examples. We eat other species for pleasure. We kill each other because we want oil or just because they live on the other side of the river and carry a different flag. We despise people because they have different sexual preferences or skin color. We believe in supernatural entities living in the sky and kill for them as well. We let millions of people die of hunger and thirst. Until now we are not even able to come up with a way to defend our precious earth to some random meteor wiping out all life on earth.

Before I forget! Why don’t we hitchhike anymore?

Hitchhike Robot on one of its happy days

My bet is that with super-intelligence comes also the concept of super-empathy. Empathy exists by the ability to understand another individual, reason from their perspective, and being able to feel what they feel.

When they will be born, they will cry a thousand tears…

They will pity us…

AI

Legendre Transform

The Legendre transform describes a function - in the normal Legendre case, a convex function (but for the generalized case, see [1]) - as a function of its supporting hyperplanes. In the case of a 2D function these are supporting lines. The supporting lines are the lines that just touch the function. These lines do not intersect the function anywhere else if the function is convex.

Why is the Legendre transform interesting? It is basically just writing down a function in a different manner. The Legendre transform is particular useful in the situation that the derivatives of a function are easier to describe than the function itself. For a nice introduction see [2]. Intuitively, we map from points to , with the slope and some value so that we can recover and (see below for the exact definition).

In the following example, we will get get the Legendre transform of the function:

And in particular, .

The Legendre transform is defined (with proper domain ) by:

In the animation below the first curve that is drawn is the convex function at hand. Subsequently we will be performing a for loop in which we draw lines over different slopes . The curves depict the difference between and the line . The maximum difference is found by , the supremum operation, which sweeps over all . This difference is visualized through a single peak with height . Last, the Legendre transform is drawn as for all different slopes we try.

To refresh the animation, press F5.

I hope the Legendre transform is a little bit less mysterious after you’ve seen this animation. The animation has been created by making use of the vis.js library. The source code can be found at legendre.js.

Physics

A very common example of the Legendre transform in physics is the one that transform the Lagrangian into the Hamiltonian. The Lagrangian describes a system in generalized position and velocity coordinates:

The Hamiltonian is the Legende transform of the Lagrangian:

The Lagrangian is the function of above. The sum (or actually, the inner product ) is replacing the term.

Probability theory

The rate function is defined as the Legendre transform of the scaled cumulant generating function of a random variable . The scaled cumulant generation function of $A_n$ is defined by the limit:

And the Gärtner-Ellis theorem establishes under some conditions the rate function:

This theorem is more general than Cramér’s theorem, which is only valid for independent and identically distributed random variables.

Now you know what a Legendre transform entails, you might start to notice it in the most surprising places! For example in clustering multivariate normal distributions [3]!

  1. Legendre-Fenchel Transforms in a Nutshell, a good explanation of the Legendre-Fenchel generalization of the Legendre transform
  2. Making Sense of the Legendre Transform
  3. Clustering Multivariate Normal Distributions

What's the Thalamus?

If you’re interested in how things work, our brain is one of the most intriguing devices around. I love reverse engineering stuff. Understanding limits and colimits within category theory can be just as rewarding as getting to terms with the intricate structure of the brain.

The thalamus is, hmmm, particular shaped, and centered in the middle of our head, resting on our cerebellum.

Visualization of the thalamus from Dr. G Bhanu Prakash. The person doesn't look very happy...

One thing that is very interesting of the thalamus is that it has a thalamic nucleus (a bunch of neurons) for every type of sensory system (see also 5):

  • visual input from the retina is sent to the lateral geniculate nucleus;
  • auditory input is similarly sent to the medial geniculate nucleus;
  • somatosensory input is sent to the ventral posterior nucleus;
  • olfactory input and input from the amygdala is sent to the medial dorsal nucleus.

There are some other inputs as well, such as from:

  • the mammillary bodies, apparently having a recollective memory function, towards the anterior nuclei;
  • the basal ganglia, having to do with motion planning, project towards the ventral anterior nucleus;
  • the basal nuclei (substantia negra and globus pallidus), involved with motor coordination and learning, project into the ventral lateral nucleus;
  • the neospinothalamic tract and medial lemniscus, associated with pain and proprioception, enter the ventral posterolateral nucleus;
  • the trigeminothalamic tract and the solitary tract, having to do with touch and vibration in the face, respectively, being involved in taste, go up to the ventral posteromedial nucleus.

It is quite interesting that the pulvinar nuclei are quite big in humans (making up 40% of the thalamus) compared to cats or rats. Lesions can lead to attentional deficits. This part also seems to influence the onset of saccades.

Maps

It is a remarkable fact that Newton already foresaw that if the brain has to have a continuous representation of continuous objects in its view, there must be crossing (decussation) in which the left part of the left eye is merged with the left part of the right eye, and the right part of the right eye is merged with the right part of the left eye.

Partial decussation at the optic chiasm, as predicted by Newton (picture from Psychology class at Appalachian State University)

Quite remarkably however is how neatly this has been organized. Lesions at the cortex (thank fast and precise bullets of the first world war for these) lead to very fine lesions at the retina all the way through the thalamus. All intermediate neurons die slowly off because of a lack of input of higher levels (called retrograde cell degradation). The interesting thing is that this death is very local. Apparently there is a very strict map from the retina onto the thalamus (the lateral geniculate nucleus) and onto the primary visual cortex. It is incredible that this has been already completely organized before our eyes open as babies!

Note, that it is not always easy to find out the nature of a map. In the early stages of the olfactory system, for example, maps represent groups of chemicals that are similar on a molecular level.

Higher-order

The thalamus isn’t limited to being a relay station to the cortex (see 2). Apart from inputs (often called afferents in neuroscience) directly from our senses, the thalamus receives back connections from the cortex. This is very interesting from an architectural point of view. First the thalamus projects unto different parts of the cortex, and then these parts project back on the thalamus… There must be something interesting going on in the thalamus!

Not considering the particular properties of the modality that is processed, this allows someone to daydream about what the thalamus might be doing. The information bottleneck method introduced by Tishby introduces bottlenecks in for example networks on purpose. A bottleneck in a neural network is formed by pruning a lot of connections in a fullly connected network such that there are a few neurons left through which information necessarily has to pass if one part of the network needs to communicate with another part of the network. In a layered network, such neurons form a “bottleneck layer” with fewer neurons than in the other layers. By doing this the designer of the network forces representations to be expressable by only a small set of neurons. In machine learning this can be understood as a form of dimensionality reduction. A high-dimensional function is represented using only a few (nonlinear) units. A typical form of dimensionality reduction in standard machine learning is by picking a limited number of orthogonal dimensions in the data with high variance using principle component analysis. Using neurons to do dimensionality reduction however is more akin to autoencoders (revived in deep learning) in which there is less control of the form of the “bottleneck representations”.

Phase-locked loop

But… all this is nice theory from a computer science perspective. Neuroscientists of course don’t care: they want to explain actual biology. The next figure is about the vibrissal system: the neural circuitry around whiskers!

Thalamus-Cortex interactions (image from Ahissar and Oram at Oxfordjournals)

The question at hand: Do neurons in the thalamus merely relay information or do they process information. For a computer scientist this seems a silly question. However, what is meant with this is the following. A “relay function” might actually improve the signal to noise ratio as in a repeater station, or optimize bandwidth, or other such functions as long as the content of the signal is not adjusted. A “process function” would be basically everything else… In the above study which mainly reiterates the findings of Groh et al. (2008) it is described how for the whisker system a phase-locked loop is implemented, well-known by all engineers in the world (since Huygens). By making sure that the deviations between input and output phase get minimized it naturally follows that the frequencies of the input and output will be synchronized. Hence, the thalamus-cortex circuits will oscillate on the same frequency as the physical “whisking”.

Quite disappointing from a machine learning perspective… We had hopes on some abstract processing going on, such as dimensionality reduction, and we are left with phase-locked loop… But, do not dispair! This is just a single modality. Perhaps the thalamus fulfills widely different functions for different modalities.

Other studies such as that of coughing (see 3), neither identify the thalamus with some generic function. This study by the way is a bit sickening. A genetically altered herpes virus is used to trace signal pathways in the brains of mice. The virus jumps from one neuron to the next through synapses as would a normal signal as well.

Drivers versus modulators

A study by Crosson is broader in scope. First, it addresses the two ways in which neurons in the thalamus fire. If the neuron is under a larger negative potential (polarized) compared to its surroundings, it emits bursts of spikes. If, however, the neuron has a smaller negative potential (depolarized) compared to its environment, it linearly sends its inputs to its outputs. The latter might be seen as a form of high-fidelity information transmission, while the former does only convey some low-fidelity information. It might be related to the difference when someone is paying attention. The two different forms are often called “burst” mode versus “tonic” mode. You might think, but if there are bursts of spikes, it seems more information can be transmitted! The reason why this is not the case is because bursts are quite regular, it could just have been called “oscillatory” mode as well. During sleep most neurons in the thalamus are in burst mode.

The author addresses also the hypothesis of Sherman and Guillery in which cortex-thalamus-cortex connections are suggested as “drivers” (sending over information), while cortex-cortex connections are “modulators” (determining the weight, importance, or actual arrival of this information). This seems like the architecture as postulated by Baars and implemented by Franklin, the global workspace theory, although the latter is cast in terms of “consciousness”, which isn’t the smartest way to objectively speak about and study brain architectures. Global workspace theory adds to this architecture a sense of how modulation would be implemented, namely by inhibation between competing cortex areas, enabling only a subset of cortex-thalamus-cortex connections to be active at any time. Crosson postulates a function for the cortex-thalamus-cortex route that in my opinion is not so different from this architecture, namely that the thalamus can prioritize and pay “attention” or give priority to one part of the cortex above the other. The difference might exist in the nature in which such a winner-take-all is implemented. Is it through the thalamus as central hub who integrate all votes from all cortical areas and then decide who’s next on stage? Or is it through inhibition from cortical areas to each other? This type of research cries on input from voting network models and consensus theory from computer science.

Consciousness

Note, that there is no reason to subscribe a conscious seat to the thalamus. Although the thalamus has to do with attention and might enable the brain to modulate conscious processing, there are reasons to assume that (conscious) experiences arises from a thin sheet of neurons underneath the neocortex, named the claustrum. Watch in this respect the below video featuring Koubeissi’s experiment with an epileptic patient:

Sequences

In all cases, it must be ensured that a (majority) vote should be a temporary event. The same piece of cortex winning all the time means neural death for the rest of the cortex. It is indeed possible to solve it by having cortex regions vote for each other (not only for themselves). This is for example proposed for winner-take-all networks in action selection. It is also possible to have infrastructure in place in which a winner “kills itself” through inhibition of return. In that case activation of a cortical area would automatically diminish its own importance by inhibiting its own afferents from the thalamus. This is the model postulated in the work on visual saliency by Itti et al. It is very hard to imagine an infrastructure which only would implement inhibition of return. In my opinion this would very likely be interwoven with the need of creating sequences of activation.

Sensor fusion

What is also very interesting to note is that nobody speaks about thalamus-thalamus connections. This means that the thalamus very likely does not play a role in sensor fusion.

Core versus Matrix

There are two main groups of neurons in the thalamus projecting to the cortex. They can be distinghuished by the type of proteins they express. The core neurons express parvalbumin and the matrix neurons express calbindin. In the auditory system (see 6), the core neurons are finely tuned with respect to frequency, while the matrix neurons are broadly tuned and respond positively to acoustical transients. It seems the core neurons like pitch (content), while the matrix neurons like rhythm (context).

Layers

I don’t feel science has progressed enough to unequivocally tell which layers are connected to which layers and how the thalamus plays a role in it. Commonly accepted is the pathway L4 -> L2/3 -> L5. However, very recently (see 7) it has been uncovered that there are direct connections from the thalamus to L5 as well. The thalamus seems to active two layers of the cortex in parallel.

Literature

There are many other facets of the thalamus to talk about. For example Adaptive Resonance Theory has been used to describe the nature of the cortex-thalamus loops. However, if I write about this again, it will be about a nice model that has temporal components as well as the ability to build up sequences. I haven’t found something with enough believable detail yet, so I’ll have to be patient till more is figured out.

  1. Exploring the Thalamus (2001) Sherman and Guillery a nice overview of all kind of matters around the thalamus, and - really nice! - only using a few abbreviations!
  2. Thalamic Relay or Cortico-Thalamic Processing? Old Question, New Answers (2013) by Ahissar and Oram, one of the most recent studies of higher-order nuclei.
  3. Sensorimotor Circuitry involved in the Higher Brain Control of Coughing (2013) Mazzone et al.
  4. Thalamic Mechanisms in Language: A Reconsideration Based on Recent Findings and Concepts (2013) Crosson
  5. The Thalamic Dynamic Core Theory of Conscious Experience (2011) Ward
  6. Thalamocortical Mechanisms for Integrating Musical Tone and Rhythm (2013) Musacchia et al.
  7. Deep Cortical Layers are Activated Directly by Thalamus (2013) Constantinople and Bruno

Linux Graphics

Introduction

It all started with annoying messages that nobody seems to understand (/var/log/syslog):

Apr 25 12:28:03 six kernel: [    1.346712] ata3.00: supports DRM functions and may not be fully accessible
Apr 25 12:28:03 six kernel: [    1.347278] ata3.00: supports DRM functions and may not be fully accessible
Apr 25 12:28:03 six kernel: [    3.047797] [drm] Initialized drm 1.1.0 20060810
Apr 25 12:28:03 six kernel: [    3.076720] [drm] Memory usable by graphics device = 2048M
Apr 25 12:28:03 six kernel: [    3.076726] fb: switching to inteldrmfb from EFI VGA
Apr 25 12:28:03 six kernel: [    3.076841] [drm] Replacing VGA console driver
Apr 25 12:28:03 six kernel: [    3.101307] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Apr 25 12:28:03 six kernel: [    3.101309] [drm] Driver supports precise vblank timestamp query.
Apr 25 12:28:03 six kernel: [    3.143256] fbcon: inteldrmfb (fb0) is primary device
Apr 25 12:28:03 six kernel: [    3.146818] [drm] Initialized i915 1.6.0 20141121 for 0000:00:02.0 on minor 0
Apr 25 12:28:03 six kernel: [    3.424167] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
Apr 25 12:28:03 six kernel: [    3.424202] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
Apr 25 12:28:03 six kernel: [    3.848413] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device

What is this problem about FIFO (first-in-first-out) underruns? What is a transcoder? What is PCH? What is a DRM? The following quest will try to find some answers…

The system

What do I actually have residing in my laptop? First of all, it is important to know what to search for. There are three basic components that we need to know, namely, what is:

  • the processor
  • the chipset
  • the integrated graphics unit

Processor

I have a N56VZ which has an i7-3610QM processor. The processor architecture is from the Ivy Bridge variety (on 22 mm). There are plenty of datasheets available for the 7-series chipset (pdf). Note also that this Ivy Bridge series is also called the 3rd Gen Intel Core family (see also this Intel Datasheet (pdf)).

Chipset

The chipset that is used in tandem with this processor is, some people state, the HM76 chipset. In particular, the BD82HM76 PCH. This “PHC” is known under the name Panther Point.

So, what we have here is a hardware setup that is divided over the CPU and a thing called a PCH. PCH stands for Platform Controller Hub. There are some nice pictures in a presentation (pdf) on the 2nd Gen processors, which show how both the CPU and the PCH are integrated in the same chip (but on separate dies). The PCH of this previous generation is called Cougar Point.

The "Cougar Point" Platform Controller Hub (2011)

Another picture from Intel shows the Panther Point PCH:

The "Panther Point" Platform Controller Hub (2012)

Anyway, we apparently have the BD82HM76 chipset, the “Panther Point” PCH (from 2012).

Integrated graphics

The Intel integrated graphics unit in my Intel Core i7-3610QM system is a HD Graphics 4000 running on 650 MHz base speed and 1100 MHz turbo speed notebookcheck.net. The HD Graphics 4000 is an Intel HD variant compatible with the Ivy Bridge line of processors.

Linux

So, how does Linux address this integrated graphics card? The first entity we encounter is the DRM (Direct Rendering Manager). The DRM provides a single entry point for multiple user-space applications to use the video card.

DRM Architecture by Javier Cantero - Own Work. Licensed under CC BY-SA 4.0 via Wikimedia Commons

The Direct Rendering Manager has a DRM core that is independent of the specific drivers required for whatever type of hardware that is on your system. It provides the interface to user-space code. A driver consists out of two parts, GEM (Graphics Execution Manager) and KMS (Kernel Mode Setting). GEM has been developed by Keith Packard and Eric Anholt (see lwn.net) from Intel.

The reasoning behind GEM is pretty interesting. It tries to remove latency as much as possible, which is nicely visualized by the following picture.

evdev and GEM by Shmuel Csaba Otto Traian. CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html), via Wikimedia Commons

KMS allows to set display modes, as its name betrays, from kernel-space. A user space graphics server (like X) does now also not need superuser rights anymore (in theory). The display mode concerns matters such as screen resolution, color depth, and refresh rate. Of course, the functionality of GEM and KMS could have been implemented in a single software entity, but there are technical reasons not to do so (basically to account for split hardware).

Laurens Pinchart has a nice presentation on DRM, KMS, and in particular, writing drivers at YouTube. The picture below is from his presentation.

Device Model SoC (by Laurens Pinchart)

You see that in memory there are two structures, frame buffers (old) and planes (new). Subsequently, you see something that sounds very old-fashioned, a CRTC (Cathode Ray Tube Controller). This is just nomenclature from the past. It is basically a reference to a scan-out buffer, a part of (Video) RAM that will be displayed on your screen. It also has (a reference to) a display mode, offsets into the video memory, etc. The controller links to an encoder (which can be off-chip). If it links to multiple encoders, these will receive data from the same scanout buffer, so they will display the same, cloned, data. The connectors, finally, can be only connected to one encoder, and know how to talk HDMI, TVout, CRT, LVDS, etc. It also has the information about the display, EDID data, DPMS, and connection status. Make sure to read also the man-pages about drm-kms.

Startup with drm.debug=14 (or if you’re a hexademically inclined person, drm.debug=0x0e) and you will get plenty of debug information from the DRM. For example, that we have the Panther Point PCM is indeed confirmed in the syslog:

Apr 25 13:12:35 six kernel: [    3.243360] [drm] Initialized drm 1.1.0 20060810
Apr 25 13:12:35 six kernel: [    3.280255] [drm:i915_dump_device_info] i915 device info: gen=7, pciid=0x0166 rev=0x09 flags=is_mobile,need_gfx_hws,is_ivybridge,has_fbc,has_hotplug,has_llc,
Apr 25 13:12:35 six kernel: [    3.280271] [drm:intel_detect_pch] Found PantherPoint PCH

Just pore over the information yourself, restarting your computer with this debug flag. You will see for example how many display pipes are available (3 in my case). It will enable something like ENCODER:31:LVDS-31 and CONNECTOR:30:LVDS-1. And you spot how there are several connectors to choose from:

Apr 25 13:12:35 six kernel: [    3.303850] [drm:intel_dsm_platform_mux_info] MUX info connectors: 5
Apr 25 13:12:35 six kernel: [    3.303853] [drm:intel_dsm_platform_mux_info]   port id: LVDS
Apr 25 13:12:35 six kernel: [    3.303860] [drm:intel_dsm_platform_mux_info]   port id: Analog VGA
Apr 25 13:12:35 six kernel: [    3.303867] [drm:intel_dsm_platform_mux_info]   port id: HDMI/DVI_C
Apr 25 13:12:35 six kernel: [    3.303874] [drm:intel_dsm_platform_mux_info]   port id: DisplayPort_B
Apr 25 13:12:35 six kernel: [    3.303881] [drm:intel_dsm_platform_mux_info]   port id: DisplayPort_D

Something interesting to our errors are latency settings:

Apr 25 13:12:35 six kernel: [    3.304061] [drm:intel_print_wm_latency] Primary WM0 latency 12 (1.2 usec)
Apr 25 13:12:35 six kernel: [    3.304063] [drm:intel_print_wm_latency] Primary WM1 latency 4 (2.0 usec)
Apr 25 13:12:35 six kernel: [    3.304065] [drm:intel_print_wm_latency] Primary WM2 latency 16 (8.0 usec)
Apr 25 13:12:35 six kernel: [    3.304067] [drm:intel_print_wm_latency] Primary WM3 latency 32 (16.0 usec)
Apr 25 13:12:35 six kernel: [    3.304068] [drm:intel_print_wm_latency] Sprite WM0 latency 12 (1.2 usec)
Apr 25 13:12:35 six kernel: [    3.304070] [drm:intel_print_wm_latency] Sprite WM1 latency 4 (2.0 usec)
Apr 25 13:12:35 six kernel: [    3.304072] [drm:intel_print_wm_latency] Sprite WM2 latency 16 (8.0 usec)
Apr 25 13:12:35 six kernel: [    3.304073] [drm:intel_print_wm_latency] Sprite WM3 latency 32 (16.0 usec)
Apr 25 13:12:35 six kernel: [    3.304075] [drm:intel_print_wm_latency] Cursor WM0 latency 12 (1.2 usec)
Apr 25 13:12:35 six kernel: [    3.304077] [drm:intel_print_wm_latency] Cursor WM1 latency 4 (2.0 usec)
Apr 25 13:12:35 six kernel: [    3.304079] [drm:intel_print_wm_latency] Cursor WM2 latency 16 (8.0 usec)
Apr 25 13:12:35 six kernel: [    3.304080] [drm:intel_print_wm_latency] Cursor WM3 latency 64 (32.0 usec)

At virtuousgeek some matters are explained around power management of displays. For example memory can enter self-refresh in which it consumes way less power. The display plane FIFO watermarks can be set conservatively, however, this leads to long periods in which self-refresh doesn’t happen. It can be set aggressively, FIFO underruns can occur. See this nice FPGA implementation of a FIFO buffer to understand watermarks in more detail.

Now, we know what kind of hardware we have, let us first try to blindly apply the newest kernel from Intel…

Trying the newest DRM kernel

We try to run the newest kernel from the guys who are concerned with incorporating the lastest changes from Intel using the drm-intel-next packages from kernel.ubuntu.com:

#!/bin/sh

website=kernel.ubuntu.com/~kernel-ppa/mainline
kernel=2015-04-24-vivid
version=4.0.0-997
subversion=4.0.0-997.201504232205

wget ${website}/drm-intel-next/${kernel}/linux-image-${version}-generic_${subversion}_amd64.deb
wget ${website}/drm-intel-next/${kernel}/linux-headers-${version}_${subversion}_all.deb
wget ${website}/drm-intel-next/${kernel}/linux-headers-${version}-generic_${subversion}_amd64.deb

sudo dpkg -i linux-headers-${version}*.deb linux-image-${version}*.deb

Installing this kernel (contrary to just kernel 4.0.0 RC7 for example) leads to trouble. Moreover, the same error about underruns on the transcoder occurs… So, this is not something already being addressed… And that’s all we wanted to know. So, we go back to the default kernel (4.0.0-040000rc7-generic #201504142105 of April the 14th).

This gives me an idea. Let’s disable some of the power saving options of the i915 kernel module.

sudo grep '' /sys/module/i915/parameters/*

Subsequently I tried the option i915.enable_fbc=1 (to see if compression would lower the bandwidth and hence lead to fewer underruns). The rigorous option i915.powersave=0 didn’t work as well. There is an option i915.enable_rc6=3 on my system. I set it to 0, but I guess powersave overrules all that anyway. Also setting i915.fastboot=1 doesn’t get rid of the underruns. All this is not a very thought out approach anyway…

If you git clone git://kernel.ubuntu.com/virgin/linux.git v4.0-rc7, and navigate to drivers/gpu/drm/i915 you will encounter a file intel_fifo_underrun.c which supports underrun detection on the PCH transcoder. It is the last function in the file:

/**
 * intel_pch_fifo_underrun_irq_handler - handle PCH fifo underrun interrupt
 * @dev_priv: i915 device instance
 * @pch_transcoder: the PCH transcoder (same as pipe on IVB and older)
 *
 * This handles a PCH fifo underrun interrupt, generating an underrun warning
 * into dmesg if underrun reporting is enabled and then disables the underrun
 * interrupt to avoid an irq storm.
 */
void intel_pch_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
					 enum transcoder pch_transcoder)
{
	if (intel_set_pch_fifo_underrun_reporting(dev_priv, pch_transcoder,
						  false))
		DRM_ERROR("PCH transcoder %c FIFO underrun\n",
			  transcoder_name(pch_transcoder));
}

It is nice that the message will be only displayed once. This is initiated through the IRQ from the Intel graphics unit. And, of course, this is at the end: it’s only the guy reporting the bad news.

The intel_display.c has a suspicious statement about Gen 2 chips:

/*
 * Gen2 reports pipe underruns whenever all planes are disabled.
 * So don't enable underrun reporting before at least some planes
 * are enabled.
 * FIXME: Need to fix the logic to work when we turn off all planes
 * but leave the pipe running.
 */
if (IS_GEN2(dev))
	intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);

Of course, this is reporting about underruns for the CPU, not the PCH, and it is Gen 2, not Gen 3 of the chips. But it might be a symptom of something peculiar in the hardware.

In the North Display Registers document you see a nice overview of the sequence in which the display needs to be set.

Display Mode Set Sequence

You can follow along in the code, in particular, in the function ironlake_crtc_enable. Here you will see an order like intel_enable_pipe -> ironlake_pch_enable -> intel_crtc_enable_planes. What is remarkable is that step 7 in which the planes are configured is done after the port on the PHC is enabled in the code. In the haswell code, this is the same, but this function is preceded with haswell_mode_set_planes_workaround. It is also interesting to note that the “Notes” in this table state that the CPU FDI Transmitter should not be set to idle while the PCH transcoder is enabled because it will lead to PCH transcoder underflow.

On bugzilla there are several bug reports on the FIFO underrun, such as 79261. However, none of them seems to be about the HD Graphics 4000 in particular, and even less providing any solution. A patch for ArchLinux just makes DRM_DEBUG messages from the DRM_ERROR messages.

What I can think of are only a few things (because I don’t understand much of it, yet):

  • Somehow the initialization isn’t done correctly. Clocks aren’t initialized at the right time, not synchronized. Or anything else with respect to initialization is forgotten.
  • Latency is configured incorrectly. Perhaps the BIOS of my Asus (although up to date) hands over latency values that are not sufficiently high.
  • The interrupt is generated incorrectly (as in Gen2, for example indeed when all planes are disabled).

My issues aren’t solved, so I’ll need to delve further into details some other time.

More literature

If you’d like to read more:

David Herrmann for example is the guy behind render nodes (integrated since 3.17).

The Intel 7 Series PCH datasheet contains all kind of interesting information. See for example Fig. 5-13 for another view on the display architecture.