Why (concurrency != concurrency) is actually true: Part 1

Everyone talks about concurrency nowadays. …and sadly, everyone knows how to speed up each and every poor sequential algorithm by throwing multiple threads at the problem. - Right?

This series of posts talks about some of the most common mistakes made when programmers try to implement a concurrent system - and believe me when I tell you, there are a lot of senior code monkeys out there who are not aware of the issues covered here. All my examples are based on wannabe-concurrent code I’ve actually seen out in the wild, I just simplified them to give prominence to the actual problem. So let’s get into it!

Y U no do what I told ya?!

If you’ve implemented a concurrent system in the past you probably know the “Hey, I didn’t tell you to… oh wait.” syndrome. No? Ok, let me explain.
Consider the following C++ code.

1 int a = 0, b = 0;
2 
3 void func()
4 {
5     a = b + 1;
6     b = 0;
7 }

Now that looks pretty straight forward, doesn’t it? We initialize the globals a and b with 0 and assuming someone calls func we add 1 to b, write the result to a and then write 0 to b. Right? … No!
The very cool thing about high level languages like C++ is that we don’t have to write assembler. We don’t have to tell the CPU instruction by instruction what it has to do in order to produce the output we desire. Every expression or statement written in a high level language is usually translated into multiple CPU instructions. C++ usually compiles to somewhere around 3-15 instructions per statement. A Python statement, for example, is somewhere between 300 and 1000 instructions. - So let’s take a look at the unoptimized compiler output of the example above.

 1 a:
 2 	.zero	4
 3 b:
 4 	.zero	4
 5 func():
 6 	pushq	%rbp
 7 	movq	%rsp, %rbp
 8 	movl	b(%rip), %eax
 9 	addl	$1, %eax
10 	movl	%eax, a(%rip)
11 	movl	$0, b(%rip)
12 	popq	%rbp
13 	ret

Ha! Just as you would expect. a and b are statically initialized to 0. pushq %rbp and movq %rsp, %rbp basically set up the stack. We can ignore them for now since we don’t actually create anything on the stack.
movl b, %eax moves b into an accumulator register and addl $1, %eax adds 1 to the contents of eax. So far so good. movl %eax, a then writes the contents of the accumulator register to b. Next, movl $0, b writes 0 to b. The last two lines revert the stack pointer; we can again ignore them for now.
Alright, this seems pretty much like what we would expect the compiler output to be. Right? Right. However, nobody actually sends out debug or unoptimized builds to customers. Right? Right. So let’s now have a look at what the compiler does when we enable the full set of optimizations.

 1 func():
 2 	movl	b(%rip), %eax
 3 	movl	$0, b(%rip)
 4 	addl	$1, %eax
 5 	movl	%eax, a(%rip)
 6 	ret
 7 b:
 8 	.zero	4
 9 a:
10 	.zero	4

“Whoops! That’s so totally not what I coded.” - “…well bro, it sort of is, actually.” Let’s analyze the compiler’s output oce again.
The first thing we notice is that the compiler actually removed everything related to the stack which is cool because we didn’t really need those instructions anyway. When entering func we first load b into the accumulator. - We basically have the same code so far. - However, the next instruction, movl $0, b, writes 0 into b! “But.. but… butt, wait. You didn’t add one yet :(“

To understand why the compiler has reordered this instruction we will need to take a look at how (main) memory interactions are actually handled by modern CPUs.
A CPU can do memory reads and writes more or less asynchronously meaning that an instruction can trigger a load from main memory into a register and the CPU will load the memory content in the background. This way it can continue executing instructions until the data requested is actually needed.
When we load b into the accumulator we do actually request a portion of memory to be loaded into a CPU register. The CPU will see the instruction and instruct the memory controller to start fetching the memory. While the memory controller is fetching some bits and bytes from main memory, the actual processing unit can keep executing until the data is needed. If, however, the next instruction already requires the data to sit ready in the accumulator, the CPU has to wait until everything has been fetched. A waiting CPU is sad, very sad. In fact, it will start to cry. Violently.
So better don’t f*ck your CPU. - Anyway, back to the example.

After writing 0 to b the processor adds 1 to the accumulator content and then writes the accumulator to a.

When working with concurrent system it is of extreme importance to keep in mind that the compiler (and the CPU too, more on that later) can reorder instructions however it likes as long as the observable result stays the same. In this case observing does not include other threads looking at us at any point in time.

“Alright. Cool. Now I know, the compiler optimizes memory writes. So what?” - Assume the following pseudo-concurrent code.

 1 int* someData = nullptr;
 2 bool preprocessingDone = false;
 3 
 4 void enqueueThread(int* aweSomeData) {
 5   someData = aweSomeData;
 6   preprocessingDone = true;
 7 }
 8 
 9 int workerThread() {
10   while(!preprocessingDone);
11 
12   return *someData;
13 }

And the compiler-optimized output.

 1 workerThread():
 2 	cmpb	$0, preprocessingDone(%rip)
 3 	jne	.L2
 4 .L5:
 5 	jmp	.L5
 6 .L2:
 7 	movq	someData(%rip), %rax
 8 	movl	(%rax), %eax
 9 	ret
10 enqueueThread():
11 	subq	$8, %rsp
12 	movl	$4, %edi
13 	call	operator new(unsigned long)
14 	movb	$1, preprocessingDone(%rip)
15 	movl	$5, (%rax)
16 	movq	%rax, someData(%rip)
17 	addq	$8, %rsp
18 	ret
19 someData:
20 	.zero	8
21 preprocessingDone:
22 	.zero	1

Ok, let’s disassemble this assembly (Haha, ha, hahaha, get the pun? Disassemble the assembly. #lol #yolo).
The first instruction checks whether preprocessingDone has been set already or not. If it hasn’t been set, the CPU runs into a busy-loop waiting for preprocessingDone to be set. Once preprocessingDone is set the CPU will fall through to L2, dereference someData and return the dereferenced value according to standard calling conventions. Looks rock solid so far, doesn’t it?
The real problem lies in enqueueThread. Let’s ignore setting up the stack again and go straight to call operator new(ulong). This is where we allocate memory.

In C++ the statement a = new B(c) does multiple things. It allocates memory, initializes this memory (by executing a constructor for example) and it assigns the pointer to the allocated memory to a variable. As we’ve already learned. The compiler is free reorder instruction as long as the result is the same. So all we can be certain about is that when we actually use a later on in the same thread all three steps mentioned above have been executed. The order in which they are executed can be more or less arbitrary. (Of course, we would have to allocate memory before we can write anything into it.)

Back to the example. Since memory I/O takes a few cycles to complete the compiler gives the CPU some time to do so by pulling instructions not depending on the allocated memory up which is the reason the next instruction is movb $1 preprocessingDon.
Whoops. We didn’t set a yet. All we have is uninitialized memory and we’re already setting preprocessingDone to true. The two following instructions then do the actual memory initialization and variable assignment. - What does this mean?

Assume both threads are executed concurrently and now let’s assume enqueueThread gets some CPU time from the scheduler. The CPU gets interrupted just before movl $5, (%rax). So we have allocated memory which hasn’t been initialized. However, preprocessingDone has already been set. The OS now decides to give workerThread some CPU time. The CPU will exit the loop because our flag has already been set by the other thread. As soon as we try to dereference someData we will get an error because enqueueThread hasn’t gotten to the point where it sets a to point to the allocated memory. BOOM!! I can tell from experience that this sort of bug is extremely hard to track down because it only happens if the instructions are executed in this exact pattern. Which might not always be the case since CPU, OS and scheduler might be in completely different states every time you execute your code.

Alright, so far we have covered a simple synchronization issue. Most programmers who looked into multi-threading before probably already knew this. However, I still find myself looking at code which works completely fine until an updated compiler suddenly decides to reorder some writes here and some reads there and suddenly hell breaks loose because all the things break …occasionally.

The next post in this series will be about memory access pattern and how to avoid unintentional sequentialization of threads because of unthoughtful memory access.

IntelliJ IDEA to go, please!

I’m not going to talk about what IntelliJ IDEA (just IntelliJ from now on) is or what it does in this post too much since this article is meant to be a guide on how to make your copy of IntelliJ portable.

Step One: Bare Installation

Note: You can skip this step if you already have a copy of IntelliJ. On Windows, simply go to your Program Files directory and copy the IntelliJ IDEA folder (usually C:\Program Files (x86)\JetBrains ) to your pen drive. On UNIX simply copy the whole IntelliJ installation directory.

The first step is - kinda obvious - to download the installer from JetBrain’s download page. Execute the installer as usual until you get to the step where you choose your installation destination. - Change this to a path on your pen drive and hit next until the installation is done. You might want to disable automatic shortcut creation because they won’t work unless your pen drive is actually plugged in.
Installation on UNIX platforms is quite a bit easier since all you need to do is extract the archive you downloaded.

Portable IntelliJ IDEA - Installing

Don’t start IntelliJ yet. We have to modify some configuration files first. …well, we have to modify one configuration file.

Step Two: Modifying Where IntelliJ Stores Your Personal Data

This basically is the step where we make IntelliJ “portable”. Go to the directory where you installed IntelliJ. Browse to <InstallDirectory>/bin and open the file idea.properties. Make sure you don’t use Windows’ integrated notepad editor. (As a side note: A really good text editor on Windows is Notepad++.)

The head of your file should look something like this:

#---------------------------------------------------------------------
# Uncomment this option if you want to customize path to IDE config folder. Make sure you're using forward slashes.
#---------------------------------------------------------------------
# idea.config.path=${user.home}/.IntelliJIdea/config

#---------------------------------------------------------------------
# Uncomment this option if you want to customize path to IDE system folder. Make sure you're using forward slashes.
#---------------------------------------------------------------------
# idea.system.path=${user.home}/.IntelliJIdea/system

These are the settings we care about. They define several paths IntelliJ looks for or saves files to. We basically want to uncomment and change all of them.

A comment at the top of the file suggests that we can use ${idea.home} to specify a location relative to IntelliJ’s installation path. - Let’s do it! ;D

First, uncomment the options like this (note: I will omit unnecessary lines from now on):

idea.config.path=${user.home}/.IntelliJIdea/config
idea.system.path=${user.home}/.IntelliJIdea/system
idea.plugins.path=${idea.config.path}/plugins
idea.log.path=${idea.system.path}/log

Next, we will go and change the paths to where we want IntelliJ to store user specific files (mostly configurations, plugins and logs). I want to have a separate var directory in my installation directory where the IDE should store these files. - Simple!

Change the config options like this:

idea.config.path=${idea.home}/var/config
idea.system.path=${idea.home}/var/system
idea.plugins.path=${idea.config.path}/plugins
idea.log.path=${idea.system.path}/log

The only important thing is that you specify your paths relative to ${idea.home}. Everything else doesn’t really matter. If we start IntelliJ now, it will probably start by asking whether we want to import settings from another version of IntelliJ. After importing old settings (or not) we get to the welcome screen. The only thing left to do now is adding a portable Java SDK.

Step Three: Setting up the JDK

Again, we start by downloading a JDK from its download page. Continue by installing the JDK as usual. Use the default installation target. - You can, of course, skip the download and installation if you already have a JDK installed.

Next, go and find your JDK installation in your Program Files directory. On my system it is: C:\Program Files\Java\jdk1.7.0_45. Copy the JDK folder (jdk1.7.0_45 in my case) to your pen drive and open IntelliJ’s project settings panel. Go to Platform Settings > SDKs and hit the big green plus. Add a JDK and set it’s home path to where you copied your JDK files to.

Portable IntelliJ IDEA - Setting up the JDK

Aaaand done. Happy coding on the road!