Parsing infix to postfix in C++11

Pursuant to my interest in programming language design, I have been giving a lot of thought to mathematical expressions. Making sense of a mathematical expression written by a human is difficult from an algorithmic point of view. It’s certainly not impossible.

Humans read and write mathematical expressions in a notation called infix1  however this requires sorting operators by precedence2, which humans are very good at, but computers not so much. To be able to do it, a computer has to first parse and identify all parts of a token stream until the end, and then group those parts, and then sort them based on the operator precedence. That’s an awful lot of work. There is however another type of notation[actually there are several.] called postfix3 which is very easy for a computer to operate on because each time it encounters an operator4 it simply collapses the last two numbers and replaces the operator with the answer. In this way the token stack reduces to a single number by a series of linear actions requiring no lookahead5 or knowledge of what is to come.

I decided today to implement a mini calculator language that takes an infix expression6 and returns the value after parsing it into postfix to operate on it.

Here is the code:

Most of this code is pretty self-explanitory, though I notice that my highlighter is messing up the template declarations, adding T=”” when it should just be T.

This program has an infinite loop that you can break out of by typing “q” or “quit”. You can type any simple mathematical expression and get it to return to you the value.

It does this by looking at each token in the input:

And testing to see if it is a number, if so it just pushes it onto the stack, in this case, we use a vector because it’s more malleable. I am sure there is a way to do it with stacks, and I tried, but it was more complicated in my opinion. Because sometimes you can get 3 numbers in a row, which meant I would need two stacks and it was just a mess.

All of the clauses lead to pushing the operator token onto the  tmp_ops stack, but the else if clause dumps the stack out to the nums stack. This only happens when we hit an operator of a lower status. The algorithm is surprisingly robust7.

Exponents as a general rule have right affinity, that is they should collect from right to left, so 2 ^ 2 ^ 2 is equal to 2 ^ 4 i.e. 16. I hadn’t actually considered this when I wrote the algorithm, and tried it only a moment ago with some trepidation. Nevertheless it worked out.

After I did this, I asked myself “Wouldn’t it be cool if we could support some variables? That way I could type pi * r ^ 2.”

As it turns out, this was pretty simple to do.

In this case we need two additional methods, define and retrieve which set a unordered_map with an string,string pair. Once we have that, in the token parsing, we consider any string that isn’t an operator or a number to be a variable.

That way we can write something like this:

Instead of just writing ‘pi’ we write the actual π symbol which makes it more fun.

As you can see, retrieve and define are very simple methods. We pass in a reference to a local copy of the map, and that is where variables are set. If they exist, then the value is updated.

At the time of token parsing, we’ve added an else if clause and rewritten the else clause.  At the point of else, instead of giving up like we used to, we just assume it’s a variable, and push that onto the nums stack.

That was hardly any trouble at all. Obviously it’s not a full featured calculator language. It was just a bit of fun to see if I could do it. I kind of like the result. I may decide to add in support for parentheses at some point.

  1. that is operators are written between numbers, like 5 + 4 = 9
  2. as in the case 5 + 4 * 3. The multiplication needs to be done first. The same if it is written 3 * 4 + 5
  3. operators are written after the operands. 3 4 * 5 + and so on.
  4. +,-,/,*,^ etc
  5. looking ahead is when you parse forward an indefinite number of tokens until one is either found or not found in the stream that would invalidate or influence the way you operate on the current token.
  6. mine doesn’t accept parentheses, but that could easily be added in.
  7. as a lay programmer without the benefit of a CS education, I am perpetually surprised when my software actually works…

About Jason

I am a 31 year old programmer living in the south of France. I currently work actively in the fields of Ruby/PHP/Javascript and server/website administration.
This entry was posted in c++0x/c++11, Programming Language Design, Topics and tagged , , , . Bookmark the permalink.