Jolie Rouge


all ur parentheses Я belong to me


Parsing infix to postfix in C++11

Pursuant to my interest in programming language design, I have been giving a lot of thought to mathematical expressions. Making sense of a mathematical expression written by a human is difficult from an algorithmic point of view. It’s certainly not impossible.

Humans read and write mathematical expressions in a notation called infix1  however this requires sorting operators by precedence2, which humans are very good at, but computers not so much. To be able to do it, a computer has to first parse and identify all parts of a token stream until the end, and then group those parts, and then sort them based on the operator precedence. That’s an awful lot of work. There is however another type of notation[actually there are several.] called postfix3 which is very easy for a computer to operate on because each time it encounters an operator4 it simply collapses the last two numbers and replaces the operator with the answer. In this way the token stack reduces to a single number by a series of linear actions requiring no lookahead5 or knowledge of what is to come.

I decided today to implement a mini calculator language that takes an infix expression6 and returns the value after parsing it into postfix to operate on it.

Here is the code:

#include 
#include 
#include 
#include 
#include 
#include 
template
void puts(T x) {
    std::cout << x << std::endl;
}
template 
std::string ToString(T i) {
    std::string dest;
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
    return dest;
}

double ToNumber(std::string& s) {
    return atof(s.c_str());
}
bool IsNumber(std::string& s) {
    std::string dict = "1234567890.";
    int size = dict.size();
    for (int i = 0; i < s.size(); i++) {
        if (s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {             continue;         } else {             return false;         }     }     return true; } bool IsOperator(std::string& s) {     std::string dict = "+-*/^";     int size = dict.size();     if (s.size() > 1)
        return false;
    for (int i = 0; i < dict.size(); i++) {
        if (s[i] == ' ' || s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            return true;
            continue;
        } else {
            return false;
        }
    }
    return true;
}
template  
void PrintVector(T s) {
    for (typename T::iterator it = s.begin(); it != s.end(); ++it) {
        puts(*it);
    }
    puts("----");
}
int Priority(std::string & s) {
    if (s == "^") {
        return 3;
    } else if (s == "*" || s == "/") {
        return 2;
    }
    return 1;
}

static std::vector nums;
static std::vector tmp_ops;
int main(int argc, char* argv[]) {
    std::string expression;
    std::string token;
    while (true) {
        std::cout << "Enter infix: ";         std::getline(std::cin,expression);         if (expression == "q" || expression == "quit") {             break;         }         std::stringstream ss(expression);         while(std::getline(ss,token,' ')) {             if (IsNumber(token)) {                 nums.push_back(token);             } else if (IsOperator(token)) {                 if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            }  else {
                goto bye_bye;
            }
        } // end token parse
        while(!tmp_ops.empty()) {
            nums.push_back(tmp_ops.back());
            tmp_ops.pop_back();
        }
restart:
        //PrintVector(nums);
        for (std::vector::iterator it = nums.begin(); it != nums.end(); ++it) {
            if (IsOperator(*it)) {
                std::string op = *it;
                std::string lpart = *(it - 2);
                std::string rpart = *(it - 1);
                double l = ToNumber(lpart);;
                double r = ToNumber(rpart);
                double result;
                if (op == "+")
                    result = l + r;
                else if (op == "-")
                    result = l - r;
                else if (op == "*")
                    result = l * r;
                else if (op == "/")
                    result = l / r;
                else if (op == "^")
                    result = pow(l,r);
                std::string sresult;
                *it = ToString(result);
                nums.erase(it - 2,it);
                goto restart; // because we modified the vector...
            }
        }
        std::cout << "Answer: " << nums.back() << std::endl;
        nums.clear();
    } // end main loop
    bye_bye:
    std::cout << "Bye Bye" << std::endl;    
    return 0;
}

Most of this code is pretty self-explanitory, though I notice that my highlighter is messing up the template declarations, adding T=”” when it should just be T.

This program has an infinite loop that you can break out of by typing “q” or “quit”. You can type any simple mathematical expression and get it to return to you the value.

jason@berserker:~/work/examples/infix2postfix$ clang++ main.cpp -o in2po.bin --std=c++11
jason@berserker:~/work/examples/infix2postfix$ ./in2po.bin 
Enter infix: 3 + 4
Answer: 7
Enter infix: 5 + 3 * 4
Answer: 17
Enter infix: 4 + 3 ^ 3 - 1
Answer: 30
Enter infix: q
Bye Bye
jason@berserker:~/work/examples/infix2postfix$

It does this by looking at each token in the input:

while(std::getline(ss,token,' ')) {
            if (IsNumber(token)) {
                nums.push_back(token);
            } else if (IsOperator(token)) {
                if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            }  else {
                goto bye_bye;
            }
        } // end token parse

And testing to see if it is a number, if so it just pushes it onto the stack, in this case, we use a vector because it’s more malleable. I am sure there is a way to do it with stacks, and I tried, but it was more complicated in my opinion. Because sometimes you can get 3 numbers in a row, which meant I would need two stacks and it was just a mess.

All of the clauses lead to pushing the operator token onto the  tmp_ops stack, but the else if clause dumps the stack out to the nums stack. This only happens when we hit an operator of a lower status. The algorithm is surprisingly robust7.

jason@berserker:~/work/examples/infix2postfix$ ./in2po.bin 
Enter infix: 2 * 2 ^ 2 ^ 2
Answer: 32
Enter infix:

Exponents as a general rule have right affinity, that is they should collect from right to left, so 2 ^ 2 ^ 2 is equal to 2 ^ 4 i.e. 16. I hadn’t actually considered this when I wrote the algorithm, and tried it only a moment ago with some trepidation. Nevertheless it worked out.

After I did this, I asked myself “Wouldn’t it be cool if we could support some variables? That way I could type pi * r ^ 2.”

As it turns out, this was pretty simple to do.

#include <string>
#include <iostream>
#include <sstream>
#include <vector>
#include <unordered_map>
#include <stdlib.h>
#include <math.h>
template<class T>
void puts(T x) {
    std::cout << x << std::endl;
}
template <class T>
std::string ToString(T i) {
    std::string dest;
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
    return dest;
}

double ToNumber(std::string& s) {
    return atof(s.c_str());
}
bool IsNumber(std::string& s) {
    std::string dict = "1234567890.";
    int size = dict.size();
    for (int i = 0; i < s.size(); i++) {
        if (s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            continue;
        } else {
            return false;
        }
    }
    return true;
}
bool IsOperator(std::string& s) {
    std::string dict = "+-*/^";
    int size = dict.size();
    if (s.size() > 1)
        return false;
    for (int i = 0; i < dict.size(); i++) {
        if (s[i] == ' ' || s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            return true;
            continue;
        } else {
            return false;
        }
    }
    return true;
}
template <class T> 
void PrintVector(T s) {
    for (typename T::iterator it = s.begin(); it != s.end(); ++it) {
        puts(*it);
    }
    puts("----");
}
int Priority(std::string & s) {
    if (s == "^") {
        return 3;
    } else if (s == "*" || s == "/") {
        return 2;
    }
    return 1;
}
void define(std::unordered_map<std::string,std::string> &vars,std::string& name, std::string& value) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
        hit->second = value;
   } else {
        std::pair<std::string,std::string> nv(name,value);
        vars.insert(nv);
   }
}
std::string retrieve(std::unordered_map<std::string,std::string> &vars,std::string& name) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
       return hit->second;
   }
   puts(name + " not found");
   return "";
}
static std::vector<std::string> nums;
static std::vector<std::string> tmp_ops;
static std::unordered_map<std::string,std::string> vars;
int main(int argc, char* argv[]) {
    std::string expression;
    std::string token;
    while (true) {
    begin:
        std::cout << "Enter infix: ";
        std::getline(std::cin,expression);
        if (expression == "q" || expression == "quit") {
            break;
        }
        std::stringstream ss(expression);
        while(std::getline(ss,token,' ')) {
            if (IsNumber(token)) {
                nums.push_back(token);
            } else if (IsOperator(token)) {
                if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            } else if (token == "set") {
                std::string value;
                std::getline(ss,token,' ');
                std::getline(ss,value,' ');
                define(vars,token,value);
                goto begin;
            } else {
                std::string val = retrieve(vars,token);
                nums.push_back(val);
            }
        } // end token parse
        while(!tmp_ops.empty()) {
            nums.push_back(tmp_ops.back());
            tmp_ops.pop_back();
        }
restart:
        //PrintVector(nums);
        for (std::vector<std::string>::iterator it = nums.begin(); it != nums.end(); ++it) {
            if (IsOperator(*it)) {
                std::string op = *it;
                std::string lpart = *(it - 2);
                std::string rpart = *(it - 1);
                double l = ToNumber(lpart);;
                double r = ToNumber(rpart);
                double result;
                if (op == "+")
                    result = l + r;
                else if (op == "-")
                    result = l - r;
                else if (op == "*")
                    result = l * r;
                else if (op == "/")
                    result = l / r;
                else if (op == "^")
                    result = pow(l,r);
                *it = ToString(result);
                nums.erase(it - 2,it);
                goto restart; // because we modified the vector...
            }
        }
        std::cout << "Answer: " << nums.back() << std::endl;
        nums.clear();
    } // end main loop
    std::cout << "Bye Bye" << std::endl;    
    return 0;
}

In this case we need two additional methods, define and retrieve which set a unordered_map with an string,string pair. Once we have that, in the token parsing, we consider any string that isn’t an operator or a number to be a variable.

That way we can write something like this:

jason@berserker:~/work/examples/infix2postfix$ clang++ with_vars.cpp -o withVars.bin --std=c++11
jason@berserker:~/work/examples/infix2postfix$ ./withVars.bin 
Enter infix: set π 3.14
Enter infix: set r 1.5
Enter infix: π * r ^ 2
Answer: 7.065
Enter infix: q
Bye Bye
jason@berserker:~/work/examples/infix2postfix$

Instead of just writing ‘pi’ we write the actual π symbol which makes it more fun.

void define(std::unordered_map<std::string,std::string> &vars,std::string& name, std::string& value) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
        hit->second = value;
   } else {
        std::pair<std::string,std::string> nv(name,value);
        vars.insert(nv);
   }
}
std::string retrieve(std::unordered_map<std::string,std::string> &vars,std::string& name) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
       return hit->second;
   }
   puts(name + " not found");
   return "";
}

As you can see, retrieve and define are very simple methods. We pass in a reference to a local copy of the map, and that is where variables are set. If they exist, then the value is updated.

At the time of token parsing, we’ve added an else if clause and rewritten the else clause.  At the point of else, instead of giving up like we used to, we just assume it’s a variable, and push that onto the nums stack.

             } else if (token == "set") {
                std::string value;
                std::getline(ss,token,' ');
                std::getline(ss,value,' ');
                define(vars,token,value);
                goto begin;
            } else {
                std::string val = retrieve(vars,token);
                nums.push_back(val);
            }

That was hardly any trouble at all. Obviously it’s not a full featured calculator language. It was just a bit of fun to see if I could do it. I kind of like the result. I may decide to add in support for parentheses at some point.

  1. that is operators are written between numbers, like 5 + 4 = 9
  2. as in the case 5 + 4 * 3. The multiplication needs to be done first. The same if it is written 3 * 4 + 5
  3. operators are written after the operands. 3 4 * 5 + and so on.
  4. +,-,/,*,^ etc
  5. looking ahead is when you parse forward an indefinite number of tokens until one is either found or not found in the stream that would invalidate or influence the way you operate on the current token.
  6. mine doesn’t accept parentheses, but that could easily be added in.
  7. as a lay programmer without the benefit of a CS education, I am perpetually surprised when my software actually works…