Backbone.js + Rails

Written by Jason on March 11, 2013 Categories: Information News, Ruby on Rails, Topics, Web Design Tags: , ,

You all know that my opinion of Ruby On Rails couldn’t be any lower if it was under the basement carpet of a Pompeii1 house, so this post won’t be very suprising.

So, awhile back I was having a discussion with a  friend about how, as a general rule, about 90% of a web app should really be taking place on the browser, and this was vindicated in my opinion with several other developers coming to much the same conclusion and releasing some very interesting application frameworks for javascript Single Page Applications. Of course this conversation took place when jQuery was just really starting to take off. The problem at that time was that no one was really talking about a client side application in javascript methodology, how such a thing could or would work.

In hindsight it all seems obvious, but actually, Backbone and other such frameworks represent some really great out of the box creativity. Anyway, after my last Rails project, I more or less swore I would never work with that ass backwards platform ever again. I decided to revisit this idea of Single Page Applications and low and behold I stumble onto Backbone.js, and I decide I am going to learn to use this tool. While looking around for some books on the topics of jQuery and Backbone (I’ve more or less decided to give up the ghost and use jQuery ui elements in all future applications cause rolling your own is just too much work.) low an behold I find a link to this: http://www.backbonerails.com. I thought to myself, you have to be shitting me. Mixing Backbone.js and Ruby on Rails is like taking a sexy girl back to your apartment and making her diddle herself across the room while you try to solve a Rubik’s Cube with your toes.

Rails is the big shining Mecca of Shit2 that passes for a software platform these days, so I understand that all the new kids want to make use of cool by association until graduation is over, and everyone realizes that the flashy douche bag will end up a used car salesman, slowly killing his soul, beating his wife and kids, and crying himself to sleep in a second hand recliner while he reminisces about the glory days over a bottle of cheap scotch.

Normally I would find such an ironic situation truly sad, but must admit the idea of a pot-bellied DHH sobbing like a little girl into a dixie cup filled with Tambowie is strangely satisfying.

The whole idea of Rails, the fundamental flaw of Rails can be summed up by drawing you a nice little picture.backbonevsrails

Using Rails on the Server side of a Backbone Application is like putting on astroglide3 before attempting to service4 a meat grinder. Deploying Rails Applications is actually MORE difficult than deploying a Java Application (go figure). Any and all time and money saved during development of a Rails application will quickly be consumed in the cost of server and the hours of headaches of deployment. It’s such a headache that tools have been created to automate the deployment of Rails Applications, and those tools themselves are a Headache to use, with confusing pointless options and defaults that would only work on trivially simple applications.

Backbone and Ruby aren’t just apples and oranges, they are Apples and a Pile of Steaming Dog Poo (PSDP).

This is because MVC was a stupid stupid idea for web applications on the server. MVC in the browser is right inline with the kinds of issues it was meant to solve.  The extent of the code on your server should be: 1) can you do this, and 2) here’s a collection object of what you asked for and 3) What you asked me to save has been saved.

The webserver should become nothing more than a database layer.

1 Comment

Parsing infix to postfix in C++11

Written by Jason on March 9, 2013 Categories: c++0x/c++11, Programming Language Design, Topics Tags: , , ,

Pursuant to my interest in programming language design, I have been giving a lot of thought to mathematical expressions. Making sense of a mathematical expression written by a human is difficult from an algorithmic point of view. It’s certainly not impossible.

Humans read and write mathematical expressions in a notation called infix1  however this requires sorting operators by precedence2, which humans are very good at, but computers not so much. To be able to do it, a computer has to first parse and identify all parts of a token stream until the end, and then group those parts, and then sort them based on the operator precedence. That’s an awful lot of work. There is however another type of notation[actually there are several.] called postfix3 which is very easy for a computer to operate on because each time it encounters an operator4 it simply collapses the last two numbers and replaces the operator with the answer. In this way the token stack reduces to a single number by a series of linear actions requiring no lookahead5 or knowledge of what is to come.

I decided today to implement a mini calculator language that takes an infix expression6 and returns the value after parsing it into postfix to operate on it.

Here is the code:

#include 
#include 
#include 
#include 
#include 
#include 
template
void puts(T x) {
    std::cout << x << std::endl;
}
template 
std::string ToString(T i) {
    std::string dest;
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
    return dest;
}

double ToNumber(std::string& s) {
    return atof(s.c_str());
}
bool IsNumber(std::string& s) {
    std::string dict = "1234567890.";
    int size = dict.size();
    for (int i = 0; i < s.size(); i++) {
        if (s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {             continue;         } else {             return false;         }     }     return true; } bool IsOperator(std::string& s) {     std::string dict = "+-*/^";     int size = dict.size();     if (s.size() > 1)
        return false;
    for (int i = 0; i < dict.size(); i++) {
        if (s[i] == ' ' || s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            return true;
            continue;
        } else {
            return false;
        }
    }
    return true;
}
template  
void PrintVector(T s) {
    for (typename T::iterator it = s.begin(); it != s.end(); ++it) {
        puts(*it);
    }
    puts("----");
}
int Priority(std::string & s) {
    if (s == "^") {
        return 3;
    } else if (s == "*" || s == "/") {
        return 2;
    }
    return 1;
}

static std::vector nums;
static std::vector tmp_ops;
int main(int argc, char* argv[]) {
    std::string expression;
    std::string token;
    while (true) {
        std::cout << "Enter infix: ";         std::getline(std::cin,expression);         if (expression == "q" || expression == "quit") {             break;         }         std::stringstream ss(expression);         while(std::getline(ss,token,' ')) {             if (IsNumber(token)) {                 nums.push_back(token);             } else if (IsOperator(token)) {                 if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            }  else {
                goto bye_bye;
            }
        } // end token parse
        while(!tmp_ops.empty()) {
            nums.push_back(tmp_ops.back());
            tmp_ops.pop_back();
        }
restart:
        //PrintVector(nums);
        for (std::vector::iterator it = nums.begin(); it != nums.end(); ++it) {
            if (IsOperator(*it)) {
                std::string op = *it;
                std::string lpart = *(it - 2);
                std::string rpart = *(it - 1);
                double l = ToNumber(lpart);;
                double r = ToNumber(rpart);
                double result;
                if (op == "+")
                    result = l + r;
                else if (op == "-")
                    result = l - r;
                else if (op == "*")
                    result = l * r;
                else if (op == "/")
                    result = l / r;
                else if (op == "^")
                    result = pow(l,r);
                std::string sresult;
                *it = ToString(result);
                nums.erase(it - 2,it);
                goto restart; // because we modified the vector...
            }
        }
        std::cout << "Answer: " << nums.back() << std::endl;
        nums.clear();
    } // end main loop
    bye_bye:
    std::cout << "Bye Bye" << std::endl;    
    return 0;
}

Most of this code is pretty self-explanitory, though I notice that my highlighter is messing up the template declarations, adding T=”" when it should just be T.

This program has an infinite loop that you can break out of by typing “q” or “quit”. You can type any simple mathematical expression and get it to return to you the value.

jason@berserker:~/work/examples/infix2postfix$ clang++ main.cpp -o in2po.bin --std=c++11
jason@berserker:~/work/examples/infix2postfix$ ./in2po.bin 
Enter infix: 3 + 4
Answer: 7
Enter infix: 5 + 3 * 4
Answer: 17
Enter infix: 4 + 3 ^ 3 - 1
Answer: 30
Enter infix: q
Bye Bye
jason@berserker:~/work/examples/infix2postfix$

It does this by looking at each token in the input:

while(std::getline(ss,token,' ')) {
            if (IsNumber(token)) {
                nums.push_back(token);
            } else if (IsOperator(token)) {
                if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            }  else {
                goto bye_bye;
            }
        } // end token parse

And testing to see if it is a number, if so it just pushes it onto the stack, in this case, we use a vector because it’s more malleable. I am sure there is a way to do it with stacks, and I tried, but it was more complicated in my opinion. Because sometimes you can get 3 numbers in a row, which meant I would need two stacks and it was just a mess.

All of the clauses lead to pushing the operator token onto the  tmp_ops stack, but the else if clause dumps the stack out to the nums stack. This only happens when we hit an operator of a lower status. The algorithm is surprisingly robust7.

jason@berserker:~/work/examples/infix2postfix$ ./in2po.bin 
Enter infix: 2 * 2 ^ 2 ^ 2
Answer: 32
Enter infix:

Exponents as a general rule have right affinity, that is they should collect from right to left, so 2 ^ 2 ^ 2 is equal to 2 ^ 4 i.e. 16. I hadn’t actually considered this when I wrote the algorithm, and tried it only a moment ago with some trepidation. Nevertheless it worked out.

After I did this, I asked myself “Wouldn’t it be cool if we could support some variables? That way I could type pi * r ^ 2.”

As it turns out, this was pretty simple to do.

#include <string>
#include <iostream>
#include <sstream>
#include <vector>
#include <unordered_map>
#include <stdlib.h>
#include <math.h>
template<class T>
void puts(T x) {
    std::cout << x << std::endl;
}
template <class T>
std::string ToString(T i) {
    std::string dest;
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
    return dest;
}

double ToNumber(std::string& s) {
    return atof(s.c_str());
}
bool IsNumber(std::string& s) {
    std::string dict = "1234567890.";
    int size = dict.size();
    for (int i = 0; i < s.size(); i++) {
        if (s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            continue;
        } else {
            return false;
        }
    }
    return true;
}
bool IsOperator(std::string& s) {
    std::string dict = "+-*/^";
    int size = dict.size();
    if (s.size() > 1)
        return false;
    for (int i = 0; i < dict.size(); i++) {
        if (s[i] == ' ' || s[i] == '\0') {
            continue;
        }
        unsigned int found = dict.find(s[i]);
        if (found < size) {
            return true;
            continue;
        } else {
            return false;
        }
    }
    return true;
}
template <class T> 
void PrintVector(T s) {
    for (typename T::iterator it = s.begin(); it != s.end(); ++it) {
        puts(*it);
    }
    puts("----");
}
int Priority(std::string & s) {
    if (s == "^") {
        return 3;
    } else if (s == "*" || s == "/") {
        return 2;
    }
    return 1;
}
void define(std::unordered_map<std::string,std::string> &vars,std::string& name, std::string& value) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
        hit->second = value;
   } else {
        std::pair<std::string,std::string> nv(name,value);
        vars.insert(nv);
   }
}
std::string retrieve(std::unordered_map<std::string,std::string> &vars,std::string& name) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
       return hit->second;
   }
   puts(name + " not found");
   return "";
}
static std::vector<std::string> nums;
static std::vector<std::string> tmp_ops;
static std::unordered_map<std::string,std::string> vars;
int main(int argc, char* argv[]) {
    std::string expression;
    std::string token;
    while (true) {
    begin:
        std::cout << "Enter infix: ";
        std::getline(std::cin,expression);
        if (expression == "q" || expression == "quit") {
            break;
        }
        std::stringstream ss(expression);
        while(std::getline(ss,token,' ')) {
            if (IsNumber(token)) {
                nums.push_back(token);
            } else if (IsOperator(token)) {
                if (!tmp_ops.empty() && (Priority(token) > Priority(tmp_ops.back()) ) ) {
                    tmp_ops.push_back(token);
                } else if (!tmp_ops.empty() && ( Priority(token) < Priority(tmp_ops.back()) ) ) {
                   while(!tmp_ops.empty()) {
                    nums.push_back(tmp_ops.back());
                    tmp_ops.pop_back();
                   }
                   tmp_ops.push_back(token);
                } else {
                    tmp_ops.push_back(token);
                }
            } else if (token == "set") {
                std::string value;
                std::getline(ss,token,' ');
                std::getline(ss,value,' ');
                define(vars,token,value);
                goto begin;
            } else {
                std::string val = retrieve(vars,token);
                nums.push_back(val);
            }
        } // end token parse
        while(!tmp_ops.empty()) {
            nums.push_back(tmp_ops.back());
            tmp_ops.pop_back();
        }
restart:
        //PrintVector(nums);
        for (std::vector<std::string>::iterator it = nums.begin(); it != nums.end(); ++it) {
            if (IsOperator(*it)) {
                std::string op = *it;
                std::string lpart = *(it - 2);
                std::string rpart = *(it - 1);
                double l = ToNumber(lpart);;
                double r = ToNumber(rpart);
                double result;
                if (op == "+")
                    result = l + r;
                else if (op == "-")
                    result = l - r;
                else if (op == "*")
                    result = l * r;
                else if (op == "/")
                    result = l / r;
                else if (op == "^")
                    result = pow(l,r);
                *it = ToString(result);
                nums.erase(it - 2,it);
                goto restart; // because we modified the vector...
            }
        }
        std::cout << "Answer: " << nums.back() << std::endl;
        nums.clear();
    } // end main loop
    std::cout << "Bye Bye" << std::endl;    
    return 0;
}

In this case we need two additional methods, define and retrieve which set a unordered_map with an string,string pair. Once we have that, in the token parsing, we consider any string that isn’t an operator or a number to be a variable.

That way we can write something like this:

jason@berserker:~/work/examples/infix2postfix$ clang++ with_vars.cpp -o withVars.bin --std=c++11
jason@berserker:~/work/examples/infix2postfix$ ./withVars.bin 
Enter infix: set π 3.14
Enter infix: set r 1.5
Enter infix: π * r ^ 2
Answer: 7.065
Enter infix: q
Bye Bye
jason@berserker:~/work/examples/infix2postfix$

Instead of just writing ‘pi’ we write the actual π symbol which makes it more fun.

void define(std::unordered_map<std::string,std::string> &vars,std::string& name, std::string& value) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
        hit->second = value;
   } else {
        std::pair<std::string,std::string> nv(name,value);
        vars.insert(nv);
   }
}
std::string retrieve(std::unordered_map<std::string,std::string> &vars,std::string& name) {
   auto hit = vars.find(name);
   if (hit != vars.end()) {
       return hit->second;
   }
   puts(name + " not found");
   return "";
}

As you can see, retrieve and define are very simple methods. We pass in a reference to a local copy of the map, and that is where variables are set. If they exist, then the value is updated.

At the time of token parsing, we’ve added an else if clause and rewritten the else clause.  At the point of else, instead of giving up like we used to, we just assume it’s a variable, and push that onto the nums stack.

             } else if (token == "set") {
                std::string value;
                std::getline(ss,token,' ');
                std::getline(ss,value,' ');
                define(vars,token,value);
                goto begin;
            } else {
                std::string val = retrieve(vars,token);
                nums.push_back(val);
            }

That was hardly any trouble at all. Obviously it’s not a full featured calculator language. It was just a bit of fun to see if I could do it. I kind of like the result. I may decide to add in support for parentheses at some point.

No Comments

Looking up variables based on a context.

Written by Jason on March 6, 2013 Categories: Programming Language Design, Topics

The general caveat for all these articles I am writing on Programming Language Design is: I don’t know. I am just messing around and trying to figure out how they work and how they can be implemented. The internals of most programming languages are scurry scurry1 places. When you read the source code for say the v8 engine you are struck by the apparent elegance of its design, it’s quite fun to read the source, and you can even abstractly understand what it going on. But an abstract understanding is not the same as a pragmatic understanding.

Pragmatic Understanding

To see if you have a pragmatic understanding of something, unplug your computer from the internet, then write an algorithm or program in your preferred language that implements that understanding, accepting a series of inputs and consistently producing the expected outputs. You can only use printed references that detail libraries, i.e. you can look up the details of some STL functions and the like. When you can do this, you have a pragmatic understanding.

It’s easy to describe the process of creating a programming language like this:

Well, first you scan for tokens, then you create a parse tree, then you navigate that parse tree and either a) emit byte codes or b) execute core functions.2

What's involved.

What’s involved.

Nebulous Bit

The Nebulous Bit® is not really a single thing, but a grouping  of things3. We can look at it a bit like an engine. In most programming languages you have the Byte Code Emitter(Fuel injector), The Virtual Machine(The Crank Shaft), The Object Store(Oil Pump), The Garbage Collector(Catalytic Converter). I don’t think it’s an accident that v8′s VM is called Crank Shaft.

Byte Code Emitter

Most modern programming languages (there might be a few exceptions with really simple languages like TCL or AWK, I don’t know those, MRI executes on the tree, but YARV doesn’t) are implemented as compilers. If someone says differently they are selling you a bill of goods. An interpreter, today, is little more than a really sophisticated dynamic compiler. It only compiles what you actually use in a program run, but more often than not, the “Fuel Injector” of the programming language’s “Engine” converts an AST4 into some intermediate language5.

For instance, Lua will “compile” your code to it’s intermediate language that might look something like this contrived and non-functioning example.

MOVE A B
FORLOOP A sBx
CALL A B C

I think, but do not know6, that it should be possible to even skip the AST stage entirely and go straight to emitting byte code. The codes that you emit7 are either a) for an intermediate language(regist/stack based) that is platform agnostic, i.e. like the byte code for the JVM, or code for LLVM, or b) directly into executable memory (v8 does this). If you are making a real compiler, you will probably emit something like Intel or At&t Assembly or some such thing.

You can execute on the tree, which is in and of itself a small amount of compilation, or at the very least, simplification. Creating your tree is less about organizing what the programmer said and more about organizing what the programmer meant. For instance, there is only one loop, sure languages provide you with various structures for looping that come with specific semantics but eventually those can all be implemented with a single kind of loop. It is usually the Parsers job to encapsulate the linguistic semantics and to produce a consistent and simplistic tree. Executing on the tree is considered very slow. Ruby was considered slow for this reason, MRI8 executed, apparently, directly on the tree. Later a virtual machine was added called YARV, which was faster than MRI but still slower than everything else. They keep claiming that Ruby is getting faster and I keep not noticing much of a difference. It’s the common wisdom that compiling to intermediate codes and executing that is better than direct execution on the tree, this is the common wisdom, but common wisdom is worth about as much as an asshole on your elbow.

Of course, when someone says this, most newbies have no idea what they mean. Essentially this means creating an array of byte code, here’s an example:

a = 1
print a

# Becomes:

 [
     create_variable a
     assign_value a 1
     push_value $R1 a
     call print
]

You might wonder, as I did, how exactly is this more efficient? It is because of two things, a) operations on some array or doubly linked list are very very fast. Moving forward is a simple matter of adding an offset, or incrementing a pointer. b) offsets in the array or list can be “cached” or saved, or references inside the code can be replaced with direct accesses. c) it’s easier to jump and return and keep track of where you jumped from so that you can restore the frame/stack/wtfe.

Essentially, this is mimicking, in a certain sense, how the computer itself runs commands. That is why it is called a Virtual Machine. Virtual Machines also simplify, to a degree, the implementation of a programming language because you break everything down into already well established machine codes, that have already proven capable of representing every kind of algorithm ever.

Algorithms for optimizing such simple languages are very good, very clever, and very well researched. So once you have the intermediate format, you can apply some general algorithms, or if you emit to a specific engine like LLVM it can run those for you, to optimize  the code. What kind of Virtual Machine you design will depend on what kinds of things you want to do, but only a little. All computer programs are already represented fully by the processor machine codes. So creating a subset of that is all you could do anyway. Considering the features of your language, you might only have to define a few dozen codes.

Generally you will need:

  1. SETSOMETHING
  2. GETSOMETHING
  3. CALLSOMETHING
  4. PUSHSOMETHING
  5. POPSOMETHING
  6. MOVESOMETHING
  7. LOOPSOME
  8. RETURNTOSOMEWHERE
  9. ADDSOMETHING
  10. SUBSOMETHING
  11. MULSOMETHING
  12. DIVSOMETHING
  13. POWSOMETHING
  14. COMPARESOMETHING
  15. JUMPSOMEWHERE

These will tie directly into your core implementation, what does it mean to SET a variable? To GET a variable? To CALL a function, to JUMP to an instruction and so on and so forth. That leads us to your:

Virtual Machine

A virtual machine receives some emitted codes and keeps track of them in some kind of list structure, for the sake of argument we’ll talk about it like it’s an array.

The Virtual Machine is just a Big API that glues together all of the parts inside the Nebulous Bit® of the language implementation. SET and GET will link to some kind of ObjectStore or Allocator that will provide non-terminal values for the operations of the machine. CALL may also link into core functions implemented deep in the internals of the programming language, or it will simply move the execution to some predetermined section of the list and start running codes from there until it hits an END or RESTORE, or RETURN code in which case it will go back to it’s original postion + 1 and continue.

The Virtual Machine will need to keep track of multiple restore points so that you can have a call stack and know where to return to when you are done.

Object Store, AKA the Context

Regardless of what language you are actually implementing, you will need some form of context. You could have one context, the Global Context, and leave it at that. But people will hate you and won’t use your language.

The idea of a context can be summed up like this: Where the fuck is x defined? I’ve got x, where the fuck do I put it? The answer is the Current Context. The Virtual Machine needs to keep track of the current context, as you enter a function, a new context needs to be created that is linked to the previous context.

When you enter almost any kind of block you end up creating some kind of new context. Even a for loop can or does create a new context, or scope. You’ll notice there are a lot of words to describe functionally the same things. On the higher levels they can and perhaps do have shades of meaning, or can if you want them to, but at the lowest level, you pretty much implement them the same kind of way.

You will need to create some kind of object(context, scope, receiver, object, whatever), and on that object there will be at least one bucket which has statefull data in the form of variables at least, possibly even functions. In Ruby, and probably other languages, you have something called the default receiver which is just a fancy way of saying the current context. You can specify a context by prefixing a def with the class name, like so:

class MyClass
  # here self is the receiver, i.e. MyClass
end
# Now we are back to the root context
def MyClass.someMethod
   # The above line causes the machine to swap the default
   # receiver to MyClass self is now MyClass again etc.
   # But we also entered a "Block" So now we have another
   # context which links back to MyClass
   x = 1
   # define(someMethod,x,1)
   @x = 1
   # define (MyClass,x,1)

end
x = 1
# Default receiver/context is the root, so it's equivalent to 
# Root.x = 1 or define(Root,x,1)

This may seem complicated, but essentially what we are doing is appending a new context to a context chain when we enter a block, and closing that context when we exit. You may notice that we can has[sic] two different kinds of contexts. One is closeable, or deletable in sofar as it only needs to exist while we are in the block and once we leave it, it can safely be deleted, even more it would be unsafe NOT to delete it. In the case of someMethod, that context should be deleted when we return from the method, but the context of MyClass should not be deleted.

It is my thinking that considering a context as a separate thing is really unnecessary. If you take the perspective that everything is an Object, or a Function (FunctionObject much?, much too much), then you can do something.

The description of an object suits the concept of a context perfectly: Something that encapsulates state (variables), behavior (functions) and can send and receive messages (Where the fuck is x?) to itself and other objects.

For fun today, I slapped together a little proof of concept for disposable contexts using clang++ and c++11 features. Here’s a trivial and naive implementation of a small language that allows you to define and retrieve variables and navigate into and out of contexts.

#include <string>
#include <iostream>
#include <sstream>
#include <unordered_map>
static std::string Undefined = "Undefined";

template <class T>
void ToString(T i, std::string& dest) {
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
}
template<class T>
void puts(T x) {
    std::cout << x << std::endl;
}
struct Receiver {
    Receiver * parent;
    Receiver * child;
    ~Receiver() {
        for (auto it = vars.begin(); it != vars.end(); ++it) {
            puts(it->second->value);
            delete it->second;
        }
    };
    std::string value;
    std::unordered_map<std::string,Receiver*> vars;
};

void define(Receiver* r,std::string& name, std::string& value) {
   auto hit = r->vars.find(name);
   if (hit != r->vars.end()) {
        hit->second->value = value;
   } else {
        auto n = new Receiver;
        n->value = value;
        std::pair<std::string,Receiver*> nr(name,n);
        r->vars.insert(nr);
   }
}
std::string& retrieve(Receiver * r, std::string& name, int level=1) {
   auto hit = r->vars.find(name);
   if (hit != r->vars.end()) {
       return hit->second->value;
   } else if (r->parent != NULL) {
       std::string lv;
       ToString(level,lv);
       lv = " !searching next level " + lv;
       puts(lv);
       return retrieve(r->parent,name,++level);
   } else {
      return Undefined;
   }
}
void next(Receiver *& default_receiver) {
    if (default_receiver->child != NULL) {
        default_receiver = default_receiver->child;
    } else {
        auto new_rec = new Receiver;
        new_rec->child = NULL;
        new_rec->parent = default_receiver;
        default_receiver->child = new_rec;
        default_receiver = new_rec;
    }
}
void previous(Receiver *& default_receiver) {
    if (default_receiver->parent != NULL) {
        auto c = default_receiver;
        default_receiver = default_receiver->parent;
        default_receiver->child = NULL;
        delete c;
    } else {
        puts("  !end of chain");
    }
}
void freechildren(Receiver *& r) {
    Receiver * c;
    while (r->child != NULL) {
        //puts("Freeing receiver");
        c = r->child;
        r->child = c->child;
        delete c;
    }
}

int main(int argc, char* argv[]) {
    std::string name;
    std::string value;
    Receiver* DejahThoris = new Receiver;
    DejahThoris->parent = NULL;
    DejahThoris->child = NULL;
    DejahThoris->value = "DejahThoris";
    auto default_receiver = DejahThoris;
    int state = 0;
    while (std::cin >> name) {
        if (name == "Q" || name == "q") {
            goto cleanup;
        } else if (name == "def") {
            state = 1;
            continue;
        } else if (name == "retr") {
            state = 2;
            continue;
        } else if (name == "next") {
            next(default_receiver);           
        } else if (name == "previous") {
           previous(default_receiver); 
        }
        switch(state) {
            case 1:
                std::cin >> value;
                define(default_receiver,name,value);
                state = 0;
                break;
            case 2:
                puts(retrieve(default_receiver,name));
                state = 0;
            default:
                break;
        }
    }
cleanup:
    freechildren(DejahThoris);
    delete DejahThoris;
    return 0;
}

Here is an example session:

jason@berserker:~/work/examples/buckets$ clang++ main.cpp -o hash.bin --std=c++0x
jason@berserker:~/work/examples/buckets$ ./hash.bin 
retr x
Undefined
def x 1
retr x
1
next
def x 2
retr x
2
next 
retr x
 !searching next level 1
2
q
2
1
jason@berserker:~/work/examples/buckets$

This sample code illustrates some core concepts, creating new scopes and linking them to their parent scope, recursively searching up the scopes until you hit the sentinel at which point we return a default Undefined. In a real language, we would throw an exception. It also illustrates occlusion.

proglang2

When we enter into Context B, we define z = 4 and this definition should occlude the one in the previous scope. Just as we did in the example, where we called next to create a new context and then defined x again with the value of 2. Then when we retrieved it, there was no need to search the parent contexts for the value.

So let’s do a little worse case scenario, here’s a variation on that code:

#include <string>
#include <iostream>
#include <sstream>
#include <unordered_map>
static std::string Undefined = "Undefined";

template <class T>
void ToString(T i, std::string& dest) {
    std::stringstream ss;
    ss << i;
    dest.assign(ss.str());
}
template<class T>
void puts(T x) {
    std::cout << x << std::endl;
}
struct Receiver {
    Receiver * parent;
    Receiver * child;
    ~Receiver() {
        for (auto it = vars.begin(); it != vars.end(); ++it) {
            //puts(it->second->value);
            delete it->second;
        }
    };
    std::string value;
    std::unordered_map<std::string,Receiver*> vars;
};

void define(Receiver* r,std::string& name, std::string& value) {
   auto hit = r->vars.find(name);
   if (hit != r->vars.end()) {
        hit->second->value = value;
   } else {
        auto n = new Receiver;
        n->value = value;
        std::pair<std::string,Receiver*> nr(name,n);
        r->vars.insert(nr);
   }
}
std::string& retrieve(Receiver * r, std::string& name, int level=1) {
   auto hit = r->vars.find(name);
   if (hit != r->vars.end()) {
       return hit->second->value;
   } else if (r->parent != NULL) {
       std::string lv;
       ToString(level,lv);
       lv = " !searching next level " + lv;
       //puts(lv);
       return retrieve(r->parent,name,++level);
   } else {
      return Undefined;
   }
}
void next(Receiver *& default_receiver) {
    if (default_receiver->child != NULL) {
        default_receiver = default_receiver->child;
    } else {
        auto new_rec = new Receiver;
        new_rec->child = NULL;
        new_rec->parent = default_receiver;
        default_receiver->child = new_rec;
        default_receiver = new_rec;
    }
}
void previous(Receiver *& default_receiver) {
    if (default_receiver->parent != NULL) {
        auto c = default_receiver;
        default_receiver = default_receiver->parent;
        default_receiver->child = NULL;
        delete c;
    } else {
        puts("  !end of chain");
    }
}
void freechildren(Receiver *& r) {
    Receiver * c;
    while (r->child != NULL) {
        //puts("Freeing receiver");
        c = r->child;
        r->child = c->child;
        delete c;
    }
}
void definesomevars(Receiver *r,int s) {
    std::string iv;
    std::string value;
    for (int i = 0; i < 20; i++) {
       value = "x";
       ToString(s+i,iv); 
       value = value + iv;
       define(r,value,iv);
    }
}
int main(int argc, char* argv[]) {
    std::string name;
    Receiver* DejahThoris = new Receiver;
    DejahThoris->parent = NULL;
    DejahThoris->child = NULL;
    DejahThoris->value = "DejahThoris";
    auto default_receiver = DejahThoris;
    for (int i = 0; i < 20000; i++) {
       next(default_receiver);
       definesomevars(default_receiver,i); 
    }
    name = "x0";
    puts(retrieve(default_receiver,name));

    freechildren(DejahThoris);
    delete DejahThoris;
    return 0;
}

Here we create 20,000 contexts, or receivers9 each context has 20 local variables, so that’s 400,000 variables defined. Let’s give it a run and see:

jason@berserker:~/work/examples/buckets$ clang++ prefill.cpp -o pre.bin --std=c++0x
jason@berserker:~/work/examples/buckets$ ./pre.bin 
0
jason@berserker:~/work/examples/buckets$ time ./pre.bin 
0

real    0m0.926s
user    0m0.856s
sys     0m0.056s
jason@berserker:~/work/examples/buckets$

Not too shabby, it can certainly be optimized, or re-designed in all manner of ways, right up to us manually allocating the contexts and having a class keep track of all the positions. We could implement some interesting caching schemes, but there is one important point.

You have to search at least once

No matter how you slice it, you will have to search for the definition of a variable or function at least once, even if for all successive calls you cache it. At some point during your program, you will have to search SOME kind of hash map to get a pointer to the location which you can then cache.

This was a serious point of confusion for me because it seemed, from the way people discussed it, that there was something evil about the searching, they would talk about ghost classes, and inline caching and all that jazz, but what they really mean is that once you do some work, i.e. searching for the variable, you should try your damndest to never have to do that work again.

There are no real ways to optimize a linear search for a variable definition, because at some point you will have to iterate over some structure and compare some kind of key to get at some kind of value. You cannot get better than O(n). You can find clever ways to make n smaller. Like separating things off into buckets or bins and using some clever mechanism like sorting all bins as the first letter of the variable name so that you only have to search those smaller buckets etc. Which would suck if some programmer started all of his variable names with ‘a’. you could sort them by the length of the variable name? The alphabet one is best, you could even create a 27 x 27 bin that uses the first and second letter of the variable name to decide the bucket it should be placed in. Why 27? The 27th bin is for weird/non alphabet character starts, like ‘_’ and such. What if the first and second character are ‘__’?

This kind of optimization would only make sense when you have lots and lots of variables. I think in the end, optimization routines should be selectable by the user, too few languages give users the ability to help the implementation figure out what was meant and what is needed. A lot of effort goes into complex heuristics that could probably be solved by asking the user to type a couple of extra letters.

Generally, how a language is implemented, and how a language pretends to be implemented, and what features it offers have more to do with Fads10 than with practicality.

What I wonder is this: Is it possible to create a programming language that allows the user to customize via code or options like command line switches the types of features she wants? Is it possible to run in imperative mode, object mode, functional mode? Maybe even logical mode? Is it possible to be able to extend the syntax from within the language, but also to extend the semantics? What about a language that can compile user code into modules then recompile itself and link against those modules? Would that be code that evolves? I am not simply talking about self rewriting code, but code where use by a user leads to recompilation automatically to optimize the language for their uses?

Most languages are implemented in such a way that the implementation is really written in stone, though I wonder if that is really necessary because it is intrinsic to language design, or that it is intrinsic to dogmatic theories of language implementation and optimization. Some food for thought.

No Comments