From Zero to App in ? Days

This is the first of several posts concerning the research & development of a small app concept that I’ve decided to make. The need for this app arose out of a conversation with the development manager at my current place of work. The work this particular company is involved with is maintenance work for a monolithic VB6/VB.NET application (in-memory size of over 700 MB). With a high turnover rate, the current situation at that company is that no one has a full sense of the application, and everyone is essentially paralyzed by lack of information. I was suggesting that with a wiki, we could begin to build a knowledge-base that would support bringing people up-to-speed quickly.

My manager raised the very real issue of wiki markup and document interaction being a significant barrier to entry compared to putting documentation in Word documents. At this company, screenshots form a significant portion of the needed documentation for our app and the interaction with it. In a word document, or in an Outlook email, you can simply paste the screenshot right into the document you are working on. Of course, this is not possible with current wikis, because editing of wikis is done in plaintext. To add an image to an article, you might do something like this from the Fractal entry on wikipedia:

[[Image:Mandel zoom 00 mandelbrot set.jpg|300px|right|thumb|The [[Mandelbrot set]] is a famous example of a ”’fractal”’.]]

When you add this image link, you can upload an image from a form accessible by clicking the link. The problem with this is it takes several steps to put a screenshot in.

For Word:

  1. Browse in the document to where you want the image
  2. Get the screenshot (Alt+PrintScreen or just PrintScreen)
  3. Bring the document into focus (click on it from the start bar, or Alt+Tab)
  4. Paste the image

For the Wiki:

  1. Make the image stub on the wiki
  2. Get the screenshot
  3. Open a picture editor capable of saving images in web format (png, jpg, gif)
  4. Paste the image
  5. Save the image to a temporary location
  6. Click the image stub to access the upload form
  7. Click the upload button to bring up the filesystem browse
  8. Upload the image

In addition to taking longer, the wiki version requires you to do two namings of middle products: one on the wiki when you name the image stub, and one on the filesystem when you save the screenshot to a temporary location. The wiki naming is good, because it supports versioning and having multiple wiki pages point to the same image, but the local naming is entirely un-necessary.

It’s easy to see that this takes significantly longer. My goal with this small app is to make an app that sits in the system tray and allows ‘pasting’ to the wiki, so that putting an image in the wiki is as simple as the following workflow:

  1. Get the screenshot
  2. Click on the system tray icon for the app, bringing up a window
  3. Paste, which brings up the image, handles putting it into web format, resampling to web-reasonable sizes, and allows cropping
  4. Either:
    • Paste New, creating a name for the image on the wiki
    • Paste Into, selecting the name from the available names on the wiki

Mingw Ocaml on Vista

So, earlier this past summer two friends and I used the ICFP contest as an opportunity to try to learn ocaml. For ease of installation, we just used ocaml under linux, but now I’m trying to set up a working environment on my new desktop, running Vista (oh no!)…

First off was installing MinGW/MSYS. This install worked pretty well. Next was downloading the Ocaml distribution for MinGW and installing it.

Then I tried to install findlib and extlib, two libraries/utilities we found valuable during the week of the competition, and consequently two things our makefile depends on…

This was where the problems started. First, findlib’s configure script wouldn’t work because ‘dirname’ kept on erroring out. The problem turned out to be a space in my install location (C:/Languages/Objective Caml). So, I reinstalled it to a more sane place (C:/MinGW/lang/ocaml). After fixing this, the configure script got farther, but died with the following error:

Detecting compiler arguments: FAILED

Upon some googling and inspection, I found that more info on the error got dumped into an ‘ocargs.log’ file in the same directory as the configure script. Inspecting this yielded the following error:

gcc: installation problem, cannot exec `cc1': 
No such file or directory

This, as it turns out, is a MinGW installation error on Vista. I found the fix courtesy of this message. I added the following line at the end of my /etc/profile:

export PATH=/mingw/libexec/gcc/mingw32/3.4.5:$PATH

This time, the configure script worked, and findlib was installed… On to extlib! A simple make, make opt, then make install seemed to work reasonably well.

Now, our makefile worked most of the way, creating the non-optimized version just fine, but still with an error on the optimized version:

ocamlc -o dna2rna -I `ocamlfind query extlib`  -g  extLib.cma
bigarray.cma unix.cma util.cmo genetic.cmo baseString.cmo 
matchreplace.cmo dnaLexer.cmo main.cmo dna2rnaprog.ml
ocamlopt -o dna2rna-opt -I `ocamlfind query extlib`  -p  
extLib.cmxa bigarray.cmxa unix.cmxa util.cmx genetic.cmx 
baseString.cmx matchreplace.cmx dnaLexer.cmx main.cmx 
dna2rnaprog.ml
Cannot find file std_exit.p.cmx

So, where was the problem? This message would seem to indicate that it was in trying to compile with profiling info on. So, for now I drop the -p, later, try to find a way to profile under mingw.

Dropping the -p got a little further. Now the error message is this:

ocamlopt -o dna2rna-opt -I `ocamlfind query extlib`    
extLib.cmxa bigarray.cmxa unix.cmxa util.cmx genetic.cmx 
baseString.cmx matchreplace.cmx dnaLexer.cmx main.cmx 
dna2rnaprog.ml
gcc: @C:/Users/user/AppData/Local/Temp\camlrespfefe77: 
Invalid argument
Error during linking
make: *** [dna2rnap] Error 2

A linker error on the optimized version… first, let’s see whether any optimized program will compile by making a test program. Here’s the code, taken from the Ocaml Tutorial example grtest1:

(* File: grtest1.ml *)

open Graphics;;

open_graph " 640x480";;
for i = 12 downto 1 do
  let radius = i * 20 in
  set_color (if (i mod 2) = 0 then red else yellow);
  fill_circle 320 240 radius
done;;
read_line ();;

Compiling this with ocamlc and ocamlopt yielded results that were non-deterministic. The code above should make a graphic of concentric yellow and red rings. Example results are shown below:

Example A Example B Example C Example D

As is easily seen, there’s clearly nondeterminacy going on here… For those who might have a clue why this is happening, I’m running 32-bit Vista on a Intel Core 2 Duo @ 2.66 GHz.

Well, I’m out for the time being. Still not sure why I’m encountering the linker error when building the optimized version…

Recursive generators, multisets, and Randall Munroe

Generators are an interesting language feature, somewhat related to the concept of lazy evaluation. Essentially, they are a way to lazily generate a sequence of values using a persistent local function state. Generator functions ‘pause’ by yielding a value, and ‘resume’ at the next line when another value is needed.

The classic example of a generator in Python is the following:

def fib():
    a, b = 0, 1
    while 1:
        yield b
        a, b = b, a+b

This fibonacci sequence generator is an example of an infinite lazy sequence. If we used this in a loop, the iterator would never terminate, so we would have to break out of the loop. To make this infinite sequence terminate, let’s wrap it in another iterator:

def fibs_up_to(n):
    for f in fib():
        if f > n:
            break
        else:
            yield f

When the infinite fib() generator yields a value greater than our threshold, the fibs_up_to() generator breaks out of the loop, and reaches the end of the function. When evaluation reaches the function bottom, the generator terminates itself and any enclosing ‘for’ loop.

In the following snip from the python shell, we see that the next fibonacci number (610) caused our wrapper iteration to break out of its loop and terminate the outer print loop.

>>> for f in fibs_up_to(500):
...     print '%d ' % (f,),
...
1  1  2  3  5  8  13  21  34  55  89  144  233  377

Now, with the background info out of the way, let’s dive into the fun part: recursive generators! The first time I noticed this pattern was while working on a fun problem Randall Munroe posted on his xkcd webcomic:

image of comic

One (inefficient) way of attacking this problem is to generate all sequences of orders, and test each to see whether it totals the target price. As it turns out, generating all sequences of a set is surprisingly easy using a recursive generator:

def all_seqs(size, alphabet):
    if size == 1:
        for c in alphabet:
            yield [c]
    else:
        for c in alphabet:
            for s in all_seqs(size - 1, alphabet):
                yield [c]+s

>>> for s in all_seqs(3, ‘ABC’):
…     print s
…
['A', 'A', 'A']
['A', 'A', 'B']
['A', 'A', 'C']
['A', 'B', 'A']
['A', 'B', 'B']
['A', 'B', 'C']
['A', 'C', 'A']
['A', 'C', 'B']
['A', 'C', 'C']
['B', 'A', 'A']

The generator starts with the terminal case: if the size of the sequence is one, simply yield each option element, one at a time. The recursive case is only slightly more complicated: prepend each option element to all of the sequences one smaller than the current size. Recursively calling the generator allows this to be represented very simply, without needing to waste memory on creating a complete list of all the one-smaller sequences.

While interesting, this approach has a problem when applied to the xkcd problem: it generates duplicates…

[('Fruit', 215), ('Wings', 355), 
    ('Wings', 355), ('Sampler', 580)]
[('Fruit', 215), ('Wings', 355), 
    ('Sampler', 580), ('Wings', 355)]
[('Fruit', 215), ('Sampler', 580), 
    ('Wings', 355), ('Wings', 355)]
…

These repeated results (12 in all) are all the permutations of the multiset with 1 Mixed Fruit, 2 Hot Wings, and 1 Sampler. So, what we’re really interested in is a generator that will yield successive multisets, thus making sure we don’t waste time evaluating multiple equivalent orders.

Like most abstract data structures, a multiset has several different concrete representations. To investigate this, we will look at how to represent several multisets taken from the set {A, B, C} in three different forms:

Element List Item Counts Count List
[A,A,A,C] {A:3, C:1} [3,0,1]
[A,B,B,C] {A:1, B:2, C:1} [1,2,1]
[B,B,B,C,C,C] {B:3, C:3} [0,3,3]

The item counts and count list both represent sets with repeated elements, related in the same way that sparse and dense matrices are. For sparser multisets, the item counts are better, but for denser sets, the count list saves a little space by not carrying a reference to the element around with each count.

First, we’ll take a look at how to generate the simple element list with a recursive generator. Here’s the code:

def all_multisets(cardinality, options):
    def all_multi_rec(card, ind):
        if card==0:
            yield []
        else:
            for i in xrange(ind, len(options)):
                for m in all_multi_rec(card-1,i):
                    m.append(options[i])
                    yield m
    for m in all_multi_rec(cardinality, 0):
        yield m

This is slightly more complicated than generating sequences, but still fairly simple. The recursive generator is the inner function ‘all_multi_rec’. The parameters to the outer function are the desired multiset cardinality (number of items in the multiset), and the options from which to take values. The recursion hinges on maintaining an index which is used to limit the already-used items from being re-included. in the recursive path, the ‘else’ in the above code, only items at the current index and beyond are added.

How close are we to being able to write a simple solution now? This close:

menu = [('Fruit', 215), ('Fries', 275),
        ('Salad', 335), ('Wings', 355),
        ('Sticks', 420), ('Sampler', 580)]

for c in xrange(1,8):
    for m in all_multisets(c, menu):
        if sum(x[1] for x in m) == 1505:
            print m

…
…
[('Sampler', 580), ('Wings', 355), 
    ('Wings', 355), ('Fruit', 215)]
[('Fruit', 215), ('Fruit', 215), 
    ('Fruit', 215), ('Fruit', 215), 
    ('Fruit', 215), ('Fruit', 215), 
    ('Fruit', 215)]

Admittedly, hard-coding the cardinality limits is clunky… maybe more on that later. For something more to chew on, here are versions for the other two representations of multisets:

def all_multiset_counts(cardinality, length):
    def all_multi_rec(card, ind):
        if card == 0:
            yield [0,]*length
        else:
            for i in range(ind, length):
                for m in all_multi_rec(card-1, i):
                    m[i] += 1
                    yield m
    for m in all_multi_rec(cardinality, 0):
        yield m

def all_multisets_dict(cardinality, options):
    def all_multi_rec(card, ind):
        if card==0:
            yield {}
        else:
            for i in xrange(ind, len(options)):
                for m in all_multi_rec(card-1, i):
                    item = options[i]
                    if m.has_key(item):
                        m[item] += 1
                    else:
                        m[item] = 1
                    yield m
    for m in all_multi_rec(cardinality, 0):
        yield m

Thoughts anyone? Hope you enjoyed,

-DH