This discussion really belongs in the other forum now, but anyway...
Whatever gave you the idea that you had to store the "actual" character frequencies? What about storing approximations that give you the same tree? You can also skip storing the character itself since you are storing an array. There are several other optimizations I can think of that would get it very close to the 10 bits per character that recursively storing it does, if not below. Also, think about the true entropy in the trees we have - a complete binary tree is very constrained in its structure. If we know how many nodes it has total, or how many leafs it has total, how many more bits do we need to store to be able to reconstruct it? Since ultimately we can do some tinkering with the tree before we finalize it in compress(), what could we do to it to make it easier to store? Without changing its optimality (sp?) we could easily swap children in a single node... There are a lot of questions we need to answer before we know whether the recursive solution is best. I suspect if it isn't, it's close enough to not matter much.
I did, however, do it the same way you did, mainly because at the time the elegance of the solution rather than the compression it gets was my primary concern. |