After constructing reading the HuffmanCharFrequencies, and constructing the huffman tree itself, it is recommended to go traverse the tree once, and store all the information from the tree in a simpler data structure.
I chose to store the values from the Huffman tree in a Vector.. my vector has 256 elements, one for each possible ASCII character. (Each element is initialized to null before the function which traverses the tree begins.)
I then call a function which begins at the root node of the Huffman tree, and traverses through it, adding the appropriate Bit value for each existing character/leaf in the tree.
For example:
(root) / \ a (branch) / \ c b
^ in the above tree, the code for 'a' is '0', 'c' is '10', and 'b' is '11'
as my traverse function is called, i have debugging statements throughout it
for the simple tree above, the following output occurs during runtime:
a traverse has occurred: added "0" at element 97 (representing 'a') to the code vector a traverse has occurred: added "10" at element 99 (representing 'c') to the code vector added "11" at element 98 (representing 'b') to the code vector
^^ this output occurs *as the traversal function is running* then, obviously, the traversal function ends, and the next part of my program is going to use the info in this Vector in order to compress (or decompress) the tree.
however, when I print out the vector *immediately after the function ends*, it contains the following: (Note: elements which are null are not printed out, only elements containing bitcodes)
The output is: " Element 97 (representing 'a') contains: 0 Element 98 (representing 'b') contains: 11 Element 100 (representing 'd') contains 10 "
^ notice, "10" was *not* placed in element 99 (c), even though the in traversal function, the code vector has "10" in its 99th element...
it seems to me, that between actually existing in the recurisions of the traverse method, and then being used at the next stage of the program, the data in this vector is completely altered in some way.
For larger test cases (3+ different characters), more characters are off. (If I have, say, abcde in my test file, the final output will say that my code vector has bit codes in elements 97, 98, then much higher numbers, obviously not the ones representing 'c', 'd', or 'e')
Does anyone know what is going on? I don't understand why this problem with the vector seeming to 'mutate' is going on..
Thanks in advance for any info, -Adam |