besides, an approximation method would require some way to scale down the numbers to 10 bits. Subtraction would not be acceptable as this would change the relationships between the sums of the minimum numbers, thus changing the tree:
Some numbers: 10 + 12 > 20 Subtract 8: (10 - 8) + (12 - 8) > (20 - 8) Now look: 2 + 3 > 12
Well we know 5 is not greater than 12. So you have to use division. But then you run into the same risk.
Take, for example, a strange file with the following characters and frequencies:
'a' -> 2^14 'b' -> 2^13 'c' -> 1 'd' -> 2 'e' -> 2^14 'f' -> 5
I don't feel like drawing trees. But approximating those numbers in 10 bits each definitely alters the tree. Well, should you approximate the smaller numbers, or the greater numbers? 'a', 'b', and 'e' require 14 or 15 bits to store. You could keep the least significant 10 bits of the numbers, and just set the ones larger than 2^10-1 to 2^10-1... so:
'a' -> 2^10-1 'b' -> 2^10-1 'c' -> 1 'd' -> 2 'e' -> 2^10-1 'f' -> 5
That definitely changes the tree. You could shift everything to the right 5 bits. Then set the things that turn to 0 equal to 1. But then:
'a' -> 2^9 'b' -> 2^8 'c' -> 1 'd' -> 1 'e' -> 2^9 'f' -> 1
Also changing the tree. Therefore approximations do not work. |