On to huffman...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • KWSW
    New Member
    • May 2007
    • 72

    On to huffman...

    Thanks to the help over at http://www.thescripts.com/forum/thread686347.html on the character count program, I can now move onto getting the huffman code done. Unfortunately I am still rather new to Java and would again need help on getting started.

    Not gonna ask for the whole code here cos its against the rules and its not gonna help me learn the language. I know that if I use C++, I can use the heap container which is present in the STL but for java is there something like that?

    Went to try to google "java heap container" but all I got was some out of memory error...

    Will still carry on searching and I hope someone will point me in the right direction.

    Thanks In Advance :)
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    Originally posted by KWSW
    Thanks to the help over at http://www.thescripts.com/forum/thread686347.html on the character count program, I can now move onto getting the huffman code done. Unfortunately I am still rather new to Java and would again need help on getting started.

    Not gonna ask for the whole code here cos its against the rules and its not gonna help me learn the language. I know that if I use C++, I can use the heap container which is present in the STL but for java is there something like that?

    Went to try to google "java heap container" but all I got was some out of memory error...

    Will still carry on searching and I hope someone will point me in the right direction.

    Thanks In Advance :)
    If you think of a Huffman tree there are two types of nodes: simple character
    nodes; they are the leaves of the tree, and 'internal' nodes which are the combined
    nodes of the Huffman tree.

    Both nodes can return their frequency: character (leaf) nodes simply return the
    frequence of the character they represent and the combined nodes return the
    sum of the frequencies of their child nodes.

    A node X is smaller than a node Y if X's freqency is smaller than Y's frequency.
    This makes those nodes ideal to store them in a SortedSet (TreeSet).
    The sorted set is the equivalent of your heap structure.

    Building a Huffman tree out of such a SortedSet is breeze: simply combine
    the two nodes with the lowest frequency, remove them from the set and add the
    newly created combined node to the set again. Keep going until there's only
    one node left in the set. That'll be the root of the Huffman tree.

    An abstract node class and two extending classes (character node and a
    combined node) are all that is needed.

    kind regards,

    Jos

    Comment

    • KWSW
      New Member
      • May 2007
      • 72

      #3
      Originally posted by JosAH
      If you think of a Huffman tree there are two types of nodes: simple character
      nodes; they are the leaves of the tree, and 'internal' nodes which are the combined
      nodes of the Huffman tree.

      Both nodes can return their frequency: character (leaf) nodes simply return the
      frequence of the character they represent and the combined nodes return the
      sum of the frequencies of their child nodes.

      A node X is smaller than a node Y if X's freqency is smaller than Y's frequency.
      This makes those nodes ideal to store them in a SortedSet (TreeSet).
      The sorted set is the equivalent of your heap structure.

      Building a Huffman tree out of such a SortedSet is breeze: simply combine
      the two nodes with the lowest frequency, remove them from the set and add the
      newly created combined node to the set again. Keep going until there's only
      one node left in the set. That'll be the root of the Huffman tree.

      An abstract node class and two extending classes (character node and a
      combined node) are all that is needed.

      kind regards,

      Jos
      Thanks for the explaination... will go and try it out... think I have a rough idea how now...

      Many thanks again :)

      Comment

      • KWSW
        New Member
        • May 2007
        • 72

        #4
        question about compareTo()

        so in this case, I would then need to use that to compare two nodes for sorting for the sortedSet.

        would it look something like this?

        Code:
        public int commpareTo(Object obj)
            {        
                node aNode = (node)obj;
                return this.getFeq() - aNode.getFeq();
            }
        sorry still in the planning stage thats why I dun have the full code yet...

        Comment

        • JosAH
          Recognized Expert MVP
          • Mar 2007
          • 11453

          #5
          Originally posted by KWSW
          question about compareTo()

          so in this case, I would then need to use that to compare two nodes for sorting for the sortedSet.

          would it look something like this?

          Code:
          public int commpareTo(Object obj)
              {        
                  node aNode = (node)obj;
                  return this.getFeq() - aNode.getFeq();
              }
          sorry still in the planning stage thats why I dun have the full code yet...
          No need to apologize; that compareTo method is about right but beware that
          you never want the return value of that method to be zero. You can get around
          that by comparing the huffman code for the nodes built so far when the
          frequencies compare equal. You'll bump into that later; keep my remark in mind ;-)

          kind regards,

          Jos

          Comment

          • KWSW
            New Member
            • May 2007
            • 72

            #6
            Originally posted by JosAH
            No need to apologize; that compareTo method is about right but beware that
            you never want the return value of that method to be zero. You can get around
            that by comparing the huffman code for the nodes built so far when the
            frequencies compare equal. You'll bump into that later; keep my remark in mind ;-)

            kind regards,

            Jos

            yup just ran into that problem. WIll try out the comparing of the huffman code length.

            Hmm... any advise on how I can set the code of each node?

            Comment

            • JosAH
              Recognized Expert MVP
              • Mar 2007
              • 11453

              #7
              Originally posted by KWSW
              yup just ran into that problem. WIll try out the comparing of the huffman code length.

              Hmm... any advise on how I can set the code of each node?
              Comparing on the code length won't help: two nodes can have the same frequency
              and the same code length; you have to compare on the code itself or some
              other artificial criteria; as long as no two nodes compare equal in a consistent way.

              The actual comparison doesn't matter much; when the nodes are taken out
              of the set to be combined, a new node is entered to the set afterwards and
              the two individual nodes never will occur in that set anymore by themselves.

              When you combine two nodes, prepend a 0 to the code of the left node and
              prepend a 1 to the code of the right node. You can append too, it depends on
              how you build up those codes and in what order you want to read the bits.

              In both cases you have to keep track of the length of the code (which equals
              the depth of the the node in the tree).

              kind regards,

              Jos

              Comment

              • KWSW
                New Member
                • May 2007
                • 72

                #8
                hmm I keep getting missing nodes... is it something to do with the compareTo() not being able to tell which is bigger?

                Code:
                public int compareTo(Object obj)
                    {        
                        Node aNode = (Node)obj;        
                        
                        if(this.getFeq() == aNode.getFeq())
                        {
                            return this.codeLen - aNode.codeLen;
                        }
                        else
                        {
                            return this.feq - aNode.feq;
                        }
                        
                    }
                my huffman algo
                Code:
                while (t.size() > 1)
                        {
                            Node left = (Node)t.first();
                            t.remove(left);
                            Node right = (Node)t.first();
                            t.remove(right);
                            
                            t.add(new Branch(left,right));
                            
                            count++;
                        }

                Comment

                • JosAH
                  Recognized Expert MVP
                  • Mar 2007
                  • 11453

                  #9
                  Originally posted by KWSW
                  hmm I keep getting missing nodes... is it something to do with the compareTo() not being able to tell which is bigger?

                  Code:
                      return this.codeLen - aNode.codeLen;
                  As I wrote above: don't compare the code lenghts; compare the actual codes
                  themselves. If the frequencies equal it is very well possible that the code lengths
                  compare equal as well; don't compare those lengths.

                  kind regards,

                  Jos

                  Comment

                  • KWSW
                    New Member
                    • May 2007
                    • 72

                    #10
                    Ok I have run out of ideas..
                    Here is my code

                    Code:
                    import java.util.*;
                    import java.io.*;
                    
                    abstract class Node implements Comparable
                    {        
                        // For CompareTo() to use
                        protected int x = 0;
                        
                    
                        // The code to represent the symbol
                        protected String code = "";
                        
                        // The fequency of the symbol that this node represents
                        protected int feq;    
                        
                        // Return fequency
                        public int getFeq()
                        {
                            return feq;
                        }
                        
                        public void increaseFeq()
                        {
                            feq++;
                        }                  
                        
                        // Constructor that takes in the fequency
                        public Node(int feq, int x)
                        {
                            this.feq = feq;
                            this.x = x;
                        }
                        
                        abstract protected void addCode(int b);
                        
                        
                        public int compareTo(Object obj)
                        {        
                            Node aNode = (Node)obj;                
                            
                            if(this.getFeq() == aNode.getFeq())
                            {            
                                return (this.x+this.feq+this.code.length()) - (aNode.x+this.feq+this.code.length());
                            }
                            else
                            {
                                return this.feq - aNode.feq;
                            }
                            
                        }
                            
                    }
                    
                    class Leaf extends Node
                    {
                        private int ascii;
                        
                        public void addCode(int i)
                        {
                           code += i;       
                        }
                        
                        public Leaf(int ascii, int feq)
                        {
                            super(feq,ascii);
                            this.ascii = ascii;        
                        }        
                        
                        public String toString() 
                        { 
                            return "\n["+ascii+":"+code+","+feq+","+code.length()+"]"; 
                        }
                    }
                    
                    class Branch extends Node
                    {
                        private Node left;
                        private Node right;
                    	
                    	public Branch(Node left, Node right) 
                            {
                    		super(left.getFeq()+right.getFeq(),left.getFeq()+right.getFeq());
                                    this.left = left;
                                    this.right = right;
                                    		
                                    left.addCode(0);		
                                    right.addCode(1);
                    	}
                    	
                    	protected void addCode(int i) 
                            { 
                                left.addCode(i); 
                                right.addCode(i); 
                            }
                     
                    	public String toString()
                            {
                                return "Total: "+ (left.getFeq()+right.getFeq())+ left + right;
                            }
                    }
                    
                    class Tree
                    {
                        Map<Integer,Integer> m = new HashMap<Integer,Integer>();
                        
                        public void add(int i)
                        {
                            int count;
                            
                            if(m.containsKey(i))
                            {
                                count = m.get(i);
                                count++;
                                m.put(i,count);
                            }
                            else
                            {
                                m.put(i,1);
                            }
                        }
                        
                        public void create()
                        {
                            ArrayList al = new ArrayList();                
                            
                            for(int i : m.keySet()) 
                            {            
                                Leaf l = new Leaf(i,m.get(i));
                                al.add(l);            
                            }                                                
                            
                            TreeSet t = new TreeSet();        
                            
                            for(int i = 0; i < al.size(); i++)
                            {
                                t.add(al.get(i));            
                            }
                            
                            do
                            {
                                Node left = (Node)t.first(); 
                                t.remove(left);
                                
                                Node right = (Node)t.first(); 
                                t.remove(right);
                    
                                t.add(new Branch(left, right));
                                
                                System.out.println(t.size());
                                
                            }while(t.size() > 1);
                            
                            System.out.println(t.first());
                            
                        }
                        
                    }
                    I tried adding everything I can think off to the compareTo()...

                    my input file just has "123456789"
                    but my output is just 3 nodes.... please help... been at it for about 4 hours...
                    TIA

                    Comment

                    • JosAH
                      Recognized Expert MVP
                      • Mar 2007
                      • 11453

                      #11
                      Your class hierarchy is (almost?) identical to mine, so I guess that's quite alright
                      (he said arrogantly ;-) but your compareTo() method still is incorrect, i.e. you're
                      still trying to compare something using the *length* of the code. That's not correct.
                      Here's my version (ripped straight from the code):

                      [code=java]
                      public int compareTo(Objec t obj) {
                      HuffmanNode that= (HuffmanNode)ob j;
                      if (this.freq == that.freq)
                      return this.code-that.code;

                      return this.freq-that.freq;
                      }
                      [/code]

                      the 'code' member represents the actual Huffman code constructed so far.
                      The 'freq' member is (as you can guess) the frequency count of the node.

                      kind regards,

                      Jos

                      Comment

                      • KWSW
                        New Member
                        • May 2007
                        • 72

                        #12
                        Originally posted by JosAH
                        Your class hierarchy is (almost?) identical to mine, so I guess that's quite alright
                        (he said arrogantly ;-) but your comparyTo() method still is incorrect, i.e. you're
                        still trying to compare something using the *length* of the code. That's not correct.
                        Here's my version (ripped straight from the code):

                        [code=java]
                        public int compareTo(Objec t obj) {
                        HuffmanNode that= (HuffmanNode)ob j;
                        if (this.freq == that.freq)
                        return this.code-that.code;

                        return this.freq-that.freq;
                        }
                        [/code]

                        the 'code' member represents the actual Huffman code constructed so far.
                        The 'freq' member is (as you can guess) the frequency count of the node.

                        kind regards,

                        Jos
                        Much appreciated... will give it a try... :)

                        Comment

                        • KWSW
                          New Member
                          • May 2007
                          • 72

                          #13
                          Originally posted by KWSW
                          Much appreciated... will give it a try... :)
                          hmm it seems that I cant do that as my code is a string type...

                          Comment

                          • JosAH
                            Recognized Expert MVP
                            • Mar 2007
                            • 11453

                            #14
                            Originally posted by KWSW
                            Much appreciated... will give it a try... :)
                            Actually it's not much of a problem: it's just that the SortedSet can't store
                            non-unique keys: it can't deal with different nodes having just equal frequencies,
                            and are thus considered equal.

                            If the frequencies of two disjunct nodes differ, everything is fine. Just in case
                            those freqencies are equal you have to define, what they call, your own PTO
                            (Partial Total Ordering). *Any* ordering will do as long as it is consistent.

                            Being the lazy bastard that I am I simply picked the 'code' (which is always
                            unique) member variable but you can also assign a unique sequence number
                            to every node you create: 0, 1, 2, 3, ... and compare those:

                            [code=java]
                            public int compareTo(Objec t obj) {
                            HuffmanNode that= (HuffmanNode)ob j;
                            if (this.freq == that.freq)
                            return this.seqno-that.seqno;
                            else
                            return this.freq-that.freq;
                            }
                            [/code]

                            This little trick (as well as my lazy little trick) makes *every* node unique so the
                            SortedSet can store all of them, i.e. there are no two nodes X and Y for which
                            X == Y holds. They're either different because of their different frequencies, and
                            if they're not, they're still different because of some other criteria such as sequence
                            numbers of because of what I did: compare the unique Huffman codes so far.

                            kind regards,

                            Jos

                            Comment

                            • JosAH
                              Recognized Expert MVP
                              • Mar 2007
                              • 11453

                              #15
                              Originally posted by KWSW
                              hmm it seems that I cant do that as my code is a string type...
                              Use your imagination:

                              [code=java]
                              ...
                              return this.code.compa reTo(that.code) ;
                              ...
                              [/code]

                              assuming that you're updating your 'code' variable on the fly while you're constructing
                              that Huffman tree.

                              kind regards,

                              Jos

                              Comment

                              Working...