hash increases after exists function

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Karel03
    New Member
    • Oct 2008
    • 2

    hash increases after exists function

    Hello All,

    I've run into a problem I am not able to solve myself because I don't know what perl exactly does when I try to use the exist function.

    My script does the following:
    first i use a database to build up a hash, this hash has has then around 800000 values divided over 2300 keys.
    Then I use a file with which I must examine whether a component exists in that hash. If the value exists it simply adds 1 to the number of times the value was found. A value can exist in combination with multiple keys.
    Now when I count the number of values before the counting and after the counting the computer comes up with different numbers which should, in my view, be impossible.

    A piece of my code looks like this:
    Code:
    $chrompos = $chromosome."_".$position;
    $test_loc=$position+$lengthreads-1;
    $test_pos = $chromosome."_".$test_loc;
    foreach $exon_id(keys %exon_hash){
    	if (exists $exon_hash{$exon_id}{$chrompos}){
    		if (exists $exon_hash{$exon_id}{$test_pos}){
    				for ($i=0;$i<$lengthreads;$i++){
    					$next_position=$position+$i;
    					$x= $chromosome."_".$next_position;
    					$exon_hash{$exon_id}{$x}{amount}++;
    				}
                      }
            }
    }
    when I print out a chromosomic location ($chrompos) the value is changed after the second exist function. This only happens in very rare cases but when it happens values gets added to my hash.

    Does anyone knows what goes wrong here and how to solve it?
    Thanks in advance.

    Regards
    Karel
    Last edited by numberwhun; Oct 20 '08, 12:48 PM. Reason: Please use code tags
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    Originally posted by Karel03
    Hello All,

    I've run into a problem I am not able to solve myself because I don't know what perl exactly does when I try to use the exist function.

    My script does the following:
    first i use a database to build up a hash, this hash has has then around 800000 values divided over 2300 keys.
    Then I use a file with which I must examine whether a component exists in that hash. If the value exists it simply adds 1 to the number of times the value was found. A value can exist in combination with multiple keys.
    Now when I count the number of values before the counting and after the counting the computer comes up with different numbers which should, in my view, be impossible.

    A piece of my code looks like this:
    Code:
    $chrompos = $chromosome."_".$position;
    $test_loc=$position+$lengthreads-1;
    $test_pos = $chromosome."_".$test_loc;
    foreach $exon_id(keys %exon_hash){
    	if (exists $exon_hash{$exon_id}{$chrompos}){
    		if (exists $exon_hash{$exon_id}{$test_pos}){
    				for ($i=0;$i<$lengthreads;$i++){
    					$next_position=$position+$i;
    					$x= $chromosome."_".$next_position;
    					$exon_hash{$exon_id}{$x}{amount}++;
    				}
                      }
            }
    }
    when I print out a chromosomic location ($chrompos) the value is changed after the second exist function. This only happens in very rare cases but when it happens values gets added to my hash.

    Does anyone knows what goes wrong here and how to solve it?
    Thanks in advance.

    Regards
    Karel
    Maybe you want to start this loop at 1 instead of 0:

    Code:
    for ($i=0;$i<$lengthreads;$i++){
    If you add 0 to $position the current value of $next_position is not changed so the value of $x is not changed and then you increment the value of:

    Code:
    $exon_hash{$exon_id}{$x}{amount}++;

    So it looks possible that you might count the above key twice for the same value. I could be totally wrong but try using 1 as the initial value instead of 0:

    Code:
    for ($i=1;$i<$lengthreads;$i++){

    Comment

    • Karel03
      New Member
      • Oct 2008
      • 2

      #3
      I tried what you suggested but it didn't solve the problem, still suddenly some extra entries in my hash got created.

      Any other suggestions?

      Regards
      Karel

      Comment

      • numberwhun
        Recognized Expert Moderator Specialist
        • May 2007
        • 3467

        #4
        Originally posted by Karel03
        I tried what you suggested but it didn't solve the problem, still suddenly some extra entries in my hash got created.

        Any other suggestions?

        Regards
        Karel

        Can you show what is in your hash and what was expected? Also, can you show your data source?

        Regards,

        Jeff

        Comment

        • KevinADC
          Recognized Expert Specialist
          • Jan 2007
          • 4092

          #5
          OK, lets look at these two lines:

          Code:
          if (exists $exon_hash{$exon_id}{$chrompos}){
                   if (exists $exon_hash{$exon_id}{$test_pos}){
          When you check for the existence of the key $chrompos in the first line, if the key $exon_id did not already exist it will spring into "life". Same in the next line. If the $test_pos key did not exist $exon_id will spring into life if it did not already exist. This is called autovivication. The only key that does not get autovivified is the deepest key ($chrompos and $test_pos in the this case).

          I don't know if that is the problem but I can't tell what the problem might be just by looking at the code you posted besides the two suggestions I have now given you.

          Comment

          • pawanrpandey
            New Member
            • Feb 2007
            • 14

            #6
            Here if we consider the top level 'for loop', autovivication should not arise:

            Code:
            foreach $exon_id(keys %exon_hash)
            {      
                  if (exists $exon_hash{$exon_id}{$chrompos})
                  {   
                           if (exists $exon_hash{$exon_id}{$test_pos}){

            Because from the %xeon_hash only those keys will be taken which 'exists' in the hash and then second line checks second level key after giving first level key which already exists...same to next 'exists' check....so I feel in this tight check mode where there is clear navigation from keys level 1,2,3....autovi vcation should not come in picture...

            Please correct me if my understanding is wrong..

            Regards,
            Pawan

            Comment

            • KevinADC
              Recognized Expert Specialist
              • Jan 2007
              • 4092

              #7
              Originally posted by pawanrpandey
              Here if we consider the top level 'for loop', autovivication should not arise:

              Code:
              foreach $exon_id(keys %exon_hash)
              {      
                    if (exists $exon_hash{$exon_id}{$chrompos})
                    {   
                             if (exists $exon_hash{$exon_id}{$test_pos}){

              Because from the %xeon_hash only those keys will be taken which 'exists' in the hash and then second line checks second level key after giving first level key which already exists...same to next 'exists' check....so I feel in this tight check mode where there is clear navigation from keys level 1,2,3....autovi vcation should not come in picture...

              Please correct me if my understanding is wrong..

              Regards,
              Pawan

              After reading your clear explanation above, I agree with you. Autovivcation should not be occuring as the loop advances from level one key to level two key. As long as a level is not skipped everything should be OK.

              Good observation Pawan,
              Kevin

              Comment

              Working...