Using sscanf() to parse a buffer string containing multiple fixed-length sub-strings

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • IgorXX
    New Member
    • May 2013
    • 5

    Using sscanf() to parse a buffer string containing multiple fixed-length sub-strings

    I used the %Width[^]s format specifier, in which "Width" specifies the maximum number of characters to be read for the value of the associated variable. It does not appear to work properly or it is incorrect.


    Code:
    char inbuf[128] = "\0";            //input string just read from infile
    char obs_sta[32] = "\0";           //name of observation station
    char sky_wx[16] = "\0";            //sky and weather conditions
    char tmp[8] = "\0";                //dry bulb temperature (?F)
    char dp[8] = "\0";                 //dew point temperature (?F)
    char rh[8] = "\0";                 //relative humidity (%)
    char wind[16] = "\0";              //wind speed/direction/gust_speed
    char pres[16] = "\0";              //barometric pressure (in Hg)
    char *rise_fall = '\0';            //pressure rising (R) or falling (F)
                                       //indicatior
    char remarks[16] = "\0";
    
    //read a record from text file
    fgets (inbuf, sizeof(inbuf), fp1);
    
    //Five examples of fixed-length fields record layout:
    //CITY           SKY/WX    TMP DP  RH WIND      PRES   REMARKS
    //ELLINGTON FLD  PTSUNNY   90  75  62 SW10G18   29.94F HAZE    HX 100
    //*ZAPATA        SUNNY     99  61  28 SE13G20   29.74F HX 100
    //FORT STOCKTON  SUNNY    102 -17   1 SW24G30   29.78F HX 93
    //GUYMON         NOT AVBL
    //SANTA FE       SUNNY     81   7   6 VRB6G23   30.02F SMOKE   HX 75
    //field lengths are as in the sscanf format statement below
    //The CITY field is 15 characters long, the SKY/WX field is 8 characters long;
    //and the REMARKS field contains those remaining characters to the newline.
    //Each of these fields can contain a string with embedded spaces.
    //inbuf correctly contains the entire record
    
    //parse the record into its components (all treated as non-numeric values)
    sscanf(inbuf, "%15[^]s%8[^]s%4s%4s%4s%10s%6s%c%[^]s", 
           obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
    
    
    //print a parsed line
    printf("full line:   [%15s][%8s][%4s][%4s][%4s][%10s][%6s][%c][%s]\n",
            obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
    
    //Five examples of parsed output:
    //[ELLINGTON FLD  ][        ][    ][    ][    ][          ][      ][ ][]
    //[*ZAPATA        ][        ][    ][    ][    ][          ][      ][ ][]
    //[FORT STOCKTON  ][        ][    ][    ][    ][          ][      ][ ][]
    //[GUYMON         ][        ][    ][    ][    ][          ][      ][ ][]
    //[SANTA FE       ][        ][    ][    ][    ][          ][      ][ ][]
    Last edited by Rabbit; May 30 '13, 10:12 PM. Reason: Please use code tags when posting code.
  • Oralloy
    Recognized Expert Contributor
    • Jun 2010
    • 988

    #2
    IgorXX,

    Try changing your format a little bit, I think it's trying to find a really screwball set of strings, which is not what you want....
    Code:
    sscanf(inbuf, "%15[^]%8[^]%4s%4s%4s%10s%6s%c%[^]", 
           obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
    It looked like you had extraneous "s" characters after the "scanset" elements. These "s" characters would have to be explicitly matched, and they aren't in your input.

    Luck!
    Oralloy
    Last edited by Oralloy; May 31 '13, 04:15 PM. Reason: Fix typo in CODE block

    Comment

    • IgorXX
      New Member
      • May 2013
      • 5

      #3
      Oralloy,

      Thanks for your help. I must really be missing something. Have worked the format string down to

      "%15[^]%8[^]%8[^]%4[^]%4s%6s%16s%8c%[^]"

      with the folowing results:

      [ELLINGTON FLD ][PTSUNNY ][90][75][62][ SW10G18][][ ][]
      [*ZAPATA ][SUNNY ][99][61][28][ SE13G20][][ ][]
      [FORT STOCKTON ][SUNNY ][102][-17][1][ SW24G30][][ ][]
      [GUYMON ][NOT AVBL][81][7][6][ VRB6G23][][ ][]
      [SANTA FE ][SUNNY ][81][7][6][ VRB6G23][][ ][]

      but can advance no further. The 3rd, 6th, 7th, and 8th "Width" specifiers are totally screwball. The 3rd one was rigged to get "NOT" and "AVBL" to be treated as one string; the rest followed.
      P.S. These data were clipboarded into Notepad from the NOAA NWS hourly weather roundup for a particular state. E.g., http://www.nws.noaa.gov/view/prodsBy...rodtype=hourly for Wyoming.

      IgorXX

      Comment

      • Oralloy
        Recognized Expert Contributor
        • Jun 2010
        • 988

        #4
        Hey IgorXX,

        All of your "%s" formats are still variable-width, blank terminated. I expect that this is having catestrophic consequences on your parsing.

        Let's try one of the following format options, instead:
        Code:
        //INPUT EXAMPLE for WIDTH CHECKING
        //....1..../....2..../....3..../....4..../....5..../....6..../....7..../
        //....15........|....8..|.4.|.4.|.4.|...10....|...6.||.................
        //ELLINGTON FLD  PTSUNNY   90  75  62 SW10G18   29.94F HAZE    HX 100
        
        //Using your format methodology
        // NUL terminates all output strings.
        // Last string is of variable length to end of input string.
        // Added error check
        int count = sscanf(inbuf, "%15[^]%8[^]%4[^]%4[^]%4[^]%10[^]%6[^]%c%[^]", 
                           obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
        if (9 != count)
          printf("ERROR: failed to parse input correctly, count = %d\n", count);
        
        //Alternately you might try this format
        // Does NOT insert NUL byte at end of each output string, except for remarks.
        // Last string is of variable length to end of input string.
        // Added error check
        sscanf(inbuf, "%15c%8c%4c%4c%4c%10c%6c%1c%[^]",
               obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
        if (9 != count)
          printf("ERROR: failed to parse input correctly, count = %d\n", count);
        Observe that I also included error checking. The return value of sscanf can be very illuminating, when there are problems.

        Good Luck!
        Oralloy

        Comment

        • IgorXX
          New Member
          • May 2013
          • 5

          #5
          Oralloy

          Thanks again. At your suggestion, I used error checking. The output lines below are from a debug display using printf("[%s][%s][%s][%s][%s][%s][%s][%c][%s]\n" . The brackets clearly show what is assigned to each variable.

          Your suggested format string "%15c%8c%4c%4c% 4c%10c%6c%1c%[^]" works better but gives:
          [ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][ ][]
          [*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][ ][]
          [FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][ ][]
          [SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][ ][]
          Note: I have eliminated the GUYMON observation station line because it is a distraction to the immediate problem. Still unable to pick up the single character (field #8) and the remarks field (#9).
          Tried the format string "%15c%8c%4c%4c% 4c%10c%6c%2c%[^]" and oddly got:
          [ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][ ][HAZE]
          [*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][ ][HX]
          [FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][ ][HX]
          [SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][ ][SMOKE]
          The field #8 Width specifier matters not for that field. No Width specifier for field #9 changes things. It is clear to me that the library implementation of the format string in sscanf() does not act as [I think] intended. Othen than my initial ignorance of the proper use of "%[^]", I doubt I would have had problems with BSD or AT&T implementations . Maybe I should tilt at the fread() windmill.... :)

          Alas I rue having access to a UNIX machine where nawk or sed, egrep and cut would make short work of the entire file. On the modern variation on the original proverb below, attributed to the British playwright Ben Jonson in his 1598 play, "Every Man in His Humour", first performed by William Shakespeare: "Curiosity killed the cat, satisfaction brought it back." Well, curiosity was framed; ignorance killed the cat. And the same is true for programmers and software engineers.

          IgorXX

          Comment

          • Oralloy
            Recognized Expert Contributor
            • Jun 2010
            • 988

            #6
            IgorXX,

            Use not thyne ancient tales of cats on me, for I shall hear them and laugh.

            This all harkens back to the days of FORTRAN card-based input, with fixed length fields in the input records. Were I truly evil, I would suggest that we do the I/O in FORTRAN, but that would be a bit over the top, don't you think?

            I/O is the pits in any project. I just hate it when something is way overboard like this is.

            From the looks of it, the last group of formats that I suggested may be off by a character or two. You know your input field widths, so you can double check that.

            Did you try the format that I suggested in line 10 of my previous post?

            Code:
            //Using your format methodology
             7. // NUL terminates all output strings.
             8. // Last string is of variable length to end of input string.
             9. // Added error check
             10. int count = sscanf(inbuf, "%15[^]%8[^]%4[^]%4[^]%4[^]%10[^]%6[^]%c%[^]", 
             11.                    obs_sta, sky_wx, tmp, dp, rh, wind, pres, rise_fall, remarks);
             12. if (9 != count)
             13.   printf("ERROR: failed to parse input correctly, count = %d\n", count);

            Comment

            • IgorXX
              New Member
              • May 2013
              • 5

              #7
              Oralloy,

              I will have to wait until maybe Wednesday before working on this nay more as Micsoroft has been trying to clear up my machine (Trojans and a trashed regisrty file among other things). Their server was up and down today so they couldn't finish. Now must "wait til the morrow". The mention of FORTRAN brings back memories to an old, toothless programmer. It was the first language I learned, even before proper English. Still have the old MS PowerStation Development System for Windows and MS-DOS ver 1.0 from 1993 (supporting Fortran77). Haven't tried to start it up, so don't know if it would even run on a more-recent version of Windows.

              Not sure if I should pursue the sscanf() any further as I feel I have exhausted reasonable format specifiers (string, character, and width), including many variations on your suggestion. Sometimes in the blissful glow of ignorance, a meat axe is the only recourse. Stir-fry kitten anyone?

              IgorXX

              Comment

              • IgorXX
                New Member
                • May 2013
                • 5

                #8
                Oralloy

                Notes on my last reply:
                1. Much earlier in our conversation, I had incorrectly defined rise_fall as
                char *rise_fall = '\0';
                It should have been
                char rise_fall = '\0';
                2. My last sentence beginning with "Alas I rue having" should have read "Alas I rue not having".

                Having abruptly awoken at 2 AM with drool running down the side of my mouth by a Bronco "Chicken Gristle Grinder" info-mercial on TV, I remembered that a char is like an unsigned int (and all numeric values) and when used in the scanf() family of functions, requires its address. I.e., &rise_fall, not the value stored in rise_fall. A string name, on the other hand, stores the address of the string array's first element, so the "address of" indicator '&', is not used. Now, if that isn't the cow's "Mooo".

                C is a beguiling, rigid and unforgiving sort, always allowing one to have their way with her. And if one strays too far from her, there will be consequences. Consequences, indeed.

                char rise_fall = '\0'; //pressure rising (R) or falling (F) indicatior

                count = sscanf(inbuf, "%15c%8c%4c%4c% 4c%10c%6c%c%[^]",
                obs_sta, sky_wx, tmp, dp, rh, wind, pres, &rise_fall, remarks);
                [ELLINGTON FLD ][PTSUNNY ][ 90][ 75][ 62][ SW10G18 ][ 29.94][F][]
                ERROR: failed to parse input correctly, count = 8
                [FORT STOCKTON ][SUNNY ][ 102][ -17][ 1][ SW24G30 ][ 29.78][F][]
                ERROR: failed to parse input correctly, count = 8
                [*ZAPATA ][SUNNY ][ 99][ 61][ 28][ SE13G20 ][ 29.74][F][]
                ERROR: failed to parse input correctly, count = 8
                [SANTA FE ][SUNNY ][ 81][ 7][ 6][ VRB6G23 ][ 30.02][F][]
                ERROR: failed to parse input correctly, count = 8

                Now how to capture that last field....

                IgorXX

                Comment

                Working...