taking a "word" as input

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • arnuld

    taking a "word" as input

    C takes input character by character. I did not find any Standard Library
    function that can take a word as input. So I want to write one of my own
    to be used with "Self Referential Structures" of section 6.5 of K&R2. K&R2
    has their own version of <getwordwhich , I think, is quite different
    from what I need:

    <getwordwill have following properties:


    1.) If the word contains any number like "beauty1" or "win2e" it will
    discard it, K&R2's <getworddoes not. My <getwordwill only take
    pure-words like "beauty", "wine" etc.


    2.) we can store each word by using <array of pointerspointin g to those
    words and since words themselves are strings, which in
    reality, are <arrays of chars>, so we will have <array of pointersto
    those <arrays of chars>.


    or you think using a 2D array is a better idea ?



    --

    my email ID is at the above address

  • Nick Keighley

    #2
    Re: taking a &quot;word&quot ; as input

    On 29 Apr, 13:27, arnuld <NoS...@NoPain. comwrote:
    C takes input character by character.
    nope. It can read lines (fgets()) or arbitary blocks (fread())
    I did not find any Standard Library
    function that can take a word as input.
    correct, there aren't any.

    So I want to write one of my own
    to be used with "Self Referential Structures" of section 6.5 of K&R2. K&R2
    has their own version of <getwordwhich , I think,  is quite different
    from what I need:
    >
    <getwordwill have following properties:
    >
     1.) If the word contains any number like "beauty1" or "win2e" it will
     discard it, K&R2's <getworddoes not. My <getwordwill only take
     pure-words like "beauty", "wine" etc.
    take a look at isalpha()

    2.) we can store each word by using <array of pointerspointin g to those
    words and since words themselves are  strings, which in
    reality, are <arrays of chars>, so we will have <array of pointersto
    those <arrays of chars>.
    char *word_table [100];

    or you think using a 2D array is a better idea ?
    are all your words the same size?
    If you use the array of pointers you'll have to get the memory
    for each word from somewhere (eg. malloc())


    --
    Nick Keighley

    there may have been other things between sliced bread and Java

    Comment

    • santosh

      #3
      Re: taking a &quot;word&quot ; as input

      arnuld wrote:
      C takes input character by character. I did not find any Standard
      Library function that can take a word as input. So I want to write one
      of my own to be used with "Self Referential Structures" of section 6.5
      of K&R2. K&R2
      has their own version of <getwordwhich , I think, is quite different
      from what I need:
      >
      <getwordwill have following properties:
      >
      >
      1.) If the word contains any number like "beauty1" or "win2e" it will
      discard it, K&R2's <getworddoes not. My <getwordwill only take
      pure-words like "beauty", "wine" etc.
      What about words with other characters like hyphen? What about
      constructs like "get_name"? Will you discard them too. What about words
      that end with a ; or ...? What about words that contain symbols like #@
      etc? Or words that end with an exclamation mark? Or words within
      parenthesis or braces?

      Just giving you some food for thought as to what exactly you are going
      to consider a word and what you will reject. This can be far trickier
      than one first imagines.
      2.) we can store each word by using <array of pointerspointin g to
      those words and since words themselves are strings, which in
      reality, are <arrays of chars>, so we will have <array of pointersto
      those <arrays of chars>.
      That's one way yes, suitable when you don't know the lengths of words in
      advance, or you don't want to possibly waste storage with statically
      allocated arrays.
      or you think using a 2D array is a better idea ?
      Depends on your requirements really, and the type and frequency of input
      you expect. Will you put an upper limit on the length of words? It
      hardly makes sense to accept words longer than about 64 characters if
      you are dealing with normal English text. Static 2D arrays are
      undoubtedly easier to work with but are less flexible than dynamically
      allocated arrays. Since statically allocated arrays are of fixed size
      it's possible for some elements to remain unused and hence wasted. OTOH
      a large number of small allocations may lead to memory fragmentation
      and also some wastage due to malloc bookeeping and possibly also a
      slowdown in speed if you'll be reading a very large number of words
      from a file. For input from a human it will not matter.

      One efficient method is to use a single dynamically allocated array in
      which words are stored sequentially. The length of each word could be
      specified by either one or two bytes prefixing the word itself. This
      results in very efficient storage, but is grossly inefficient if you
      want to insert and delete words at random. For this a hash table based
      approach is probably the best. OTOH a tree is very convenient for quick
      searching and sorting.

      If you tell us more details about the type and volume of input you
      expect and the facilities (like searching, insertion, etc.) you plan to
      implement, perhaps a tailored approach can be suggested.

      Comment

      • arnuld

        #4
        Re: taking a &quot;word&quot ; as input

        ..On Tue, 29 Apr 2008 02:47:18 -0700, Nick Keighley wrote:

        are all your words the same size?
        It depends on the user, what he likes to input at run-time.

        If you use the array of pointers you'll have to get the memory
        for each word from somewhere (eg. malloc())
        yes. I came up with this code and as you can see it does not do what I
        want. I want to take every word into the input but it only takes 1st for
        obvious reasons. I am not able to think of the way to take all the words
        of the input:




        #include <stdio.h>
        #include <ctype.h>


        enum MAXSIZE { MAXWORD = 100 };

        char *getword( char *, int );


        int main(void) {

        char buffer[MAXWORD];

        getword( buffer, MAXWORD );

        printf("--------------------\n");
        printf("%s\n", buffer);

        return 0;
        }



        char *getword( char *word, int max )
        {
        int c, i;

        i = 0;

        while( isalpha(c = getchar()) && i < max - 1 )
        {
        word[i++] = c;
        }

        word[i] = '\0';

        return word;
        }

        ============= OUTPUT =============== ==
        /home/arnuld/programs/C $ gcc -ansi -pedantic -Wall -Wextra test.c
        /home/arnuld/programs/C $ ./a.out
        like that
        --------------------
        like
        /home/arnuld/programs/C $



        --

        my email ID is at the above address

        Comment

        • arnuld

          #5
          Re: taking a &quot;word&quot ; as input

          On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:

          What about words with other characters like hyphen? What about
          constructs like "get_name"? Will you discard them too. What about words
          that end with a ; or ...? What about words that contain symbols like #@
          etc? Or words that end with an exclamation mark? Or words within
          parenthesis or braces?
          all of them will be discarded. Only words containing letters like
          "santosh" will be considered, nothing else.



          That's one way yes, suitable when you don't know the lengths of words in
          advance, or you don't want to possibly waste storage with statically
          allocated arrays.
          yes, exactly, input will be at run-time only.

          Depends on your requirements really, and the type and frequency of input
          you expect. Will you put an upper limit on the length of words? It
          hardly makes sense to accept words longer than about 64 characters if
          you are dealing with normal English text.
          ok, make the upper limit to 64 :) , I usually take it 100 as my style.


          Static 2D arrays are
          undoubtedly easier to work with but are less flexible than dynamically
          allocated arrays. Since statically allocated arrays are of fixed size
          it's possible for some elements to remain unused and hence wasted. OTOH
          a large number of small allocations may lead to memory fragmentation and
          also some wastage due to malloc bookeeping and possibly also a slowdown
          in speed if you'll be reading a very large number of words from a file.
          For input from a human it will not matter.

          you want to say that there will be 2 types of implementations if
          efficiency is my concern:

          1.) input from human
          2.) input from a text-file

          ??

          couldn't there be a single implementation for both types of inputs ?

          One efficient method is to use a single dynamically allocated array in
          which words are stored sequentially. The length of each word could be
          specified by either one or two bytes prefixing the word itself. This
          results in very efficient storage, but is grossly inefficient if you
          want to insert and delete words at random. For this a hash table based
          approach is probably the best. OTOH a tree is very convenient for quick
          searching and sorting.
          If you tell us more details about the type and volume of input you
          expect and the facilities (like searching, insertion, etc.) you plan to
          implement, perhaps a tailored approach can be suggested.

          The basic problem is to sort, count and print the sorted words. We are
          not going to save a word in an array if it has already appeared, we will
          just increase the count for that word.

          K&R2 seems to suggest that a doubly-linked list using binary search is
          the most efficient method to use, described in section 6.5 and is already
          solved. Actually I am not able to understand the <getwordfunctio n of the
          authors which actually is different from what I want, hence I need to
          create one of my own.





          --

          my email ID is at the above address

          Comment

          • arnuld

            #6
            Re: taking a &quot;word&quot ; as input

            On Thu, 01 May 2008 03:02:23 +0500, arnuld wrote:
            .... SNIP...
            The basic problem is to sort, count and print the sorted words. We are
            not going to save a word in an array if it has already appeared, we will
            just increase the count for that word.
            .....SNIP....
            by accident, it is actually exercise 6-4 of K&R2 :)



            --

            my email ID is at the above address

            Comment

            • arnuld

              #7
              Re: taking a &quot;word&quot ; as input

              On Wed, 30 Apr 2008 20:54:48 +0500, arnuld wrote:
              by accident, it is actually exercise 6-4 of K&R2 :)

              How about this code. It works fine:


              /* A program that takes a single word as input. It will discard
              * the whole input if it contains anything other than the 26 alphabets
              * of English. If the input word contains more than 30 letters then only
              * the extra letters will be discarded . For general purpose usage of
              * English it does not make any sense to use a word larger than this size.
              * Nearly every general purpose word can be expressed in a word with less
              * than or equal to 30 letters.
              *
              * version 1.1
              *
              */


              #include <stdio.h>
              #include <stdlib.h>
              #include <ctype.h>


              enum MAXSIZE { WORDSIZE = 30 };

              int getword( char *, int );


              int main( void )
              {
              char ac[WORDSIZE];

              if( getword( ac, WORDSIZE ) )
              {
              printf("%s\n", ac);
              }

              return EXIT_SUCCESS;

              }


              int getword( char *word, int max_length )
              {
              int c;
              char *w = word;


              while( isspace( c = getchar() ) )
              {
              ;
              }

              while( --max_length )
              {
              if( isalpha( c ) )
              {
              *w++ = c;
              }
              else if( c == '\n' || c == EOF || isspace( c ) )
              {
              *w = '\0';
              break;
              }
              else
              {
              return 0;
              }

              c = getchar();
              }

              /* I can simply ignore the if condition and directly write the '\0'
              onto the last element because in worst case it will only rewrite
              the '\n' that is put in there by else if clause.

              or in else if clause, I could replace break with return word[0].

              I thought these 2 ideas will be either inefficient or
              a bad programming practice, so I did not do it.
              */
              if( *w != '\0' )
              {
              *w = '\0';
              }



              return word[0];
              }


              ========== OUTPUT ============
              Welcome to the Emacs shell

              /home/arnuld/programs/C $ gcc -ansi -pedantic -Wall -Wextra getword.c
              /home/arnuld/programs/C $ ./a.out
              like this
              like
              /home/arnuld/programs/C $ ./a.out
              like3
              /home/arnuld/programs/C $ ./a.out
              9like
              /home/arnuld/programs/C $ ./a.out
              like ll
              like
              /home/arnuld/programs/C $



              --

              my email ID is @ the above address

              Comment

              • Nick Keighley

                #8
                Re: taking a &quot;word&quot ; as input

                On 30 Apr, 23:02, arnuld <NoS...@NoPain. comwrote:
                .On Tue, 29 Apr 2008 02:47:18 -0700,Nick Keighleywrote:
                are all your words the same size?
                >
                It depends on the user, what he likes to input at run-time.
                in other words, no.

                santosh has pointed out some of the design drivers for this.
                So decide do you want a fixed size (limits word size and wastes space)
                or a variable size (harder to program).

                If you use the array of pointers you'll have to get the memory
                for each word from somewhere (eg. malloc())
                Note Well

                yes. I came up with this code and as you can see it does not do what I
                want. I want to take every word into the input but it only takes 1st for
                obvious reasons. I am not able to think of the way to take all the words
                of the input:
                1. after you read a word you need to skip to the next word.

                eg. read until you get a letter

                2. you need somewhere to store the words. Either a 2D array or
                use malloc().

                #include <stdio.h>
                #include <ctype.h>
                >
                enum MAXSIZE { MAXWORD = 100 };
                >
                char *getword( char *, int );
                >
                int main(void) {
                >
                  char buffer[MAXWORD];
                this only holds one word

                char buffer[MAXNUMWORDS][MAXWORD];
                OR char* buffer [MAXWORD]

                  getword( buffer, MAXWORD );
                pass the appropriate argument

                <snip>

                --
                Nick keighley

                Comment

                • Nick Keighley

                  #9
                  Re: taking a &quot;word&quot ; as input

                  On 30 Apr, 23:02, arnuld <NoS...@NoPain. comwrote:
                  On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:
                  What about words with other characters like hyphen? What about
                  constructs like "get_name"? Will you discard them too. What about words
                  that end with a ; or ...? What about words that contain symbols like #@
                  etc? Or words that end with an exclamation mark? Or words within
                  parenthesis or braces?
                  >
                  all of them will be discarded. Only words containing letters like
                  "santosh" will be considered, nothing else.
                  >
                  That's one way yes, suitable when you don't know the lengths of words in
                  advance, or you don't want to possibly waste storage with statically
                  allocated arrays.
                  >
                  yes, exactly, input will be at run-time only.
                  I don't understand what you mean here

                  Depends on your requirements really, and the type and frequency of input
                  you expect. Will you put an upper limit on the length of words?   It
                  hardly makes sense to accept words longer than about 64 characters if
                  you are dealing with normal English text.
                  >
                  ok, make the upper limit to 64 :) , I usually take it 100 as my style.
                  >
                  Static 2D arrays are
                  undoubtedly easier to work with but are less flexible than dynamically
                  allocated arrays. Since statically allocated arrays are of fixed size
                  it's possible for some elements to remain unused and hence wasted. OTOH
                  a large number of small allocations may lead to memory fragmentation and
                  also some wastage due to malloc bookeeping and possibly also a slowdown
                  in speed if you'll be reading a very large number of words from a file.
                  For input from a human it will not matter.
                  >
                  you want to say that there will be 2 types of implementations if
                  efficiency is my concern:
                  >
                    1.) input from human
                    2.) input from a text-file
                  >
                   ??
                  >
                  couldn't there be a single implementation for both types of inputs ?
                  yes. But file or human might influence your design. People type
                  v e r y s l o w l y so a human input only program doesn't need to
                  be fast (for this problem). The file input one should work just fine
                  with people.

                  One efficient method is to use a single dynamically allocated array in
                  which words are stored sequentially. The length of each word could be
                  specified by either one or two bytes prefixing the word itself. This
                  results in very efficient storage, but is grossly inefficient if you
                  want to insert and delete words at random. For this a hash table based
                  approach is probably the best. OTOH a tree is very convenient for quick
                  searching and sorting.
                  If you tell us more details about the type and volume of input you
                  expect and the facilities (like searching, insertion, etc.) you plan to
                  implement, perhaps a tailored approach can be suggested.
                  >
                  The basic problem is to sort, count and print the sorted words.  We are
                  not going to save a word in an array if it has already appeared, we will
                  just increase the count for that word.  
                  that didn't really answer the question...
                  K&R2 seems to suggest that a  doubly-linked list using binary search is
                  the most efficient method to use, described in section 6.5 and is already
                  solved. Actually I am not able to understand the <getwordfunctio n of the
                  authors which actually is different from what I want, hence I need to
                  create one of my own.

                  --
                  Nick Keighley

                  Comment

                  • Ben Bacarisse

                    #10
                    Re: taking a &quot;word&quot ; as input

                    arnuld <NoSpam@NoPain. comwrites:
                    >On Wed, 30 Apr 2008 20:54:48 +0500, arnuld wrote:
                    >
                    >by accident, it is actually exercise 6-4 of K&R2 :)
                    I don't have K&R2 so I don't know the end point of this exercise, so I
                    may have this wrong...
                    How about this code. It works fine:
                    >
                    /* A program that takes a single word as input. It will discard
                    * the whole input if it contains anything other than the 26 alphabets
                    * of English. If the input word contains more than 30 letters then only
                    * the extra letters will be discarded . For general purpose usage of
                    * English it does not make any sense to use a word larger than this size.
                    * Nearly every general purpose word can be expressed in a word with less
                    * than or equal to 30 letters.
                    *
                    * version 1.1
                    *
                    */
                    >
                    >
                    #include <stdio.h>
                    #include <stdlib.h>
                    #include <ctype.h>
                    >
                    >
                    enum MAXSIZE { WORDSIZE = 30 };
                    >
                    int getword( char *, int );
                    >
                    >
                    int main( void )
                    {
                    char ac[WORDSIZE];
                    >
                    if( getword( ac, WORDSIZE ) )
                    {
                    printf("%s\n", ac);
                    }
                    >
                    return EXIT_SUCCESS;
                    >
                    }
                    >
                    >
                    int getword( char *word, int max_length )
                    {
                    int c;
                    char *w = word;
                    >
                    >
                    while( isspace( c = getchar() ) )
                    {
                    ;
                    }
                    I find { ; } a messy way of saying nothing, but that is a style
                    point. More important, if this will be used to read more than one
                    word (eventually) you need to skip anything that you don't count as a
                    word character, not just spaces.
                    while( --max_length )
                    {
                    if( isalpha( c ) )
                    {
                    *w++ = c;
                    }
                    else if( c == '\n' || c == EOF || isspace( c ) )
                    {
                    *w = '\0';
                    break;
                    }
                    else
                    {
                    return 0;
                    When the word ends because of this condition, why do you return 0
                    rather than the word you have read? You do have a word to return.
                    }
                    >
                    c = getchar();
                    }
                    >
                    /* I can simply ignore the if condition and directly write the '\0'
                    onto the last element because in worst case it will only rewrite
                    the '\n' that is put in there by else if clause.
                    I think the comment is confusing. Without the if below, you re-write
                    a 0 that is already there. A \n is never put into the buffer.
                    or in else if clause, I could replace break with return word[0].
                    >
                    I thought these 2 ideas will be either inefficient or
                    a bad programming practice, so I did not do it.
                    */
                    if( *w != '\0' )
                    {
                    *w = '\0';
                    }
                    I'd just write *w = '\0';
                    return word[0];
                    That's a char. Given what you said about conversions and clarity, you
                    should really write return word[0] != '\0'; or maybe return !!word[0];
                    }
                    --
                    Ben.

                    Comment

                    • arnuld

                      #11
                      Re: taking a &quot;word&quot ; as input

                      On Wed, 30 Apr 2008 18:54:45 +0100, Ben Bacarisse wrote:
                      >arnuld wrote:
                      I don't have K&R2 so I don't know the end point of this exercise, so I
                      may have this wrong...
                      I knew this ;)


                      > while( isspace( c = getchar() ) )
                      > {
                      > ;
                      > }
                      I find { ; } a messy way of saying nothing, but that is a style
                      point. More important, if this will be used to read more than one
                      word (eventually) you need to skip anything that you don't count as a
                      word character, not just spaces.
                      It is for the trailing spaces, any white-spaces, that come before the
                      word.


                      > while( --max_length )
                      > {
                      > if( isalpha( c ) )
                      > {
                      > *w++ = c;
                      > }
                      > else if( c == '\n' || c == EOF || isspace( c ) )
                      > {
                      > *w = '\0';
                      > break;
                      > }
                      > else
                      > {
                      > return 0;
                      When the word ends because of this condition, why do you return 0 rather
                      than the word you have read? You do have a word to return.
                      word doe snot end here. If the next character we are reading is other than
                      a character, any whitespace or EOF, then it will not be a letter e.g.
                      "Ben2" or "usen@et" and in that case I am going to discard the whole word.



                      > if( *w != '\0' )
                      > {
                      > *w = '\0';
                      > }
                      > }
                      I'd just write *w = '\0';
                      ok, fine, will do that.

                      > return word[0];
                      That's a char. Given what you said about conversions and clarity, you
                      should really write return word[0] != '\0'; or maybe return !!word[0];

                      I don't understand your point. word[0] is char but the function is
                      supposed to return an integer and hence there is an implicit conversion
                      from char to int. This conversion is useful in the while loop that I am
                      writing as part of a doubly-linked list program. For the full program see
                      my other thread titled: "sorting using a doubly-linked list"



                      --

                      my email ID is @ the above address

                      Comment

                      • Ben Bacarisse

                        #12
                        Re: taking a &quot;word&quot ; as input

                        arnuld <NoSpam@NoPain. comwrites:
                        On Wed, 30 Apr 2008 18:54:45 +0100, Ben Bacarisse wrote:
                        >arnuld wrote:
                        >
                        >> while( isspace( c = getchar() ) )
                        >> {
                        >> ;
                        >> }
                        >
                        >I find { ; } a messy way of saying nothing, but that is a style
                        >point. More important, if this will be used to read more than one
                        >word (eventually) you need to skip anything that you don't count as a
                        >word character, not just spaces.
                        >
                        It is for the trailing spaces, any white-spaces, that come before the
                        word.
                        Yes I know what it is for. I was suggesting that you could do
                        better. If this is all you need, then fine, but the usual goal is
                        to make flexible functions.
                        >> while( --max_length )
                        >> {
                        >> if( isalpha( c ) )
                        >> {
                        >> *w++ = c;
                        >> }
                        >> else if( c == '\n' || c == EOF || isspace( c ) )
                        >> {
                        >> *w = '\0';
                        >> break;
                        >> }
                        >> else
                        >> {
                        >> return 0;
                        >
                        >
                        >When the word ends because of this condition, why do you return 0 rather
                        >than the word you have read? You do have a word to return.
                        >
                        word doe snot end here. If the next character we are reading is other than
                        a character, any whitespace or EOF, then it will not be a letter e.g.
                        "Ben2" or "usen@et" and in that case I am going to discard the whole
                        word.
                        Again, I know that. My reading of the exercise is that the program
                        would take the input:

                        Can you count these words?
                        "Yes, I can".

                        and report eight words none occurring more than once (Can != can for
                        the moment). If you want to just stop on punctuation, fine, but that
                        seems an odd choice. That is all I was saying.
                        >> return word[0];
                        >
                        >That's a char. Given what you said about conversions and clarity, you
                        >should really write return word[0] != '\0'; or maybe return !!word[0];
                        >
                        I don't understand your point. word[0] is char but the function is
                        supposed to return an integer and hence there is an implicit conversion
                        from char to int.
                        I should have added a smiley. You stated in another message that you
                        wanted all conversions to be explicit. In that case I'd used an int
                        where a char was needed. Here, you do the reverse quite happily!

                        C's implicit conversion are good and there is no need to make them all
                        explicit. Your return statement is fine just as it is. I'll remember
                        to make my jokes stand out more!

                        --
                        Ben.

                        Comment

                        Working...