A string collection abstract data type

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jacob navia

    A string collection abstract data type

    Abstract:

    Continuing the discussion about abstract data types, in this
    discussion group, a string collection data type is presented,
    patterned after the collection in C# and similar languages (Java).
    It stores character strings, and resizes itself to accommodate
    new strings when needed.

    Interface:
    ----------

    The data structure uses a table of function pointers to provide an
    extensible basis for many functions without cluttering the
    user's workspace with too many names.

    This architecture is not only extensible at the API level, but
    it allows "subclassin g". If the user is unsatisfied with some
    of the functions of the API, he/she can:

    o Replace one or more of the function pointers in the table
    with a function of his/her own.

    o Store somewhere the old function pointer, and replace it with
    a function of his own that after doing some work calls the
    original function pointer, optionally doing some work after the
    original function returns.

    This type of extensibility can only be achieved with function pointers.
    The basic problem with many of the proposals presented here is that
    they present too much names that can conflict with user names.

    This hiding of names is useful in another way, since it allows the API
    to use completely generic names like "Add" or similar without any
    ambiguity.

    Since those generic names can be used in *other* abstract data types
    that use the same type of structure, the user code can be made truly
    general, and it is easy to change from a string collection to a list
    without too much trouble.

    The objective would be to make a standard way of naming things within
    a container, so that user code is shielded from change when passing from
    one container to another.

    The name of the table of functions member is "lpVtbl" following
    an old function naming convention under the windows OS and in
    the COM system. It can be changed of course.



    The interface is described in the header file <strcollection. h>
    ---------------------------------------------------------------------
    #include <string.h>
    // Forward declaration of the string collection type
    typedef struct _StringCollecti on StringCollectio n;
    typedef struct {
    // Returns the number of elements stored
    int (*GetCount)(Str ingCollection *SC);

    // Is this collection read only?
    int (*IsReadOnly)(S tringCollection *SC);

    // Sets this collection read-only or unsets the read-only flag
    int (*SetReadOnly)( StringCollectio n *SC,int flag);

    // Adds one element at the end. Given string is copied
    int (*Add)(StringCo llection *SC,char *newval);

    // Adds a NULL terminated table of strings
    int (*AddRange)(Str ingCollection *SC,char **newvalues);

    // Clears all data and frees the memory
    int (*Clear)(String Collection *SC);

    //Case sensitive search of a character string in the data
    int (*Contains)(Str ingCollection *SC,char *str);

    // Copies all strings into a NULL terminated vector
    char **(*CopyTo)(Str ingCollection *SC);

    //Returns the index of the given string or -1 if not found
    int (*IndexOf)(Stri ngCollection *SC,char *SearchedString );

    // Inserts a string at the position zero.
    int (*Insert)(Strin gCollection *SC,char *);

    // Inserts a string at the given position
    int (*InsertAt)(Str ingCollection *SC,int idx,char *newval);

    // Returns the string at the given position
    char *(*IndexAt)(Str ingCollection *SC,int idx);

    // Removes the given string if found
    int (*Remove)(Strin gCollection *SC,char *);

    //Removes the string at the indicated position
    int (*RemoveAt)(Str ingCollection *SC,int idx);

    // Frees the memory used by the collection
    int (*Finalize)(Str ingCollection *SC);

    // Returns the current capacity of the collection
    int (*GetCapacity)( StringCollectio n *SC);

    // Sets the capacity if there are no items in the collection
    int (*SetCapacity)( StringCollectio n *SC,int newCapacity);

    // Calls the given function for all strings.
    // "Arg" is a user supplied argument (that can be NULL)
    // which is passed to the function to call
    void (*Apply)(String Collection *SC,int (*Applyfn)(char *,void *
    arg),void *arg);

    // Calls the given function for each string and saves
    // all results in an integer vector
    int *(*Map)(StringC ollection *SC,int (*Applyfn)(char *));

    // Pushes a string, using the collection as a stack
    int (*Push)(StringC ollection *SC,char *str);

    // Pops the last string off the collection
    char * (*Pop)(StringCo llection *SC);

    // Replaces the character string at the given position
    // with a new one
    char *(*ReplaceAt)(S tringCollection *SC,int idx,char *newval);

    } StringCollectio nFunctions;

    // Definition of the String Collection type
    struct _StringCollecti on {
    StringCollectio nFunctions *lpVtbl; // The table of functions
    size_t count; /* in element size units */
    char **contents; /* The contents of the collection */
    size_t capacity; /* in element_size units */
    unsigned int flags; // Read-only or other flags
    };


    // This is the only exported function from this module
    StringCollectio n * newStringCollec tion(int startsize);

    ------------------------------------------end of stringcollectio n.h

    Implementation
    --------------

    I haven't had the time to comment this more in depth. I hope that
    this will be done in the ensuing discussion.

    ------------------------------------------stringcollectio n.c
    #include "strcollection. h"
    // Forward definitions
    static int GetCount( struct _StringCollecti on *SC);
    static int IsReadOnly( struct _StringCollecti on *SC);
    static int SetReadOnly( struct _StringCollecti on *SC,int newval);
    static int Add( struct _StringCollecti on *SC,char *newval);
    static int AddRange( struct _StringCollecti on *SC,char **newvalues);
    static int Clear( struct _StringCollecti on *SC);
    static int Contains( struct _StringCollecti on *SC,char *str);
    static char **CopyTo( struct _StringCollecti on *SC);
    static int IndexOf( struct _StringCollecti on *SC,char *SearchedString );
    static int Insert( struct _StringCollecti on *SC,char *);
    static int InsertAt( struct _StringCollecti on *SC,int idx,char *newval);
    static char *IndexAt( struct _StringCollecti on *SC,int idx);
    static int Remove( struct _StringCollecti on *SC,char *);
    static int RemoveAt( struct _StringCollecti on *SC,int idx);
    static int Finalize( struct _StringCollecti on *SC);
    static int GetCapacity( struct _StringCollecti on *SC);
    static int SetCapacity( struct _StringCollecti on *SC,int newCapacity);
    static void Apply(struct _StringCollecti on *SC,int(*Applyf n)(char *,void
    *),void *arg);
    static int *Map(struct _StringCollecti on *SC,int (*Applyfn)(char *));
    static int Push(struct _StringCollecti on *SC,char *str);
    static char *Pop(struct _StringCollecti on *SC);
    static char *ReplaceAt(stru ct _StringCollecti on *SC,int idx,char *newval);
    static StringCollectio nFunctions lpVtableSC = {
    GetCount, IsReadOnly, SetReadOnly, Add, AddRange,
    Clear, Contains, CopyTo, IndexOf, Insert,
    InsertAt, IndexAt, Remove, RemoveAt,
    Finalize, GetCapacity, SetCapacity, Apply,
    Map, Push, Pop, ReplaceAt,
    };

    static char *DuplicateStrin g(char *str)
    {
    char *result;
    if (str == NULL)
    return NULL;
    result = MALLOC(strlen(s tr)+1);
    if (result == NULL)
    return NULL;
    strcpy(result,s tr);
    return result;
    }

    #define SC_READONLY 1
    #define CHUNKSIZE 20

    StringCollectio n * newStringCollec tion(int startsize)
    {
    StringCollectio n *result = MALLOC(sizeof(S tringCollection ));
    if (result == NULL)
    return NULL;
    result->count = 0;
    if (startsize == 0)
    startsize = DEFAULT_START_S IZE;
    result->contents = MALLOC(startsiz e*sizeof(char *));
    if (result->contents == NULL) {
    FREE(result);
    return NULL;
    }
    memset(result->contents,0,siz eof(char *)*startsize);
    result->capacity = startsize;
    result->count = 0;
    result->flags = 0;
    result->lpVtbl = &lpVtableSC;
    return result;
    }

    static int GetCount(struct _StringCollecti on *SC)
    {
    return SC->count;
    }
    static int IsReadOnly(stru ct _StringCollecti on *SC)
    {
    return SC->flags * SC_READONLY ? 1 : 0;
    }
    static int SetReadOnly(str uct _StringCollecti on *SC,int newval)
    {
    int oldval = SC->flags * SC_READONLY ? 1 : 0;
    if (newval)
    SC->flags |= SC_READONLY;
    else
    SC->flags &= ~SC_READONLY;
    return oldval;
    }

    static int Resize(struct _StringCollecti on *SC)
    {
    int newcapacity = SC->capacity + CHUNKSIZE;
    char **oldcontents = SC->contents;
    SC->contents = MALLOC(newcapac ity*sizeof(char *));
    if (SC->contents == NULL) {
    SC->contents = oldcontents;
    return 0;
    }
    memset(SC->contents,0,siz eof(char *)*newcapacity) ;
    memcpy(SC->contents,oldco ntents,SC->count*sizeof(c har *));
    SC->capacity = newcapacity;
    return 1;
    }

    static int Add(struct _StringCollecti on *SC,char *newval)
    {
    if (SC->flags & SC_READONLY)
    return -1;
    if (SC->count >= SC->capacity) {
    if (!Resize(SC))
    return 0;
    }

    if (newval) {
    SC->contents[SC->count] = DuplicateString (newval);
    if (SC->contents[SC->count] == NULL) {
    return 0;
    }
    }
    else
    SC->contents[SC->count] = NULL;
    SC->count++;
    return SC->count;
    }

    static int AddRange(struct _StringCollecti on * SC,char **data)
    {
    int i = 0;
    if (SC->flags & SC_READONLY)
    return 0;
    while (data[i] != NULL) {
    int r = Add(SC,data[i]);
    if (r <= 0)
    return r;
    i++;
    }
    return SC->count;
    }

    static int Clear(struct _StringCollecti on * SC)
    {
    int oldval = SC->count,i;
    if (SC->flags & SC_READONLY)
    return 0;
    for (i=0; i<SC->count;i++) {
    FREE(SC->contents[i]);
    SC->contents[i] = NULL;
    }
    SC->count = 0;
    return oldval;
    }

    static int Contains(struct _StringCollecti on * SC,char *str)
    {
    int c,i;
    if (str == NULL)
    return 0;
    c = *str;
    for (i=0; i<SC->count;i++) {
    if (c == SC->contents[i][0] && !strcmp(SC->contents[i],str))
    return 1;
    }
    return 0;
    }

    static char **CopyTo(struct _StringCollecti on *SC)
    {
    char **result = MALLOC((1+SC->count)*sizeof( char *));
    int i;
    if (result == NULL)
    return NULL;
    for (i=0; i<SC->count;i++) {
    result[i] = DuplicateString (SC->contents[i]);
    }
    result[i] = NULL;
    return result;
    }

    #ifdef __LCC__
    /* The lcc-win compiler allows operator overloading, and allows using
    this abstract data type with the more natural [ ] operators */
    char * __declspec(nake d) operator[](StringCollecti on *SC, int idx)
    {
    }
    #endif
    static int IndexOf(struct _StringCollecti on *SC,char *str)
    {
    int i;

    for (i=0; i<SC->count;i++) {
    if (!strcmp(SC->contents[i],str)) {
    return i;
    }
    }
    return -1;
    }


    static char *IndexAt(struct _StringCollecti on *SC,int idx)
    {
    if (idx >=SC->count || idx < 0)
    return NULL;
    return SC->contents[idx];
    }

    static int InsertAt(struct _StringCollecti on *SC,int idx,char *newval)
    {
    if (SC->flags & SC_READONLY)
    return 0;
    if (SC->count >= (SC->capacity-1)) {
    if (!Resize(SC))
    return 0;
    }
    if (idx < 0 || idx SC->count)
    return -1;
    if (idx == 0) {
    if (SC->count 0)
    memmove(SC->contents+1,S C->contents,SC->count*sizeof(c har *));
    SC->contents[0] = newval;
    }
    else if (idx == SC->count) {
    SC->contents[idx] = newval;
    }
    else if (idx < SC->count) {

    memmove(SC->contents+idx+1 ,SC->contents+idx,( SC->count-idx+1)*sizeof(c har
    *));
    SC->contents[idx] = newval;
    }
    SC->count++;
    return SC->count;
    }

    static int Insert(struct _StringCollecti on *SC,char *newval)
    {
    if (SC->flags & SC_READONLY)
    return 0;
    return InsertAt(SC,0,n ewval);
    }

    static int RemoveAt(struct _StringCollecti on *SC,int idx)
    {
    if (idx >= SC->count || idx < 0)
    return -1;
    if (SC->count == 0)
    return -2;
    FREE(SC->contents[idx]);
    memmove(SC->contents+idx,S C->contents+idx+1 ,(SC->count-idx)*sizeof(cha r
    *));
    SC->contents[SC->count-1]=NULL;
    SC->count--;
    return SC->count;
    }

    static int Remove(struct _StringCollecti on *SC,char *str)
    {
    int idx = IndexOf(SC,str) ;
    if (idx < 0)
    return idx;
    return RemoveAt(SC,idx );
    }

    static int Push(struct _StringCollecti on *SC,char *str)
    {
    char *r;
    if (SC->flags&SC_READO NLY)
    return 0;
    if (SC->count >= SC->capacity) {
    if (!Resize(SC))
    return 0;
    }
    r = DuplicateString (str);
    if (r == NULL)
    return 0;
    SC->contents[SC->count++] = r;
    return SC->count;
    }

    static char * Pop(struct _StringCollecti on *SC)
    {
    char *result;
    if ((SC->flags&SC_READO NLY) || SC->count == 0)
    return NULL;
    SC->count--;
    result = SC->contents[SC->count];
    SC->contents[SC->count] = NULL;
    return result;
    }

    static int Finalize(struct _StringCollecti on *SC)
    {
    int result = SC->count,i;
    for (i=0; i<SC->count;i++) {
    FREE(SC->contents[i]);
    }
    FREE(SC->contents);
    FREE(SC);
    return result;
    }

    static int GetCapacity(str uct _StringCollecti on *SC)
    {
    return SC->capacity;
    }

    static int SetCapacity(str uct _StringCollecti on *SC,int newCapacity)
    {
    if (SC->count != 0)
    return 0;
    FREE(SC->contents);
    SC->contents = MALLOC(newCapac ity*sizeof(char *));
    memset(SC->contents,0,siz eof(char *)*newCapacity) ;
    SC->capacity = newCapacity;
    return 1;
    }

    static void Apply(struct _StringCollecti on *SC,int (*Applyfn)(char
    *,void *),void *arg)
    {
    int i;
    for (i=0; i<SC->count;i++) {
    Applyfn(SC->contents[i],arg);
    }
    }

    static int *Map(struct _StringCollecti on *SC,int (*Applyfn)(char *))
    {
    int *result = MALLOC(SC->count*sizeof(i nt)),i;

    if (result == NULL)
    return NULL;
    for (i=0; i<SC->count;i++) {
    result[i] = Applyfn(SC->contents[i]);
    }
    return result;
    }

    #ifdef __LCC__
    /* The lcc-win compiler allows operator overloading, and allows using
    this abstract data type with the more natural [ ]= operators */
    char * __declspec(nake d) operator[]=(StringCollect ion *SC,int idx,char
    *newval)
    {
    }

    static char *ReplaceAt(Stri ngCollection *SC,int idx,char *newval)
    {
    if (SC->flags & SC_READONLY)
    return NULL;
    if (idx < 0 || idx SC->count)
    return NULL;
    FREE(SC->contents[idx]);
    SC->contents[idx] = newval;
    return newval;
    }
    #ifdef TEST
    #include <stdio.h>
    static void PrintStringColl ection(StringCo llection *SC)
    {
    int i;
    printf("Count %d, Capacity %d\n",SC->count,SC->capacity);
    for (i=0; i<SC->count;i++) {
    printf("%s\n",S C->lpVtbl->IndexAt(SC,i)) ;
    }
    printf("\n");
    }
    int main(void)
    {
    StringCollectio n *SC = newStringCollec tion(10);
    char *p;
    SC->lpVtbl->Add(SC,"Martin ");
    SC->lpVtbl->Insert(SC,"Jak ob");
    printf("Count should be 2, is %d\n",SC->lpVtbl->GetCount(SC) );
    PrintStringColl ection(SC);
    SC->lpVtbl->InsertAt(SC,1, "Position 1");
    SC->lpVtbl->InsertAt(SC,2, "Position 2");
    PrintStringColl ection(SC);
    SC->lpVtbl->Remove(SC,"Jak ob");
    PrintStringColl ection(SC);
    SC->lpVtbl->Push(SC,"pushe d");
    PrintStringColl ection(SC);
    SC->lpVtbl->Pop(SC);
    PrintStringColl ection(SC);
    p = SC->lpVtbl->IndexAt(SC,1 );
    printf("Item position 1:%s\n",p);
    PrintStringColl ection(SC);
    }
    #endif

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique

  • jacob navia

    #2
    Re: A string collection abstract data type

    Charlie Gordon wrote:
    "jacob navia" <jacob@nospam.o rga écrit dans le message de news:
    4724e895$0$2737 8$ba4acef3@news .orange.fr...
    >
    <original post snipped>
    >
    Thank you for your contribution Jacob.
    The code was flushed left probably because of tabs, I reformated it and
    fixed a number of small and not so small issues. See below:
    >
    Problems fixed:
    >
    added API remarks
    fixed missing #endif
    fixed indentation and spacing
    used typedef instead of struct tag in implementation
    added const on unmodified string parameters
    #include <stddef.hinstea d of <string.h>
    fixed some bogus bit tests: SC->flags * SC_READONLY ?
    added definition for DEFAULT_START_S IZE, allowed it to be 0
    removed redundant initialization of result->count
    used safer MALLOC idiom (à la c.l.c)
    fixed off by one error in ReplaceAt
    fixed memory leak in Resize
    fixed crash on NULL strings in Contains and IndexOf
    used same semantics and method in IndexOf and Contains and shared code
    fixed two off by one errors in InsertAt
    simplified InsertAt
    removed test for condition never reached in RemoveAt
    fixed off by one bug in RemoveAt
    make SetCapacity work for a non empty collection
    added test for MALLOC failure in SetCapacity
    duplicate string in Insert, InsertAt, ReplaceAt; handle NULL and failure
    allow ReplaceAt with an index of SC->count
    used int instead of size_t for count and size for consistency with API.
    improved PrintStringColl ection and test main output
    added missing return 0 in main
    >
    Thanks a lot!

    The original version of this code is written for
    my compiler system and distributed with the source code
    in the lcc-win distribution.

    I eliminated the lcc-win specific parts in this code.
    When doing this, I may have introduced some of the bugs
    above. Specifically the replacing of references with pointers
    intoruced the bug replacing the "&" in an AND operation
    with a multiplication sign!

    When I comment about it then, I will refer to this version.

    [snip code]

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique

    Comment

    • Roland Pibinger

      #3
      Re: A string collection abstract data type

      On Sun, 28 Oct 2007 20:52:11 +0100, jacob navia wrote:
      >Abstract:
      >Continuing the discussion about abstract data types, in this
      >discussion group, a string collection data type is presented,
      >patterned after the collection in C# and similar languages (Java).
      >It stores character strings, and resizes itself to accommodate
      >new strings when needed.
      A collection for strings certainly can be useful in many cases. I have
      two major objection to the current design:

      1. The function pointers are unnecessary. I can hardly imagine anyone
      wants to override/replace some of the container functions. OTOH, the
      function pointers make usage clumsy. One has to write:
      SC->lpVtbl->Remove(SC,"Jak ob");
      instead of just:
      Remove(SC,"Jako b");

      Actually, I don't see why the vtable is exposed to the user at all.
      Even if you insist on function pointers you can implement something
      like:

      static
      int Remove (StringCollecti on *SC, const char *str) {
      return SC->lpVtbl->Remove(SC, str);
      }

      2. Memory management:
      Strings in the container are managed by the container (via
      Finalize()). But in some cases the returned strings must be freed by
      the caller. A clear and unambiguous strategy would prevent memory
      leaks.


      --
      Roland Pibinger
      "The best software is simple, elegant, and full of drama" - Grady Booch

      Comment

      • jacob navia

        #4
        Re: A string collection abstract data type

        Roland Pibinger wrote:
        On Sun, 28 Oct 2007 20:52:11 +0100, jacob navia wrote:
        >Abstract:
        >Continuing the discussion about abstract data types, in this
        >discussion group, a string collection data type is presented,
        >patterned after the collection in C# and similar languages (Java).
        >It stores character strings, and resizes itself to accommodate
        >new strings when needed.
        >
        A collection for strings certainly can be useful in many cases. I have
        two major objection to the current design:
        >
        1. The function pointers are unnecessary. I can hardly imagine anyone
        wants to override/replace some of the container functions. OTOH, the
        function pointers make usage clumsy. One has to write:
        SC->lpVtbl->Remove(SC,"Jak ob");
        instead of just:
        Remove(SC,"Jako b");
        >
        This is certainly possible woth some macros.

        However, the key idea of the design is precisely to give several
        containers the SAME interface, so that if you decide to
        replace a string collection with a list, you can still have

        SC->lpVtbl->Remove(SC,"Jac ob");

        without changing a single line from your user code besides a few
        declarations.

        An added advantage is that it could very well exist some "Remove"
        function in a LOT of user code, so that we would be forced to
        use SOME kind of identifier prefix like

        CLC_LIST_Remove

        or similar anyway.

        Those macros are so trivial to write that they could be proposed but
        not be a forced part of the interface.
        Actually, I don't see why the vtable is exposed to the user at all.
        Even if you insist on function pointers you can implement something
        like:
        >
        static
        int Remove (StringCollecti on *SC, const char *str) {
        return SC->lpVtbl->Remove(SC, str);
        }

        The advantages of exposing the vtable is to give the user the freedom to
        change to behavior of the containers as he/she wishes WITHOUT changing
        a single line in the user code.

        For instance if you want to store only unique strings in the table,
        and keep it sorted at all times, you just change the table of
        functions and add the new functionality WITHOUT any change
        to all your code at all!!

        Besides, this gives the user the possibility of changing the behavior
        of only ONE instance or changing the behavior of ALL instances
        without any limitations with a VERY fast pointer assignment operation!

        >
        2. Memory management:
        Strings in the container are managed by the container (via
        Finalize()). But in some cases the returned strings must be freed by
        the caller. A clear and unambiguous strategy would prevent memory
        leaks.
        >
        >
        This is true, but I do not see any easy way out. When I return a string
        should I allocate a new copy and free the one in the container?

        But maybe there is a better solution. You have a valid point.


        --
        jacob navia
        jacob at jacob point remcomp point fr
        logiciels/informatique

        Comment

        Working...