3D Array of (Array of Strings) -- my malloc attempts fail miserably

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Chris3020
    New Member
    • May 2022
    • 15

    3D Array of (Array of Strings) -- my malloc attempts fail miserably

    I'm trying to make a sort-of-dictionary (don't think "hashmap" when you see that d-word, my dictionary is just simple lists of words in a 3D array)

    Code:
    #define MAX_WORDLENGTH 20
    
    struct dict{
    	int wordcount;
    	char **words;
    };
    
    struct dict dicts[MAX_WORDLENGTH + 1]; // +1 to get an index for MAX_WORDLENGTH
    e.g. dicts[7] is the struct dict for 7-letter words (7 excludes \0).
    dicts[0] .. [2] will always be unused, and depending on the problem-instance, some (or many) other dicts[N] may also be unused.

    After some preparatory work (examine problem-instance to learn necessary word-lengths, read words of necessary lengths from disk through a series of regex-like filters to get word-count for each word-length), I'm ready to malloc.

    Imagine 123 words of 7-letters.
    What I want to be able to do is iterate through those 7-letter words: i.e. use something like
    Code:
    dicts[7].words[5]
    to set/get the 6th 7-letter word with strncpy().

    Here is just one of my 99 failed attempts:

    Code:
    printf(" .words: %p before malloc\n", dicts[wordlen].words);
    // .words: 0x0 before malloc
    
    int wordlen = 7;
    char (*p)[wordlen+1]; // +1 for \0
    dicts[wordlen].words = malloc(dicts[wordlen].wordcount * sizeof *p);
    
    printf(" .words: %p after malloc\n", dicts[wordlen].words);
    // .words: 0x14c606690 after malloc
    
    printf(" .words[5]: %p after malloc\n", dicts[wordlen].words[5]);
    // .words[5]: 0x0 after malloc
    ...malloc did something, but certainly not what I wanted it to do :(

    Tweaking around with variations **, *, & juggles segfaults/errors/warnings, but nothing I've tried has actually worked!

    I've got my pointers in a twist: please help.
    Chris
  • dev7060
    Recognized Expert Contributor
    • Mar 2017
    • 656

    #2
    Code:
    printf(" .words: %p before malloc\n", dicts[wordlen].words);
    // .words: 0x0 before malloc
     
    int wordlen = 7;
    char (*p)[wordlen+1]; // +1 for \0
    dicts[wordlen].words = malloc(dicts[wordlen].wordcount * sizeof *p);
     
    printf(" .words: %p after malloc\n", dicts[wordlen].words);
    // .words: 0x14c606690 after malloc
     
    printf(" .words[5]: %p after malloc\n", dicts[wordlen].words[5]);
    // .words[5]: 0x0 after malloc
    How does it point to the addresses of the words?

    Imagine 123 words of 7-letters.
    What I want to be able to do is iterate through those 7-letter words: i.e. use something like
    Code:
    dicts[7].words[5]
    to set/get the 6th 7-letter word with strncpy().
    Code:
    dicts[5].wordcount = 3;
    
    dicts[5].words    = malloc(dicts[5].wordcount * sizeof(char *));
    dicts[5].words[0] = malloc(6 * sizeof(char));
    dicts[5].words[1] = malloc(6 * sizeof(char));
    dicts[5].words[2] = malloc(6 * sizeof(char));
    
    strncpy(dicts[5].words[0], "sieve", 5);
    strncpy(dicts[5].words[1], "mango", 5);
    strncpy(dicts[5].words[2], "hello", 5);
    Check for NULL in case malloc doesn't work.

    Comment

    • Chris3020
      New Member
      • May 2022
      • 15

      #3
      Thanks for the suggestion ...sorry, not tested yet - but it makes sense.
      Will experiment tomorrow morning.

      Is there no way to get this done with a single malloc per dicts[n] ?
      ...many thousands of malloc() seems less "efficient" .

      All this happens in a "setup-phase", so it doesn't need to be fast ...run-times for the application that uses dicts[] can be seconds to hours depending on the problem instance, so a few seconds here or there at setup is totally irrelevant.

      The client application will pull some number (a tweak not yet defined, but somewhere around 10...15) of x-length words from dicts[x]: again thinking efficiency, a single malloc() might aid locality for cache purposes.

      Chris

      Below is a compilable single malloc() approach that DOES NOT WORK.

      Code:
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      
      #define MAX_WORDLEN 10	// maximum word-length in test ( sans \0 )	
      
      struct dict{
      	int wordcnt;	// count of words in dict
      	int nexti;	// reserved
      	char *words;	// wannabe array of fixed-length strings
      };
      
      struct dict dicts[MAX_WORDLEN + 1];	// +1 to get index MAX_WORDLEN
      
      int main(void) {
      	// just a mock-up with: fake word-reads from file; magic-numbers; no free;
      	// no check on malloc success; and the whole mess in-line in main().
      
      	char *read50 = "ALPHA"; // some faked linebuffers from file read
      	char *read51 = "BRAVO";
      	char *read52 = "CATCH";
      	char *read53 = "DRINK";
      	char *read70 = "EXTINCT";
      	char *read71 = "FLAVOUR";
      
      	int wordlen;	// length ( sans \0 ) of word being handled
      	int wordcnt;	// number of word of length wordlen
      	
      	// do stuff...that provides:
      	wordlen = 5;
      	wordcnt = 4; // ...the fake Linebuffers read5n
      	
      	char *aword[wordlen + 1];
      	dicts[wordlen].words = malloc(wordcnt * sizeof(*aword));
      	// populate using the faked linebuffers:	
      	strncpy(&dicts[wordlen].words[0], read50, wordlen+1);
      	strncpy(&dicts[wordlen].words[1], read51, wordlen+1);
      	strncpy(&dicts[wordlen].words[2], read52, wordlen+1);
      	strncpy(&dicts[wordlen].words[3], read53, wordlen+1);
      	
      	// do more stuff...that provides:
      	wordlen = 7;
      	wordcnt = 2; // ...the fake linebuffers read7n
      	
      	char *bword[wordlen + 1];
      	dicts[wordlen].words = malloc(wordcnt * sizeof(*bword));
      	// populate using the faked linebuffers
      	strncpy(&dicts[wordlen].words[0], read70, wordlen);
      	strncpy(&dicts[wordlen].words[1], read71, wordlen);
      	
      	puts(" what happens?"); // using magic-numbers
      	for (int i = 0; i <  4; ++i) {
      		printf(" &dicts[5].words[%d]: %s\n", i, &dicts[5].words[i]);
      	}
      	for (int i = 0; i < 2; ++i) {
      		printf(" &dicts[7].words[%d]: %s\n", i, &dicts[7].words[i]);
      	}
      		
      	return 0;
      }
      /* OUTPUT:
       what happens?
       &dicts[5].words[0]: ABCDRINK
       &dicts[5].words[1]: BCDRINK
       &dicts[5].words[2]: CDRINK
       &dicts[5].words[3]: DRINK
       &dicts[7].words[0]: EFLAVOUR
       &dicts[7].words[1]: FLAVOUR
      */

      Comment

      • dev7060
        Recognized Expert Contributor
        • Mar 2017
        • 656

        #4
        Allocating large chunks to limit malloc calls

        1)
        Code:
        struct dict{
            int wordcount;
            char **words;
        };
        Code:
        char * mylist;
        
        dicts[5].wordcount = 3;
        dicts[5].words = malloc(dicts[5].wordcount * sizeof(char * ));
        
        mylist = malloc(sizeof(char) * 6 * dicts[5].wordcount);
        
        dicts[5].words[0] = mylist + 0;
        dicts[5].words[1] = mylist + 6;
        dicts[5].words[2] = mylist + 12;
        
        strncpy(dicts[5].words[0], "sieve", 5);
        strncpy(dicts[5].words[1], "mango", 5);
        strncpy(dicts[5].words[2], "hello", 5);

        2)
        Code:
        struct dict{
            int wordcount;
            char *words;
        };
        Code:
        dicts[5].wordcount = 3;
        dicts[5].words = malloc(sizeof(char) * 6 * dicts[5].wordcount);
        
        strncpy(dicts[5].words + 0, "sieve", 5);
        strncpy(dicts[5].words + 6, "mango", 5);
        strncpy(dicts[5].words + 12, "hello", 5);

        Comment

        • Chris3020
          New Member
          • May 2022
          • 15

          #5
          Mmmmmm interesting!
          **words just has to be more correct in the struct: I have a dozens of variations using ** (and hundreds of errors/warnings to go with).
          I will study (and experiment with) your stuff over the weekend.
          Thank you!
          Chris

          My latest effort is below. It still uses *words in the struct, it compiles, it even works (...but it is not pretty):
          Code:
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          
          #define MAX_WORDLEN 10	// maximum word-length in test ( sans \0 )	
          
          struct dict{
          	int wordcnt;	// count of words in dict
          	int nexti;		// reserved
          	char *words;	// wannabe array of fixed-length strings
          };
          
          struct dict dicts[MAX_WORDLEN + 1];	// +1 to get index MAX_WORDLEN
          
          int main(void) {
          	// just a mock-up with: fake word-reads from file; magic-numbers; no free;
          	// no check on malloc success; and the whole mess in-line in main().
          
          	char *read50 = "ALPHA"; // some faked linebuffers from file read
          	char *read51 = "BRAVO";
          	char *read52 = "CATCH";
          	char *read53 = "DRINK";
          	char *read70 = "EXTINCT";
          	char *read71 = "FLAVOUR";
          
          	int wordlen;	// length ( sans \0 ) of word being handled
          	int wordcnt;	// number of word of length wordlen
          	
          	// do stuff...that provides:
          	wordlen = 5;
          	wordcnt = 4; // ...the fake linebuffers read5n
          	
          	// drop the terminating \0 from the stored no-longer-strings
          	// +	makes the dictionary smaller
          	// -	makes printing the "string" more complicated (don't need print!)
          	dicts[wordlen].words = malloc(wordcnt * wordlen);
          	// populate using the faked linebuffers:	
          	strncpy(&dicts[wordlen].words[0*wordlen], read50, wordlen);
          	strncpy(&dicts[wordlen].words[1*wordlen], read51, wordlen);
          	strncpy(&dicts[wordlen].words[2*wordlen], read52, wordlen);
          	strncpy(&dicts[wordlen].words[3*wordlen], read53, wordlen);
          	
          	// do more stuff...that provides:
          	wordlen = 7;
          	wordcnt = 2; // ...the fake linebuffers read7n
          	
          	char *bword[wordlen + 1];
          	dicts[wordlen].words = malloc(wordcnt * sizeof(*bword));
          	// populate using the faked linebuffers
          	strncpy(&dicts[wordlen].words[0*wordlen], read70, wordlen);
          	strncpy(&dicts[wordlen].words[1*wordlen], read71, wordlen);
          	
          	puts(" what happens?");
          	// any eventual print function would have a fixed buffer name, but because
          	// we have the whole mess in-line in main(), test needs two buff names.
          	// outside of development, no use for printing words from dicts!
          	puts("\n wordlen: 5");
          	wordlen = 5;
          	wordcnt = 4;
          	char buff5[wordlen+1]; // in a print function (if wanted), name is fixed
          	for (int i = 0; i < wordcnt; ++i) {
          		strncpy(buff5, &dicts[wordlen].words[i*wordlen], wordlen);
          		printf(" &dicts[wordlen].words[%d*wordlen]: %s\n", i, buff5);
          	}
          	
          	puts("\n wordlen: 7");
          	wordlen = 7;
          	wordcnt = 2;
          	char buff7[wordlen+1]; // print function (if wanted) has fixed name
          	for (int i = 0; i < wordcnt; ++i) {
          		strncpy(buff7, &dicts[wordlen].words[i*wordlen], wordlen);
          		printf(" &dicts[wordlen].words[%d*wordlen]: %s\n", i, buff7);
          	}
          		
          	return 0;
          }
          /* OUTPUT:
           what happens?
          
           wordlen: 5
           &dicts[wordlen].words[0*wordlen]: ALPHA
           &dicts[wordlen].words[1*wordlen]: BRAVO
           &dicts[wordlen].words[2*wordlen]: CATCH
           &dicts[wordlen].words[3*wordlen]: DRINK
          
           wordlen: 7
           &dicts[wordlen].words[0*wordlen]: EXTINCT
           &dicts[wordlen].words[1*wordlen]: FLAVOUR
          */

          Comment

          • Chris3020
            New Member
            • May 2022
            • 15

            #6
            I went with 2)
            It was very close to my "latest effort" above.

            I didn't like the extra 8 bytes per word in dicts[] of 1), though it was "pretty".

            Thanks for all your suggestions, Chris

            Current test rig looks like this:

            []
            Code:
            #include <stdio.h>
            #include <stdlib.h>
            #include <string.h>
            
            #define MAX_WORDLEN 10 // maximum word-length in test ( sans \0 )   
            
            struct dict{
              unsigned wordcnt;  // count of words in dict
              unsigned next;     // put: next empty slot / get: next unread
              char *words;       // fixed-length array of chars ( sans \0 !!not strings)
            };
            
            static struct dict dicts[MAX_WORDLEN + 1];  // +1 to have index MAX_WORDLEN
            
            static void dict_allocate(unsigned wordlen   // char-count ( sans \0 ) 
                                      unsigned wordcnt.  // count of wordlen-letter words
                                     ) {
              dicts[wordlen].words = malloc(wordlen * wordcnt);
              if ( dicts[wordlen].words == NULL ) {
                puts(" ERROR, dictionary memory allocation failed");
                printf(" for %d words of %d-letters ...will exit\n", wordcnt, wordlen);
                exit(EXIT_FAILURE);
              }
              dicts[wordlen].wordcnt = wordcnt; // remains 0 if malloc fail
            }
            
            static void dict_putword( // add word at .next index in .words
                                     unsigned wordlen, // char-count ( sans \0 )
                                     char *word 
                                    ) {
              if ( dicts[wordlen].next >= dicts[wordlen].wordcnt ) {
                puts(" ERROR, attempted putword beyond upper-bound");
                printf(" of %d-letter dictionary ...will exit\n", wordlen);
                exit(EXIT_FAILURE);
              }
              strncpy(&dicts[wordlen].words[dicts[wordlen].next*wordlen], word, wordlen);
              dicts[wordlen].next++;
            }
            
            static void dict_restart_all(void) {
              for (int i = 0; i < MAX_WORDLEN + 1; ++i) {
                dicts[i].next = 0;
              }
            }
            
            static void dict_free(void) {
              for (int i = 0; i < MAX_WORDLEN; ++i) {
                if ( dicts[i].wordcnt ) {
                  free(dicts[i].words);
                  dicts[i].words = NULL;
                  printf("DEV\tfreed %d\n", i);
                }
              }
            }
            
            //////////////////////////// external ////////////////////////////
            
            int dict_getword( // return .next word in caller's buffer
                             unsigned wordlen, // char-count ( sans \0 )
                             char *buff.       // buffer for wordlen chars (sans \0)
                            ) {
               if ( dicts[wordlen].next >= dicts[wordlen].wordcnt ) {
                  return 0;   // no word got: we hit upper-bound
                           // caller decides to dict_restart_dict()   ...or not
                    }
                    strncpy(buff, &dicts[wordlen].words[dicts[wordlen].next*wordlen], wordlen);
                    dicts[wordlen].next++;
                    return 1;
            } // returns 1: word in buff  or  0: empty buff (hit upper bound of dict)
            
            int dict_getnwords( // does NOT wrap around to 0 at upper-bound
                               unsigned wordlen, // char-count ( sans \0 )
                               unsigned start,   // index of first word to get
                               unsigned count,   // desired number of words 
                               char *buff        // caller's buffer count*wordlen char   
                              ) {
              unsigned got = 0;   // count of words copied to buffer = next buffer index               
              puts(" dict_getnwords() NOT YET IMPLEMENTED");
              return -1;
            
              return got;
            } // returns 0...count = the number of words copied to buff, -ve is error
            
            void dict_restart_dict(unsigned wordlen) {
              dicts[wordlen].next = 0;
            }
            
            void dict_printdict(unsigned wordlen) { // DEV purposes only
              char buff[wordlen +1];
              for (int i = 0; i < dicts[wordlen].wordcnt; i++) {
                strncpy(buff, &dicts[wordlen].words[i*wordlen], wordlen);
                printf("\t%d: %s\n", i, buff);
              }
            }
            
            int main(void) {
            
              char *read50 = "ALPHA"; // some faked "linebuffers from file read"
              char *read51 = "BRAVO";
              char *read52 = "CATCH";
              char *read53 = "DRINK";
              char *read70 = "EXTINCT";
              char *read71 = "FLAVOUR";
            
              unsigned wordlen;  // length ( sans \0 ) of word being handled
              unsigned wordcnt;  // number of words of length wordlen
               
              // do stuff...that provides:
              wordlen = 5;
              wordcnt = 4; // ...the fake linebuffers read5n
            
              dict_allocate(wordlen, wordcnt);
              // populate using the fake linebuffers:
              dict_putword(wordlen, read50);
              dict_putword(wordlen, read51);
              dict_putword(wordlen, read52);
              dict_putword(wordlen, read53);
               
              // do more stuff...that provides:
              wordlen = 7;
              wordcnt = 2; // ...the fake linebuffers read7n
               
              dict_allocate(wordlen, wordcnt);
              // populate using the fake linebuffers
              dict_putword(wordlen, read70);
              dict_putword(wordlen, read71);
               
              // done writing dicts. Time to re-init all .next for the reading-phase
              dict_restart_all();
               
              puts(" dictionary contents:");
            
              wordlen = 5;
              printf(" wordlen = %d\n", wordlen);
              dict_printdict(wordlen);
                
              wordlen = 7;
              printf(" wordlen = %d\n", wordlen);
              dict_printdict(wordlen);
                
              puts("\n 5-letter words from dict_getword() ...and exceed upper-bound:");
              wordlen = 5;
              char gotbuf[wordlen + 1];
              for (int i = 0; i < 10; ++i) { // tries to go out-of-bounds
                if ( !dict_getword(wordlen, gotbuf) ) {
                  puts("DEV\texceeded upper-bound of dict!");
                  break;
                }
                printf(" %d: %s\n", i, gotbuf);
              }
                
              puts("\n try free():");
              dict_free();
                
              puts("\n done!\n");
              return 0;
            }
            
            /* OUTPUT:
             dictionary contents:
             wordlen = 5
              0: ALPHA
              1: BRAVO
              2: CATCH
              3: DRINK
             wordlen = 7
              0: EXTINCT
              1: FLAVOUR
            
             5-letter words from dict_getword() ...and exceed upper-bound:
             0: ALPHA
             1: BRAVO
             2: CATCH
             3: DRINK
            DEV  exceeded upper-bound of dict!
            
             try free():
            DEV  freed 5
            DEV  freed 7
            
             done!
               
            */
            Last edited by zmbd; May 8 '22, 03:16 PM. Reason: [Chris3020[Reason: kill all the wretched <tab>s]] [Z{had a go at it - removed another 30 tabs not sure about anu missing _ }]

            Comment

            • Chris3020
              New Member
              • May 2022
              • 15

              #7
              mmm; straight copy/paste code from BBEdit is __NOT__ a good idea: the <tab>s went crazy. Sorry!
              ...will try to edit!

              Comment

              • Chris3020
                New Member
                • May 2022
                • 15

                #8
                Apparently I can't edit stuff inside code ... /code tags.
                Ho hum!

                Chris

                Comment

                • zmbd
                  Recognized Expert Moderator Expert
                  • Mar 2012
                  • 5501

                  #9
                  Chris,
                  You can edit within the [CODE] [/CODE] tags; however, it takes a bit of extra effort! 👾
                  >As you discovered [Tabs] do not work

                  What I have found easier for myself is that pull the text over into something like Notepad++ and you can then do a replace on all of the tabs ( /t) with several spaces and then cut and paste the revised block back into the post.

                  Table layouts are the more difficult within the [CODE] [/CODE] it usually takes me a few extra tries to get things to line up nicely.

                  Please try to edit the code again... if you continue to have issues let one of us know what you're trying to do and we'll see if we can help you out.

                  Comment

                  • Chris3020
                    New Member
                    • May 2022
                    • 15

                    #10
                    Thanks zmbd.
                    Had a go at fixing the <tab> issue.
                    Chris

                    ...just noticed: code seems to have dropped some underscores! between copy and paste.
                    ...ah no! some underscores don't show, but copy from posted code to text-editor and they are back again.
                    [#r!ยง
                    Last edited by Chris3020; May 8 '22, 09:39 AM. Reason: "disappeared" underscores! ...are actually (invisibly) there.

                    Comment

                    • zmbd
                      Recognized Expert Moderator Expert
                      • Mar 2012
                      • 5501

                      #11
                      Originally posted by chris3020
                      @chris3020 Had a go at fixing the <tab> issue. [...] ...just noticed: Code seems to have dropped some underscores! Between copy and paste.
                      How absolutely frustrating.... sorry that's happening.
                      I had a go at the editor myself to see if there was anything I could do... purged another 30ish [Tab] characters from the text; however, comparing what shows online to what I had in the text editor didn't show any obvious differences - please take a moment to see if the underscores are still missing - if so, maybe give me the line numbers from the margin pleadings.
                      Last edited by zmbd; May 8 '22, 03:34 PM.

                      Comment

                      • Chris3020
                        New Member
                        • May 2022
                        • 15

                        #12
                        !!ASHAMED!! that you found 30 residual <tab> ...I went through line-by-line manually replacing <tab> with <space><space >.
                        I was using Mac TextEdit for this exercise (an app that I don't normally use, so I didn't try auto find/replace).
                        My only hope for salvation is that those errant <tab> were between meaningful char and \n.

                        !!BAFFLED!! by the underscores.
                        Looking again now, they are all there, but earlier they were not (evidence in .png but no way to post .png here without revealing to *.world the server concerned).

                        Even when the underscores were NOT visible chez vous in my browser, copy/paste to a text editor happily revealed the underscores - they were there!

                        Chris

                        Comment

                        • Chris3020
                          New Member
                          • May 2022
                          • 15

                          #13
                          Bah!
                          It just happened again:

                          open page: missing some (not all) underscores!
                          refresh page: hey ho, underscores are back!
                          eh?

                          Is it me?
                          MacBook Air (M1 2020)
                          MacOS BigSur 11.6.5
                          Safari 15.4 (16613.1.17.1.1 3, 16613)
                          // this is not the appropriate forum in which to discuss the most absurd version nomenclature on the planet!

                          Or you?

                          Chris

                          Comment

                          • Chris3020
                            New Member
                            • May 2022
                            • 15

                            #14
                            silly error:

                            Code:
                            static void dict_free(void) {
                              for (int i = 0; i < MAX_WORDLEN; ++i)...  // should be:  i < MAX_WORDLEN[B]+1[/B];
                            Last edited by zmbd; May 8 '22, 10:30 PM.

                            Comment

                            • zmbd
                              Recognized Expert Moderator Expert
                              • Mar 2012
                              • 5501

                              #15
                              Originally posted by Chris3020
                              @Chris3020{Safari 15.4 (16613.1.17.1.1 3, 16613)}
                              My guess... it's a Safari thing - doesn't seem to be happening in Edge, Firefox, nor Chrome
                              Of course, the Edge version I'm using has Chrome as the core engine... (shrug)
                              You can try to use Firefox; however, from what I understand, it uses the Safari core engine...
                              At this point - I think we leave it as is :)

                              Originally posted by Chris3020
                              {silly error:}
                              It's the little things that will kill you
                              Working a CSS and my body wouldn't update - "varBodyBg" is not the same as "VarBodyBg" took me ages to figure out that one mistake
                              and that's why we're here - that second (third, ... thousandth) set of eyes.

                              Comment

                              Working...