pointers to rows in Burrows-Wheeler Transform

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • phil_w@uk-businessdirectory.co.uk

    pointers to rows in Burrows-Wheeler Transform

    To practice some C I have been looking at the Burrows-Wheeler
    Transform.

    Ive got a small program that builds the initial table of possible
    combinations of the original string, used to find the compressed
    result, but there are some notes about using BWT without having to
    build a table of strings, instead using pointers to the offset in the
    original and just sorting those.

    My question is how do you return a pointer to "bcda" from "abcd"

    I started by doing something like

    char * ptr[input_str];

    for i =0 ; i < len(input_str); i++) {
    ptr[i] = (input_str + i);
    }

    but then that only gives "abcd" -"bcd " when i = 1.
  • Szabolcs Borsanyi

    #2
    Re: pointers to rows in Burrows-Wheeler Transform

    On Wed, May 28, 2008 at 01:56:59AM -0700, phil_w@uk-businessdirecto ry.co.uk wrote:
    >To practice some C I have been looking at the Burrows-Wheeler Transform.
    [snip]
    char * ptr[input_str];
    Is input_str a pointer to char or a character array? Then this is unlikely
    to do what you might intend. I guess you wanted to write one plus
    the maximal expected length of the string. (Or the actual one, in C99)
    >
    for i =0 ; i < len(input_str); i++) {
    Is for a tricky macro, or you just missed a parethesis?
    Is len a synonym for strlen? I am just guessing...
    ptr[i] = (input_str + i);
    }
    >
    but then that only gives "abcd" -"bcd " when i = 1.
    I wonder how you got that space appended to the end of the string...
    By the way: "bcda" never occurs in the memory, so you have to write
    it somehow somewhere (either by modifying the original string, or
    doing it out of place), just pointer arithmetics won't do that.

    You may want to place two adjacent copies of the string in one
    character array. And then you may indeed sort the pointers where
    the comparison rule that you pass to qsort sensibly accesses the relevant
    pair of substrings and calls e.g. memcmp()

    Szabolcs

    Comment

    • Szabolcs Borsanyi

      #3
      Re: pointers to rows in Burrows-Wheeler Transform

      On Wed, May 28, 2008 at 01:56:59AM -0700, phil_w@uk-businessdirecto ry.co.uk wrote:
      >To practice some C I have been looking at the Burrows-Wheeler Transform.
      [snip]
      char * ptr[input_str];
      Is input_str a pointer to char or a character array? Then this is unlikely
      to do what you might intend. I guess you wanted to write one plus
      the maximal expected length of the string. (Or the actual one, in C99)
      >
      for i =0 ; i < len(input_str); i++) {
      Is for a tricky macro, or you just missed a parethesis?
      Is len a synonym for strlen? I am just guessing...
      ptr[i] = (input_str + i);
      }
      >
      but then that only gives "abcd" -"bcd " when i = 1.
      I wonder how you got that space appended to the end of the string...
      By the way: "bcda" never occurs in the memory, so you have to write
      it somehow somewhere (either by modifying the original string, or
      doing it out of place), just pointer arithmetics won't do that.

      You may want to place two adjacent copies of the string in one
      character array. And then you may indeed sort the pointers where
      the comparison rule that you pass to qsort sensibly accesses the relevant
      pair of substrings and calls e.g. memcmp()

      Szabolcs

      Comment

      • Szabolcs Borsanyi

        #4
        Re: pointers to rows in Burrows-Wheeler Transform

        Sorry for double posting. Szabolcs

        Comment

        • Jens Thoms Toerring

          #5
          Re: pointers to rows in Burrows-Wheeler Transform

          phil_w@uk-businessdirecto ry.co.uk wrote:
          To practice some C I have been looking at the Burrows-Wheeler
          Transform.
          Ive got a small program that builds the initial table of possible
          combinations of the original string, used to find the compressed
          result, but there are some notes about using BWT without having to
          build a table of strings, instead using pointers to the offset in the
          original and just sorting those.
          My question is how do you return a pointer to "bcda" from "abcd"
          Not directly.
          I started by doing something like
          char * ptr[input_str];
          I have no idea what 'input_str' is ment to be, you use it
          here as if it would be some integer value, but in the rest
          as if it would be a string. Perhaps you meant

          char (*p)[ strlen( input_str ) ];

          I will assume that for the rest.
          for i =0 ; i < len(input_str); i++) {
          I guess you meant 'strlen(input_s tr)', didn't you?
          ptr[i] = (input_str + i);
          }
          but then that only gives "abcd" -"bcd " when i = 1.
          You now have for the input string "abcd" four pointers which
          point to strings as in

          p[0] - "abcd"
          p[1] - "bcd"
          p[2] - "cd"
          p[3] - "d"

          Something like "bcd " can't occur (unless there's some other
          mistake in the code you're not showing) since there's no
          space in the input string.

          By using all combinations of these pointers (and only
          using the character pointed to directly, i.e. the first
          char of the strings), you can construct all possible com-
          binations of the original string successively.

          E.g. one possible combiattion could be constructed, by

          char r[ strlen( input_str ) + 1 ];

          r[0] = *p[1];
          r[1] = *p[2];
          r[2] = *p[d];
          r[3] = *p[0];
          r[4] = '\0';

          resulting in the 'r' holding the string "bcda". I guess
          that's about what you wrote above to intended to do.

          Regards, Jens
          --
          \ Jens Thoms Toerring ___ jt@toerring.de
          \______________ ____________ http://toerring.de

          Comment

          • Richard Heathfield

            #6
            Re: pointers to rows in Burrows-Wheeler Transform

            phil_w@uk-businessdirecto ry.co.uk said:
            To practice some C I have been looking at the Burrows-Wheeler
            Transform.
            >
            Ive got a small program that builds the initial table of possible
            combinations of the original string, used to find the compressed
            result, but there are some notes about using BWT without having to
            build a table of strings, instead using pointers to the offset in the
            original and just sorting those.
            >
            My question is how do you return a pointer to "bcda" from "abcd"
            >
            I started by doing something like
            >
            char * ptr[input_str];
            >
            for i =0 ; i < len(input_str); i++) {
            ptr[i] = (input_str + i);
            }
            >
            but then that only gives "abcd" -"bcd " when i = 1.
            This looks very shaky. Where is the memory for the strings? Do you "own"
            it, or do the pointers (for char *ptr[input_str] is an array of pointers,
            not an array of strings) point to string literals? If they do, modifying
            the strings invokes undefined behaviour - all you might usefully do is
            reorder the pointers.

            If you own memory for a string, you can "rotate" it like this:

            #include <string.h>

            void rotate_string_o ne_place(char *s, size_t len)
            {
            char t = *s;
            memmove(s, s + 1, len - 1);
            s[len - 1] = t;
            }

            /* sample driver */
            #include <stdio.h>
            #include <string.h/* not needed if you're doing this all in one file
            because it's already included above */
            int main(void)
            {
            char sample[] = "Yes, I own the memory for this string";
            size_t len = strlen(sample);
            size_t i;
            for(i = 0; i < len; i++)
            {
            rotate_string_o ne_place(sample , len);
            printf("%s\n", sample);
            }
            return 0;
            }

            Sample output:

            es, I own the memory for this stringY
            s, I own the memory for this stringYe
            , I own the memory for this stringYes
            I own the memory for this stringYes,
            I own the memory for this stringYes,
            own the memory for this stringYes, I
            own the memory for this stringYes, I
            wn the memory for this stringYes, I o
            n the memory for this stringYes, I ow
            the memory for this stringYes, I own
            the memory for this stringYes, I own
            he memory for this stringYes, I own t
            e memory for this stringYes, I own th
            memory for this stringYes, I own the
            memory for this stringYes, I own the
            emory for this stringYes, I own the m
            mory for this stringYes, I own the me
            ory for this stringYes, I own the mem
            ry for this stringYes, I own the memo
            y for this stringYes, I own the memor
            for this stringYes, I own the memory
            for this stringYes, I own the memory
            or this stringYes, I own the memory f
            r this stringYes, I own the memory fo
            this stringYes, I own the memory for
            this stringYes, I own the memory for
            his stringYes, I own the memory for t
            is stringYes, I own the memory for th
            s stringYes, I own the memory for thi
            stringYes, I own the memory for this
            stringYes, I own the memory for this
            tringYes, I own the memory for this s
            ringYes, I own the memory for this st
            ingYes, I own the memory for this str
            ngYes, I own the memory for this stri
            gYes, I own the memory for this strin
            Yes, I own the memory for this string


            --
            Richard Heathfield <http://www.cpax.org.uk >
            Email: -http://www. +rjh@
            Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
            "Usenet is a strange place" - dmr 29 July 1999

            Comment

            • Ben Bacarisse

              #7
              Re: pointers to rows in Burrows-Wheeler Transform

              phil_w@uk-businessdirecto ry.co.uk writes:
              To practice some C I have been looking at the Burrows-Wheeler
              Transform.
              <snip>
              there are some notes about using BWT without having to
              build a table of strings, instead using pointers to the offset in the
              original and just sorting those.
              Usually a good idea, yes.
              My question is how do you return a pointer to "bcda" from "abcd"
              You've have lots of C answers, so I won't repeat them, but there is
              one algorithm question that does impact on the C. You have to decide
              how to represent the "end marker" because this has to included in the
              sort. C gives you an obvious choice (the terminating null of the
              string) but if you go that route you must remember that the result of
              the BWT will (most often) not be a string, so care will be needed when
              do anything with it.

              --
              Ben.

              Comment

              • James Dow Allen

                #8
                Re: pointers to rows in Burrows-Wheeler Transform

                On May 28, 3:56 pm, phi...@uk-businessdirecto ry.co.uk wrote:
                To practice some C I have been looking at the Burrows-Wheeler
                Transform.
                >
                My question is how do you return a pointer to "bcda" from "abcd"
                Start by duplicating the input string, i.e. get
                "abcdabcd".
                The substrings won't be null-terminated but
                you won't need them to be.

                If you wish you can look at my old implementation
                Burrows-Wheeler Transform Programming example text compression

                though I see it has stylistic problems.

                James

                Comment

                Working...