Splitting a long string

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Sheriffo
    New Member
    • Dec 2010
    • 3

    Splitting a long string

    Hi All,
    I have a string of 1500 or more characters. but I want to go through this string and split it at the 160th but it should not split in the middle of the word. If the 160th character is not the end of a word it should go back to the next word and split from that. My code is below but its not working well

    Code:
    sub get_horoscope {
        my($dbh,$thetype, $hname, $phone)=@_;
        my($val,$sth,$rec,$horoscope,$subset,$i);
        $sth=$dbh->prepare("SELECT ? FROM horoscope_table WHERE `horoscope_name`  = '$hname'");
        my($more)=$dbh->prepare("INSERT INTO chat_outbox (`sender`, `phone`,`text`,`insertdate`) VALUES ('466',TRIM(?),TRIM(?),NOW())");
        $sth->execute($thetype);
        $horoscope="";
        my(@types);
        my($start, $end) = (0, 160);
        my $set = length($rec->{'$thetype'}) % 160;
        while($rec=$sth->fetchrow_hashref()) {
        if(length($rec->{'$thetype'}) > 160){
            for($i = 0; $i <= $set; $i++){
                $subset = substr($rec->{'$thetype'}, $start, $end);
                $start+=160;
                $end+=160;
                $more->execute($subset);
                }else{
                    $horoscope = "$horoscope\n$rec->{'$thetype'}";  
                }    
            }
        }
        $sth->finish();
        syslog('info',__LINE__."::[get_horoscope] $horoscope");
        return ($horoscope);
    }
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    Would you be able to provide a sample of the data with which you are working? That way we can have something to work with that you are also using.

    Regards,

    Jeff

    Comment

    • toolic
      Recognized Expert New Member
      • Sep 2009
      • 70

      #3
      Text::Wrap

      Comment

      • raja goutham
        New Member
        • Dec 2010
        • 1

        #4
        Code:
        $i=22;
        $String="A Computer is an Electronic Device";
        until(substr($String, $i, 1)eq " ") {
        $i++;
        }
        print substr($String, 0, $i);
        
        # here the 22nd character is 't' or 'r' in Electronic Word... But it will print the whole "A Computer is an Electronic" as output... Just try...
        Answers from
        EMAIL REMOVED
        Last edited by numberwhun; Dec 7 '10, 03:55 PM. Reason: Emails in post are a violation of site policy. Also, PLEASE USE CODE TAGS!!

        Comment

        • Sheriffo
          New Member
          • Dec 2010
          • 3

          #5
          Thanks for that I really appreciate your efforts..

          Comment

          • chaarmann
            Recognized Expert Contributor
            • Nov 2007
            • 785

            #6
            You can do it with a regular expression.
            I have developed following regular expression and used it successfully with Java (I haven't tried it yet, but it should also work with Perl). The regular expression got a bit complicated, because I wanted it to work for any line length (not only 160) and for any text (also for empty text or short text or text with words that are longer than a line etc.), so I commented it well. The spaces should be preserved in a way that concatenationg all parts should give back the old string exactly as it was. If possible, the split should be done after the space.

            If the string contains newline-characters, split it first by newline-character and then apply the regular expression on each part.

            With help of the regular expression I will insert a newline-character everywhere where the split should occur, so I avoid passing arrays around (and concatenating array parts when saving to database or searching inside etc. later on).

            In the explanation below, "4" is an example value and should be replaced with the value of the constant MAX_LINE_LENGTH .
            The following regular expression splits the string into parts with length of maximum 4 characters using following rules in following order
            1. don't split if the whole line (or remainder) is less than 4 characters. regEx="(?s).{1, 4}$"
            2. if the word is bigger than 4 characters, split the word inside. regEx="[^\s]{4}"
            3. split before last space if a space follows exactly after 4 characters. regEx="(?s).{4} (?=\s)"
            4. split behind last space before maximum 4 character regEx="(?s).{0, 3}\s"

            Example: splitting "12345 67 8 9ab c de wxyz1 2 4 5 678" yields "1234", "5 67", " 8 ", "9ab ", "c de", " ", "wxyz", "1 2 ", "4 5 ", "678".

            In Java:
            Code:
            int maxLineLength=4; // usually you will get this value from your configuration file
            String oldText="12345 67 8 9ab c de wxyz1 2 4 5 678"; // this is the String you want to split
            
            // verify parameters.
            // The line should be splitted at newline-characters already before.
             if (maxLineLength < 1) throw new Exception("ERROR: maxLineLength is " + maxLineLength + ", but must be greater than 0!");
            if (oldText == null) throw new Exception("ERROR: text must not be null!");
            if (oldText.indexOf("\n") != -1) throw new Exception("ERROR: text must not contain newline characters!");
               
            // quick way out to increase performance
            if (oldText.isEmpty()) return "";
            
            final String regularExpression = "(?s).{1," + maxLineLength + "}$|[^\\s]{" + maxLineLength + "}|(?s).{" + maxLineLength + "}(?=\\s)|(?s).{0," + (maxLineLength - 1) + "}\\s";
            
            // insert newlines where we want to split the string into parts.
            // Note: Trailing empty strings are not included in the resulting array of the split() method. So the newline-char at the end of newText will have no effect. 
            String newText = oldText.replaceAll(regularExpression, "$0\n"); // append newline  
            
            return newText;
            Now you know the logic, so I leave it for you as an exercise to convert this program to Perl. Sorry, but I have no time any more to do it myself today. If you have difficulties, come back to me tomorrow and I will help you doing it.

            Comment

            • chaarmann
              Recognized Expert Contributor
              • Nov 2007
              • 785

              #7
              deleted deleted

              Comment

              • rovf
                New Member
                • May 2010
                • 41

                #8
                it should go back to the next word

                Do you mean: "go back to the previous word" or "go forward to the next word"?

                Comment

                • Sheriffo
                  New Member
                  • Dec 2010
                  • 3

                  #9
                  back to the previous word, because the subsentence should not be more than 160 characters..

                  Comment

                  • rovf
                    New Member
                    • May 2010
                    • 41

                    #10
                    Then, as toolic already suggested, Text::Wrap should do it.

                    Comment

                    • chaarmann
                      Recognized Expert Contributor
                      • Nov 2007
                      • 785

                      #11
                      Module Text:Wrap is exactly the same what my solution is doing, too. So if this module is not installed on production environment or you are not allowed to install it, you can use my solution instead. Also my solution can be easier modified in case you want its behaviour slightly changed.

                      Here it is in Perl:
                      Code:
                      #!/usr/local/perl-5.9-64/bin/perl
                      
                      package test;
                      
                      use warnings; 
                      use strict;
                      
                      # main program
                      my $newText = wrap(4, "12345 67 8 9ab c de wxyz1 2 4 5 678");
                      print "splitted String:\n";
                      my @stringParts = split(/\n/, $newText);
                      map {print "$_\n";} @stringParts;
                      
                      sub wrap {
                      	my ($maxLineLength, $oldText) = @_;
                      	
                      	my $regularExpression = '.{1,' . $maxLineLength . '}$|[^\\s]{' . $maxLineLength . '}|.{' . $maxLineLength . '}(?=\\s)|.{0,' . ($maxLineLength - 1) . '}\\s';	
                      	my $newText = $oldText;
                      	$newText =~ s/$regularExpression/$&\n/gs;
                      		 
                      	return $newText;
                      }
                      And here is the output when you run it:
                      splitted String:
                      Code:
                      splitted String:
                      1234
                      5 67
                       8
                      9ab
                      c de
                      
                      wxyz
                      1 2
                      4 5
                      678

                      Comment

                      • rovf
                        New Member
                        • May 2010
                        • 41

                        #12
                        So if this module is not installed on production environment

                        At least on Perl 5.10 or later, Text::Wrap is a standard module.

                        Comment

                        Working...