Regular expression for capturing the first alpha string in a document

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • keydrive
    New Member
    • Oct 2007
    • 57

    Regular expression for capturing the first alpha string in a document

    Hi,

    I need a regular expression for capturing the first alpha string in a document not necessarily on the first line

    match first instance

    Here is a few tried permutations

    Code:
    [:alpha:]*
    
    [:alpha:]*.[:alpha:]*
    
    [A-Z][a-z]*.[A-Z][a-z]*\r
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    #2
    It would help to see some example inputs and expected results.

    Comment

    • keydrive
      New Member
      • Oct 2007
      • 57

      #3
      Hi,

      Here is an example document. I only want the first string in the document captured - nothing else. Note the First string could be more than two words in mix case. The end of line does not work for me

      Code:
      %20
      Banff Springs
      Banff Sprints Hotel
      %20
      Mountain Biking 34.78
      %20
      Table # 55  holiday forward
      $1,899,999.00
      
      55555
      Togo Hot Specials

      Comment

      • Rabbit
        Recognized Expert MVP
        • Jan 2007
        • 12517

        #4
        So in that example you just want the word Banff? We also need to know what language you're programming in or which regex engine you're using.

        Comment

        • keydrive
          New Member
          • Oct 2007
          • 57

          #5
          JRegex is fine.

          I need to capture the title beit on the first line, second or third after a single return char or multiple. So if there are three blank lines and then text that is to be assumed the title and all the text on that line is required. In this example it would be Banff Springs.

          Thanks for you help

          Comment

          • Rabbit
            Recognized Expert MVP
            • Jan 2007
            • 12517

            #6
            Well, assuming multiline is turned on then the regex you would use is "[^\w]*\w[^\n]*" So basically, any number of non-alphabet characters followed by an alphabet character followed by any number of non-new line characters.

            Comment

            • keydrive
              New Member
              • Oct 2007
              • 57

              #7
              The first line is now being captured as the first hit in the test-text area of QuickREx. The other text below is also matching as secondary.(in yellow) if you are farmilar with QuickREx. I tried to include a line brake so that it would only highlight the first instance, being the title but it doesn't seem to work.

              Is there an option so that only the first instance is captured? Like, read all information until end of line and nothing else?

              Thanks for your help!

              Comment

              • Rabbit
                Recognized Expert MVP
                • Jan 2007
                • 12517

                #8
                Turns out you can just use [a-zA-z][^\n]* and if you have global turned off, it'll return the first result and if you have global turned on, you just need the first result.

                Comment

                • keydrive
                  New Member
                  • Oct 2007
                  • 57

                  #9
                  Thanks for your help Dr. Rabbit

                  Appreciate the assistance!

                  Comment

                  • rampdv
                    New Member
                    • Feb 2013
                    • 7

                    #10
                    Hi keydrive
                    Here i provide the solution to u r example document using regular expression in perl.
                    Here i read a file data word by word and push that data into array and print the first value of the array

                    if u have any doubts or find any error in code please make a follow up
                    Code:
                    open(alpha,"alphastring.txt");
                    
                    while(<alpha>){
                    
                    while($_ =~ /([a-zA-Z]+)/ig) {
                    
                    push @array,$1;
                    
                    }
                    
                    }
                    print " $array[0] \n";

                    Comment

                    Working...