VBA macro for removing unwanted paragraph break

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fngb3
    New Member
    • Jun 2016
    • 3

    VBA macro for removing unwanted paragraph break

    I am in Word and have a lot of text pulled from a PDF that has inserted breaks in to lines. For example it may go

    "The quick brown fox jumps
    over the lazy dog."

    ...and what I'd want, obviously, is that sentence all on one line

    e.g. "The quick brown fox jumps over the lazy dog."

    Essentially, every time there is a paragraph break (I believe they are Char(10) or vbLf) followed by a lowercase letter, I want to change that paragraph character to a simple space.

    I am completely new to visual basic. I have some Frankenstein-monster-esque chunks of code that I've pulled from various places on the internet, but I'm afraid that sharing it would probably limit people's recommendations on how to do what at least appears to be a fairly simple thing by making them try to use or fix the structure of what I've got, when I'm not the least bit attached to it anyway.

    Can someone please point me in the right direction here?
    Much obliged
  • Luk3r
    Contributor
    • Jan 2014
    • 300

    #2
    Go ahead and share the code you've currently got and you'll get more help instead of us working in the dark. :)

    Comment

    • fngb3
      New Member
      • Jun 2016
      • 3

      #3
      Sub ChangeCase1()
      Dim vFindLineBreak As Variant
      Dim vReplace As Variant
      Dim orng As Range

      'This is what I found that seems to detect a lowercase character from a to z.
      Function FirstLower(strI n As String) As String
      Dim objRegex As Object
      Dim objRegM As Object
      Set objRegex = CreateObject("v bscript.regexp" )
      With objRegex
      .Pattern = "[a-z]"
      .ignorecase = False
      If .test(strIn) Then
      Set objRegM = .Execute(strIn) (0)
      FirstLower = objRegM.firstin dex + 1
      Else
      FirstLower = "no match"
      End If
      End With
      End Function

      'I want this to find instances where a line break is followed by a lowercase character, and replace those two things with a space.
      vFindText = vbLf + FirstLower

      vReplaceText = " "

      Next
      End Sub

      Comment

      • fngb3
        New Member
        • Jun 2016
        • 3

        #4
        Or, and this seems more promising.

        Sub findlower()

        Selection.Find. ClearFormatting
        Selection.Find. Replacement.Cle arFormatting
        With Selection.Find
        .Text = "^13 + ([a-z])"
        .Replacement.Te xt = " \1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLik e = False
        .MatchAllWordFo rms = False
        End With
        Selection.Find. Execute
        With Selection
        If .Find.Forward = True Then
        .Collapse Direction:=wdCo llapseStart
        Else
        .Collapse Direction:=wdCo llapseEnd
        End If
        .Find.Execute Replace:=wdRepl aceOne
        If .Find.Forward = True Then
        .Collapse Direction:=wdCo llapseEnd
        Else
        .Collapse Direction:=wdCo llapseStart
        End If
        .Find.Execute
        End With
        End Sub

        Comment

        • Luk3r
          Contributor
          • Jan 2014
          • 300

          #5
          I think you can accomplish what you're after by changing this line:
          Code:
          Pattern = "[a-z]"
          to this:
          Code:
          Pattern = "[a-z][\r\n]"
          In case you're wondering how I tested this, I used the following code in Visual Studio using VB.NET, but the regex should still be the same.
          Code:
          'Create a test string with a line feed (vbLf) in the middle
          Dim testString As String = "The quick brown fox jumps" & vbLf & " over the lazy dog."
          
          'Display test string in a window
          MsgBox(testString)
          
          'Using regular expressions, replace the line feed if it follows lowercase letter
          testString = System.Text.RegularExpressions.Regex.Replace(testString, "[a-z][\r\n]", "")
          
          'Display new test string format in a window
          MsgBox(testString)

          Comment

          Working...