Ruby regex for removing C/Java-style /* ... */ comments

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • beatTheDevil
    New Member
    • Nov 2006
    • 16

    Ruby regex for removing C/Java-style /* ... */ comments

    Hey guys,

    As the title says I'm trying to make a regular expression (regex/regexp) for use in removing the comments from code. In this case, this particular regex is meant to match /* ... */ comments.

    I'm using Ruby v.1.8.6

    Here's my regex:
    Code:
    multiline_comments = /\/\*(.*?)\*\//
    When I try
    Code:
    myStr.gsub(multiline_comments, "")
    I see no effect. The string has big fat comments in it too. I tried using this regex in irb with a couple test strings, and it works perfectly. This leads me to think I don't understand some subtlety of file io, so here's all my code (cop-out, I know). I'm trying to write a very simple JavaScript compacter, but I want to preserve readability so I'm only getting rid of unnecessary newlines, spaces in between tokens, and comments. I DON'T want the whole file on one line like some compactors do it. Anyway, here goes:

    Code:
    # Non-destructive JavaScript Packer
    # =================================
    #
    # Reduces overall script filesize by removing comments
    # and unecessary whitespace. Does not affect variable naming,
    # indentation, or line-by-line formatting in order to maximize
    # readability.
    
    def pack_line(file_line)
    
    	return '' unless file_line
    
    	#puts "The next line: " + file_line
    
    	#kill one-line (//...) comments
    	line_comments = /(\S*)\s*\/\/.*/
    	intermed = file_line.gsub(line_comments, '\1')
    	intermed += "\n" if intermed[intermed.length - 1] != "\n"
    	
    	#puts "\tAfter one-liner removal: " + intermed
    	
    	#kill unnecessary whitespace
    	extra_whitespace = /([^(var|function|return|\s*)])[ \t]+(.*?)/
    	intermed = intermed.gsub(extra_whitespace, '\1\2')
    	
    	#puts "\tAfter extra whitespace removal: " + intermed
    	
    	intermed
    end
    
    #performs the packing operation, returns a single string
    #representing the packed document
    def pack(file)
    	lines = Array.new
    	
    	file.each_line do |line|
    		lines.push pack_line(line)
    	end
    	
    	intermed = lines.join
    	
    	#puts "\tBefore multi-liner removal: " + intermed
    	
    	#kill multi-line (/* ... */) comments
    	multiline_comments = /\/\*(.*?)\*\//
    	intermed = intermed.gsub(multiline_comments, '')
    	
    	#puts "\tAfter multi-liner removal: " + intermed
    	
    	#kill extra new lines
    	extra_newlines = /(\r?\n){2,}/
    	intermed = intermed.gsub(extra_newlines, "\n")
    	
    	#puts "\tFinally: " + intermed + "\n"
    	
    	intermed
    end
    
    #open file for reading and pass it to pack()
    def init(in_file, out_file)
    	file = File.new(in_file, "r")
    
    	newDoc = pack(file)
    	
    	file.close
    	
    	if out_file then
    		file = File.new(out_file, "w")
    		
    		file.puts(newDoc)
    		
    		file.close
    	else
    		puts newDoc
    	end
    end
    
    #start the script with the command-line arg file name
    puts init(ARGV[0], ARGV[1])
    Any ideas? Thanks for all your help.
  • beatTheDevil
    New Member
    • Nov 2006
    • 16

    #2
    More specifically, even though it works fine on one-line strings, I think I've found that it's unable to match this style of comments across new lines ("\n"). Is there a way to get around this? I thought the '.' matched any character whatsoever...

    Comment

    • roguesheep
      New Member
      • Aug 2007
      • 1

      #3
      http://blade.nagaokaut .ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/11137

      Comment

      Working...