LZW Compression Algorithm in VBScript

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    LZW Compression Algorithm in VBScript

    Introduction
    This article shows you how to implement the LZW lossless compression algorithm in VBScript. It can also be used in VBA as is or almost as is.



    The LZW Algorithm
    The LZW algorithm is a compression technique that results in no loss of data. It builds a dictionary of codes and values used in the compression on the fly. The dictionary is not stored with the compressed file and is discarded after compression. During decompression, the dictionary is rebuilt from the compressed data.

    The LZW algorithm functions by:
    1. Initialize the dictionary to contain all strings of length one
    2. Find the longest string in the dictionary that matches the current input
    3. Output the dictionary code for that matching input
    4. Append the next character from the input to the matching input string and add it as a new dictionary value with a new code
    5. Go to step 2


    The Code and How to Use It
    The code below is an example implementation of the LZW Algorithm in VBScript and is easily portable to VBA. The functions are LZWCompress and LZWUncompress and take the file path as the parameter.

    The dictionary is initialized to the full range of 8 bit values and uses 16 bits for each key. The dictionary reinitializes after reaching 65535 keys I did this for ease of implementation even though that also means it's not as compressed as it can be.

    My tests on large Access databases have shown an 86% compression level compared to 93% compression using "ultra" level compression with the LZMA algorithm in 7zip.

    My implementation of the algorithm is also slow due to the fact that I read the file 1 byte at a time. Again, this was due to ease of implementation. Would be more efficient to read a large amount of the file into memory rather than byte by byte.

    Code:
    Option Explicit
    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    
    Function LZWCompress(strPath)
    	Dim oFS, oFRead, oFWrite, oDict, strNext, strCurrent, intMaxCode, i
    	
    	Set oDict = CreateObject("Scripting.Dictionary")
    	Set oFS = CreateObject("Scripting.FileSystemObject")
    	Set oFRead = oFS.OpenTextFile(strPath, ForReading)
    	Set oFWrite = oFS.OpenTextFile(strPath & ".lzw", ForWriting, True)
    	Set oFS = Nothing
    	intMaxCode = 255
    	strCurrent = oFRead.Read(1)
    	
    	For i = 0 To 255
    		oDict.Add Chr(i), i
    	Next
    	
    	Do Until oFRead.AtEndOfStream
    		strNext = oFRead.Read(1)
    		
    		If oDict.Exists(strCurrent & strNext) Then
    			strCurrent = strCurrent & strNext
    		Else
    			oFWrite.Write(Chr(CByte(oDict.Item(strCurrent) \ 256)) & Chr(CByte(oDict.Item(strCurrent) Mod 256)))
    			
    			intMaxCode = intMaxCode + 1
    			oDict.Add strCurrent & strNext, intMaxCode
    			strCurrent = strNext
    			
    			If intMaxCode = 65535 Then
    				intMaxCode = 255
    				oDict.RemoveAll
    				
    				For i = 0 To 255
    					oDict.Add Chr(i), i
    				Next
    			End If
    		End If
    	Loop
    
    	oFWrite.Write(Chr(CByte(oDict.Item(strCurrent) \ 256)) & Chr(CByte(oDict.Item(strCurrent) Mod 256)))
    	
    	oFRead.Close
    	oFWrite.Close
    	Set oFRead = Nothing
    	Set oFWrite = Nothing
    	Set oDict = Nothing
    End Function
    
    Function LZWUncompress(strPath)
    	Dim oFS, oFRead, oFWrite, oDict, intNext, intCurrent, intMaxCode, i, strNext
    	
    	Set oDict = CreateObject("Scripting.Dictionary")
    	Set oFS = CreateObject("Scripting.FileSystemObject")
    	Set oFRead = oFS.OpenTextFile(strPath, ForReading)
    	Set oFWrite = oFS.OpenTextFile(strPath & ".unc", ForWriting, True)
    	Set oFS = Nothing
    	intMaxCode = 255
    	strNext = oFRead.Read(2)
    	intCurrent = 0
    	For i = 1 To Len(strNext)
    		intCurrent = intCurrent + 256 ^ (Len(strNext) - i) * Asc(Mid(strNext, i, 1))
    	Next
    	
    	For i = 0 To 255
    		oDict.Add i, Chr(i)
    	Next
    	
    	Do Until oFRead.AtEndOfStream
    		oFWrite.Write(oDict.Item(intCurrent))
    		intMaxCode = intMaxCode + 1
    		
    		strNext = oFRead.Read(2)
    		intNext = 0
    		For i = 1 To Len(strNext)
    			intNext = intNext + 256 ^ (Len(strNext) - i) * Asc(Mid(strNext, i, 1))
    		Next
    		
    		If oDict.Exists(intNext) Then
    			oDict.Add intMaxCode, oDict.Item(intCurrent) & Left(oDict.Item(intNext), 1)
    		Else
    			oDict.Add intMaxCode, oDict.Item(intCurrent) & Left(oDict.Item(intCurrent), 1)
    		End If
    		
    		If intMaxCode = 65535 Then
    			intMaxCode = 255
    			oDict.RemoveAll
    			
    			For i = 0 To 255
    				oDict.Add i, Chr(i)
    			Next
    		End If
    		
    		intCurrent = intNext
    	Loop
    	oFWrite.Write(oDict.Item(intCurrent))
    	
    	oFRead.Close
    	oFWrite.Close
    	Set oFRead = Nothing
    	Set oFWrite = Nothing
    	Set oDict = Nothing
    End Function
Working...