Introduction
This article shows you how to implement the LZW lossless compression algorithm in VBScript. It can also be used in VBA as is or almost as is.
The LZW Algorithm
The LZW algorithm is a compression technique that results in no loss of data. It builds a dictionary of codes and values used in the compression on the fly. The dictionary is not stored with the compressed file and is discarded after compression. During decompression, the dictionary is rebuilt from the compressed data.
The LZW algorithm functions by:
The Code and How to Use It
The code below is an example implementation of the LZW Algorithm in VBScript and is easily portable to VBA. The functions are LZWCompress and LZWUncompress and take the file path as the parameter.
The dictionary is initialized to the full range of 8 bit values and uses 16 bits for each key. The dictionary reinitializes after reaching 65535 keys I did this for ease of implementation even though that also means it's not as compressed as it can be.
My tests on large Access databases have shown an 86% compression level compared to 93% compression using "ultra" level compression with the LZMA algorithm in 7zip.
My implementation of the algorithm is also slow due to the fact that I read the file 1 byte at a time. Again, this was due to ease of implementation. Would be more efficient to read a large amount of the file into memory rather than byte by byte.
This article shows you how to implement the LZW lossless compression algorithm in VBScript. It can also be used in VBA as is or almost as is.
The LZW Algorithm
The LZW algorithm is a compression technique that results in no loss of data. It builds a dictionary of codes and values used in the compression on the fly. The dictionary is not stored with the compressed file and is discarded after compression. During decompression, the dictionary is rebuilt from the compressed data.
The LZW algorithm functions by:
- Initialize the dictionary to contain all strings of length one
- Find the longest string in the dictionary that matches the current input
- Output the dictionary code for that matching input
- Append the next character from the input to the matching input string and add it as a new dictionary value with a new code
- Go to step 2
The Code and How to Use It
The code below is an example implementation of the LZW Algorithm in VBScript and is easily portable to VBA. The functions are LZWCompress and LZWUncompress and take the file path as the parameter.
The dictionary is initialized to the full range of 8 bit values and uses 16 bits for each key. The dictionary reinitializes after reaching 65535 keys I did this for ease of implementation even though that also means it's not as compressed as it can be.
My tests on large Access databases have shown an 86% compression level compared to 93% compression using "ultra" level compression with the LZMA algorithm in 7zip.
My implementation of the algorithm is also slow due to the fact that I read the file 1 byte at a time. Again, this was due to ease of implementation. Would be more efficient to read a large amount of the file into memory rather than byte by byte.
Code:
Option Explicit
Const ForReading = 1, ForWriting = 2, ForAppending = 8
Function LZWCompress(strPath)
Dim oFS, oFRead, oFWrite, oDict, strNext, strCurrent, intMaxCode, i
Set oDict = CreateObject("Scripting.Dictionary")
Set oFS = CreateObject("Scripting.FileSystemObject")
Set oFRead = oFS.OpenTextFile(strPath, ForReading)
Set oFWrite = oFS.OpenTextFile(strPath & ".lzw", ForWriting, True)
Set oFS = Nothing
intMaxCode = 255
strCurrent = oFRead.Read(1)
For i = 0 To 255
oDict.Add Chr(i), i
Next
Do Until oFRead.AtEndOfStream
strNext = oFRead.Read(1)
If oDict.Exists(strCurrent & strNext) Then
strCurrent = strCurrent & strNext
Else
oFWrite.Write(Chr(CByte(oDict.Item(strCurrent) \ 256)) & Chr(CByte(oDict.Item(strCurrent) Mod 256)))
intMaxCode = intMaxCode + 1
oDict.Add strCurrent & strNext, intMaxCode
strCurrent = strNext
If intMaxCode = 65535 Then
intMaxCode = 255
oDict.RemoveAll
For i = 0 To 255
oDict.Add Chr(i), i
Next
End If
End If
Loop
oFWrite.Write(Chr(CByte(oDict.Item(strCurrent) \ 256)) & Chr(CByte(oDict.Item(strCurrent) Mod 256)))
oFRead.Close
oFWrite.Close
Set oFRead = Nothing
Set oFWrite = Nothing
Set oDict = Nothing
End Function
Function LZWUncompress(strPath)
Dim oFS, oFRead, oFWrite, oDict, intNext, intCurrent, intMaxCode, i, strNext
Set oDict = CreateObject("Scripting.Dictionary")
Set oFS = CreateObject("Scripting.FileSystemObject")
Set oFRead = oFS.OpenTextFile(strPath, ForReading)
Set oFWrite = oFS.OpenTextFile(strPath & ".unc", ForWriting, True)
Set oFS = Nothing
intMaxCode = 255
strNext = oFRead.Read(2)
intCurrent = 0
For i = 1 To Len(strNext)
intCurrent = intCurrent + 256 ^ (Len(strNext) - i) * Asc(Mid(strNext, i, 1))
Next
For i = 0 To 255
oDict.Add i, Chr(i)
Next
Do Until oFRead.AtEndOfStream
oFWrite.Write(oDict.Item(intCurrent))
intMaxCode = intMaxCode + 1
strNext = oFRead.Read(2)
intNext = 0
For i = 1 To Len(strNext)
intNext = intNext + 256 ^ (Len(strNext) - i) * Asc(Mid(strNext, i, 1))
Next
If oDict.Exists(intNext) Then
oDict.Add intMaxCode, oDict.Item(intCurrent) & Left(oDict.Item(intNext), 1)
Else
oDict.Add intMaxCode, oDict.Item(intCurrent) & Left(oDict.Item(intCurrent), 1)
End If
If intMaxCode = 65535 Then
intMaxCode = 255
oDict.RemoveAll
For i = 0 To 255
oDict.Add i, Chr(i)
Next
End If
intCurrent = intNext
Loop
oFWrite.Write(oDict.Item(intCurrent))
oFRead.Close
oFWrite.Close
Set oFRead = Nothing
Set oFWrite = Nothing
Set oDict = Nothing
End Function