/*************************************************************************1* Name: lz.c2* Author: Marcus Geelnard3* Description: LZ77 coder/decoder implementation.4* Reentrant: Yes5*6* The LZ77 compression scheme is a substitutional compression scheme7* proposed by Abraham Lempel and Jakob Ziv in 1977. It is very simple in8* its design, and uses no fancy bit level compression.9*10* This is my first attempt at an implementation of a LZ77 code/decoder.11*12* The principle of the LZ77 compression algorithm is to store repeated13* occurrences of strings as references to previous occurrences of the same14* string. The point is that the reference consumes less space than the15* string itself, provided that the string is long enough (in this16* implementation, the string has to be at least 4 bytes long, since the17* minimum coded reference is 3 bytes long). Also note that the term18* "string" refers to any kind of byte sequence (it does not have to be19* an ASCII string, for instance).20*21* The coder uses a brute force approach to finding string matches in the22* history buffer (or "sliding window", if you wish), which is very, very23* slow. I recon the complexity is somewhere between O(n^2) and O(n^3),24* depending on the input data.25*26* There is also a faster implementation that uses a large working buffer27* in which a "jump table" is stored, which is used to quickly find28* possible string matches (see the source code for LZ_CompressFast() for29* more information). The faster method is an order of magnitude faster,30* but still quite slow compared to other compression methods.31*32* The upside is that decompression is very fast, and the compression ratio33* is often very good.34*35* The reference to a string is coded as a (length,offset) pair, where the36* length indicates the length of the string, and the offset gives the37* offset from the current data position. To distinguish between string38* references and literal strings (uncompressed bytes), a string reference39* is preceded by a marker byte, which is chosen as the least common byte40* symbol in the input data stream (this marker byte is stored in the41* output stream as the first byte).42*43* Occurrences of the marker byte in the stream are encoded as the marker44* byte followed by a zero byte, which means that occurrences of the marker45* byte have to be coded with two bytes.46*47* The lengths and offsets are coded in a variable length fashion, allowing48* values of any magnitude (up to 4294967295 in this implementation).49*50* With this compression scheme, the worst case compression result is51* (257/256)*insize + 1.52*53*-------------------------------------------------------------------------54* Copyright (c) 2003-2006 Marcus Geelnard55*56* This software is provided 'as-is', without any express or implied57* warranty. In no event will the authors be held liable for any damages58* arising from the use of this software.59*60* Permission is granted to anyone to use this software for any purpose,61* including commercial applications, and to alter it and redistribute it62* freely, subject to the following restrictions:63*64* 1. The origin of this software must not be misrepresented; you must not65* claim that you wrote the original software. If you use this software66* in a product, an acknowledgment in the product documentation would67* be appreciated but is not required.68*69* 2. Altered source versions must be plainly marked as such, and must not70* be misrepresented as being the original software.71*72* 3. This notice may not be removed or altered from any source73* distribution.74*75* Marcus Geelnard76* marcus.geelnard at home.se77*************************************************************************/787980/*************************************************************************81* INTERNAL FUNCTIONS *82*************************************************************************/838485/*************************************************************************86* _LZ_ReadVarSize() - Read unsigned integer with variable number of87* bytes depending on value.88*************************************************************************/8990static int _LZ_ReadVarSize( unsigned int * x, const unsigned char * buf )91{92unsigned int y, b, num_bytes;9394/* Read complete value (stop when byte contains zero in 8:th bit) */95y = 0;96num_bytes = 0;97do98{99b = (unsigned int) (*buf ++);100y = (y << 7) | (b & 0x0000007f);101++ num_bytes;102}103while( b & 0x00000080 );104105/* Store value in x */106*x = y;107108/* Return number of bytes read */109return num_bytes;110}111112113114/*************************************************************************115* PUBLIC FUNCTIONS *116*************************************************************************/117118119/*************************************************************************120* LZ_Uncompress() - Uncompress a block of data using an LZ77 decoder.121* in - Input (compressed) buffer.122* out - Output (uncompressed) buffer. This buffer must be large123* enough to hold the uncompressed data.124* insize - Number of input bytes.125*************************************************************************/126127unsigned int LZ_Uncompress( const unsigned char *in, unsigned char *out,128unsigned int insize )129{130unsigned char marker, symbol;131unsigned int i, inpos, outpos, length, offset;132133/* Do we have anything to uncompress? */134if( insize < 1 )135{136return 0;137}138139/* Get marker symbol from input stream */140marker = in[ 0 ];141inpos = 1;142143/* Main decompression loop */144outpos = 0;145do146{147symbol = in[ inpos ++ ];148if( symbol == marker )149{150/* We had a marker byte */151if( in[ inpos ] == 0 )152{153/* It was a single occurrence of the marker byte */154out[ outpos ++ ] = marker;155++ inpos;156}157else158{159/* Extract true length and offset */160inpos += _LZ_ReadVarSize( &length, &in[ inpos ] );161inpos += _LZ_ReadVarSize( &offset, &in[ inpos ] );162163/* Copy corresponding data from history window */164for( i = 0; i < length; ++ i )165{166out[ outpos ] = out[ outpos - offset ];167++ outpos;168}169}170}171else172{173/* No marker, plain copy */174out[ outpos ++ ] = symbol;175}176}177while( inpos < insize );178179return outpos;180}181182183