|
We can distinguish different ways of storing raster data, which basically vary in storage size and consequently in the geometric organisation of the storage. The following types of geometric elements are identified:
Raster data are managed easily in computers; all commonly used programming languages well support array handling. However, a raster when stored in a raw state with no compression can be extremely inefficient in terms of computer storage space.
As already said the way of improving raster space efficiency is data compression.
Illustrations and short texts are used to describe different methods of raster data storage and raster data compression techniques.
By convention, raster data is normally stored row by row from the top left corner.
Example: The Swiss Digital elevation model (DHM25-Matrixmodell in decimeters)
In the header file are stored the following information:
ncols 480 nrows 450 xllcorner 878923 yllcorner 207345 cellsize 25 nodata_value -9999 6855 6855 6855 6851 6851 6837 6824 6815
6808 6855 6857 6858 6858 6850 6839 6826 6814 6809 6854 6863 6865 6865 6849 6840 6826 6812 6803 6853 6852 6873 6886 6886 6853
6822 6804 6748 6847 6848 6886 6902 6904 6855 6808 6762 6686 6850 6859 6903 6903 6881 6806 6739 6681 6615 6845 6857 6879 6856
6795 6706 6638 6589 6539 6801 6827 6825 6769 6670 6597 6562 6522 6497 6736 6760 6735 6661 6592 6546 6517 6492 6487 ...
PROBLEM: Big amount of data!
Geographical data tends to be "spatially autocorrelated", meaning that objects which are close to each other tend to have similar attributes:
"All things are related, but nearby things are more related than distant things" (Tobler 1970) Because of this principle, we expect neighboring pixels to have similar values. Therefore, instead of repeating pixel values, we can code the raster as pairs of numbers - (run length, value).
The runlength coding is a widely used compression technique for raster data. The primary data elements are pairs of values or tuples, consisting of a pixel value and a repetition count which specifies the number of pixels in the run. Data are built by reading successively row by row through the raster, creating a new tuple every time the pixel value changes or the end of the row is reached.
Describes the interior of an area by run-lengths, instead of the boundary.
In multiple attribute case there a more options available:
We can note in Codes - III, that if a run is not required to break at the end of each line we can compress data even further.
NOTE: run coding would be of little use for DEM data or any other type of data where neighboring pixels almost always have different values.
See Rasterising vector data and the Freeman coding
This method is a generalization of run-length encoding to two dimensions. Instead of sequences of 0s or 1s, square blocks are counted. For each square the position, the size and, the contents of the pixels are stored.
The quadtree compression technique is the most common compression method applied to raster data. Quadtree coding stores the information by subdividing a square region into quadrants, each of which may be further subdivided in squares until the contents of the cells have the same values.
Reading versusExample 1:
Positioncode of 10: 3,2
Example 2:
On the following figure we can see how an area is represented on a map and the corresponding quadtree representation. More information on constructing and addressing quadtrees, see Lesson "Spatial partitioning and indexing", Unit 2
Huffmann coding compression technique involves preliminary analysis of the frequency of occurrence of symbols. Huffman technique creates, for each symbol, a binary data code, the length of which is inversely related to the frequency of occurrence.
LZ77 compression is a loss-less compression method, meaning that the values in your raster are not changed. Abraham Lempel and Jacob Ziv first introduced this compression method in 1977. The theory behind this compression method is relatively simple: when you find a match (a data value that has already been seen in the input file) instead of writing the actual value, the position and length (number of bytes) of the value is written to the output (the length and offset - where it is and how long it is).
Some image-compression methods often referred to as LZ (Lempel Ziv) and its variants such as LZW (Lempel Ziv Welch). With this method, a previous analysis of the data is not required. This makes LZ77 Method applicable to all the raster data types.
The JPEG-compression process:
JPEG-Compression (left to right: decreasing quality setting results in a 8x8 block generation of pixels)
Text and graphic from: Wikipedia 2010
In the following example, we can see the difference in data storing, between an uncompressed raster file and a compressed one.
Full raster coding 256 values | runlength coding 132 values |
Compress the following raster file with the compression techniques that you have learnt in this unit.
How much space is needed for storage, if we assume that bytes (8 bits) can be used to store the data values?