Variogram

Let us first get acquainted with the problem by looking at an example: Imagine a digital terrain model and take samples. The value of a sample is the height above sea level. Adjacent samples may have been randomly taken along a valley floor of the same altitude. Another pair of samples with approximately the same distance between them may have been taken on a ridge. If you compare the values of the two pairs you will notice a match or at least a similarity of the values. Let's compare samples with a greater distance between them. It is possible that they have similar values but it is more likely that their values (i.e. the sea level) are dissimilar.

Variography is a method that performs this pairwise comparison for all of our samples: every point is compared to every other point. This can add up to a lot of pairs of points depending on the number of samples. To be exact, it adds up to = n*(n-1)/2 (n ... number of samples). You might ask, "Where does the distance come into play?" While each point is compared to every other point, the distance (and direction) of the pairs is determined as well!

Precipitation

3 lags (Lag 0, Lag1, Lag2) for a data point (value 58) are shown. The numbers are values of Swiss precipitation monitoring stations.

Pairs of values:
for Lag0
58,65
58,91
58,54
58,72

for Lag1
58,45
58,64
58,82
etc.

From these numerous pairs of values the so-called "semivariance" is calculated as a measure of similarity (and we can also interpret it as "dependency").

Formula for semivariance

γ (h)

... Semivariance for the distance h

N (h)

... Number of pairs within distance h

v_{i}, v_{j}

... Values at position i and j

In simple words, the difference between the value pairs is squared and halved. This parameter is calculated for each distance interval h – only the value pairs within this distance are included in the calculation. This distance h is called a "lag". Enter all value pairs within one lag on a scatter plot and you will get the so-called h-scatterplot.
From the semivariances per lag, the empirical (or experimental) semivariogram is created as a line graph. Move the mouse over the lag points to display the corresponding h-scatterplot for the first 8 lags):

Semivariogram (move the mouse over the lag-points to see the h-scatterplot of lag 0-7)

Can you imagine why there are clearly fewer points in the h-scatterplots of low distances than in those of the higher lags? (Click here for more information)

The x-axis shows the increasing distance between pairs of points; the y-axis shows the semivariance per lag. The circular symbols on the curve mark the individual lags. In this example, the lag interval is 15'000. How do we interpret a curve like that? The more similar the pairs of values are per lag, the lower the semivariance for this lag; the more dissimilar, the higher the semivariance and thus the curve rises. This curve confirms: the values of our data are more similar at low distance. There is a direct connection between the distance between the data points and their similarity in value! There are two key figures to keep in mind to help you describe this curve:

Range – the distance h, where the curve flattens
Sill – the value of semivariance where the curve reaches its range

If the lag interval in the example above is 15'000 then why is the first lag (= lag 0 or h0) not in the coordinate origin? Simply because the pairs of points in lag 0 are at a certain distance from one another. Their average distance is now the position for lag 0 on the x-axis. But why does the curve not start at semivariance 0, i.e. on the x-axis? Because the data in lag 0 are not all identical (this is often the case). That is the reason why the origin of the semivariogram curve usually lies just above the x-axis. This is called the nugget effect. This term comes from the use of this method in geological exploration. In samples of gold, nuggets can occur selectively, i.e. the values of immediately adjacent samples may differ considerably.

a) Lag 0 includes all pairs of points within the first lag. The average distance between the points marks the lag on the x-axis; b) The pairs of points in lag 0 show different values; therefore, the semivariance is not equal to 0 but slightly above the x-axis (=nugget effect)

In the simplest form, the pairs of values of each point in every direction are formed and an isotropic semivariogram is created. As an extension and refinement to this method, variogram programs can create pairs of points in specific directions. By doing this, you can examine if values in your dataset have higher spatial dependencies in certain directions. Think about the example above with sea level: if – in your data – there is a valley running in N-S-direction, points in this direction will show higher similarities than in E-W-direction. The result is now an anisotropic semivariogram. If you have lost sight of the overall goal: all this information about spatial dependencies and its structure can be used to estimate the unknown values.

Use the following interactive semivariance calculator and enter pairs of values. First, choose similar values (up to 99), then vary the values and let them be more dissimilar. Observe how the semivariance changes! Note how easily the semivariance formula is implemented.

Semivariance Calculator

What happens if you enter the same value for every point? Is it of any importance, in what order you enter the pairs of values? (Click here for more information)