Data fusion is a wide subject area, so we will concentrate on the most powerful method in this, which is Bayesian or probabilistic data fusion for the application of sensor fusion or state estimation. This is where the state of the dynamic system to be estimated is encoded as a probability distribution. This probability distribution is then updated with the probability distributions of sensor measurements or other sources of information to form the most optimum estimate of the state of the system.
Probability provides a very powerful framework to build data fusion algorithms upon, it helps encapsulate the inherent uncertainty about the model and measurement processes and also helps us to know how accurate are the state estimates.
Sensor data fusion is usually split into two phases and most estimation processes follow a similar process of breaking the problem down into two recursive steps:
Prediction Step: to calculate the best estimate of the state for the current time using all the available information we have at that time. This is where the uncertainty in the estimates grows.
Update Step: where the current best state estimate is updated measurements or information when they become available, to produced an even better estimate. This is where the uncertainty in the estimate shrinks.
Now these two steps don't have to be run in sequence, you might have different sensors or measurement being made at different rates, so you might make multiple prediction steps before the next measurement update is fused in. You can also fuse in multiple sensors at the same time.

Now to carry out the mathematical operations required to do all the above calculations, it can become numerically difficult and computationally slow, especially if we start to estimate more than just one state.
This is where the power of the Kalman Filter comes into play. By making a few assumptions about the probability distributions, we can dramatically simplify the calculations, which opened up a wide range of applications, with some of the first being navigating a spaceship to the moon and back with less computational power as your mobile phone in your pocket!
Represent the estimated state as a probability distribution.
Represent the measurements or observations of the current state (or function of the states) as a probability distribution.
Fuse the two distributions to get a better estimate (Bayes’ Theorem)
Prediction process increases the uncertainty with time.
Update/Measurement process decreases the uncertainty.
Kalman Filter allows use to do this numerically and mathematically simply, by making a few assumptions about the probability distributions and a few other properties of the dynamic system.