Stochastic approaches to integration#
Here we will revisit our discussion of numerical integration, and think about how we can use sampling to approximate integrals.
Expectation values from sampling#
Last lecture we discussed how to sample from probability distributions, but we totally avoided the question of why we might want to do such sampling. The obvious answer is that we want to understand spome property of the distribution.
For example, when we were discussing the hard disc gas we argued that because of the ergodic dynamics all valid configurations were equally likely. Thus we should be able to calculate the pair correlation functions by simply sampling all valid configurations.
In the end, anything we end up calculating can be expressed as some sort of expectation value. Suppose we have a set of states \(j\), which occur with probability \(p_j\). We then have some quantity \(X_j\) which depends on the state. The expectation value is
We can approximate this expectation value by taking a sample of \(N\) states,
here \(\alpha\) labels the realizations, and \(j_\alpha\) is the state in that realization. We will often shorten this to
This notation is somewhat confusing, though, as the probability distribution doesn’t appear anywhere in it. The point is, however, that the \(j\)’s are drawn from it. One often calls \(X^{\rm est}_N\) the sample mean, and \(\langle X\rangle\) the population mean.
Here is an example. We are going to generate \(j\)’s between 1 and 10, with probablity distribution \(p_j\propto j\)
using StatsBase
function ts()
sample(1:10,Weights([j for j in 1:10]))
end
ts (generic function with 1 method)
using Plots
histogram([ts() for j in 1:1000])
Lets now estimate
# Exact answer
exactmean=sum(j^2 for j in 1:10)/sum(j for j in 1:10)
7.0
function approxmean(numsamples)
result=0.
for j in 1:numsamples
result+=ts()
end
return result/numsamples
end
approxmean(100)
7.08
Every time we do this we get a different answer. Here is the distribution of answers
histogram([approxmean(100) for j in 1:10000])
If I sample more times, I get a smaller spread
histogram([approxmean(100) for j in 1:10000],label="100 realizations",normed=true)
histogram!([approxmean(400) for j in 1:10000],label="400 realizations",
normed=true,color=RGBA(1,0,0,0.5))
We understand this result by looking at the expectation value and standard deviation of our sample estimator
Thus on average the estimator is a good approximation to the mean. The spread is
where for notational simplicity we have written \(\bar{X}=\langle X\rangle\).
Now if the samples are uncorrelated the only non-zero terms have \(\alpha=\beta\) and the sum becomes
where
is the population standard deviation.
Lets see if this works
sigma=sqrt(sum(j*(j-exactmean)^2 for j in 1:10)/sum(j for j in 1:10))
2.449489742783178
histogram([approxmean(100) for j in 1:10000],label="100 realizations",normed=true)
histogram!([approxmean(400) for j in 1:10000],label="400 realizations",
normed=true,color=RGBA(1,0,0,0.5))
plot!(j->exp(-(j-exactmean)^2/2(sigma^2/100))*sqrt(100/(2*pi*sigma^2)),6,8,label="")
plot!(j->exp(-(j-exactmean)^2/2(sigma^2/400))*sqrt(400/(2*pi*sigma^2)),6,8,label="")
Now typically we don’t know what the population standard deviation is. However, we can estimate it from the sample standard deviation. Intuitively, it should be well approximated by the sample standard deviation.
More precisely, however
On average we then have
Thus our best estimate of the population standard deviation is
We are always going to be using a large enough \(N\) that we don’t really need to worry about the difference between \(N\) and \(N-1\).
To reitterate, to estimate the error of our mean, we first calculate the standard deviation of our sample. We then divide by the square root of the number of realizations (or \(N-1\)).
Calculating Integrals#
We now have the background to begin thinking about evaluating integrals. There are a number of different approaches, and we will not be able to touch on them all. Suppose we want to calcuates
The typical strategy is to take a probability distribution on \(\Omega\), \(P_\Omega(\vec{r})\), and write
One then notes that this is just
which you then calculate using sampling.
In the simplest examples, one just takes \(P_\Omega\) to be the uniform distribution, \(P_\Omega(\vec{r})=1/V_\Omega\). As a simple example, lets do
where we sample \(x\) uniformly
function calcI(nsamples)
xvalues=pi*rand(nsamples)
yvalues=@. pi*sin(xvalues)
Iest=mean(yvalues)
Ierrest=var(yvalues)/sqrt(length(yvalues))
return Iest,Ierrest
end
calcI (generic function with 1 method)
calcI(1000)
(2.0379315034289838, 0.02831971367614405)
If \(P_\Omega(\vec r)\propto f(\vec r)\), then all of the samples will have exactly the same value – and we get a zero-variance estimator – ie. the exact right answer. This motivates trying to use a probability distribution which matches \(f\) as closely as possible: importance sampling.
To make our life easier, lets write
Lets then sample on \([0,\pi/2]\) with
In which case
We know we can sample a linear ramp by uniformly sampling \(x^2\):
function icalcI(nsamples)
xvalues=(pi/2)*sqrt.(rand(nsamples))
yvalues=@. (pi^2/4)*sin(xvalues)/xvalues
Iest=mean(yvalues)
Ierrest=var(yvalues)/sqrt(length(yvalues))
return Iest,Ierrest
end
icalcI (generic function with 1 method)
icalcI(1000)
(2.0097948258655625, 0.00204590966078176)
calcI(1000)
(2.0067186650666584, 0.028960214200107994)
Indeed – the error bars are about an order of magnitude smaller.