# Estimating π using Monte-Carlo method

## Idea

Let's compute an approximation of π using the Monte Carlo method. The idea is to draw points from the uniform distribution on a unit square and count the ratio of points that are inside the quadrant of the unit circle. Since the probably $p$ for a point to be inside the quadrant is the area of the quadrant ($= π / 4$) divided by the area of the unit square ($= 1$), we can estimate $π$ by estimating the probability $p = π / 4$:

xs = rand(10_000)
ys = rand(10_000)
p = count(xs.^2 .+ ys.^2 .< 1) / length(xs)
4 * p
3.1524

## RNG on GPU

We try to do this computation on a GPU using FoldsCUDA.jl:

using CUDA
using FLoops
using FoldsCUDA

As of writing, CUDA.CURAND does not provide the API usable inside the loop body (i.e., the device API). However, we can use pure-Julia pseudo number generator quite easily. In particular, we use a Counter-based random number generator (CBRNG) provided by Random123.jl (documentation).

using Random123

In this example, we use Random123.Philox2x. This RNG gives us two UInt64s for each counter which wraps around at typemax(UInt64):

rng_a = Philox2x(0)
rng_b = Philox2x(0)
set_counter!(rng_b, typemax(UInt64))
rand(rng_b, UInt64, 2)
@assert rng_a == rng_b

## Using counter-based RNG

Let's create a helper function that divides UInt64(0):typemax(UInt64) into n equal intervals:

function counters(n)
stride = typemax(UInt64) ÷ n
return UInt64(0):stride:typemax(UInt64)-stride
end

This lets us use "independent" RNG for each ctr-th iteration:

function monte_carlo_pi(n, m = 10_000, ex = has_cuda_gpu() ? CUDAEx() : ThreadedEx())
@floop ex for ctr in counters(n)
rng = set_counter!(Philox2x(0), ctr)
nhits = 0
for _ in 1:m
x = rand(rng)
y = rand(rng)
nhits += x^2 + y^2 < 1
end
@reduce(tot = 0 + nhits)
end
return 4 * tot / (n * m)
end
πₐₚₚᵣₒₓ = monte_carlo_pi(2^12)
3.1411634765625