Heating a Square Plate

20 Nov 2020 | python CUDA numba

Problem Statement

We will consider a square plate with sides $ \left[ -1,1 \right] \times \left[ -1,1 \right]$. We have the following initial conditions at time $t=0$:

The temperature of 3 sides is $u_0=0$.
The temperature of the fourth side is $u_0=5$.

We would like to find time $ t = t^{'} $ such that $ u(t^{'}) = 1 $ at the center of the plate.

We are given the correct time $t^{'} = 0.424011387033 $.

To do this, we will implement a finite difference scheme with both forward and backward time-stepping.

Solution

First, we note that the temperature of the plate obeys the heat equation: $$ \frac{\partial u}{\partial t} = \alpha \Delta u $$ where $\Delta$ is the standard Laplacian operator (we work in coordinates $ {x, y, t}$). Furthermore, $ \alpha $ is some positive coefficient of the medium. We will use $ \alpha = 1$ for simplicity.

We denote $ u(x,y,t) = u_{i,j}^{k}$ where $ i, j$ represent the spatial coordinates and $k$ represents the time-step.

We can solve this problem using two different implementations: a forward Euler method (both CPU and GPU) and a backward Euler method (CPU).

Implementation 1 - Forward Euler Method

The forward difference is given by

$$ \frac{du}{dx} \approx \frac{u_{i+1} - u_{i}}{h} $$

Further, second-order derivatives are approximated by:

$$ \frac{d}{dx} \left [\frac{d}{dx} u \right] = \frac{u_{i+1} - 2u_{i} - u_{i-1}}{h^2} $$

Applying these two relations to the heat equation, we get:

$$ \frac{u_{i,j}^{k+1} - u_{i,j}^{k}}{\Delta t} = (\frac{u_{i+1,j}^{k} - 2u_{i,j}^{k} - u_{i-1,j}^{k}}{h^2} + \frac{u_{i,j+1}^{k} - 2u_{i,j}^{k} - u_{i,j-1}^{k}}{h^2}) $$

where $\Delta t$ is the discretized time step, and $h$ is our discretized spatial step. We are able to obtain our final equation:

$$ u_{i,j}^{k+1} = (1-4C)u_{i,j}^{k} + C(u_{i+1,j}^{k} + u_{i-1,j}^{k} + u_{i,j+1}^{k} + u_{i,j-1}^{k}) $$

where $ C = \frac{\Delta t}{h^2} $.

This method is stable if: $ \Delta t \leq \frac{h^2}{4} $.

A CPU Implementation

First we have a standard CPU computing method for the Euler forward method. We have used NUMBA's parallel processing to accelerate the performance.

CPU Code

def initialize(N, iterations): 
    """
    Creates the initial array for the plate
    ---
    N = plate size 
    iterations: number of time steps
    """
    #Initializaing the plate
    plate = np.zeros((iterations, N, N))
    plate[ :, N-1, :] = 5
    
    return plate

@njit
def forward_step(plate, N, iterations):
    """
    Performs forward euler method 
    ----
    plate = array representing the temperatures of the plate
    N = plate dimensions (spatial)
    iterations = number of time-steps we wish to perform
    """

    alpha = 1
    h = 2/(N-1)
    time_step = (h**2)/(4*alpha)
    C = time_step/(h**2)

    middle = (N-1)//2
    result_time = 0
    for k in prange(0, iterations-1): 
      for i in range(1, N-1): 
        for j in range(1,N-1): 
            if plate[k, middle, middle] >= 1.0: 
                result_time = k
                print('CPU Iterations: ', result_time)
                return result_time
            plate[k+1, i, j] = (1-4*C)*(plate[k, i,j]) + C*(plate[ k, i+1, j] + plate[ k, i-1, j] + plate[ k, i, j+1] + plate[ k, i, j-1])
      
    return 999

Initializing the plate and running forward_step will let us evaluate this implementation.

A GPU Implementation

We also present a GPU computing method for the Euler forward scheme. The CPU controls each time-step iteration, while the GPU performs the computation of the square plate for each time-step.

GPU Code

## Our GPU Kernel ##
@cuda.jit
def gpu_forward(N, input_array, C, result_array):
    ## Shared Memory Arrays ##
    shared = cuda.shared.array((34, 34), numba.float32)

    ## Positioning - Local ##
    tx = cuda.threadIdx.x
    ty = cuda.threadIdx.y

    ## Positioning - Global ##
    px, py = cuda.grid(2)
    index = (N-2)*py + px

    if px >= N-2:
        return
    if py >= N-2: 
        return

    ## Loading data into shared memory ##
    # Middle
    shared[tx+1, ty+1] = input_array[(N-2)*py + px]

    # Filling in the edges
    if px == 0: 
        shared[tx, ty+1] = 0
    if px == N-3: 
        shared[tx+2, ty+1] = 0
    if py == 0: 
        shared[tx+1, ty] = 0
    if py == N-3: 
        shared[tx+1, ty+2] = 5

    if (tx == 0) and (px != 0):
        shared[tx,ty+1] = input_array[(N-2)*py + px - 1] 
    if (tx == 31) and (px != N-3):
        shared[tx+2, ty+1] = input_array[(N-2)*py + px + 1]
    if (ty == 0) and (py != 0): 
        shared[tx+1, ty] = input_array[(N-2)*(py-1) + px] 
    if (ty == 31) and (py != N-3): 
        shared[tx+1, ty+2] = input_array[(N-2)*(py+1) + px]

    # Sync Threads
    cuda.syncthreads()

    ## Performing Stencil ##
    if index < (N-2)**2: 
        stencil = (1-4*C)*shared[tx+1,ty+1] + C*(shared[tx,ty+1] + shared[tx+2,ty+1] + shared[tx+1,ty] + shared[tx+1,ty+2]) 
        result_array[index] = stencil

        # Sync Threads
        cuda.syncthreads()


## A function to help evaluate the GPU kernel ##
def eval_gpu_forward(gpu_plate, N, iterations): 
    """
    Create initial array and evaluates GPU Euler forward step
    """
    SX = 32
    SY = 32
    nblocks = (N + SX - 1) // SX
    blockspergrid = (nblocks, nblocks)
    threadsperblock = (SX, SY)

    middle = (N-1)//2

    for k in range(0, iterations-1): 
        gpu_prev_plate = gpu_plate[k, 1:(N-1), 1:(N-1)].reshape((N-2)**2)
        gpu_result = np.zeros((N-2)**2, dtype = np.float32)

        gpu_forward[blockspergrid, threadsperblock](N, gpu_prev_plate, C, gpu_result)
        gpu_plate[k+1, 1:(N-1), 1:(N-1)] = gpu_result.reshape((N-2), (N-2))

        if gpu_plate[k+1, middle, middle] >= 1.0: 
            gpu_time = k + 1
            print('GPU Iterations: ', gpu_time)
            return gpu_plate[k+1, :, :], gpu_time
    
    print('The center never reached u = 1')
    return gpu_plate[k,:,:], 999999999999

Initializing the plate and running eval_gpu_forward will let us evaluate this implementation.

Implementation 2: Backward Euler Method

The backward difference is given by:

$$ \frac{du}{dx} \approx \frac{u_{i} - u_{i-1}}{h} $$

Further, second-order derivatives are approximated by:

$$ \frac{d}{dx} \left [\frac{d}{dx} u \right] = \frac{u_{i+1} - 2u_{i} + u_{i-1}}{h^2} $$

Applying these two relations to the heat equation, we get:

$$ \frac{u_{i,j}^{k} - u_{i,j}^{k-1}}{\Delta t} = (\frac{u_{i+1,j}^{k} - 2u_{i,j}^{k} + u_{i-1,j}^{k}}{h^2} + \frac{u_{i,j+1}^{k} - 2u_{i,j}^{k} + u_{i,j-1}^{k}}{h^2}) $$

Therefore, we get our final equation of

$$ u_{i,j}^{k-1} = (1 + 4C) u_{i,j}^{k} - C (u_{i+1,j}^{k} + u_{i-1,j}^{k} + u_{i,j+1}^{k} + u_{i,j-1}^{k} ) $$

where $ C = \frac{\Delta t}{h^2} $.

We note that this method is unconditionally stable.

Unlike the forward method, we cannot use this equation alone. We note that we may express the same equation as a time-dependent matrix equation:

$$ u^{k-1} = A u^{k} $$

Then applying the inverse matrix on both sides, we get:

$$ A^{-1} u^{k-1} = u^{k} $$

A CPU Implementation

In this implementation, we require that we generate the sparse matrix A. We solve the matrix equation using spsolve in every time iteration to find the next configuration of the plate.

CPU Code

def generate_inverse(N): 
    nelements = 5 * N**2 - 16 * N + 16

    row_ind = np.empty(nelements, dtype=np.float64)
    col_ind = np.empty(nelements, dtype=np.float64)
    data = np.empty(nelements, dtype=np.float64)

    f = np.empty(N * N, dtype=np.float64)

    alpha = 1
    h = 2/(N-1)
    time_step = (h**2)/(4*alpha)
    C = time_step/(h**2)

    count = 0
    for j in range(N):
        for i in range(N):
            if i == 0 or i == N - 1 or j == 0 or j == N - 1:
                row_ind[count] = col_ind[count] = j * N + i
                data[count] =  1
                f[j * N + i] = 0
                count += 1
                    
            else:
                row_ind[count : count + 5] = j * N + i
                col_ind[count] = j * N + i
                col_ind[count + 1] = j * N + i + 1
                col_ind[count + 2] = j * N + i - 1
                col_ind[count + 3] = (j + 1) * N + i
                col_ind[count + 4] = (j - 1) * N + i
                                    
                data[count] = 1 + 4*C
                data[count + 1 : count + 5] = - C
                f[j * N + i] = 1
                    
                count += 5
                                                
    return coo_matrix((data, (row_ind, col_ind)), shape=(N**2, N**2)).tocsr()

def cpu_backward(backward_plate, N, iterations): 
    cpu_prev_plate = backward_plate[0, :, :].reshape(N**2)
    A = generate_inverse(N)

    for k in range(iterations - 1): 
        sol = spsolve(A, backward_plate[k, :, :].reshape(N*N))
        backward_plate[k+1, :, :] = sol.reshape((N,N))

        if (backward_plate[k+1, middle, middle] >= 1) and (backward_plate[k, middle, middle] < 1): 
            print("Iterations: ", k+1)
            cpu_back_time = k+1

            return backward_plate[k, :, :], cpu_back_time

Initializing the plate and running cpu_backward will let us evaluate this implementation.

Note: We have set $h = \frac{2}{(N-1)} $ in all of our implementation methods. This is because our plate must have edges which take points $[-1, 1]$. Therefore, our choice of $h$ is to ensure that each side of our plate has length 2.

Validation

These are the errors I got for all three implementations:

Euler Forward (CPU) Error: 7.053204587920749e-05
Euler Forward (GPU) Error: 7.053204587920749e-05
Euler Backward (CPU) Error: 0.0004890281047001464

These errors are pretty negligible. For the forward methods, I got up to $N = 181$, but for the backward method I only managed to get to $N=81$. This is because the backward method is significantly more time-consuming, but answers why the backward method had a much higher error.

Visualization

It might be interesting to visualise the heat dynamics of the plate when the center point reaches $u(t') = 1$. All three methods yielded the same image:

Joe Clayton

Heating a Square Plate

Problem Statement

Solution

Implementation 1 - Forward Euler Method

A CPU Implementation

A GPU Implementation

Implementation 2: Backward Euler Method

A CPU Implementation

Validation

Visualization

Related posts

Visualising the Julia Set 01 Oct 2021

Visualising the Mandelbrot Set 25 Dec 2020

A Discrete Diffusion Process 05 Oct 2020