Memory access

Memory access patterns are important for good cache usage. If you access memory sequentially, the cache can usually prefetch the next word that you are likely to read. However, if you access memory out of order, you'll have to wait longer for the memory to be read since it may not already be in the cache.

Example

Here is a C program that accesses memory in various ways to show how to traverse data most efficiently when you have a multidimensional array.

#include <stdio.h>

#define NX 4
#define NY 3
#define NZ 2
#define INDEX3(I,J,K) (((K)*NY*NX) + ((J)*NX) + (I))

int main(int argc, char *argv[])
{
    int i,j,k, ijk, ijkaddr, kjiaddr;
    float ijkdata[NX][NY][NZ], kjidata[NZ][NY][NX];
    printf("You want the pattern that gives linear memory access for the \n"
           "best cache usage.\n");

    printf("K,J,I loop traversal\n------------------------n");
    for(k = 0; k < NZ; ++k)
    for(j = 0; j < NY; ++j)
    for(i = 0; i < NX; ++i)
    {
        ijk = INDEX3(i,j,k);
        ijkaddr = &ijkdata[i][j][k] - &ijkdata[0][0][0];
        kjiaddr = &kjidata[k][j][i] - &kjidata[0][0][0];

        printf("ijk index: %2d\tijk addr: %2d\tkji addr: %2d\n", ijk, ijkaddr, kjiaddr);
    }
    printf(    "Good            Bad             Good\n\n");

    printf("\nI,J,K loop traversal\n------------------------n");
    for(i = 0; i < NX; ++i)
    for(j = 0; j < NY; ++j)
    for(k = 0; k < NZ; ++k)
    {
        ijk = INDEX3(i,j,k);
        ijkaddr = &ijkdata[i][j][k] - &ijkdata[0][0][0];
        kjiaddr = &kjidata[k][j][i] - &kjidata[0][0][0];

        printf("ijk index: %2d\tijk addr: %2d\tkji addr: %2d\n", ijk, ijkaddr, kjiaddr);
    }
    printf(    "Bad             Good            Bad\n");

    return 0;
}

Results

You want the pattern that gives linear memory access for the 
best cache usage.
K,J,I loop traversal
------------------------
ijk index:  0	ijk addr:  0	kji addr:  0
ijk index:  1	ijk addr:  6	kji addr:  1
ijk index:  2	ijk addr: 12	kji addr:  2
ijk index:  3	ijk addr: 18	kji addr:  3
ijk index:  4	ijk addr:  2	kji addr:  4
ijk index:  5	ijk addr:  8	kji addr:  5
ijk index:  6	ijk addr: 14	kji addr:  6
ijk index:  7	ijk addr: 20	kji addr:  7
ijk index:  8	ijk addr:  4	kji addr:  8
ijk index:  9	ijk addr: 10	kji addr:  9
ijk index: 10	ijk addr: 16	kji addr: 10
ijk index: 11	ijk addr: 22	kji addr: 11
ijk index: 12	ijk addr:  1	kji addr: 12
ijk index: 13	ijk addr:  7	kji addr: 13
ijk index: 14	ijk addr: 13	kji addr: 14
ijk index: 15	ijk addr: 19	kji addr: 15
ijk index: 16	ijk addr:  3	kji addr: 16
ijk index: 17	ijk addr:  9	kji addr: 17
ijk index: 18	ijk addr: 15	kji addr: 18
ijk index: 19	ijk addr: 21	kji addr: 19
ijk index: 20	ijk addr:  5	kji addr: 20
ijk index: 21	ijk addr: 11	kji addr: 21
ijk index: 22	ijk addr: 17	kji addr: 22
ijk index: 23	ijk addr: 23	kji addr: 23
Good            Bad             Good


I,J,K loop traversal
------------------------
ijk index:  0	ijk addr:  0	kji addr:  0
ijk index: 12	ijk addr:  1	kji addr: 12
ijk index:  4	ijk addr:  2	kji addr:  4
ijk index: 16	ijk addr:  3	kji addr: 16
ijk index:  8	ijk addr:  4	kji addr:  8
ijk index: 20	ijk addr:  5	kji addr: 20
ijk index:  1	ijk addr:  6	kji addr:  1
ijk index: 13	ijk addr:  7	kji addr: 13
ijk index:  5	ijk addr:  8	kji addr:  5
ijk index: 17	ijk addr:  9	kji addr: 17
ijk index:  9	ijk addr: 10	kji addr:  9
ijk index: 21	ijk addr: 11	kji addr: 21
ijk index:  2	ijk addr: 12	kji addr:  2
ijk index: 14	ijk addr: 13	kji addr: 14
ijk index:  6	ijk addr: 14	kji addr:  6
ijk index: 18	ijk addr: 15	kji addr: 18
ijk index: 10	ijk addr: 16	kji addr: 10
ijk index: 22	ijk addr: 17	kji addr: 22
ijk index:  3	ijk addr: 18	kji addr:  3
ijk index: 15	ijk addr: 19	kji addr: 15
ijk index:  7	ijk addr: 20	kji addr:  7
ijk index: 19	ijk addr: 21	kji addr: 19
ijk index: 11	ijk addr: 22	kji addr: 11
ijk index: 23	ijk addr: 23	kji addr: 23
Bad             Good            Bad

Conclusion

If you have created a multidimensional array in C with dimensions [NZ][NY][NX] then you must access it in k,j,i order in your for loop in order to get the best memory access pattern. Note that i corresponds to the X dimension, which is changing the fastest. This pattern of traversal is also best if you have created a linear array with size NX*NY*NZ and you want to index into it using a macro like 'INDEX3'.

If you have created a multidimensional array in C with dimensions [NX][NY][NZ] then you probably think that you are programming in Fortran. To get the best memory access pattern with that arrangement of data, you'll need to nest your for loops in i,j,k order. This is all fine but you will have the Z dimension varying most quickly, which may not be what you'd want if you're sharing code with VisIt, or Fortran for that matter where X varies most quickly. If you will be passing memory back and forth between C and Fortran, use the [NZ][NY][NX] layout and a k,j,i traversal when you are in C code.

These memory layouts are equivalent in C and Fortran, the order of the indexing just being a syntactic difference between languages:

C

int data[NZ][NY][NX];

Fortran equivalent

integer data(NX, NY, NZ)