A block-wise decomposition approach is used to implement KIVA-3 on distributed-memory machines. Dependencies in the code extend only one layer in each direction and the presence of ghost cells and cell-face boundary arrays in the code suits a distributed-memory implementation well. There seems to be no dependency created through temporal differencing since variables are computed based on the quantities from the previous iteration or time step. Spatial differencing requires estimating variables and sometimes their gradients (diffusion terms) on the cell faces which leads to a communication between adjacent processors sharing the cell-face in common. Momentum cells around the boundary vertices are split between processors, requiring each to compute only their share of the vertex momentum. Advection (either via QSOU or PDC) involves fluxing through regular and momentum cell faces. Cell-face values that are found via upwind differencing require quantities and their derivatives on both sides of the face. Spray dynamics and chemistry require little communication but non-uniform distribution of particles can lead to slight load balancing problems. Particles cross processor boundaries and need to be created and destroyed. The grid points on the shared faces between processors need to have the same structure for predictable communication patterns. The testing of the algorithms is being done on the Intel Paragon and the speedup of 1.92 have been obtained for a baseline engine case on 2 processors. Further results (scalibility, MP node performance, etc) will be published in an upcoming meeting .