We have developed a parallel MD algorithm which performs efficient atomistic simulations of large systems, including 400 million particles on 1024 processors. By using an asynchronous approach to message-passing, our algorithm maintains efficiency regardless of how many processors are utilized. In addition, our dynamic load balancing method adjusts to any type of domain geometry or particle distribution; enabling us to simulate a wider variety of problems than non-load balancing MD codes. Both our communication scheme and our load balancing technique are general enough to be applied to other types of parallel applications. We are currently using our MD algorithm to simulate the manufacturing processes of thin-films for planer and non-planer substrates. We plan to apply our methods to a finite-difference algorithm in the future.