We report the design and performance of a computational molecular dynamics (MD) code for 400 million particles interacting through the standard pair-wise 6--12 Lennard-Jones potential on a 1024-node Intel Paragon, a distributed-memory MIMD parallel computer. The initially recorded ``particle-step time'' was .4 microseconds. A new inter-node communication strategy ensures high parallel efficiency for a large number of nodes. Besides the ability to tackle large problems, our implementation incorporates a novel method for dynamic load balancing. Our communication and load balancing enhancements provide increased efficiency and flexibility for our MD code, yet are general enough for use in other parallel algorithms.