The parallel method to study the fluid motion and especially the contraction flow problem here is based on a finite element method (FLUCODE) developed early by Bernstein, Malkus and Olsen[3]. Parallel FLUCODE solves the weak form of steady-state governing equations

along with an integral form of constitutive equation for the stress . The original method uses an integral constitutive equation that is believed by many rheologists to be the most realistic form. A good feature with integral methods is that the integral constitutive equations are naturally suited for parallelization because each element can track of its own history independently. Solving differential constitutive equations, on the other hand, involves matrix solutions at each time step, thus bringing with it some communication overhead.

The discrete form of Eq. (1) results in an form which is solved here by a direct method[4] on the Intel supercomputers. In many problems, steady solutions are achieved and can be solved for directly. In other problems, steady solutions do not appear to exist, thus doing a fully dynamic analysis may become necessary. Fully dynamic solutions with an integral equation, though, would require a huge storage for velocity fields to keep track of the strain history. Our current efforts have been focused on problems in which a steady state is assumed a-prior; in the future we plan to use the technology we developed here to model unsteady flows with differential constitutive equations where it may be a necessity to find an efficient matrix solver because the solver will be called many times due to the small time steps associated with fully dynamic case.

When applied to the contraction flow problem, our parallel code
is 40 % efficient on 8 nodes without a load balancing scheme.
With a load balancing strategy the efficiency is doubled.
In each iteration, every node does its job and lets the host manage the
load redistribution. Each node posts a * help wanted* message when done.
The host sends messages to
both of the working and idle nodes so that these nodes can cooperate
between themselves. In this way, all the nodes always have the
work to do except the message passing and waiting time.
However, the communication created to keep the nodes balanced soon
becomes a significant overhead. After several iterations, the load
distribution becomes almost steady, indicating that one might redistribute the
loads early on and then turn off the balancing scheme. Consequently,
this reduced the high saturation of messages communications.
Using 400 elements, a numerical
experiment was done with load balancing * on* all the time, and with load balancing * turned off* after initial steps.
Table 1 lists the results.

**Table 1:** Processor efficiency with load balancing on and off after .

Tue Jan 21 16:43:41 EST 1997