In general the recommended way to build PC clusters is to use switches and not the point to point (P-to-P) parallel architecture. It is truth that point to point can be very efficient and cheap for middle sized clusters, but there are some drawbacks which need a special expertise to control it. Lets name a few of them: - one node dies and a lot of others (or all of them) are effected (it is possible to cheat hre a little bit, like we do) - in order to prevent the cable nightmare one has to make them on the spot by himself. - special programs have to be written to build routing tables for each specific architecture. It is not possible to connect more then 10 machines by hand written routing tables. One such program was written here for hypercube and it could be extended for other architectures fairly easy, but you must spend some time on the problem. >> - What brand/model of NIC? The NICs we are using are about $10. They are RTL8139B based. >> - If not, do you think a VRANA-4-type cluster *could* work with multi-port >> NICs? It could but I don't think you get 6 ports for $60. >> - What toplogy exactly are you using? If you tested different ones, which >> one(s) did you find the best for what programs? The reason we decided to connect the machines in a hypercube is that if one wants to connect 64 machines, one must have 64 equivalent ports in one switch. The `cheap' switches usually have just 24 ports, which means one have to connect several switches to get so many ports. Then there is the bottleneck in the system: you have to connect switches among them. The best I saw is that there is 1Gbit connection between 2 switches but this is about 5 times slower than what is needed for a full non-blocking connection with 3 switches. Any proper solution will cost as much or more then CPU boxes do. >> - I'll very soon have a bunch of P III 866MHz nodes here. As far as I can >> tell, there are 5 free PCI slots left in each. What topology would you >> build with such systems? 6 PCI slot motherboards are quite popular now, that's why we decided to build the hypercube. But CHARMM could run pretty efficiently also on a 3-D torus meaning each node has connection to its left-right-up-down neighbours. Ends are wrapped around. This means 4 NICS. But if one node dies all of them are not connected anymore, ie the routing table should be modified. Also it becomes a problem to partition such a system. Hypercube is much more flexible to partition. Lets say 3 jobs want to use 3-D torus - it is impossible, to do it efficiently unless you reconnect the cables. >> - Did you have to change CHARMM to get it to work well (or at all?) >> with this system? CHARMM runs on almost any parallel architecture. For an efficient use of 3-D torus I would have to make new communication routines but that is easy because one could use the ring topology routines (they are in CHARMM) in 2 dimensions. >> - From the (preliminary) benchmarks on VRANA-4 on http://kihp6.cmm.ki.si/ >> parallel/summary.html, I saw that up to 16 nodes, it scales quite >> well, but for 32 nodes, there seems to be a noticeable dropoff. >> Do you think you will be able to improve this? Not really with this hardware. I am hoping to do something for PME and maybe it could scale better. We are using this machines also for QM/MM calculations. With average system (30 - 50 QM atoms) one can get the speedup of 20 on 32 processors. >> - Anything else that might be important to observe when building a >> VRANA-4 type system? In short the guideline is like this: ~ 5 boxes: You just connect everyone to everyone (hypernet). No routing tables are needed, no special SW. If there are 4 boxes only one can connect them in the ring and this is basically equivalent to any other parallel architectures: hypercube, torus, etc. Routing tables can be written by hand or using the program for a hypercube. ~ 10-20 boxes: buy the $1000 or cheaper switch more than 20 boxes: Start thinking big and go the LoBoS's way, or make 2 or more loosely connected clusters, still each group of 20 is fully connected. Anyway CHARMM will not scale on more than 20 machines... The bottomline: Don't try this (hypercube, or other P-to-P) at home, unless you know what you are doing. But then you don't need reading the above :-)