From Education
How to handle multiple mpi libraries
mpich
lam
openmpi
bccd-switch-mpi (changes env variables and .profile), if NFS than it can write once rather than doing something remote
build binaries for all supported MPIs, how to handle apt-get building for more than one
Community Mode
login/inittab listens for a community leader, if one or more is present prompts user to join one or go standalone.
Standalone - local password file, no shared disk, user is prompted for username they would like to use, we create and they go. Polls every N seconds for a community and updates the list of available ones.
Community Member - join auth protocol, mount distributed file system, user creates creds, workshop leader approves.
Community Leader - starts bccd-community and is prompted for which directory should be exported as /bccd/... and the name of the community. Tool to sync locally created users with Edu-Grid LDAP server. Must have a persistant local disk to save u/p, etc.
Shared file system
Optional, enables "community mode", pkbcast and bccd-switch-mpi work the same with and without it. Command line tool to join later.
When you boot-up in community mode you name your island, and then if the bccd login program detects one island, just joins; if more than island available give them a choice of which to join
nodes have to go up and down easily.
on start of community then prompted for name and usernames to use, manual list entry or prefix with a range.
how to make this persistant? BCCD community than must have writable disk of some sort!
detected with bccd-community-start, if there is an "image" available option
mpirun is a wrapper that builds the machines file in the correct format based on what the environment variables are. we sanitize it before we call the real mpirun.
when clients run login prompted to join a community (if one or more exists). If one exists, and nobody logins-in, than we join that automatically after 1 minute or so.
NFS and LDAP?
bccd-
allow-all - no
build-info - yes
checkem - subsumed under mpirun wrapper
deny-all - subsumed under login/community
join-group - subsumed under login/community
leave-group - subsumed under login/community
snarf-hosts - subsumed under mpirun wrapper
syncdir - no
pbkast -
Edu-Grid interface on the BCCD (web pointer)
Torque - front-end to Edu-Grid queues and ultimately a local queue that comes with the community
USB boot of this image. Use the USB key with not only the image but credentials which allow community membership.
boot modes
automode - subsumed under login/community
c3mode - subsumed under login/community
intelfb - replaced by knoppix
i810fb - replaced by knoppix
nohotplug - replaced by knoppix
quickboot - yes
runinram - no, NFS root later
startdhcp - subsumed under login/community
Later
NFS root to replace runinram
Review high level BCCD list of software
Yes
gnu - gcc, gfortran, gdb, gprof, gcov, gmp
ATLAS, GotoBLAS
OpenMP
icc/MKL/idb - Wilf about license issues
Java - license ok now, PJ (from RIT)
Condor
Apache
Ganglia running and configured on community leader for monitoring emerging islands of activity.
R and Rmpi
Octave
PAPI/PERFCTR
FFTW
Gromacs
Firefox
mpich, lam, openmpi
Torque - check license compatibility
Python
Ruby
C3 tools
xpdf
xmpi
sl
robotfindskitten
xgaliga
Later
PVFS/Lustre
Sage
POVray
Blender
Ogre
No
openmosix
wulfd
rdesktop
Testing plan - Alex and Kevin with old ACLs
Liberation model - Skylar's working on it now
Kevin's Notes
Charliep: Material is kind of a stack
- 3 layers
Paul: Sprints
Material
- Condor (next level)
- BCCD (fundamental)
-
BCCD:
- pkbcast
- some kind of broadcast "I'm here"
- static keys? absolutely not. must be created /at least/ once per boot
- supposed to clobber old keys
- what happens if machine reboots to new ip address?
- each machine keeps own list of who's out there
- user wants to run mpi
- need to grab current list of machines file
-
BCCD: new features
- OpenMP
- gfortran
- icc? get their permission
- java? <---- license? okay /if/ extending?
- mpich
- goto ... what hoops? ask J (John B TACC guy)