Preface
The MicroGrid is
a tool which provides the ability to emulate virtual grid infrastructures, enabling
scientific study of grid resource management issues. The MicroGrid emulation tool enables
repeatable, controllable experimentation with dynamic resource management
techniques, a critical activity in the investigation of flexible resource
sharing environments posited for future Global Grid scenarios. MicroGrid experimentation complements
experimentation with physical resources by providing the capability to emulate
both resources not yet deployed (what if scenarios) and rare events such as
catastrophic failures. As such, the
MicroGrid supports scientific evaluation of adaptive software, resource
management algorithms and implementations, and general resource and
applications control algorithms over a wide variety of Grid configurations.
For more information,
the interested reader is referred to:
Validating and
Scaling the MicroGrid: A Scientific Instrument for Grid Dynamics, Xin Liu, Huaxia Xia, and Andrew Chien, to be appeared in Journal
of Grid Computing.
The MicroGrid
offers many advantages for grid researchers, such as a tunable process
scheduler, pervasive instrumentation for system profiling, and scalability from
local to wide area networks.
Assumptions
Readers should be
proficient in the use of a standard UNIX shell, gzip, tar and the build tool
GNU make. In addition, general
familiarity with the principles of IP networking is helpful. Finally, if you do not understand how to use
XML, then developing useful configuration files and network topologies might be
difficult, although you should be able to build simple experiments easily.
Font Conventions Used in This Manual
Courier
Font is used for:
Italic Courier Font is used for:
·
Arguments
to MicroGrid functions, since they could be typed in code as shown but are
arbitrary.
Italic Arial is used for:
·
MicroGrid
specific terms.
Boldface is used for:
·
Chapter
and section headings.
2.1 Contents
The MicroGrid package contains:
Download the MicroGrid distribution from the distribution website (http://www-csag.ucsd.edu/projects/grid/download.html) and unpack it using 'gunzip' and 'tar'. This will create a directory called 'mgrid2', where you will find example MicroGrid resource files, Globus patches, configuration files, and a number of scripts to build common applications.
In order to
successfully install and use the MicroGrid, edit the MicroGrid resource file (mgridrc) and declare the following
variables needed by the build scripts and emulation system.
·
$MGRID_ROOT: MicroGrid installation path
·
$PERL5LIB: Path to your installation of Perl5
·
$MPICHP4_PATH: Path to your installed MPICH ch_p4 device
(eg: /usr/x86-local/mpich/ch_p4)
·
$MPICHG2_PATH: Path to your local installation of the MPICH
Globus device (eg: /usr/x86-local/mpich/globus2)
·
$GLOBUS_BUNDLE_DIR: Path where the Globus bundles can be found. These source bundles are needed to build
MicroGrid enabled Globus services.
Now run 'source mgridrc' to load the variables into your environment.
|
If you are a tcsh user, edit and source the mgrid.tcsh file instead of mgridrc. |
2.3 Building the Emulation Libraries
To build the MicroGrid emulation library with Globus support (mgrid_globus), enter the mgrid2 directory and type './build_mgrid_globus'. If your environment settings are correct, gmake will begin building the emulator libraries and binaries; be patient the project may take a while to finish.
This script should build and install the mgrid-patched Globus 2 to $GLOBUS_LOCATION defined in your mgridrc file. Under $GLOBUS_LOCATION, you should find the following mgrid-globus specific files:
bin/mgridrun
sbin/mgrid-gatekeeper
libexec/mgrid-job-manager
lib/libmgrid.a
This MicroGrid enabled Globus installation will not interfere with your original Globus installation (if any), as long as you set the $GLOBUS_LOCATION environment variable correctly before using it. See the official Globus website for information regarding Globus installation (http://www.globus.org).
If you don’t need Globus support, you can build the standalone MicroGrid emulation library (mgrid). Enter the mgrid2 directory and type './build_mgrid'. Again, be patient since the project may take a while to finish.
2.4 User Deployment
Before you may begin using the MicroGrid, it
must first be deployed to a shared NFS/SMB filesystem available to all the
nodes in your system. To deploy the MicroGrid,
first source your mgridrc (mgridrc.tcsh) file, this updates any variables that may
have changed in your mgridrc after
the initial build. Now run the
mgrid_deploy script under your $MGRID_ROOT directory.
This script will deploy your build of the MicroGrid into the directory
specified by the $MG_CONF environment variable. If this variable has not been declared the mgrid_deploy script will create a directory named ‘.mgrid’ in the user’s home
directory. This directory contains a
number of configuration files and other temporary state files the emulator may
need at runtime. These files include an
updated MicroGrid resource file
mgridrc, the
MicroGrid configuration file mg_conf, and a
number of internal temporary directories.
After deployment
each user must edit the mg_conf file (changing each of the variables
below) to a unique set of values. If
they are not unique bizarre interference may occur.
·
NSE_ADMPORT: Emulator administration port, through which MicroGrid enabled applications
send administrative messages to the emulation.
·
MAP_ADMPORT: The MapServer port, through
which the MicroGrid enabled application will add, remove, and query for
virtual/real IP address mappings.
·
MAP_ADMADDR: The IP address of the physical machine, on which the MapServer daemon will run.
·
SCH_PORT: The IP port, through which
the command line job dispatch client ‘schapi’will send job activation, termination, and
control messages to MicroGrid process Schedulers.
If you plan to use Globus with the MicroGrid, you must find the correct values for the following variables and edit them in mg_conf appropriately.
· LDAP_SERVER: IP Address or Hostname of the physical machine serving as LDAP server (GIIS).
· LDAP_PORT: Port used by the GIIS.
· LDAP_PASSWORD: Plaintext of the GIIS password.
· basedn: This optional string contains your Globus Domain Name
|
See: $MASSF_DIR/control/mg_conf.example for a more verbose example. |
3.1 Building a Virtual Network
First, source the mgridrc file in your $MG_CONF directory. Now, create a
directory where you will store your network configuration files and launch your
experiments. For the remainder of this
section we refer to this directory as your Working
Project Directory (WPD). We
recommend that you place your WPD in
a folder on a NFS/SMB shared filesystem, so that all machines participating in
the emulation can get copies of the necessary Virtual Network files. After creating a WPD, change to (cd) this directory and copy the project template
from $MGRID_ROOT/examples/template_mgrid_globus/*
into
your current working directory. If you
are targeting a non-Globus application use the files in $MGRID_ROOT/examples/template_mgrid/*. You should now have the following files in
your WPD:
MYGRID.dml
machine.dml
MYGRID.phost
MYGRID.vhost
Makefile
Now, begin editing the sample MYGRID.dml file and save it to a file named <GRIDNAME>.dml. The <GRIDNAME> you have chosen will be used to identify your network configuration. Make a note of this <GRIDNAME>, as you will use it in the following procedures.
Next, edit the machine.dml file in your current WPD and list the physical machines available to the network emulator. This file contains a sequence of entries, of the following format:
MACHINE
[IPADDR hostname ARCHITECTURE os_type PARTITION
processor_number]
The hostname field specifies the hostname, or IP address, of a physical host that will run the emulator. The os_type, and processor_number fields respectively, specify the current operating system and the number processors available on the target machine.
àDue to an implementation limitation the processor_number must be set to 1.
àBeware: MicroGrid options used with mpirun, do not allow you to include the machine from which you launch the MicroGrid in your machine.dml file.
Next, edit the <GRIDNAME>.phost file and list the physical machines available to the application. For each physical machine, include one line as below:
machine_name CPU_rate CPU_count memory
CPU_rate is a relative CPU speed, you can use MHz, FLOPS, or even the ratio to a “standard” CPU.
àDue to an implementation limitation the CPU_count must be set to 1 (only uniprocessor modeling is
supported); the memory field is not used currently.
Edit the <GRIDNAME>.vhost to define the capabilities of each virtual hosts you are going to use. The first line of the file defines the CPU downgrade, the factor by which the emulation rate is slower than wall clock time. For example:
cpu_downgrade=5
Then, for each virtual host, include one line as below:
virtual_name virtual_ID CPU_rate CPU_count
memory [gatekeeper]
Where virtual_name is the name of the virtual machine used by the user to identify the machine; a “*” can be used if user doesn’t care about the name. The virtual_ID variable is the host ID from topology file. The CPU_rate variable specifies the relative speed of virtual CPU, which should be comparable to the CPU_rate of the physical CPU. The last item, gatekeeper indicates the virtual machines where you want to run gatekeeper.
àDue to an implementation limitation the CPU_count must be set to 1; the memory field is
not used currently.
Now, edit the makefile and set the $GRIDNAME variable to the name of your <GRIDNAME> file and then execute make.
|
The MicroGrid emulator uses a virtual network topology file that complies with the original SSFNET Domain Modeling Language (DML). You can find more information about the SSFNET DML file format here: http://www.ssfnet.org/InternetDocs/ssfnetTutorial-1.html Though MicroGrid DML files follow the same format used by SSFNET, they have been modified to include an emulation_rate setting that specifies the CPU scaling factor. This value indicates the rate by which time progresses in the application; Larger numbers result in a slower than real-time rate of progress. |
If make was able to successfully build your emulated network, you will find the following files in your WPD:
· emulator-LINUX: The emulator executable.
· mpirun-emulator: The script used to launch the emulator.
· <GRIDNAME>.xml: The virtual hosts information to be uploaded to the GIIS server.
· FWTB: A mapping of the virtual IP addresses in the virtual network to the physical host that provides emulation for each virtual host. At present virtual IP addresses are assigned automatically from the network DML file. You should examine this FWTB file to learn how virtual IPs, and DML addresses, have been assigned.
· <GRIDNAME>.mapfile: The physical/virtual mapping information
· <GRIDNAME>-app.xml: The configuration used to run application
3.2 Launching the Emulator
Congratulations, you
are now ready to launch the emulator!
% mpirun-emulator
This starts the emulator process, and you can now run experiments that communicate through the virtual network. There are a variety of parameters for mpirun-emulator; examples of which can be found in $MGRID_ROOT/template_mgrid/run.
4.1 MicroGrid Enabling through static
linker interception
Now prepare your simple application to run with the mgrid by following these three steps.
1.) Modify your application makefile to include defs.mk as follows:
include
${MASSF_DIR}/wrapper/defs.mk
2.) Add ${WRAP_LIST} to the linker options in your makefile
3.) then delete the preexisting application binary, and rebuild using the makefile.
Your application is now MicroGrid enabled.
You can run original binary codes on the MicroGrid by dynamic binding to the MicroGrid library, please refer to subsection 4.7 for instructions.
4.2 Virtual Compute Resources
The forwarding table (FWTB)
and mapping table (<GRIDNAME>.mapfile) provide some useful information
about how the virtual resources are mapped on to the physical resources.
The forwarding table contains
translations from virtual IP addresses in the emulation to DML node names in
the virtual network description (these are not IP addresses). The emulator assigns IP addresses and
configures routing tables appropriately, consistent with CIDR hierarchical
address management. The mgrid system uses the FWTB
translations to inject/extract application messages to/from the network
emulator.
Each line of the FWTB file
has the following format:
IPAddress/Netmask Physical-Node Node-Type DML-Address
For example:
0.0.1.46/30 machine-23 Router 3:3(1)
This entry indicates that the DML
node with address 3:3(1) has been assigned the virtual IP
Address 0.0.1.46 with a subnet mask of 30. This DML node is hosted on the physical
node machine-23, and is functioning as a router.
There are two Node-Types, router and host. Applications and jobs run only on nodes of type host.
The mapping table contains more
detailed information for virtual hosts.
Each line of the mapping table has the following format:
Virtual-name IPAddress Physical-Node
CPU-percentage memory
For example:
n10.caltech.teragrid.org 0.0.2.40
compute-0-5 32 512
This entry indicates that the virtual host n10.caltech.teragrid.org is assigned IP address 0.0.2.40 and will run on physical machine compute-0-5. The virtual host will use at most 32% of available CPU cycles and 512MB memory.
There is another file that is generated: <GRIDNAME>-app.xml. This file defines the mapping information in XML format and is used by the scheduler. It includes a sequence of entry elements called a hostmap.
<hostmap>
<entry>
<v_host>0.0.1.10</v_host>
<p_host>compute-1-0</p_host>
<speed>90</speed>
<memory>512</memory>
</entry>
...
</hostmap>
The hostmap example above specifies that jobs for virtual host 0.0.1.10 will be executed on physical host compute-1-0 and use 90% of the available CPU cycles and 512MB memory on that physical host.
4.3
Configuring Your Application
Having already configured your virtual resource environment, your next step is to write the job description file that specifies where and how each application process will be executed.
An example file, named jobs-cs.xml, can be found in your WPD (a copy of this file can be found in $MGRID_ROOT/examples/template_mgrid). This file is used to specify a sequence of job entities known as a joblist.
<joblist>
<stdout>/home/ksmith/tmp/TcpServer.out</stdout>
<stderr>/home/ksmith/tmp/TcpServer.err</stderr>
<job>
<exec>/home/ksmith/tcptest/TcpServer</exec>
<startime>0</startime>
<param>5234</param>
<v_host>0.0.1.130</v_host>
</job>
<job>
<exec>/home/ksmith/tcptest/TcpClient</exec>
<startime>2</startime>
<param>0.0.1.130
5234</param>
<v_host>0.0.0.34</v_host>
</job>
</joblist>
Each job specifies, an application executable, and the number of seconds after the joblist is submitted that this job should run. Each virtual host is declared using a v_host tag and command line variables may be specified using a param tag.
The example above redirects stdout to /home/ksmith/tmp/TcpServer.out and stderr to /home/ksmith/tmp/TcpServer.err. It then starts a TcpServer application with a single parameter 5234. The application is assigned to v_host 0.0.1.130. Finally a TcpClient application with parameters 0.0.1.130 and 5234, on v_host 0.0.0.34, starting 2 seconds after the joblist is submitted.
You are almost ready to start you application, but first you have to start the service daemons.
4.4 Starting the Service Daemons
To start the
MicroGrid service daemons execute the following command.
% start_daemon.pl <GRIDNAME>
You have now started a MapServer on the local machine and a process Scheduler on each physical resource defined in the resource XML file (see: Section 4.3.2).
4.5 Launching the Application
Congratulations, you can now start
your application with the following command.
% submit_job.pl <GRIDNAME> MYJOB.xml
4.6
Launching the Application Manually
For most users, launching the
application through the submit_job.pl script is the most convenient approach. But
if you need more control (for example, you want to debug), you can launch each
single process manually by following these steps:
1) Login to the machine to execute
this process.
2) Set environment variable $MGHOSTNAME to the IP address of the virtual host for this
process. For example, you can
% export
MGHOSTNAME=0.0.1.20
By setting this environment
variable, you have turned this shell to a console of the virtual hostname 0.0.1.20. All mgrid jobs executed under this shell
will use this virtual IP address as its host IP address.
3) Launch the job just as you normally would.
Remember, the MicroGrid Scheduler
does not control jobs launched in this manner, as a result such experiments may
be somewhat less accurate.
4.7
Run application through dynamic binding to the MicroGrid libary
You can run original binaries (without
relinking to the MicroGrid library) through
dynamic binding. Using this method, you
can run almost any programs, including Java, Python, etc.
First, define the environment
variables “LD_PRELOAD” and “MGHOSTNAME”:
% export
LD_PRELOAD=$MGRID_ROOT/massf/wrapper/libmgrid.so
% export MGHOSTNAME=0.0.1.20
Then you can run your binaries as
usual. For example:
%
/bin/hostname
will print “0.0.1.20” instead of your physical host
name.
ß Remember, you must compile the MicroGrid using gcc 3.2 or higher
version to use this feature.
|
See: $MGRID_ROOT/examples/template_mgrid/ld_exec.sh |
5.1 MicroGrid Enabling Globus
To prepare your
Globus application for use with the MicroGrid, you must perform the following
three steps.
1.) Modify your application makefile to include defs.mk as follows:
include
${MASSF_DIR}/wrapper/defs.mk
2.) Add ${WRAP_LIST} to the linker options in your makefile.
3.) Delete the preexisting application binary, and then rebuild using the makefile.
Your Globus
application is now MicroGrid enabled.
5.2 Virtual Compute Resources
This section discusses three files
that provide useful information about how the virtual resources are mapped to
the physical resources: the forwarding table (FWTB), mapping table (<GRIDNAME>.mapfile), and the application mapping
table (<GRIDNAME>-app.xml).
If you have already read
subsection 4.2 you may skip most of the remainder of this subsection. The only
difference is there’s a ‘<gatekeeper>’ entry in the application-mapping table.
The forwarding table contains
translations from virtual IP addresses in the emulation to DML node names in
the virtual network description (these are not IP addresses). The emulator assigns IP addresses and
configures routing tables appropriately, consistent with CIDR hierarchical
address management. The mgrid system uses the FWTB
translations to inject/extract application messages to/from the network
emulator.
Each line of the FWTB file
has the following format:
IPAddress/Netmask
Physical-Node Node-Type DML-Address
For example:
0.0.1.46/30 machine-23 Router 3:3(1)
This entry indicates that the DML
node with address 3:3(1) has been assigned the virtual IP
Address 0.0.1.46 with a subnet mask of 30. This DML node is hosted on the physical
node machine-23, and is functioning as a router.
There are two Node-Types, router and host. Applications and jobs run only on nodes of type host.
The mapping table contains more
detailed information for virtual hosts.
Each line of the mapping table has the following format:
Virtual-name
IPAddress Physical-Node CPU-percentage memory
For example:
n10.caltech.teragrid.org 0.0.2.40
compute-0-5 32 512
This entry indicates that the virtual host n10.caltech.teragrid.org is assigned IP address 0.0.2.40 and will run on physical machine compute-0-5. The virtual host will use at most 32% of total CPU cycles and 512MB memory.
There is another file that was generated: <GRIDNAME>-app.xml, which defines the mapping information in XML format and is used by the scheduler. It includes a sequence of entry elements called a hostmap.
<hostmap>
<entry>
<v_host>0.0.1.10</v_host>
<p_host>compute-1-0</p_host>
<speed>90</speed>
<memory>512</memory>
<gatekeeper>6767</gatekeeper>
</entry>
...
</hostmap>
This hostmap has an entry, which specifies that jobs for virtual host 0.0.1.10 will be executed on physical host compute-1-0 and will use 90% of the total CPU cycles and 512MB memory on that physical host. If the gatekeeper entry is present, a port number must be specified. The daemon startup script will automatically start a gatekeeper daemon on the indicated virtual machine and port number.
|
See: $MGRID_ROOT/examples/template_mgrid_globus/camps8-app.xml
for an example. |
5.3 Application
Configuration
To configure your application for mgrid_globus you will need to create an XML job description file and a Globus RSL file under your WPD.
|
Mgrid_globus uses a new job manager for job submission named jobmanager-mgrid. This job manager is comparable to the PBS or Loadleveler job managers that can control multiple virtual hosts. In order to properly use this job manager you must create a separate job description file and modify the RSL file to dispatch this new mgrid job. At present this is the only job manager that we support. |
5.3.1 mgrid Job Description
File
Each gatekeeper needs a job description file. You will need to create a job description XML file in your WPD. An example file, named jobs-cs.xml, can be found in your WPD. This file is used to specify a sequence of job entities known as a joblist.
<joblist>
<stdout>/home/ksmith/tmp/NPB.out</stdout>
<stderr>/home/ksmith/tmp/NPB.err</stderr>
<job>
<exec>/home/ksmith/NPB2.3/bin/mg.A.8</exec>
<startime>0</startime>
<param>-h</param>
<v_host>0.0.0.34</v_host>
</job>
<!-- …other job entries… -->
</joblist>
Each job specifies an application executable; the number of seconds after the joblist is submitted when this job should start. Each virtual host is declared using a v_host tag and command line variables may be specified using a param tag.
The example above redirects stdout to /home/ksmith/tmp/NPB.out and stderr to /home/ksmith/tmp/NPB.err. It then starts /home/ksmith /NPB2.3/bin/mg.A.8 with a single parameter –h, and the application is assigned to v_host 0.0.0.34.
ß Remember, if you use multiple gatekeepers, you must have a job
description file for each gatekeeper.
|
See: $MGRID_ROOT/examples/template_mgrid_globus |
5.3.2
The mgrid RSL
configuration file
The mgrid RSL file defines the Globus application to be executed, in a fashion similar to standard Globus RSL files. However, mgrid RSL files must use jobmanager-mgrid and they include a few special entries.
First, all Globus application must include the service type "jobmanager-mgrid", in the resource string.
If the gatekeeper is running on virtual IP address 0.0.1.10, listening on port 7549, and running under user name "K Smith", its contact string might look like:
resourceManagerContact="0.0.1.10:7549/jobmanager-mgrid:/C=US/O=Globus/O=University
of California San Diego/OU=Computer Science and Engineering/OU=Concurrent
Systems and Architecture Group/CN=K Smith"
You also need to specify the virtual network to be used for this job by using ‘(project=)’ entry. For example, if you use campus8.dml as the virtual network configuration file, you need to add:
(project=campus8)
As usual, use '(count=)' to specify the number of
processes launched. And you should also
specify the job_type entry. Currently it must be “multiple”, even if you only use one
running process. For example,
(count=4) (job_type=multiple)
Indicates four instances of the program are going to be launched.
Then it is time to specify your application to be executed. Remember that you have created mgrid job description files at section 5.3.2.1, and now they will be used as input files to the MicroGrid jobmanager ($MASSF_DIR/control/submit_job.pl). So, in the RSL file, the directory and executable are fixed for your MicroGrid deployment, and the arguments entry is just the job description file. Fox example, the following entries submit the jobs defined in the jobs-mg.A.8a.xml to the gatekeeper.
(directory=/home/ksmith/mgrid2/massf/control)
(executable=submit_job.pl)
(arguments=/home/ksmith/template2/jobs-mg.A.8a.xml)
To sum up, here is an example from which you might want to derive a command line:
resourceManagerContact="0.0.1.146:2119/jobmanager-mgrid:/C=US/O=Globus/O=University
of California San Diego/OU=Computer Science and Engineering/OU=Concurrent
Systems and Architecture Group/CN=K Smith")
(count=4)
(label="subjob 0")
(directory=/home/ksmith/mgrid2/massf/control)
(executable=submit_job.pl)
(arguments=/home/ksmith/template2/jobs-mg.A.8a.xml) (project=campus8)
(job_type=multiple)
(environment = (LD_LIBRARY_PATH
"/home/ksmith/globus2.2/lib")(GLOBUS_DUROC_SUBJOB_INDEX
"0")) (stdout =
/home/ksmith/tmp/mgridrun0.out)
(stderr = /home/ksmith/tmp/mgridrun0.err))
Remember, stdout and stderr of globusrun are no longer the output of your application (it is the log of the submit_job.pl). Instead you should set them in the job description xml file.
|
See: $MGRID_ROOT/examples/template_mgrid_globus/ campus8.mg.A.8.rsl for examples. |
5.4 Daemon Startup
You may now start the service daemons. First launch the virtual network emulation
modules that were built in Section 3.3. After they have begun you may start the
service daemons.
% start_daemon.pl <GRIDNAME>
At this point you have started the MicroGrid MapServer, Scheduler, and the mgrid-gatekeeper on the
specified virtual hosts.
5.5 Submitting Globus Jobs
The mgridrun executable is a MicroGrid enabled version of globusrun, used to submit Globus jobs, in a fashion similar to globusrun. Be sure that you have set $MGHOSTNAME appropriately before using this tool. ($MGHOSTNAME is used to define the virtual host IP address for any jobs that running under that shell console. See section 4.6 for more details)
5.5.1 Submitting Globus Jobs with a RSL
File
You may submit jobs using a RSL file, required by Duroc jobs. An example RSL file can be found in $MGRID_ROOT/examples/template_mgrid_globus/ campus8.mg.A.8.rsl. This RSL file will submit jobs defined in the jobs-mg.A.8a.xml and jobs-mg.A.8b.xml files.
Example command:
% mgridrun -f campus8.mg.A.8.rsl
5.5.2 Submitting Jobs without a RSL File
To submit a Globus job to the MicroGrid enabled gatekeeper without an RSL file, it is quite similar to that of using a RSL file. You should use jobmanager-mgrid and define those entries mentioned in section 5.3.2.1. An example looks like:
% mgridrun –o –r
"0.0.1.146:2119/jobmanager-mgrid:/C=US/O=Globus/O=University of California
San Diego/OU=Computer Science and Engineering/OU=Concurrent Systems and
Architecture Group/CN=K Smith")
(count=4)
(label="subjob 0")
(directory=/home/ksmith/mgrid2/massf/control)
(executable=submit_job.pl)
(arguments=/home/ksmith/template2/jobs-mg.A.8a.xml) (project=campus8)
(job_type=multiple)
(environment = (LD_LIBRARY_PATH
"/home/ksmith/globus2.2/lib")(GLOBUS_DUROC_SUBJOB_INDEX
"0")) (stdout =
/home/ksmith/tmp/mgridrun0.out)
(stderr = /home/ksmith/tmp/mgridrun0.err))
5.6 Termination
The kill_job.pl script provides Job Termination. Example command:
% kill_job.pl <GRIDNAME> jobs.xml
After your Job has been terminated you may restart it, or shutdown the network emulator using the kill_emulator script. Example command:
% kill_emulator
These scripts must be launched from your current WPD.
6.
Test the MicroGrid Installation
The MicroGrid package comes with 2 example applications that can be used to test the correctness of your installation. One is a simple TCP client/server program to test the mgrid installation and the other is a NPB2.3 mg benchmark to test the mgrid_globus. You have read through this document before you start testing.
6.1
Testing the mgrid installation
To test your installation copy all the files under $MGRID_ROOT/examples/template_mgrid to your working directory.
1. Edit the machine.dml and MYGRID-app.xml to use your local resource
2. Edit the jobs-cs.xml to use the correct path name for executable and output files.
3. Build it with make
4. Launch the emulator with mpirun-emulator
5. Launch daemons with start_daemon.pl MYGRID
6. Submit the jobs with submit_job.pl MYGRID jobs-cs.xml
This experiment uses the TcpClient/TcpServer programs under $MASSF_DIR/wrapper/tcptest. The server will service requests from 3 different clients and then quit. If everything goes correctly, you will see the message
TcpServer exits successfully!
at the end of the server stdout file (tmp/TcpServer.out.0)
6.2
Testing the mgrid_globus installation
To test your installation copy all the files under $MGRID_ROOT/examples/template_mgrid_globus to your working directory.
1. Edit the machine.dml and campus8-app.xml to use your local resource
2. Edit the campus8.mg.A.8.rsl to use the correct path name for executable and output files.
3. You should also modify the job description files jobs-mg.A.8a.xml and jobs-mg.A.8b.xml to represent your current path.
4. Build it with make
5. Goto the $MGRID_ROOT/examples/NPB2.3 directory
6. Using make suite build mg.A.8
7. Then return to your working directory.
8. Launch the emulator with mpirun-emulator
9. Launch daemons with start_daemon.pl campus8
10. Submit the job using mgridrun –f campus.mg.A.8.rsl
This experiment submits an 8-process mg class-A job to 2 gatekeepers. Each gatekeeper controls 4 virtual machines. If everything goes correctly, you will find the message
Verification = SUCCESSFUL
in the stdout file (tmp/mg.A.8.out.0)
7. Trouble
Shooting
7.1 TODO
In this world, there are too many things that can go wrong. Most of your time is spent diagnosing what's actually not working (Things are going to be added)
1. Memory leak. (nse_res, nse_msg, recv buffer msg at Rcv_Window. Works well with small experimental setups.
2. If MGHOSTNAME is not set the application will dump core and the user is not told what is wrong!
2. If the FWTB file is missing the application fails with a dump core and the user is not told what is wrong!
3. If the MG_CONF file is missing the application fails silently, without telling the user what is wrong!
4. If the user leaves blank lines in the <GRIDNAME>.phost file the build will fail without telling the user what is wrong!
7.2 FAQ
Q.) Where can I find Perl 5?
A.) See: http://www.perl.com
Q.) Where can I find MPICH v1.2.5.1?
A.) ftp://ftp.mcs.anl.gov/pub/mpi/mpich-1.2.5-1a.tar.gz
; Archives can be found at ftp://ftp.mcs.anl.gov/pub/mpi/old/
Q) What
does this error mean:
TCPServer::Listen: bind error (107):
There is another application using Port 5494 on this Computer; Please close the
offending program.
p0_11576: p4_error: interrupt SIGSEGV: 11
Killed by signal 2.
A) There is a Port conflict. Either you forgot to kill the previous running process or someone else is running an emulator with the same port. To solve this:
(1) Make sure that you run “kill_emulator” after you finish the emulation. Please refer to section 4.7 “other tools” for its usage.
(2) Make sure that different users have different port number assigned
Q.) What does it mean when my virtual network build (Section 3.2)
reports:
/home/ksmith/MicroGrid/mgrid2/massf/dassf/bin/partition
/home/ksmith/MicroGrid/mgrid2/massf/ssfnet/config/runtime.dml LINUX
submodel.tmp .machprof mpirun-emulator KNet.dml
make: *** [model.dml] Error 1
A.) This error usually means the $MG_CONF is not set correctly. Remember, the $MG_CONF variable refers to the local user configuration directory, not the mg_conf file itself.
Q.) After editing machine.dml, and campus8-app.xml, I cleaned out the pervious build and I rebuilt and got these errors:
/home/x86-local/mgrid2/massf/dassf/lib/libssf-linux-thread.a(x-ssf-main.kernel.o):
In function `global_init()':
/home/x86-local/mgrid2/massf/dassf/src/kernel/ssf-main.cc:56:
undefined
reference to `__gxx_personality_v0'
/home/x86-local/mgrid2/massf/dassf/lib/libssf-linux-thread.a(x-cmdline.kernel.o):
In function `CommandLine::parse(int,
char**)':
/home/x86-local/mgrid2/massf/dassf/src/kernel/cmdline.cc:540:
undefined
reference to `_Unwind_Resume'
/home/x86-local/mgrid2/massf/dassf/src/kernel/cmdline.cc:599:
undefined
reference to `_Unwind_Resume'
/home/x86-local/mgrid2/massf/dassf/lib/libssf-linux-thread.a(x-cmdline.kernel.o):
In function `DML_SubModel::operator!()':
/usr/include/sys/stat.h(.eh_frame+0x12):
undefined reference to
`__gxx_personality_v0'
A.) This problem indicates that you may have used a version of gcc or mpicc that does not meet our requirements. Make sure that massf is compiled with mpich ch_p4 and gcc3.0.4.
Q.) When I try to start my application the process returns immediately.
A.) Check your $mg_conf directory and make sure it contains a valid MG_CONF configuration file.
Q.) How do I determine the appropriate emulation_rate to use in the network DML file?
A.) At present there are no rules limiting your choice. It will depends mostly on your network topology and the expected traffic load. Typical emulation_rate values range from 5 to 15. When the emulator cannot catch up with the realtime requirement, it will print warning message to the testout-* log files. You may need to adjust the emulation_rate setting if you see the following message in the log files:
“Warning: Emulator Cannot catch up
with the realtime network speed. Please increase the emulation_rate int dml file.”
Q.) What should I do if I get the following error when I try to build the emulator in my WPD?
“/home/aoo/mgrid2/massf/control/script_conf.pl
campus8.phost campus8.vhost FWTB campus8.mapfile campus8-app.xml
Wrong format:
/home/aoo/mgrid2/massf/control/script_conf.pl
campus8.phost campus8.vhost FWTB campus8.mapfile campus8-app.xml
Wrong format:”
A.) Check your <GRIDNAME>.phost file to make sure that it does not contain blank lines.
Appendix
A Intercepted function list
The MicroGrid redirects live traffic by intercepting the socket and other I/O related function calls (e.g. system or libc functions) issued by the application. To work correctly, we try to intercept all the network related function calls and predict all possible programming models. In WrapSocket, we have tried our best to capture these function calls, but the list will never be complete. So, whenever MicroGrid is used with a new application, you should first check that the relevant network related function calls are intercepted and test the application. If you find any missing functions please contact the MicroGrid team and we will try to add it to our WrapSocket library.
|
Accept |
|
Bind |
|
Close |
|
Connect |
|
dup |
|
dup2 |
|
execv |
|
execve |
|
execvp |
|
fclose |
|
fcntl |
|
feof |
|
ferror |
|
fgetc |
|
fgets |
|
fork |
|
fprintf |
|
fputc |
|
fputs |
|
fread |
|
freeaddrinfo |
|
fwrite |
|
getaddrinfo |
|
gethostbyaddr |
|
gethostbyaddr_r |
|
gethostbyname |
|
gethostbyname2 |
|
gethostbyname2_r |
|
gethostbyname_r |
|
gethostname |
|
getnameinfo |
|
getpeername |
|
getsockname |
|
getsockopt |
|
gettimeofday |
|
_IO_getc |
|
_IO_putc |
|
listen |
|
main |
|
MPI_Wtime |
|
poll |
|
printf |
|
pthread_create |
|
pthread_exit |
|
read |
|
readv |
|
recv |
|
recvfrom |
|
recvmsg |
|
select |
|
send |
|
sendfile |
|
sendmsg |
|
sendto |
|
setsockopt |
|
socket |
|
uname |
|
usleep |
|
write |
|
writev |