CSE 225 Course Project Suggested Topics
Each
of you will participate in a course project involving 2-4 students.
Everyone taking the course will be involved in a project. The projects
will come from a short list of types (see below), and will involve a model
problem for Grids. All projects will be developed and approved in advance
in discussions with Professor Chien. Typical projects will include an
existing grid or distributed systems infrastructure and hand-on computer
systems experiments involving new systems or software.
You
should begin planning your project right away. Some possible CSE 225 Projects:
- Resource Selection, and Binding #1: Using realistic resource
data and the three basic models
for resource sharing, perform experiments to evaluate how well different
selection and binding strategies work? How well can we do if system
utilization is the primary goal?
Application performance?
Turnaround (completion latency)?
How well do they work as the level of competition for resources
increases in the system?
- Resource Selection, and Binding #2: Using realistic resource
data and one of the three models
for resource sharing (cycle stealing, batch scheduling, and slicing),
perform experiments to evaluate how well different selection and binding
strategies work? Use several synthetic distributed application models
(which involve a set of resources, and a set of computations, data
movement, etc.) How well can we do
if system utilization is the primary goal?
Application performance?
Turnaround (completion latency)?
How well do they work as the level of competition for resources
increases in the system?
- Dynamic Applications: Using the Virtual Grid infrastructure
(vgES), construct an application which has a specific set of resource
requirements (uses a specification to select), and then uses monitoring
and adaptation to improve some aspect of its performance. Quantify the dependence of behavior on
the dynamic information (i.e. network weather service) and its quality. Explore ranges of resource environments
and properties for which stable (and unstable) behavior is realized. Are there any general principles that can
be derived from the adaptation methods or resource behaviors explored?
- Open Resource Sharing #1: Using the statistical
characterization of resources we have studied, compare the efficacy of the
three models for resource sharing (cycle stealing, batch scheduling, and
slicing) for three classes of applications – compute intensive,
memory intensive, and data intensive. Where these terms are defined by the
resource which limits the performance of a process in the overall
computation. Explore quantitatively
how the properties of the applications affect the utility of each resource
sharing model.
- Open Resource Sharing #2: Using the statistical
characterization of resources we have studied, compare the efficacy of the
three models for resource sharing (cycle stealing, batch scheduling, and
slicing) to study applications with a range of coupling between processes,
ranging from embarrassingly parallel, master-worker, workflow, both
master-worker and workflow with tightly-coupled subjobs, and
tightly-coupled parallel computations. Explore quantitatively how the coupling
properties of the applications affect the utility of each resource sharing
model.
- Configurable Networks #1: Compare the three models of use (intelligent
network, asynchronous file transfer, and distributed virtual computer) for
an application example where we know all of the computation times and
communication quantities. For
example, you could use the NAS Parallel Benchmarks and Grid Parallel
Benchmarks. Other example
applications are also of interest. Vary
the parameters of time to detect a connection, connection setup delay, the
cost to set up and tear down a connection, and the cost of “having a
connection” per unit time. For what range of application properties
does each model make sense? For
what range of costs does each model make sense?
- Configurable Networks #2: Based on public estimates of the
topology of major ISP’s such as Qwest, AT&T, MCI, Sprint, and
others based on their publicly available maps, explore the following
questions. Using a backdrop of the
top 50 population centers in the United States, explore the
number of topologies and competitive providers available to 5-city and
10-city subsets of these centers. Using
the top 50 population centers in the world, explore the number of topologies and
competitive providers. What is the
spectrum of providers available for each subset? If one wanted to expand the subset of
possible topologies, what is the impact on connection latency? (increase due to speed of light flight
time) Explore how these realities
might affect application performance and competition in future dynamic
configurable network environments.
- Application-driven Evaluation of Grid Infrastructures: From the perspective of an important
computational science application (e.g. Climate modeling, Protein Folding,
Toxic Chemical diffusion, etc.), analyze the capabilities of current and
future grid hardware infrastructures and technologies. Working with application experts who are
well versed in the computational issues (we have several such volunteers),
develop a performance model and simulation which includes a distributed
application architecture, a resource description used to acquire
resources, performance models for each element, and scaling
characteristics. Use this
simulation infrastructure to evaluate achievable performance of Grid
deployments of these applications.
- Other Topics
Many other topics are possible, and should be discussed with
Professor Chien as early as possible.
Possible infrastructures for use include:
1)
Grid modeling tools such as the MicroGrid
and SimGrid
2)
Desktop Grid computing systems such as XtremeWeb or BOINC,
3)
Grid toolkits such as the Globus toolkit 3.x
and Globus Toolkit 4,
4)
The Virtual Grid Execution
System from UCSD,
5)
Proactive Java, and
6)
Other software infrastructures.
We have source code
for many of these systems. However, it
is important to understand that not all projects need involve modification to
the source code. Computer system
resources that will be available to support course projects include:
- a number of Linux
workstations (100) and disk (50TB) as part of the FWGrid project,
- access to SDSC and Teragrid
Resources,
- GradLab resources,
and
- any other resources you may have access
to.