CSE 225 Course Project Suggested Topics


Each of you will participate in a course project involving 2-4 students.  Everyone taking the course will be involved in a project.  The projects will come from a short list of types (see below), and will involve a model problem for Grids.  All projects will be developed and approved in advance in discussions with Professor Chien.  Typical projects will include an existing grid or distributed systems infrastructure and hand-on computer systems experiments involving new systems or software.  

You should begin planning your project right away.   Some possible CSE 225 Projects:

  • Resource Selection, and Binding #1: Using realistic resource data  and the three basic models for resource sharing, perform experiments to evaluate how well different selection and binding strategies work? How well can we do if system utilization is the primary goal?  Application performance?  Turnaround (completion latency)?  How well do they work as the level of competition for resources increases in the system?
  • Resource Selection, and Binding #2: Using realistic resource data  and one of the three models for resource sharing (cycle stealing, batch scheduling, and slicing), perform experiments to evaluate how well different selection and binding strategies work? Use several synthetic distributed application models (which involve a set of resources, and a set of computations, data movement, etc.)  How well can we do if system utilization is the primary goal?  Application performance?  Turnaround (completion latency)?  How well do they work as the level of competition for resources increases in the system?
  • Dynamic Applications:  Using the Virtual Grid infrastructure (vgES), construct an application which has a specific set of resource requirements (uses a specification to select), and then uses monitoring and adaptation to improve some aspect of its performance.  Quantify the dependence of behavior on the dynamic information (i.e. network weather service) and its quality.  Explore ranges of resource environments and properties for which stable (and unstable) behavior is realized.  Are there any general principles that can be derived from the adaptation methods or resource behaviors explored?
  • Open Resource Sharing #1: Using the statistical characterization of resources we have studied, compare the efficacy of the three models for resource sharing (cycle stealing, batch scheduling, and slicing) for three classes of applications – compute intensive, memory intensive, and data intensive.  Where these terms are defined by the resource which limits the performance of a process in the overall computation.  Explore quantitatively how the properties of the applications affect the utility of each resource sharing model.
  • Open Resource Sharing #2: Using the statistical characterization of resources we have studied, compare the efficacy of the three models for resource sharing (cycle stealing, batch scheduling, and slicing) to study applications with a range of coupling between processes, ranging from embarrassingly parallel, master-worker, workflow, both master-worker and workflow with tightly-coupled subjobs, and tightly-coupled parallel computations.  Explore quantitatively how the coupling properties of the applications affect the utility of each resource sharing model.
  • Configurable Networks #1:  Compare the three models of use (intelligent network, asynchronous file transfer, and distributed virtual computer) for an application example where we know all of the computation times and communication quantities.  For example, you could use the NAS Parallel Benchmarks and Grid Parallel Benchmarks.  Other example applications are also of interest.  Vary the parameters of time to detect a connection, connection setup delay, the cost to set up and tear down a connection, and the cost of “having a connection” per unit time.  For what range of application properties does each model make sense?  For what range of costs does each model make sense?
  • Configurable Networks #2:  Based on public estimates of the topology of major ISP’s such as Qwest, AT&T, MCI, Sprint, and others based on their publicly available maps, explore the following questions.  Using a backdrop of the top 50 population centers in the United States, explore the number of topologies and competitive providers available to 5-city and 10-city subsets of these centers.  Using the top 50 population centers in the world,  explore the number of topologies and competitive providers.  What is the spectrum of providers available for each subset?  If one wanted to expand the subset of possible topologies, what is the impact on connection latency?  (increase due to speed of light flight time)  Explore how these realities might affect application performance and competition in future dynamic configurable network environments.
  • Application-driven Evaluation of Grid Infrastructures:  From the perspective of an important computational science application (e.g. Climate modeling, Protein Folding, Toxic Chemical diffusion, etc.), analyze the capabilities of current and future grid hardware infrastructures and technologies.  Working with application experts who are well versed in the computational issues (we have several such volunteers), develop a performance model and simulation which includes a distributed application architecture, a resource description used to acquire resources, performance models for each element, and scaling characteristics.  Use this simulation infrastructure to evaluate achievable performance of Grid deployments of these applications.
  • Other Topics  Many other topics are possible, and should be discussed with Professor Chien as early as possible.

Possible infrastructures for use include:

1)      Grid modeling tools such as the MicroGrid  and SimGrid

2)      Desktop Grid computing systems such as XtremeWeb or BOINC,

3)      Grid toolkits such as the Globus toolkit 3.x and Globus Toolkit 4,

4)      The Virtual Grid Execution System from UCSD,

5)      Proactive Java, and

6)      Other software infrastructures.

We have source code for many of these systems.  However, it is important to understand that not all projects need involve modification to the source code.  Computer system resources that will be available to support course projects include:

  • a number of Linux workstations (100) and disk (50TB) as part of the FWGrid project,
  • access to SDSC and Teragrid Resources,
  • GradLab resources, and
  • any other resources you may have access to.