\documentstyle[class,11pt]{article}
\def\ClassName{Computer~Organization}
\def\ClassNumber{CS~333}
\handedout{(9/13/95) Handout \#5}
%\vspace{0.5in}

\begin{center}
\large{\bf Solutions to Homework \#1} \\
\end{center}

\begin{document}

Common Error: Work was not shown. {\it Please make sure to show
how you obtained your solution.  If you used the right formula,
but made mistakes in calculation, you can still get some points !
If you did the right thing and got the right solution; but showed
no work -- you will not be given points !!
Etc. Etc...}

\begin{problems}
\problem \Points{20}
{\bf Amdahl's Law} -- H\&P Exercise 1.2

{\bf Purpose of the Exercise:}  To familiarize you with the specific
meaning of Amdahl's law.

{\bf Solution:} \\
a.  [12 pts] Assume that the time taken to execute some perfectly
representative program P on the computer in question
is 100 seconds {\it when the enhancement is in use}.  The amount of
time during which enhanced mode is in use is 50\% of this time,
i.e. 50 seconds.  Now the enhanced mode gives us a speedup factor of
10x.  This means the original time (in unenhanced mode) to execute
these 50 seconds would be 500 seconds.  The total original
time would therefore be 500 + 50 = 550 seconds.

Thus $$Speedup = {Original\ time \over Enhanced\ time} = {550 \over 100} = 5.5$$

b. [8 pts] The amount of original time converted to fast mode is 500 seconds.
The total original time is 550 seconds.
Thus the percentage of original time converted to fast mode is 90.91\%.

{\bf Common Mistakes:} \\
1. Assuming that the 50\% is a percentage of the original (unenhanced)
time.

\problem \Points{30}
{\bf Performance} -- H\&P Exercise 1.7

{\bf Purpose of the Exercise:}  To explore MIPS and MFLOPS in order
to illustrate, among other things, how MIPS can sometimes be misleading.

{\bf Solution:} \\
We know:
	$$Number\ of\ FP\ operations = 195{,}578$$
	$$Clock\ Rate = 16.67\ MHz$$
	$$Whetstone\ Time_{software} = 13.6\ seconds$$
	$$Whetstone\ Time_{coprocessor} = 1.08\ seconds$$
	$$CPI_{software} = 6\ cycles$$
	$$CPI_{coprocessor} = 10\ cycles$$

a. [5 pts] In P\&H Section 1.8 we are given $$ MIPS = {Clock\ Rate \over CPI * 10^6}$$

Thus 
$$MIPS_{software} = { 16.67 * 10^6 \over 6 * 10^6 } = 2.778 = 2.8$$
$$MIPS_{coprocessor} = { 16.67 * 10^6 \over 10 * 10^6 } = 1.667 = 1.7$$

So we see that the software has higher MIPS than the coprocessor;
although we know that the software actually takes about 13 times longer !

\newpage

b. [15 pts] In P\&H Section 1.8 we have 
	$$ Execution\ Time = {Instruction\ Count \over MIPS * 10^6}$$

Thus 
$$Instruction\ Count_{software} = 13.6 * 2.8 * 10^6 = 3.8 * 10^7 = 38 * 10^6$$
$$Instruction\ Count_{coprocessor} = 1.08 * 1.7 * 10^6 = 1.8 * 10^6$$

The orders of magnitude difference between the above two results
is the main reason why the software has greater MIPS -- i.e. it
executes many quick instructions while the coprocessor executes
fewer (but more powerful) instructions.

c. [5 pts] The total number of instructions to run the Whetstone
Benchmark with the coprocessor is:
	$$Instruction\ Count_{coprocessor} = 1.8 * 10^6$$

Assuming that the coprocessor executes one FP operation in a single
instruction we have:
	$$Number\ of\ FP\ instructions_{coprocessor} 
		= Number\ of\ FP\ operations = 195{,}578$$

Therefore $$ Number\ of\ Non{-}FP\ Instructions_{coprocessor} 
		= 1.8 * 10^6 - 195{,}578 = 1.6 * 10^6 $$

And $$Number\ of\ FP\ Instructions_{software}
	= Total\ Instructions_{software} $$
$$ - Number\ of\ Non{-}FP\ Instructions_{software}$$
$$= Instruction\ Count_{software} -
		Number\ of\ Non{-}FP\ Instructions_{coprocessor} $$
$$= 3.8 * 10^7 - 1.6 * 10^6 = 3.6 * 10^7$$

The software version executes $3.6 * 10^7$ instructions to perform
195,578 FP operations.  Thus,
$$Instructions\ per\ FP\ Operation_{software} 
	= {3.6 * 10^7 \over 195{,}578} = 185$$

{\bf Common Mistakes:} \\
1.  Many students did not subtract the number of NonFP instructions
from the numerator. \\
2.  Many students divided by the coprocessor instruction count
rather than by the number of floating point operations in the Whetstone
benchmark.

d. [5 pts] On P\&H Page 46 we have:
$$ MFLOPS = {Number\ of\ FP\ operations\ in\ a\ program
		\over Execution\ Time\ in\ Seconds * 10^6 }$$
$$ MFLOPS_{coprocessor} = {195{,}578 \over 1.08 * 10^6} = 0.18 $$

\problem \Points{20}
{\bf Performance Summary} -- H\&P Exercise 1.11

{\bf Purpose of the Exercise:}
To compare geometric mean with weighted arithmetic mean, as ways
of summarizing several benchmark results.

{\bf Solution:}

a. [10 pts] On P\&H Page 26 (Fig. 1.12) we have:
	$$ Weight_i = {1 \over Time_i * \sum_{j=1}^N {1 \over Time_j} } $$

Computing $\sum_{j=1}^N {1 \over Time_j}$ yeilds 0.00237.
Applying the above equation to each VAX time yields:

\begin{center}
\begin{tabular}{|l|r|}
\hline 
Benchmark	& Weight \\
\hline \hline
spice2g6	& 0.0176 \\
doduc		& 0.2268 \\
mdljdp2		& 0.0595 \\
wave5		& 0.1143 \\
tomcatv		& 0.1592 \\
ora		& 0.0569 \\
alvinn		& 0.0549 \\
ear		& 0.0165 \\
mdljsp2		& 0.1260 \\
swm256		& 0.0332 \\
su2cor		& 0.0327 \\
hydro2d		& 0.0308 \\
nasa7		& 0.0251 \\
fpppp		& 0.0459 \\
\hline
\end{tabular}
\end{center}

b. [10 pts] We have to calculate the weighted arithmetic mean (WAM) for
each of the three machines given in the table using the
weights computed above.  However, we need to convert the
ratios provided to actual values in seconds.  

For the DEC the spice2g6 time is $23,944 / 97 = 246.845\ seconds$,
the doduc time is $1860/137 = 13.577\ seconds$, the mdljdp2 time is 
$7084/154 = 56\ seconds$, etc.
The DEC WAM is $246.845 * 0.0176 + 13.577 * 0.2268 + 56 * 0.0595
	+ ... = 2.44$.

The final WAMs are: $DEC\ 3000 = 2.44 {;\ } IBM\ 590 = 1.93 {;\ } 
	Intel\ Pentium = 5.68$.  So the DEC has 2.32x, and the IBM
has 2.94x the WAM of the Intel.

The Geometric Means are 188, 256, and 81 respectively.  The DEC has 2.32x,
and the IBM has 3.16x the GM SPECratio of the Pentium for 
the given benchmarks.

{\bf Common Mistakes:} \\
1. Very few students solved this question correctly.  It is a
very simple number-crunching application, using two formulas
from the text.  It appears that many students did not find the
formula for 3a tucked away in Fig.1.12.  The H\&P text contains
tremendous information in the figure descriptions and these
should be read carefully. \\
2. For 3b, many did not convert the ratios in the table under the columns
headed by DEC, IBM and Intel, into seconds. \\
3. Most students did not list what formulas were used, (and came
up with enormous numbers for the WAMs.)  Thus, partial
credit could not be given.


\problem \Points{30}
{\bf Technology Evolution:} Because the technologies that drive
computers are evolving at
geometric rates, the rate of technology evolution often catches
designers of both computer hardware and software by surprise.
Consider the following rates of technological evolution:

\begin{center}
\begin{tabular}{|l|r|r|l|}\hline 
Technology	& Multiplier	&	Period 	& 1995 Typical \\ \hline \hline
IC Logic	& 3x		&	2 years	& Intel Pentium (3-4M
trans) \\ \hline 
Memory (DRAM)	& 4x		& 	3 years & 64MB using 16Mbit
DRAMs \\ \hline
Disk		& 4x		& 	3 years & 1GB disk	\\ \hline
\end{tabular}
\end{center}

Where these increases are in transistor density per chip
(microprocessors), bit density per chip (DRAM), and storage density
(disk).

a. [5 pts] Ignoring cost for the moment, consider a PC system for 2001, based
on core technology improvements, what are the likely parameters of the
system? (e.g. millions of transistors in the CPU, megabytes of memory
and gigabytes of disk storage)

{\bf Solution:}

$CPU density = 4\ million * 3 * 3 * 3 = 108\ million$.

$Memory = 64\ MB * 4 * 4 = 1024\ MB = 1\ GB$.

$Disk\ Capacity = 1\ GB * 4 * 4 = 16\ GB$.

b. [10 pts] Ignoring cost, given the relative rates of technology evolution, 
DISCUSS how the
balance of the computer system is likely to change over the time
between now and 2001.  That is explain how the system balance will be
different for PC's in 2001.  Based on this scenario, give two possible
impacts this could have on the applications that are feasible, or how
designers should structure their software.

{\bf Solution:}

{\it Balance:} We have more than 25 Pentiums on one chip.  This means that
processing power is extremely plentiful, and furthermore
multiprocessing may become common.  We could therefore increase
the on-chip cache memory and improve access time for memory.
Since memory is now 1GB, we can keep all applications and (substantial
portions of) data in main memory.  This will increase the need for
reliability, i.e. the system must not hang or crash, but must
degrade gracefully.  Disk capacity is certainly very large,
but (see P\&H Fig. 1.15) access time in 1995 compared to main memory
is 50x; and compared to cache is 500x.

{\it Possible impacts on feasible applications:}
Applications that are too compute intensive for today's PCs and
therefore run sluggishly, will simply blitz through on a 2001 PC
giving real-time performance.
These applications include variations on features that require
fast processing, and also features that are information intensive,
such as:
\begin{enumerate}\itemsep=-3pt
\item  Emulation of the instruction set architectures of other chips.
(Perhaps, compatibility issues will be a thing of the past !)
\item  Improved User Interfaces, and Data Visualization Techniques.
\begin{itemize}
\item  Voice and Character Recognition; perhaps even limited image recogition.
\item  Voice Synthesis.
\item  Virtual Reality Interfaces.
\end{itemize}
\item  Multimedia video.
\item  Embedded applications.
\item  Miniaturized/Portable calculator sized computers.
\end{enumerate}

{\it Some specific examples of applications:}
\begin{enumerate}\itemsep=-3pt
\item  Real-time human-language translation.
\item  Video-conferencing.
\item  Personal Expert Systems (to filter information, select news, etc.)
\end{enumerate}

{\it How developers should structure software:}
\begin{enumerate}\itemsep=-3pt
\item Neither memory nor processing capacity is a problem.  But
for highest performance, maintaining locality of reference
is still critical.  This principle is being strained by the
advent of object-oriented languages, with their runtime binding,
numerous minor function calls, etc.
\item  Given the large storage capacity, especially if one considers
multiple disks, software will likely become immense in size.  Software
Engineering will become crucial, and software re-use vital for
developers to provide a reliable product in a reasonable timeframe.
Towards this end, Object Oriented Programming is the way software
is being developed.  Developers need to focus on software composition,
responsiveness and ensure rapid development.
\item Very high level programming languages and environments
will be needed.  Designers will have to construct software
out of existing databases of modules.  Software interface
standardization is likely to accelerate.
\end{enumerate}

c. [5 pts] In the above scenarios, we have considered only the rate of
technology evolution.  Of course, the cost at which the technology is
available also has an impact on how much of it is
exploited/incorporated.  Though historically the price of
semiconductor memory (DRAM) has fallen on exponential curves (see pg
9), there are some that believe that in the future, memory prices will
not drop as fast.  Suppose that the technology improves as above, but
the price of memory only falls 2x every 3 years.  Then what are the
likely parameters of a 2001 computer system? (HINT: see pg 14 for 
a typical system cost breakdown).

{\bf Solution:}

P\&H Page 7 tells us:

\begin{quotation}
Traditionally, cost has decreased very closely to the
rate at which density increases.
\end{quotation}

So, for CPU and Disk the rates of density increase
specified in the table above can be afforded for the same \$\$'s.
But to maintain the same system cost breakdown, we can only afford
to upgrade memory by 2x every 3 years.  So,

$CPU density = 4\ million * 3 * 3 * 3 = 108\ million$.

$Memory = 64\ MB * 2 * 2 = 256\ MB$.

$Disk\ Capacity = 1\ GB * 4 * 4 = 16\ GB$.

{\bf Common Mistakes:} \\
1.  Many students went off on the wrong track, and were trying
to calculate dollar costs and percentages of configuration
rather that realizing that if the costs go up, and you wish to
maintain the same configuration, you buy less of what is more
expensive.  Also, many failed to appreciate the close relationship,
quoted above, between cost and density.\\
2.  Some students used linear rather than geometric ratio to
calculate the densities.  Perhaps they were filled with disbelief at
the amazing growth rate of geometric progression !

d. [10 pts] Based on the scenario in part (c) above, give two possible
impacts this could have on the applications that are
feasible, or how designers should structure their software.

{\bf Solution:}

Feasible applications are as above, but might run slower.

Main memory is relatively small, and the average memory access time
for very large applications is likely to drop down as the number
of disk accesses increases.  This means that when there is a
choice between more computation or more storage, the former is
to be strongly preferred.  So,
rather than storing results, if they can be computed in less that
the average memory access time, they should be re-computed.  For
example, in loops we currently pull invariants out and store them
in temporary variables or registers rather than re-calculating
them.  In the above scenario, performance might
improve if the invariant sub-expressions are re-calculated at
each iteration.

Developers will have to tune application data set sizes to
optimize virtual memory managemant.

\end{problems}

\end{document}

