ExpReport.TXT

- Assumptions:

1. The penalties for each memory operation (from the input file) are all
  included in the line for the type of operation being performed, regardless
  of the type of actual memory access required. For example, a read miss where
  the line to be evicted is dirty requires the old line to be written out, and
  the new line to be written in. All of these transfers and latency times are
  added to the  READ  totals. This is deliberate: it gives a very clear picture
  of where latency penalties are occurring. For the above example, there is
  quite a lot of writing going on, which slows down read operations because
  cache lines tend to be dirty on eviction.

2. Since the bus width = 1 word, every word (i.e. every 2 bytes) transferred
  incurs a transfer, and hence a memory latency penalty.

3. The cache incurs a latency penalty for every time it is read from or written
  to. Hence:
  a. A read HIT incurs a single cache latency
  b. For a read MISS:
    i. If the line to be replaced is CLEAN, only one latency is incurred
    ii. If the line to be replaced is DIRTY, a line must be written as well as
      one read, so the cache incurs 2 latencies.

4. Cache and memory cannot be read / written in parallel. Hence, the latency
  will always be the sum of memory latencies (one for each transfer) plus cache
  latencies as calculated above.

5. A DIRTY cache line will be written out of the cache to the memory only when
  it needs to be evicted from the cache.

6. Although we can set any cache and memory latencies for the cache we
  simulated, we always assume they are 5 and 200 nanoseconds respectively.

7. In the program, the time cost of checking a tag in a low-way associative
  cache's entry is the same as in a high-way associative cahce's. It means, if
  want to check whether the tag is in the cache line or not, no matter how many
  associativity the cache is (1-way or 32-way), the time cost is the same, 5
  nanoseconds. Though it seems not pratical, this can simplise the programming.

- The important information about the implement in my pragram of 'WRITE BACK'
  and 'WRITE THROUGH', 'WRITE ALLOCATE' and 'WRITE NO ALLOCATE',

  Only a WRITE HIT operation under the policy of 'WRITE BACK' can mark the line
  in the cache as 'DIRTY'. Furthermore, in my programe, if a WRITE MISS
  happened, and 'WRITE ALLOCATE' is the case, then one word will write to
  memory before the required line has been brought into the cache. As a result,
  after the line be brought into the cache, now it is CLEAN, not DIRTY. So, it
  is quite simple in my program to take account of the DIRTY lines, since the
  DIRTY cases only exist under the policy of 'WIRTE BACK', but not 'WRITE
  ALLOCATE'. It means that 'WRITE THROUGH' never meets DIRTY lines, so never
  needs to consider it.

- The experiments:

I. How does the degree of cache associativity affect cache performance?

  I used 5 files in spec92 directory to execute the experiment. To explore it, I
  divide the experiment into 3 groups using 3 different cache sizes, 16 KB, 64
  KB, 128 KB. In every experiment, a fixed set of parameters: a 32 bytes cache
  line, 'WRITE BACK' for the write miss policy and 'WRITE ALLOCATE' for the
  write hit policy, has been used to make all the experiment in an comparative
  environment. In order to check how well the cache works, we use the 2 factors,
  the miss ratio and average time, to measure the perfomance. As described in
  the 'README.TXT' file, for every experiment, there are 3 types data: READ,
  WRITE and TOTAL. To compare the overall effect, we only select the data of
  TOTAL item to analyse.

  NOTE: All the material of the raw experiment data is in the file
  'ExpMaterial.txt'.

  1. 16 KB cache.
  1) traces/spec92/008.espresso.din.gz
  table 1.1
  ------------------------------------------------------------------------
  Associative  %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1          0.645           32.421
    2          0.217           15.431 52.4%
    4          0.160           12.859 16.67%   Y
    8          0.162           12.904 -0.35%
    16         0.162           12.906 -0.0155%
    32         0.164           12.971 -0.504%
    64         0.165           12.983 -0.0925%
    128        0.166           13.020 -0.285%
    256        0.166           13.014 0.0461%
  ------------------------------------------------------------------------
  Note, in the above table:
  1 Associative item indicates what associativity is used in the experiment.
  2 '%' means the miss ratio.
  3 'Average time' is for every operation how much time it cost averagely, which
    is the time spent by all the operation divided by the number of 'Accesses'.
  4 'Improved' is how much its perfomance is improved, compared to the previous
    experiment's data by 'Average time'.
  5 'Best' means whether it's the best perfomance in the serial experimnet for
    the special data file. If 'yes', marks it as 'Y'.
  6 All the rest tables are in the same format as this one.

  2) traces/spec92/023.eqntott.din.gz
  table 1.2
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         1.663          81.229
    2         1.191          62.438  23.13%
    4         1.150          60.933  2.41%    Y
    8         1.155          61.216  -0.464%
    16        1.161          61.486  -0.441%
    32        1.164          61.634  -0.241%
    64        1.164          61.666  -0.0519%
    128       1.165          61.727  -0.0989%
  ------------------------------------------------------------------------

  3) traces/spec92/026.compress.din.gz
  table 1.3
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         7.539  319.324
    2         7.127  299.986         6.06%
    4         7.076  297.968         0.673%
    8         7.051  297.127         0.282%
    16        7.034  296.565         0.189%
    32        6.964  294.092         0.834%
    64        6.928  293.050         0.354%
    128       6.914  292.610         0.15%
    256       6.908  292.371         0.0817%
    512       6.895  291.934         0.149%   Y
  ------------------------------------------------------------------------

  4) traces/spec92/047.tomcatv.din.gz
  table 1.4
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         2.986   158.146
    2         2.267   136.867        13.46%
    4         2.387   141.582        -3.445%
    8         1.881   121.386        14.26%
    16        1.610   104.543        13.88%
    32        1.609   104.457        0.0823%  Y
    64        1.610   104.468        -0.0105%
    128       1.610   104.472        -0.00383%
    256       1.610   104.463        0.00861%
    512       1.610   104.464        -0.000957%
  ------------------------------------------------------------------------

  5) 056.ear.din.gz
  table 1.5
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         1.036          63.494
    2         0.209          15.934  74.9%
    4         0.150          13.281  16.66%
    8         0.147          13.152  0.971%
    16        0.146          13.130  0.167%
    32        0.146          13.099  0.236%
    64        0.146          13.096  0.0229%
    128       0.146          13.096  0.00%
    256       0.146          13.096  0.00%
    512       0.146          13.094  0.0153%  Y
  ------------------------------------------------------------------------

  2. 64 KB cache.
  1) traces/spec92/008.espresso.din.gz
  table 2.1
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.163          11.266
    2         0.090          8.037   28.66%
    4         0.090          8.028   0.112%
    8         0.090          8.022   0.0747%  Y
    16        0.090          8.022   0.00%    Y
    32        0.090          8.022   0.00%    Y
    64        0.090          8.022   0.00%    Y
  ------------------------------------------------------------------------

  2) traces/spec92/023.eqntott.din.gz
  table 2.2
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.684          40.400
    2         0.565          35.695  17.4%
    4         0.538          34.528  3.27%
    8         0.526          33.955  1.66%
    16        0.524          33.820  0.398%
    32        0.523          33.785  0.103%   Y
    64        0.525          33.907  -0.361%
    128       0.525          33.900  0.0206%
    256       0.525          33.903  -0.00885%
    512       0.526          33.937  -0.1%
    1024      0.526          33.926  0.0324%
    2048      0.527          33.999  -0.215%
  ------------------------------------------------------------------------

  3) traces/spec92/026.compress.din.gz
  table 2.3
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         5.215   238.524
    2         4.922   228.231        4.32%
    4         4.790   224.009        1.85%
    8         4.743   222.313        0.757%
    16        4.734   222.014        0.134%
    32        4.733   221.965        0.0221%
    64        4.726   221.732        0.105%
    128       4.672   219.787        0.877%
    256       4.656   219.163        0.234%
    512       4.649   218.913        0.114%
    1024      4.649   218.906        0.0032%
    2048      4.645   218.767        0.635%   Y
  ------------------------------------------------------------------------

  4) traces/spec92/047.tomcatv.din.gz
  table 2.4
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         2.523   137.264
    2         1.504   100.481        26.80%
    4         1.488   99.495         0.931%
    8         1.488   99.478         0.0171%
    16        1.488   99.478         0.00%
    32        1.488   99.344         0.135%
    64        1.488   99.337         0.00705%
    128       1.488   99.317         0.0201%
    256       1.488   99.304         0.0131%
    512       1.488   99.302         0.00201%
    1024      1.488   99.298         0.00403% Y
    2048      1.488   99.299         -0.00101%
  ------------------------------------------------------------------------

  5) 056.ear.din.gz
  table 2.5
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.147          11.692
    2         0.133          11.123  4.87%
    4         0.133          11.118  0.045%   Y
    8         0.133          11.127  -0.081%
    16        0.133          11.124  0.027%
    32        0.133          11.130  -0.54%
    64        0.133          11.129  0.00899%
    128       0.133          11.140  -0.0989%
    256       0.133          11.141  -0.00898%
    512       0.133          11.141  0.00%
    1024      0.133          11.141  0.00%
    2048      0.133          11.141  0.00%
  ------------------------------------------------------------------------

  3. 128 KB cache.
  1) traces/spec92/008.espresso.din.gz
  table 3.1
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.137          10.314
    2         0.090          8.024   22.2%
    4         0.090          8.022   0.0249%  Y
    8         0.090          8.022   0.00%    Y
    16        0.090          8.022   0.00%    Y
    32        0.090          8.022   0.00%    Y
    64        0.090          8.022   0.00%    Y
  ------------------------------------------------------------------------

  2) traces/spec92/023.eqntott.din.gz
  table 3.2
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.469          28.230
    2         0.404          25.101  11.08%
    4         0.353          22.084  12.06%   Y
    8         0.364          22.771  -3.11%
    16        0.360          22.531  1.05%
    32        0.361          22.617  -0.381%
    64        0.361          22.610  0.031%
    128       0.362          22.673  -0.531%
    256       0.362          22.682  -0.0397%
  ------------------------------------------------------------------------

  3) traces/spec92/026.compress.din.gz
  table 3.3
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         3.655          178.384
    2         3.391          170.698 4.31%
    4         3.216          164.297 3.75%
    8         3.149          161.811 1.51%
    16        3.130          161.033 0.481%
    32        3.128          160.900 0.0826%
    64        3.126          160.858 0.0261%
    128       3.126          160.895 -0.023%
    256       3.103          159.969 0.576%
    512       3.102          159.863 0.0663%
    1024      3.098          159.698 0.103%
    2048      3.098          159.696 0.00125%
    4096      3.097          159.677 0.0119%  Y
  ------------------------------------------------------------------------

  4) traces/spec92/047.tomcatv.din.gz
  table 3.4
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         2.441          131.365
    2         1.491          98.116  25.31%
    4         1.486          97.827  0.295%
    8         1.486          97.661  0.17%
    16        1.486          97.523  0.141%   Y
    32        1.486          97.523  0.00%    Y
    64        1.486          97.526  -0.00308%
    128       1.486          97.538  -0.0123%
  ------------------------------------------------------------------------

  5) 056.ear.din.gz
  table 3.5
  ------------------------------------------------------------------------
  Associative %     Average time(ns) Improved Best
  ------------------------------------------------------------------------
    1         0.135          9.770
    2         0.131          9.455   3.22%
    4         0.131          9.438   0.18%
    8         0.131          9.434   0.0424%
    16        0.131          9.423   0.117%
    32        0.131          9.419   0.0424%  Y
    64        0.131          9.419   0.00%    Y
    128       0.131          9.419   0.00%    Y
    256       0.131          9.419   0.00%    Y
    512       0.131          9.419   0.00%    Y
  ------------------------------------------------------------------------

  * Conclusion:
  In all the experiments, the data show that the direct map cache (1-way
  associate) performs really poor, and the cache with 2-way associativity
  performs much better than the direct map cache. Even the improvement reached
  74.9% in the table 1.5. The reason, I think, is that in the direct map cache,
  too many memory lines compete the same cache slot, because there is only one
  cache slot in every cache entry. As a result, evicting cache line happens
  quite freqently in direct map cache, which leads the miss ratio is quite high
  so the efficiency goes down. We can see, the performance is different from
  file to file. Some file got the best performance in 4-way associative cache,
  but others might 32-way associative. Why the difference existing? I think, it
  is because the real effect depends on how the data in every file access the
  memory. File '026.compress.din.gz' can provide an example for this point. It
  reached its best performance always at the highest way associative in my
  experiments.

  The statistic shows: in all the experiments, there are 5 best performances in
  4-way associative cache; 1 in 8-way associative; 1 in 16-way associative; 3 in
  32-way associative; 2 in 512-way associative and 1 in 1024, 2048, 4096-way
  associative respectively. According to the statistic, it seems that 4-way
  associative appears to be the ideal choice for us. But it is not absolute,
  since the number of 5 is not dominant for the whole number. There is at least
  one thing we are sure which is multi-way can improve the performance greatly,
  compared to 1-way. And we need to point out: the high way associativities
  cannot improve the perfomance very much (usually less than 1%), even though
  they can improve. In some cases, the high way associative caches even perform
  worse than the low way associative caches, for example, in table 1.1, 1.2,
  1.4, 2.5, 3.2, 3.4. Based on the above, I suggest the low-way associative
  caches are the better choices.

  NOTE: since, in Assumption 7, I have assumed the time cost of access the
  different way associative cache is the excatly same, so in a real cache, the
  high-way associative one's performance should be somehow worse than its in my
  experiments.

II. How do write-back caches compare with write-through caches?

  For this issue, I selected 6 files from java directory and 4 files from spec92
  directory. For every file, I designed 4 experiments to check the effect. The
  first 2 use 'WRITE ALLOCATE' policy for WRITE MISS, but different WRITE HIT
  policies: one is 'WRITE BACK', and the other is 'WRITE THROUGH'. The last 2
  use 'WRITE NO ALLOCATE' policy for WRITE MISS, and different WRITE HIT
  policies used respectively. A running envirenment of a 64 KB cache with 4
  associaivity and 32 bytes cache lines is used for all the experiments. In
  order to compare the overall performance, only the data of TOTAL item of every
  experment are taken into account.

  * java files
  1. kawa.java.pi100.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.597   35.339
  WRITE THROUGH  0.597   66.058           1.87

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.995   33.794
  WRITE THROUGH  0.995   65.292           1.93

  Note:
  In the above table, 'T/B' is the result of the Average time of 'WRITE THROUGH'
  divided by the Average time of 'WRITE BACK'. The same as in the rest tables.

  2. linpack.java.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.474   31.702
  WRITE THROUGH  0.474   45.527           1.44

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.544   31.534
  WRITE THROUGH  0.544   45.427           1.44

  3. matmult.java.64.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.218   14.385
  WRITE THROUGH  0.218   28.127           1.96

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.979   13.596
  WRITE THROUGH  0.979   27.120           1.99

  4. qsort.java.50000.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.158   13.747
  WRITE THROUGH  0.158   35.362           2.57

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.474   11.979
  WRITE THROUGH  0.474   34.269           2.86

  5. queens.java.10.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.322   19.521
  WRITE THROUGH  0.322   41.770           2.14

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     1.056   20.390
  WRITE THROUGH  1.056   41.700           2.05

  6. wordfreq.java.rev.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.391   23.660
  WRITE THROUGH  0.391   63.876           2.70

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.742   22.367
  WRITE THROUGH  0.742   63.667           2.85

  * spec92 files
  1. 008.espresso.din.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.090   8.028
  WRITE THROUGH  0.090   46.070           5.74

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.890   9.383
  WRITE THROUGH  0.890   45.963           4.90

  2. 023.eqntott.din.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     0.538   34.528
  WRITE THROUGH  0.538   70.080           2.03

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     7.619   29.756
  WRITE THROUGH  7.619   60.290           2.03

  3. 026.compress.din.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     4.790   224.009
  WRITE THROUGH  4.790   229.582           1.02

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     10.197  159.140
  WRITE THROUGH  10.197  175.740           1.10

  4. 047.tomcatv.din.gz
  WRITE ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     1.488   99.495
  WRITE THROUGH  1.488   88.072           0.885

  WRTIE NO ALLOCATE:
  ------------------------------------------------------------------------
  HIT POLICY     %       Average time(ns) T/B
  ------------------------------------------------------------------------
  WRITE BACK     11.143  30.303
  WRITE THROUGH  11.143  43.262           1.43

  * conclusion:
  The experiment data show 'WRITE THROUGH' performed quite worse in the
  experiments I have done, compared to 'WRTIE BACK'. The difference of the
  average time between 'WRITE THROUGH' and 'WRITE BACK' is usually between
  1.1 ~ 2.86 times. It even reached 5.74 times for the file '008.espresso.din.gz'.
  There is only one exception, file '047.tomcatv.din.gz', when using 'WRITE
  ALLOCATE', the effect of 'WRITE THROUGH' is better than 'WRITE BACK'.

  The difference between 'WRITE BACK' and 'WRITE THROUGH' is: in 'WRITE BACK',
  the cache line is updated and marked as 'DIRTY', when it needs to be evicted
  later, the whole line needs to be write back to memory; in 'WRITE THROUGH',
  after update cache line, the line in memory will also be updated immediately,
  and the status of the cache line keeps 'CLEAN'. So the performance is
  dependent to the data file's memory access. If the write hit line in 'WRITE
  BACK' is less wirte hit again before it is evicted, then the penalty for
  'WRITE BACK' will be really heavy, since it only saves one access to memory,
  but needs 16 accesses to memory to write back the whole line (because the bus
  width is one word, ie 2 bytes, and the cache line is 32 bytes). But, if the
  hit line continues to be write hit heavily (> 15 times, because the number of
  memory access for writing the DIRTY line to memory is 16, and the line has
  been write hit once), before it is evicted, 'WRITE BACK' will win 'WRTIE
  THROUGH'. Because, in 'WRITE THROUGH', this line needs to write one word to
  memory to update for every write hit. So, the total number of memory accesses
  for this write hit under the policy of 'WRITE THROUGH' will exceed 16. But
  under the 'WRITE BACK' policy, this number is always 16. It results that
  'WRITE THROUGH' needs more accesses to memory than 'WRITE BACK' in this case,
  so 'WRITE BACK' wins. The heavier the line is write hit, the more 'WREIT BACK'
  wins. And, it seems likely in most cases, the line will be write hit
  frequently, since the miss ratio is very low in most experiments I have done.
  So, 'WRITE BACK' seems to be more possible to perform better. That is the
  phenomenon in my experiments.