Thesis projects a guide for students in computer science and information systems 9781848000087 29968 by Christian Mendez

Thesis projects a guide for students in computer science and information systems 9781848000087 29968

10 Presenting and Analysing your Data 3,75

Seconds

3,25 2,75

Old

2,25

New

1,75 1,25 0,75 10

Tuples (x 1,000)

Fig. 10.6 Misleading column plot, which makes the difference between the old and new algorithm seem larger by letting the y-axis begin at 0.75 s

for the old algorithm, although the real difference is that the new algorithm takes 1.1 s and the old 0.8 s. The new algorithm thus takes 37% longer to execute than the old one, which means that the column should only be 37% higher. This discrepancy decreases further up the x-axis, but is present throughout the whole graph. At 20,000 tuples, the column for the new algorithm is 55% higher than the column for the old algorithm, but Table 10.2. shows that the new algorithm took only 31% longer to execute than the old one (2.1 and 1.6 s, respectively). There are tools available today for drawing graphs very rapidly by automating part of the process. Some of these tools are easy to learn, and very efficient to use for generating graphs for your report. A drawback, however, is that they can sometimes automatically create misleading graphs. The shortening of the y-axis of Fig. 10.6, for example, may be done automatically by some tools. You must therefore inspect the graph carefully, and correct any mistakes made by the tool.

10.2.3

Significance Tests

You may be presenting a numerical comparison of data from experiments or simulations where you have varied one parameter. In such a case, if the system you have studied is stochastic, i.e. if the outcome of an individual run depends to some degree on chance, then it will be necessary to repeat each parameter setting a number of times in order to present the average result from a number of runs. This is because a single run can produce an untypical result by chance. Thus, it is important that any results that are going to be considered when drawing conclusions, are not just random effects. This can be done by applying a test for statistical significance. As an example, we can observe that the advantages of the old algorithm in Fig. 10.4 were not statistically significant, since the error bars overlap for every plotted point in the interval 10,000 â&#x20AC;&#x201C;60,000 tuples. The differences between the