DUJS 13S by Dartmouth Undergraduate Journal of Science

DUJS 13S

(a)

(c)

(b)

Figure 5.4: Figure (a) shows the percent of the precipitation examples misclassified in our models as a function of the prediction time. Figure (b) shows the percent of the non-precipitation examples misclassified in our models as a function of the prediction time. Figure (c) shows the total error rate as a function of prediction time.

lead to some over-classification of precipitation events, leading to a higher overall error. The RBF error in classifying precipitation events in the training set can be seen in Figure 5.5, the error for non-precipitation events can be seen in Figure 5.6, and the overall error can be seen in Figure 5.7 (next page). Note these figures contain results from both the Gaussian and thin-plate spline basis functions, for k-values of 100, 500 and 1000, and time intervals of 6, 12, 18 and 24 hours. It is interesting to notice that the plots for the classification error of non-precipitation events and overall classification error are very similar, but in fact it makes sense. There are so many more samples of non-precipitation than precipitation events that the total error rate and non-precipitation error rate should be similar, as non-precipitation events make up a very large proportion of the entire test set. A similar correlation can be seen in other algorithms that produced very low total error rates, which were initially

convincing, until it was determined that the algorithm had simply classified everything as non-precipitation. The over-classification done by the RBF network can be seen in the opposite trends of error in precipitation events and non-precipitation events as the time frame increased. To maintain a robust accuracy in predicting precipitation events, the algorithm began classifying more precipitation events over longer timeframes, clearly not all of which were accurate. Thus the accuracy of precipitation classification actually increased slightly as the timeframe increased, but the overall accuracy and accuracy of non-precipitation simultaneously declined linearly.

5.3 Comparison To compare the results of our networks to a baseline, we also ran our data through a few simpler models for comparison. We chose to run a forest of 20 randomly generated decision trees,

Table 5.1

RBF

Gaussian

Error hr,k = 6,100 hr,k = 6,500 hr,k = 6,1000 hr,k = 12,100 hr,k = 12,500 hr,k = 12,1000 hr,k = 18,100 hr,k = 18,500 hr,k = 18,1000 hr,k = 24,100 hr,k = 24,500 hr,k = 24,1000

Precip. 0.473 0.314 0.302 0.319 0.288 0.263 0.291 0.264 0.259 0.278 0.234 0.255

SPRING 2013

Thin-Plate Spline Non-Precip. 0.035 0.043 0.048 0.144 0.132 0.126 0.272 0.259 0.25 0.368 0.36 0.355

Total 0.059 0.058 0.062 0.164 0.149 0.141 0.275 0.259 0.251 0.349 0.334 0.334

Precip. 0.482 0.343 0.294 0.346 0.302 0.327 0.281 0.303 0.303 0.28 0.239 0.26

Non-Precip. 0.026 0.043 0.043 0.138 0.15 0.144 0.285 0.277 0.291 0.407 0.418 0.404

Total 0.051 0.06 0.057 0.161 0.167 0.164 0.284 0.281 0.293 0.381 0.381 0.374 37