Journal of Computer Science and Information Security January 2011

Page 9

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 1, 2011

The 1D ANNaBell was only conceptually a map because there was very little visually, just a jagged line, to observe. A particular jag in the line represented the BMN only for the IP addresses of two local computers infected with a certain kind of bot. This node was used to find additional infected computers. The SOM hound dog had the scent, but it was not clear to technicians what the scent was---thus, the black box effect.

with http as being the primary characteristic. See that paper for the entire graphic and an explanation of it. Fig. 2 shows an 11x3 sample cutout of an 18x14 SOM from Hoglund [11] in a hexagonal format representing user behaviors such as CPU times, characters transmitted, and blocks read in a U-matrix display, meaning that every other hexagon is a node (marked in Fig. 2 with either a dot or a numerical label) and that the intervening hexagons are in a grey scale indicating the distances between the neighboring nodes, darker meaning a larger distance and lighter meaning a closer distance. The labels indicate a user number and the number of Best Matching Node (BMN) hits in a node for that user. For example, 127_8 means that User 127 had 8 BMN hits on the node with that label. A single user as reported in this paper can have hits on nodes in numerous areas of the map. The hexagons provide better representation than a standard 2D layout, but the rectangular layout of the hexagons limits this potential. The U-matrix display is an advantage in that it visually highlights clusters of nodes. The researchers on this project probably have a good idea of the characteristics of various clusters, but these characteristics are not readily apparent from the displayed map. See that paper for the entire graphic and explanation of it. Rectangular U-matrix hexagonal maps were also used by Cho [12] in 2002.

So 1D ANNaBell was redesigned, using the same data, as a hexagonal map with the intent of producing something visual which would aid technicians in understanding the SOM process. Some of the methodology, using grey scale, for this hexagonal ANNaBell was described by Langin [18 and 19]. This paper continues the methodology by showing how colorization influenced the map, and this paper also shows the map as a 3D island. Look ahead to Fig. 14 for the full color map and Fig. 23 for the 3D island to see where this is leading. The source data for ANNaBell Island is from firewall logs and is in the form of a six dimensional vector for each local IP address---these are the pertinent features, given here as a reference for the rest of this paper:

Fig. 3 is a sample cutout of a SOM from Kayacik [13] in 2003 based on network traffic where each hexagon is a node and the amount of filling in the hexagon represents how many BMN hits the corresponding node has (the more hits, the larger the filling). This can create different patterns for different types of traffic, attack vs. normal traffic, for example. See that paper for the entire graphic and explanation of it. A similar histogram map was used by Yeloglu [14] in 2007. This type of map produces useful visual patterns, but does not indicate distances between nodes nor characteristics of nodes. Fig. 4 is a sample cutout of a U-matrix SOM from Kayacik [15] in 2006 which has been labeled with acronyms and with boundaries drawn to enclose clusters. MHP, for example, stands for multihop, and is in a region in this cutout called hostbased attack group. See that paper for the full graphic and an explanation of it. This type of map provides more information that previous ones, but is still somewhat cryptic. III.

BACKGROUND

1

tot_norm: Total normalized. The total number of log entries in a 24 hour period, normalized. The lowest number of entries in the source data for a local IP address was 0 and the highest number was 2,020,349. These counts were normalized to a range of 0 to 1.

2

src_rat: Source ratio. The ratio of unique source (external) IP addresses to the total number of log entries.

3

port_rat: Port ratio. The ratio of unique destination (local) ports to the total number of log entries.

4

lo_norm: Lowest port normalized. The lowest attempted destination (local) port, normalized from 0 to 1, with the lowest possible port being 0 and the highest possible port being 65,535.

5

hi_norm: Highest port normalized. The highest attempted destination (local) port, normalized from 0 to 1, with the lowest possible port being 0 and the highest possible port being 65,535.

6

udp_rat: UDP ratio. The ratio of UDP network traffic to all network traffic.

For example, a local IP address with 1,548 log entries in a 24-hour period, from 139 external IP addresses, directed at 58 local ports, from Port 22 to Port 61,123, with 1,345 of the log entries being for UDP traffic would have a vector of 0.000766204, 0.089793282, 0.0374677, 0.000335698,

This research evolved from a one dimensional SOM, now called 1D ANNaBell, reported by Langin [16 and 17]. This 1D ANNaBell has discovered numerous real life instances of malicious network traffic, being the first self-trained computational intelligence to find feral malware, as far as the authors know, on March 29, 2008, and is still in production after more than two years.

Figure 3, Histogram

Figure 2, U-matrix

2

Figure 4, Acronyms

http://sites.google.com/site/ijcsis/ ISSN 1947-5500


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.