Page 1

I nternational Journal Of Computational Engineering Research ( Vol. 3 Issue.1

Memory Efficient Bit Split Based Pattern Matching For Network Intrusion Detection System 1,

Borra Sudhakiran, 2,G.Nalini 1,

M.Tech student, 2,Asst. Professor Lenora College of Eng ineering , Rampachodavaram,E.G,Dt.,India


Abstract: In recent days hardware based Network intrusion detection system is used to inspect packet contents against thousands of pre defined malicious or suspicious patterns I order to support the high speed internet download. Because traditional software alone pattern matching approaches can no longer meet the high throughput of today‟s networking, many hardware approaches are proposed to accelerate pattern matching. A mong hardware approaches, memo ry-based architecture has attracted a lot of attention because of its easy reconfigurability and scalability. In order to accommodate the increasing number of attack patterns and meet the throughput requirement of networks, a successful network intrusion detection system must have a memory -efficient pattern-matching algorith m and hardware design. In this paper, we propose a memory-efficient pattern-matching algorith m wh ich can significantly reduce the memory requirement. Here we propose bit split based pattern matching which will match more pattern as compared to any ASCII value based matching.

1. Introduction Network Intrusion Detection Systems (NIDS) perfo rm deep packet inspection. They scan packe t‟s payload looking fo r patterns that would indicate security threats. Matching every incoming byte, though, against thousands of pattern characters at wire rates is a complicated task. Measurements on SNORT show that 31% of total processing is due to string matching; the percentage goes up to 80% in the case of Web -intensive traffic [20]. So, string matching can be considered as one of the most computationally intensive parts of a NIDS and in this thesis we focus on payload matching.Many different algorithms or comb ination of algorith ms have been introduced and implemented in general purpose processors (GPP) for fast string matching[16, 20, 42, 35, 3, 2], using mostly SNORT open source NIDS rule -set [38, 41]. However, intrusion detection systems running in GPP can only serve up to a few hundred Mbps throughput. Therefore, seeking for hardware -based solutions is possibly the only way to increase performance for speeds higher than a few hundred Mbps. Until now several ASIC co mmercial products have been deve loped [31, 30, 27,28, 29, 32]. These systems can support high throughput, but constitute a relatively expensive solution. On the other hand, FPGA -based systems provide higher flexibility and comparable to ASICs performance. FPGA -based platforms can explo it the fact that the NIDS rules change relatively infrequently, and use reconfiguration to reduce imp lementation cost. In addition, they can exploit parallelis m in order to achieve satisfactory processing throughput. Several architectures have been proposed for FPGA-based NIDS, using regular expressions (NFAs/DFAs) [40, 34, 36, 22, 14, 15], CAM [23], discrete comparators [13, 12, 7, 6, 5, 43, 44], and approximate filtering techniques [4, 18]. Generally, the performance results of FPGA systems are promising, showing that FPGAs can be used to support the increasing needs for network security. FPGAs are flexib le, reconfigurable, prov ide hardware speed, and therefore, are suitable for imp lementing such systems. On the other hand, there are several issues that should be faced. Large designs are complex and therefore hard to operate at high frequency. Additionally, matching a large number of pat -terns has high area cost, so sharing logic is crit ical, since it could save a significant amount of resou rces, and make designs smaller and faster. 2. Software-Based Packet Inspection Network Intrusion Detection Systems (NIDS) attempt to detect attacks by monitoring in -co ming traffic for suspicious contents. They collect data from network, monitor activ it y across network, analyze packets, and report any intrusive behavior in an automated fashion. Intrusion detection systems use advanced pattern matching techniques (i.e. Boyer and Moore, Aho and Corasick, Fisk and Varghese) on network packets to identify kn own attacks. They use simple rules (or search patterns) to identify possible security threats, much like v irus detection software, and report offending packets to the admin istrators for further actions. NIDSs should be updated frequently, since new signatu res may be added or others may change on a weekly basis. A. SNORT RUL E: SNORT is an open-source NIDS that has been extensively used. Based on a rule database, SNORT monitors network traffic and detect intrusion events. Many researchers developed string matching algorith ms, combination of algorith ms and techniques such as pre-filtering in order to improve SNORT‟s performance. SNORT rule can contain header and content fields. The header part checks the protocol, and source and destination IP address a nd port. The content part scans packets payload for one or more patterns. The matching pattern may be in ASCII, HEX or mixed format. HEX parts are between vertical bar sy mbols „j‟. An example of a SNORT rule is: ||Issn 2250-3005(online)||

||January || 2013

Page 22

I nternational Journal Of Computational Engineering Research ( Vol. 3 Issue.1

alert tcp any any -> 111(content:

"idcj3a3bj"; msg: "mountd access";)

The above rule looks for a TCP packet, with any source IP and port, destination IP =, and port=111. To match this ru le, packet payload must contain pattern ”idc j3a3bj”, which is ASCII characters ”i”, ”d”, and ”c” and also bytes ”3a”, and ”3b” in HEX format. Intrusion Detection Systems: Intrusion detection systems are able to perform protocol analysis and state ful inspection. They also detect content-based security threats, while tradit ional firewalls cannot. Their major bottleneck is pattern matching [17], which limits NIDS perfo rmance.

Fig 1.Character occurrence in patterns The pattern length is between 1 to 107 characters, while the average size of each pattern is 12.3 characters. Mo st patterns contain less than 20 characters, while 80% of the patterns are 1 to 17 characters long, and almost all of them (99.5%) have less than 40 bytes length. Half of the matched characters are included in patterns less than 15 bytes long, and patterns with less than 50 bytes contain almost all of the matching characters (99%). B. FPGA-BAS ED S TRING MATCH: One of our first ideas for FPGA -based string match was to recode or encode the incoming data (i.e. Huffman encoding [26]). This idea would possibly be interesting if the most frequently used characters could be encoded in 4 bits or less. That is because of the FPGAs‟ structure, the smallest logic element of devices can implement logic functions that have 4 bits input in a 4-input LUT. Otherwise, two or mo re logic cells are needed. So, in order to use fewer logic cells for the matching, the encoded bits must be less than 5.The 16 most frequently used characters (can be encoded in 4 bits), account for 61% of the total number of characters. Howe ver, Huffman encoding would possibly not offer considerable potential, since even if for these most frequent characters a designer could half the cost of matching, the overhead for matching the rest of the characters would be about equal to the gained logic. C. ALGORITHMS IN MIS US E DET ECTION:  Simp le string matching  State Machine Matching Simple string matching: The Boyer-Moore algorith m[5] uses two different heuristics for determin ing the maximu m possible shift distance in case of a mismatch: the “bad character” and the “good suffix” heuristics. The first heuristic, referred to as the bad character heuristic, works as follo ws: if the search pattern contains a mis matching character (that is different fro m corresponding character in the given text), the pattern is shifted so that the mismatching character is aligned with the right most position at which it appears inside the pattern. The second heuristic, works as follows: if a mismatch is found in the middle of the pattern, the search pattern is shifted to the next occurrence of the matched suffix in the pattern. Both heuristics can lead to a shift distance of m. For the bad character heuristics this is the case, if the first comparison causes a mis match and the corresponding text symbol does not occur in t he pattern at all. For the good suffix heuristics this is the case, if only the first comparison was a match, but that symbol does not occur elsewhere in the pattern. And with the help of preprocessed “bad character” and “good suffix” values, one can finds the value of shift needed as the max of these two. State Machine Matching : Aho/Corasick String Matching Automaton for a given finit e set P of patterns is a (determin istic) finite automaton G accepting the set of all words containing a word of P .Formation about where to jump to for each character ∈ . It just traverses the string to be matched making transitions via the δ, the transition function which tells which state to jump for each character ∈ . Whenever we reach a state ∈ F , a match is reported by the engine. For simple string matching cases, it does not performs very well but when there are multiple patterns or pattern matching is done at regular expression level, it is one of the best options for pattern matching. ||Issn 2250-3005(online)||

||January || 2013

Page 23

I nternational Journal Of Computational Engineering Research ( Vol. 3 Issue.1

3. . Nfa/Dfa Implementation At Hardware Level

Sidhu and Prasanna in [18],first time imp lemented NFA matching onto programmab le logic in O(n 2 ) logic and still provid ing O(1) access time. They imp lemented One-Hot Encoding (OHE) scheme, where one flip-flop is associated with each state and at anytime only one is active. Then combinational logic associated with each flip flop ensures that this 1-bit is transferred to flip-flop corresponding to next state in the DFA. For fitting in logic of the existing patterns, first DFA is formed and then a NFA. Now each transition is mapped to these flip -flop structure. Taking care of the z transitions in the NFA‟s by providing the same input to next state also, and usage of LUT‟s for co mparing the input character, they are able to map the patterns to the FPGA‟s. A. Hardware-based String Matching & Packet Ins pection Given the processing bandwidth limitations of General purpose processors (GPP), which can serve only a few hundred Mbps throughput, H/W-based NIDS (ASIC o r FPGA) is an attractive alternative solution. Many ASIC intrusion detection systems usually store their rules using large memory blocks, and examine incoming packets in integrated processing engines. Generally, ASICs programmab le security co -processors are expensive, complicated, and although they can support higher throughput compared to GPP, they do not achieve impressive performance. The memo ry blocks that store the NIDS ru les are re -loaded, whenever an updated rule-set is availab le. The most common technique for pattern matching in ASIC intrusion detection systems is the use of regular exp ressions. Updating the rule-set is not a trivial procedure, since the system must be able to support a variation of rules, with sometimes complex syntax, and special features. On the other hand, FPGAs are more suitable, because they are reconfigurable, they provide H/W speed and exp loit parallelism. B. FPGA-based String Matching One of the first attempts in string matching using FPGAs, presented in 1993 by Pryor, Thistle and Sh irazi. Their algorith m, imp lemented on Splash-2 platform, and succeeded to perform a d ictionary search, without case sensitivity patterns, that consisted of English alphabet characters (26 characters). Pryor et al. managed to achieve great performance and perform a low overhead AND-reduction of the match indicators using hashing. Since 1993, many others have worked on implementing FPGA -based string match systems.

Fig 2 Hardware NFA i mplementati on of the following regular expression

4. Proposed Work A. Memory-B ased Bit-S plit DFA Fro m the definit ion in [11], DFA is an FSM where there is one and only one transition to a next state according to each pair of state and input symbols. DFA can be represented with a five -tuple: a finite set of states (Q), a finite set of Pinput symbols (), a transition function (:Q!Q),an init ial state (q 0 2Q), and a set of output states (FQ). The identification index of a target pattern is an individual keyword used to distinguish the target pattern match. The memory requirements of DFA are proportional to the size of Q and. B. Pattern Identification

Fig 3 State di agram of an AC machine. For each target pattern, a unique identificat ion index should be provided in order to distinguish its pattern match from other pattern matches. If mu ltiple target patterns are mapped onto a DFA, it is possible that a target pattern can be a sub pattern of other target patterns. For examp le, it is assumed that four target patterns {“abc,” “abcd,” “ac,” “bcd”} are mapped on a DFA, where target pattern lengths range fro m 2 to 4. ||Issn 2250-3005(online)||

||January || 2013

Page 24

I nternational Journal Of Computational Engineering Research ( Vol. 3 Issue.1

Fig 4 Merging similar states. The fourth target pattern is a suffix of the second target pattern. If the second target pattern is matched, the fourth targe t pattern is always matched, but not vice versa

5 . Basic Memory Architecture. In terms of reconfigurability and scalability, the memory architecture has attracted a lot of attention because it allo ws on-the-fly pattern update on memory without resynthesis and relayout. The basic memory architecture works as follows. First, the (attack) string patterns are compiled to a finite-state machine (FSM) whose output is asserted when any substring of input strings matches the string patterns.

Fig 5 .DFA for matching “ bcdf” and “ pcdg”. Then, the corresponding state transition table of the FSM is stored in memory. For instance, Fig. 1 shows the state transition graph of the FSM to match two string patterns “bcdf” and “pcdg”, where all transitions to state 0 are omitted. States 4 and 8 are the final states indicating the matching of string patterns “bcdf” and “pcdg”, respectively. Fig. 2 presents a simple memo ry architecture to imp lement the FSM. In the architecture, the memory address register consists of the current state and input character; the decoder converts the memo ry address to the corresponding memory location, which stores the next state and the match vector information. A “0” in the match vector indicates that no “suspicious” pattern is matched; otherwise the value in the matched vector indicates which pattern is matched.

Fig 5 Proposed archi tecture

Fig 6 simul ated output ||Issn 2250-3005(online)||

||January || 2013

Page 25

I nternational Journal Of Computational Engineering Research ( Vol. 3 Issue.1

Fig 7 states used by proposed method

6. Conclusion: The proposed DFA-based parallel string matching scheme min imizes total memory requirements. The problem of various pattern lengths can be mitigated by dividing long target patterns into sub patterns with a fixed length. The memo ry-efficient bit-split FSM arch itectures can reduce the total memory requirements. Considering the reduced memo ry requirements for the real rule sets, it is concluded that the proposed string matching scheme is useful for reducing total memo ry requirements of parallel string matching engines. REFERANCE [1] P.-C. Lin, Y.-D. Lin, T.-H. Lee, and Y.-C. Lai, “Using String Matching for Deep Packet Inspection,” IEEE Co mputer, vol. 41, no. 4, pp. 23-28, Apr. 2008. [2] Snort, Ver.2.8, Network Intrusion Detection System, http://, 2011. [3] Clam AntiVirus, Ver.0.95.3., 2011. [4] C.-H. Lin, Y.-T. Tai, and S.-C. Chang, “Optimization of Pattern Matching Algorithm for Memory Based Architecture,” Proc. Third A CM/IEEE Sy mp. A rchitecture for Networking and Co mm. Systems, pp. 11 -16, 2007. [5] Deterministic Finite -State Machine, wiki/Determin istic_fin ite_state_machine, 2011. [6] H. Kim, H. Hong, H.-S. Kim, and S. Kang, “A Memory-Efficient Parallel String Matching for Intrusion Detection Systems,” IEEE Co mm. Letters, vol. 13, no. 12, pp. 1004-1006, Dec. 2009. [7] Virtex-4 FPGA User Guide, http://www.xilin m/support/ documentation/user_guides/ug070.pdf., 2011. [8] F. Yu, Z. Chen, Y. Diao, T.V. Lakshman, and R.H. Kat z, “Fast and Memory -Efficient Regular Expression Matching for Deep Packet Inspection,” Proc. Second ACM/IEEE Sy mp. Architecture for Networking and Co mm. Systems, pp. 93-102, 2006. [9] A.V. Aho and M.J. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” Co mm. A CM, vol. 18, no 6, pp. 333-340, 1975. [10] L. Tan and T. Sherwood, “A High Throughput String Matching Architecture for Intrusion Detection and Prevention,” Proc. 32nd IEEE/A CM Int‟l Sy mp. Co mputer Architecture, pp. 112-122, 2005. [11] L. Tan, B. Brotherton, and T. Sherwood, “Bit-Split String-Matching Engines for Intrusion Detection and Prevention,” ACM Trans. Architecture and Code Optimization, vol. 3, no. 1, pp. 3 -34, Mar. 2006.

||Issn 2250-3005(online)||

||January || 2013

Page 26