SYN-dog: Sniffing SYN Flooding Sources Λ - PDF

Description
SYN-dog: Sniffing SYN Flooding Sources Λ Haining Wang Danlu Zhang Kang G. Shin Real-Time Computing Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor,

Please download to get full document.

View again

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Automobiles

Publish on:

Views: 14 | Pages: 8

Extension: PDF | Download: 0

Share
Transcript
SYN-dog: Sniffing SYN Flooding Sources Λ Haining Wang Danlu Zhang Kang G. Shin Real-Time Computing Laboratory Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, MI fhxw, danlu, ABSTRACT This paper presents a simple and robust mechanism called SYN-dog to sniff SYN flooding sources. We install SYN-dog as a software agent at leaf routers that connect stub networks to the Internet. The statelessness and low computation overhead of SYN-dog make itself immune to any flooding attacks. The core mechanism of SYN-dog is based on the protocol behavior of TCP SYN SYN/ACK pairs, and is an instance of the Sequential Change Detection []. To make SYN-dog insensitive to site and access pattern, a non-parametric Cumulative Sum (CUSUM) method [4] is applied, thus making SYNdog much more generally applicable and its deployment much easier. Due to its proximity to the flooding sources, SYN-dog can trace the flooding sources without resorting to expensive IP traceback.. INTRODUCTION The growing DDoS attacks have imposed a significant threat on the availability of network services [2]. Due to the readily available tools and its simple nature, flooding packets is the most common and effective DoS attack. More than 9% of the DoS attacks use TCP [8], and TCP SYN flooding dominates in the available attacking tools and the number of known DoS attacks [6]. The SYN flooding consists of a stream of spoofed SYN packets directed to a listening TCP port of the victim, which exploits the TCP three-way handshake mechanism and its limitation in maintaining half-open connections. Under the normal condition, when a server receives a SYN request, it sends a SYN/ACK packet back to the client and waits for client s acknowledgment. Before the SYN/ACK packet is acknowledged by the client, the connection remains in half-open state for a period of up to the TCP connection timeout. The half-open connection is not closed until the failure of two retransmissions, which typically lasts for 75 seconds. The server has built in its system memory a backlog queue to maintain all half-open connections. However, if a SYN request is spoofed, the victim server will never receive the final ACK packet from the client to complete the threeway handshake. Since this backlog queue is of finite size, the flooding of spoofed SYN requests can easily exhaust the victim server s backlog queue, causing all of new incoming SYN requests to be dropped. The stateless and destination-based nature of Internet routing infrastructure cannot differentiate a legitimate SYN from a spoofed one, and TCP does not offer strong authentication on its SYN packets. Therefore, under SYN flooding attacks, the victim server cannot respond only to legitimate connection requests while ignoring the Λ Haining Wang and Kang G. Shin were supported in part by Samsung Electronics, Inc. and by the Office of Naval Research under Grant No. N spoofed. Note that the spoofed source address must be an invalid IP address so that it can t be reachable from the victim; otherwise, any endhost that receives the SYN/ACKs from the victim would send a RST to the victim. A RST packet is issued when the receiving host does not know what to do with the received packet. The arrival of RST causes the connection to be reset, foiling the flooding attack. Most of previous work in countering SYN flooding attacks focused on mitigating the flooding effect on the victim, such as Syn cookies [3], SynDefender [6], Syn proxying [9] and Synkill [24]. All of these defense mechanisms are stateful, i.e., states are maintained for each TCP connection or state computation is required like Syn cookies, which makes the defense mechanism itself vulnerable to SYN flooding attacks. Moreover, the defense mechanisms installed at the firewall of the victim server or inside the victim server can not give any hint about the SYN flooding sources, and hence, must rely on the expensive IP traceback [2, 2, 23, 26, 27, 32] to trace the flooding sources. In this paper, we propose a simple and robust mechanism called SYN-dog to sniff SYN flooding sources without resorting to expensive IP traceback. The statelessness and low computation overhead of SYN-dog make itself immune to any flooding attacks. Instead of monitoring the ongoing traffic at the front end or the victim server itself, we install SYN-dog as a software agent at leaf routers that connect end hosts to the Internet. The key feature of SYN-dog is to utilize the inherent TCP SYN SYN/ACK pair s behavior for sniffing SYN flooding sources. (active open) SYN_SENT ESTABLISHED (active close) FIN_WAIT FIN_WAIT TIME_WAIT Client SYN J SYN K ACK J+ ACK K+ FIN M ACK M+ FIN N ACK N+ Server LISTEN (passive open) SYN_RCVD ESTABLISHED CLOSE_WAIT (passive close) LAST_ACK CLOSED Figure : TCP states corresponding to normal connection establishment and teardown (from [29]) The SYN and SYN/ACK packets signal the start of a TCP connec- Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2) /2 $7. 22 IEEE tion establishment in each direction. As shown in Figure that is borrowed from [29], in the ideal case, one appearance of a SYN packet results in the corresponding transmission of a SYN/ACK packet in the reverse direction within one round-trip time (RTT). Although there is no strict one-to-one match between SYN and SYN/ACK packets due to SYN losses and subsequent retransmissions, under the normal condition, a very strong positive correlation between SYN and SYN/ACK does exist as shown in Section 4.. The discrepancy between the number of SYNs and SYN/ACKs mostly happens for the following two reasons. ffl Some TCP servers are overloaded, and drop the SYN requests without generating SYN/ACK responses. ffl The forwarding path of SYNs is congested, and as a result, some SYNs are dropped before they reach their destinations, so no corresponding SYN/ACKs are generated. Due to its proximity to the flooding sources, SYN-dog can trace the flooding sources without resorting to expensive IP traceback. The flooding sources must be inside the subnet to which the leaf router is connected. Neither state nor state computation is involved in our SYN-dog. Only two new variables are introduced to measure the number of received SYN and SYN/ACK packets at the inbound and outbound interfaces, respectively. We refer to the traffic flowing from the Internet to the Intranet as inbound, and the traffic in the other direction as outbound. Based on this SYN SYN/ACK pair s behavior, the dynamics of the difference between the number of SYN and SYN/ACK packets can be viewed as a stationary, ergodic random process, and SYN-dog is an instance of the Sequential Change Detection []. To make SYN-dog independent of sites and access patterns, the difference between the number of SYNs and SYN/ACKs is normalized by an estimated average number of SYN/ACKs. The non-parametric Cumulative Sum (CUSUM) method [4] is applied, making SYN-dog much more generally applicable and its deployment much easier. The efficacy of SYN-dog is validated by trace-driven simulations. The evaluation results show that SYN-dog has short detection time and high detection accuracy. Due to its proximity to the flooding sources, SYN-dog mechanism not only alarms on the ongoing SYN flooding attacks but also reveals the location of the flooding sources. It is also incrementally deployable and works without requiring a wide installation of SYN-dogs. The remainder of this paper is organized as follows. Section 2 discusses the issues related to SYN-dog. Section 3 describes the proposed detection algorithm based on the TCP SYN SYN/ACK pair s behavior. Section 4 validates and evaluates the performance of SYN-dog using trace-driven simulations. Finally, conclusions are drawn in Section ISSUES RELATED TO SYN-DOG Two issues closely related to the SYN-dog mechanism are discussed in this Section. One is the structure of SYN-dog, and the other is packet classification. The SYN-dog consists of two Sniffers, which are installed at the inbound and outbound interfaces of a leaf router, respectively. The one installed at the outbound interface is the outbound Sniffer that counts the outgoing SYNs, while the one installed at the inbound interface is the inbound Sniffer that counts the incoming SYN/ACK. Figure 2 illustrates the structure of SYN-dog at a leaf router. The two sniffers coordinate with each other via shared memory, or IPC inside the router, and periodically exchange the counting information. Intranet Outbound Inferface Out-bound Sniffer In-bound Sniffer Inbound Interface Internet Figure 2: The structure of SYN-dog at a leaf router SYN-dog is, in some sense, a by-product of the router infrastructure that differentiates TCP control packets from data packets [3]. This packet classification was originally motivated by the desire of providing fine-grained service differentiation to IP flows. Largescale packet classification mechanisms [4, 5, 28] have been proposed, making it possible to distinguish the TCP SYN and SYN/ACK packets at leaf routers at a very high speed. To identify TCP SYNs and SYN/ACKs, the TCP header needs to be accessed. This identification is performed at leaf routers, which are usually the trusted entities for the clients in the same intranet. A multi-layer IPSec protocol [33] has been proposed, which allows trusted routers to access the transport-layer information. Therefore, the network-level security of IPSec can not be an obstacle to the identification and counting of TCP SYNs and SYN/ACKs at leaf routers. Briefly, packets are classified as follows. First, we check if the IP packet contains a TCP header. The IP packet that contains the TCP header must have zero fragmentation offset. Then we compute the offset of TCP flag bits in the IP packet. Finally, the six TCP flag bits are read to determine the type of the TCP segment. The detailed description of the packet-classification algorithm is given in [3]. 3. STATISTICAL FLOODING DETECTION The rationale behind SYN-dog is that a flooding attacker s behavior is noticeably different from that of a legitimate user. It compares the observed sequence with the profile representing the user s normal behavior, and detects any significant discrepancy. Moreover, as will be seen in Section 3.2, SYN-dog can detect flooding attacks even when the normal connection arrivals are bursty and time-varying. 3. Sniffing Mechanism The total number of outgoing SYNs and incoming SYN/ACKs are recorded during every observation period, t, at leaf routers. The setting of the observation period t must balance the sniffing resolution and the algorithm s stability; t is set to 2 seconds in our CUSUM algorithm. Note, however, that our algorithm is insensitive to this choice. At the end of each observation period, the number of outgoing SYNs counted by the outbound Sniffer and the number of incoming SYN/ACKs counted by the inbound Sniffer are reported to the SYN-dog s CUSUM algorithm. Under the normal condition, the difference between the above collected numbers of outgoing SYNs and incoming SYN/ACKs is bounded compared to the total number of active TCP connections. This observation holds in spite of the fact that the total number of active TCP connections may be bursty on a small time scale, and slowly-varying on a large time scale. According to the specification of TCP/IP protocol [22, 29], an outgoing SYN is paired with an incoming SYN/ACK within one RTT. In other words, the strong correlation between the number of SYNs and SYN/ACKs is not sensitive to the request ar- Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2) /2 $7. 22 IEEE rival process. Figures in Section 4. clearly show that the consistent synchronization between the SYNs and SYN/ACKs is independent of the sample time, sites and time-of-day. However, under SYN flooding attacks, there will be much more outgoing SYNs than incoming SYN/ACKs collected by the sniffers of SYN-dog. The SYN flooding traffic has significant regularity and semantics that can be filtered out. Recent experiments with SYN attacks on commercial platforms have shown that the minimum flooding rate to overwhelm an unprotected server is 5 SYN packets per second. With a specialized firewall designed to resist against SYN floods, a server can be disabled by a flood of 4, SYNs per second [8]. To shut down the victim server for minutes, for example, the group of attackers need to inject at least a total of 3, SYN packets. During the same time period, however, the number of SYN/ACKs counted by the inbound Sniffer remains largely unchanged. Therefore, the difference between the number of outgoing SYNs and incoming SYN/ACKs will dramatically increase, and remain large during the whole flooding period that typically lasts for several minutes [8]. Therefore, the occurrence of a large difference between the number of SYNs and SYN/ACKs implies the existence of flooding sources in its stub network. 3.2 The CUSUM Algorithm Let f n;n =; ; gbe the number of outgoing SYNs minus that of incoming SYN/ACKs collected from the above sampling. To alleviate its dependence on the time, access pattern and size of the network, f ng is normalized by the average number K μ of incoming SYN/ACKs during the sampling period t. K μ can be estimated in real time and updated periodically. An example of recursive estimation and update of K μ is: μk(n) =ff μ K(n ) + ( ff)syn/ack(n); () where n is the discrete time index and ff is a constant lying strictly between and that represents the memory in the estimation. Define X n = n= μ K. The mean of X n, denoted as c,ismuchless than. fx ng is not dependent on the network size or time-of-day. Its dynamics are solely the consequence of the TCP protocol specification. So, we can consider fx ng as a stationary random process. Our flooding source sniffing algorithm is based on the Sequential Change Detection []. The objective of Change Detection is to determine if the observed time series is statistically homogeneous. It has been studied extensively by statisticians. See [] and [4] for a good survey. The existing algorithms can be largely divided into two categories: posterior and sequential. Posterior tests are done off-line where the whole data segment is collected first and then a decision about homogeneity is made based on the analysis of all the collected data. On the other hand, sequential tests are done on-line with the data presented sequentially and the decisions are made on the fly. We adopt a sequential test for a quicker response when a flooding attack occurs. It also saves memory and computation. Despite of the numerous works on the modeling of the arrival process of TCP connection requests [5, 7,, 3, 2, 25], there is no consensus on whether it should be modeled as self-similar or Poisson. For such a dynamic and complicated entity like the Internet, it may not be possible to model the total number of TCP connections at all times by a simple parametric model. Therefore, we seek robust tests which are not model-specific. Non-parametric methods fit this requirement very well. In particular, we apply the non-parametric CUSUM (Cumulative Sum) method [4] toour attack sniffing. This method enjoys all the virtues of sequential and non-parametric test, and the computation load is very light. When the time series is i.i.d. with a parametric model, CUSUM is asymptotically optimal for a wide range of Change Detection problems [, 4]. fx ng is assumed to satisfy some regularity conditions. The details can be found in [4]. In practice, they are very mild and easily satisfied even by long range dependent arrival processes. In general, E(X n) = c . We choose a parameter a cand define ~X n = X n a so that it has a negative mean during normal operation. When a flooding attack takes place, ~ X n will quickly become a large positive. Suppose, during an attack, the increase in the mean of ~ X n can be lower-bounded by h. Our change detection is based on the observation of h fl c. Define S k = P k i= ~ X i, with S =at the beginning. Let y n = (y n + ~ X n) + ; (2) y = ; where x + is equal to x if x and otherwise. It can be shown that y n = S n min»k»n S k; (3) i.e., y n is the maximum continuous increment until time n. Alarge fy ng is a strong indication of a flooding attack. Since Eq. (2) is iterative and much easier to compute than Eq. (3), we will use it in making detection decisions. Let d N (:) be the decision at time n: for normal operation (homogeneity) and for attack (a change occurs). Here N represents the flooding threshold: d N (y n)=ρ if yn» N; if y n N: In other words, d N (y n)=i(y n N), wherei(:) is the indicator function. The effect of introducing a is to offset the possible positive mean in fx ng so that the test statistic y n will be reset to zero frequently and will not accumulate with time. In this algorithm, there are two parameters involved: a, the upper bound in case of normal operation and N, the flooding threshold. These values affect the performance. Let P m(e m) be the probability measure (expectations) of f ~ X ng with the attack occurring at time m and P(E) be the counterparts of f ~ X ng without any attack. There are two fundamental performance measures for the sequential change detection. False alarm time (the time without false alarm): the time duration with no false alarm reported when there is no attack. Detection time: the detection delay after the attack started. One would want the second measure to be as short as possible while keeping the first measure as long as possible. However, they are conflicting goals and cannot be simultaneously met. Therefore, the design philosophy of a statistical change detection is to minimize the detection time subject to a certain false alarm tolerance, like average time between two consecutive false alarms, worst-case false alarm time, and so on. The CUSUM rule has been shown to be asymptotically optimal with respect to the worst-case mean false alarm time when the parametric model is known for the data and the observations are independent. Due to the lack of a complete model for f ~ X ng, itisdifficult to discuss optimality. The choice of CUSUM is based on its simplicity in computation and non-parametric implementation, as well as its (4) Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2) /2 $7. 22 IEEE generally excellent performance. It has been shown in [4] that, with the choice of a and N, asn!,wehave Pfd N (n) =g = c exp ( c 2N): (5) In other words, the time between consecutive false alarms grows exponentially with N. c and c 2 are constants, depending on the marginal distribution and mixing coefficients of f ~ X ng. The burstiness of the traffic is reflected by the mixing coefficients ψ(s), and thus, does impact the detection performance. However, the constants c and c 2 only play a secondary role and can be ignored in practice. In order to study the detection time, let s define fi N = fi (d N ) = inffn : d N (:) =g; (6) (fin m)+ ρ N = ; N where ρ N represents the normalized detection time after a change occurs and m represents the starting time of the attack. In CUSUM, for any m, ifh and a represent accurate values instead of bounds, we have ρ N! fl = h jc aj ; (7) where h jc aj is the mean of f X ~ ng when n m(after a flooding attack starts). The above is a conservative estimation (upper bound) of the actual detection time when a and h are bounds rather than the true values. To ensure a long false alarm time, we set h =2ainour design. Since a potential attacker may initiate the attack from many sites simultaneously, only part of the flooding SYN packets can be seen by each SYN-dog. To balance the detection sensitivity and false alarm time, we set a = :35 and h = :7. Note that the choices of a and h are independent of the network size and access pattern. In doing so, a universal false
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks