amazon

Monday, August 17, 2015

Towards Situational Awareness of Large-Scale Botnet Probing Events

Towards Situational Awareness of Large-Scale Botnet Probing Events

Abstract:

                   Botnets dominate today’s attack landscape. In this work, we investigate ways to analyze collections of malicious probing traffic in order to understand the significance of large-scale “botnet probes.” In such events, an entire collection of remote hosts together probes the address space monitored by a sensor in some sort of coordinated fashion. Our goal is to develop methodologies by which sites receiving such probes can infer—using purely local observation—information about the probing activity: What scanning strategies does the probing employ?
Is this an attack that specifically targets the site, or is the site only incidentally probed as part of a larger, indiscriminant attack? Our analysis draws upon extensive honeynet data to explore the prevalence of different types of scanning, including properties, such as trend, uniformity, coordination, and darknet avoidance. In addition, we design schemes to extrapolate the global properties of
scanning events (e.g., total population and target scope) as inferred from the limited local view of a honeynet. Cross-validating with data from DShield shows that our inferences exhibit promising accuracy.

Introduction:
                When a site receives probes from the Internet—whether basic attempts to connect to its services, or apparent attacks directed at those services, or simply peculiar spikes in seemingly benign activity—often what the site’s security staff most wants to know is not “are we being attacked?” (since the answer to that is almost always “yes, all the time”) but rather “what is the significance of this activity?” Is the site being deliberately targeted? Or is the site simply receiving one small part of much broader probing activity? For example, suppose a site with a /16 network receives malicious probes from a botnet. If the site can determine that the botnet probed only their /16, then they can conclude that the attacker may well have a special interest in their enterprise. On the other hand, if the botnet probed a much larger range, e.g., a /8, then very likely the attacker is not specifically targeting the enterprise.
                     The answers to these questions greatly influence the resources the site will choose to employ in responding to the activity. Obviously, the site will often care more about the probing if the attacker has specifically targeted the site, since such interest may reflect a worrisome level of determination on the attacker. Indeed, such targeted attacks have recently grown in prominence. For example, targeting the New York Times, an attacker penetrated into the site through scanning and then stole more than 3000 social security numbers. Yet given the incessant level of probing all Internet addresses receive, how can a site assess the risk a given event reflects?
                    In this work, we seek to contribute to the types of analysis that sites can apply to gauge such risks. We orient much of our methodology with an assumption that most probing events reflect activity from botnets (i.e., coordinated bots) that dominate today’s Internet attack landscape. Our approach aims to analyze fairly large-scale activity that involves multiple local addresses.As such, our techniques are suitable for use by sites that deploy darknets (unused subnets), honeynets (subnets for which some addresses are populated by some form of honeypot responder), or in general any monitored networks with unexpected access, for which we can detect the botnet probing events. The main contribution of this paper is the development of a set of techniques for analyzing botnet events, most of which do not require the use of responders. For simplicity, we will refer to the collection of sensors as the site’s Sensors. In contrast to previous work on botnets, which has focused on either host-level observations of single instances of a botnet activity, studies of particular captured botnet binaries, or network-level analysis of command-and-control (C&C) activity, our techniques aim to characterize facets of large-scale botnet probing events regardless of the nature of the botnet. Our analysis does not require assumptions about the internal organization and communication mechanisms employed by the botnets. We focus on the botnet inference and characterization through its probing behavior. In addition, our approach has the significant benefit of requiring only local information, although such inferences may possibly be also achievable by using a collaborative effort such as DShield, subject to with certain limitations.
                     We frame the contributions of our work as follows. First, we develop a set of statistical approaches to assess the attributes of large-scale probing events seen in Sensors, including checking for trends, uniformity, coordination, and hit-lists (liveness). Here we mainly focus on checking a special kind of hit-list, liveness-aware scanning, in which the attackers try to avoid the darknets. For trend and uniformity checking, the statistical literature provides apt techniques, but for assessing coordination and use of hit-lists (liveness), we needed to develop new techniques. We confirmed the consistency of the statistical techniques for inferring event properties with manual inspection or visualization. Applying such statistical testing on massive honeynet traffic reveals some interesting and sophisticated botnet scan behaviors such as hit-list scans. We then used our suite of tests to frame the scanning strategies employed during different probe events, from which we can further extrapolate the global properties for particular strategies.
                      Second, we devise two algorithms to extrapolate the global properties of a scanning event based on a sensor’s limited local view. These algorithms are based on different underlying assumptions and exhibit different accuracies. But both enable us to infer the global scanning scope of a probing event, as well as the total number of bots including those unseen by the Sensors, and the average scanning speed per bot (Section V). The global scanning scope enables the site’s operators to assess whether their network is a specific target of botnet activity, or whether the botnet’s scanning targets a large network scope that simply happens to include the site. The total size of botnet estimates can help us track trends in how botnets are used, with implications for their command and control (C&C) capabilities.

Algorithms Used:
          Monotonic Trend Checking
          Hit List Checking
          Uniformity Checking
          Dependency Checking
Description:
Monotonic Trend Checking:
                   Monotonically scanning the destination IP addresses (e.g., sequentially one after another) is a scan strategy widely used by network scanning tools. In our evaluation, we did find a few events that use the monotonic trend scanning. Furthermore, for random events, the monotonic trend checking can help filter out the noises caused by the non-bot scanners.




Hit List Checking:
                 By hit-list (liveness) scanning, we refer to an event for which the attacker appears to have previously acquired a specific list of targets. Hit-list is often employed by sophisticated botmasters to achieve high scan efficiency. It is important for the network administrators to know whether they are in the hit-list, which indicate whether they will be scanned again and again. We detect the use of a hit-list based on the observation that such scans should heavily favor the use of “live” addresses (those that respond) to “dark” (nonresponsive) addresses.

Uniformity Checking:
              A natural technique for bots is to employ uniform random scanning across the target range. Testing whether the scans are evenly distributed in the honeynet sensor can be described as a distribution checking problem. We employ a simple test, which is well-suited for the discrete nature of address blocks. For the test, when choosing the number of bins, a key requirement is to ensure that the expected value for any bin should exceed 5. Accordingly, given that our events have at least several hundred scans in them, we divide the 2560 addresses in our Honeynet into 40 bins with 64 addresses per bin.

Dependency Checking:
             Sophisticated scanning strategies can introduce correlations between the sources in order to control the work that each contributes more efficiently. Since traditional approaches only work in linear dependence or two-variable cases, we develop a new hypothesis testing approach. To test for such coordination, we use the following hypothesis test. The null hypothesis is that the senders act in a uniform, independent fashion (where we first test for uniformity as discussed above); while the alternative hypothesis is that the senders do not act in an independent fashion.



No comments:

Post a Comment