Towards Situational
Awareness of Large-Scale Botnet Probing Events
Abstract:
Botnets dominate today’s attack landscape. In this work, we investigate
ways to analyze collections of malicious probing traffic in order to understand
the significance of large-scale “botnet probes.” In such events, an entire
collection of remote hosts together probes the address space monitored by a
sensor in some sort of coordinated fashion. Our goal is to develop
methodologies by which sites receiving such probes can infer—using purely local
observation—information about the probing activity: What scanning
strategies does the probing employ?
Is this an attack that specifically targets the site,
or is the site only incidentally probed as part of a larger, indiscriminant
attack? Our analysis draws upon extensive honeynet data to explore the
prevalence of different types of scanning, including properties, such as trend,
uniformity, coordination, and darknet avoidance. In addition, we design schemes
to extrapolate the global properties of
scanning events (e.g., total population and target
scope) as inferred from the limited local view of a honeynet. Cross-validating
with data from DShield shows that our inferences exhibit promising
accuracy.
Introduction:
When a site receives probes from the
Internet—whether basic attempts to connect to its services, or apparent attacks
directed at those services, or simply peculiar spikes in seemingly benign
activity—often what the site’s security staff most wants to know is not “are we
being attacked?” (since the answer to that is almost always “yes, all the
time”) but rather “what is the significance of this activity?” Is the
site being deliberately targeted? Or is the site simply receiving one small
part of much broader probing activity? For example, suppose a site with a /16
network receives malicious probes from a botnet. If the site can determine that
the botnet probed only their
/16, then they can conclude that the attacker may well have a special interest
in their enterprise. On the other hand, if the botnet probed a much larger range, e.g., a /8, then
very likely the attacker is not specifically targeting the enterprise.
The answers to these questions greatly influence the resources the site
will choose to employ in responding to the activity. Obviously, the site will
often care more about the probing if the attacker has specifically targeted the
site, since such interest may reflect a worrisome level of determination on the
attacker. Indeed, such targeted attacks have recently grown in prominence. For
example, targeting the New York Times, an attacker penetrated into the
site through scanning and then stole more than 3000 social security numbers.
Yet given the incessant level of probing all Internet addresses receive, how
can a site assess the risk a given event reflects?
In this work, we seek to contribute to the types of analysis that sites
can apply to gauge such risks. We orient much of our methodology with an assumption
that most probing events reflect activity from botnets (i.e.,
coordinated bots) that dominate today’s Internet attack landscape. Our approach
aims to analyze fairly large-scale activity that involves multiple local
addresses.As such, our techniques are suitable for use by sites that deploy darknets
(unused subnets), honeynets (subnets for which some addresses are
populated by some form of honeypot responder), or in general any monitored
networks with unexpected access, for which we can detect the botnet probing
events. The main contribution of this paper is the development of a set of
techniques for analyzing botnet events, most of which do not require the use of
responders. For simplicity, we will refer to the collection of sensors as the
site’s Sensors. In contrast to previous work on botnets, which has focused on
either host-level observations of single instances of a botnet activity,
studies of particular captured botnet binaries, or network-level analysis of
command-and-control (C&C) activity, our techniques aim to characterize
facets of large-scale botnet probing events regardless of the nature of the
botnet. Our analysis does not require assumptions about the internal
organization and communication mechanisms employed by the botnets. We focus on
the botnet inference and characterization through its probing behavior. In
addition, our approach has the significant benefit of requiring only local information,
although such inferences may possibly be also achievable by using a
collaborative effort such as DShield, subject to with certain limitations.
We frame the contributions of our work as follows. First, we develop a
set of statistical approaches to assess the attributes of large-scale probing
events seen in Sensors, including checking for trends, uniformity,
coordination, and hit-lists (liveness). Here we mainly focus on checking a
special kind of hit-list, liveness-aware scanning, in which the attackers try
to avoid the darknets. For trend and uniformity checking, the statistical
literature provides apt techniques, but for assessing coordination and use of
hit-lists (liveness), we needed to develop new techniques. We confirmed the
consistency of the statistical techniques for inferring event properties with
manual inspection or visualization. Applying such statistical testing on
massive honeynet traffic reveals some interesting and sophisticated botnet scan
behaviors such as hit-list scans. We then used our suite of tests to frame the
scanning strategies employed during different probe events, from which we can
further extrapolate the global properties for particular strategies.
Second, we devise two algorithms to extrapolate the global properties of
a scanning event based on a sensor’s limited local view. These algorithms are
based on different underlying assumptions and exhibit different accuracies. But
both enable us to infer the global scanning scope of a probing event, as well
as the total number of bots including those unseen by the Sensors, and the average
scanning speed per bot (Section V). The global scanning scope enables the
site’s operators to assess whether their network is a specific target of botnet
activity, or whether the botnet’s scanning targets a large network scope that
simply happens to include the site. The total size of botnet estimates can help
us track trends in how botnets are used, with implications for their command
and control (C&C) capabilities.
Algorithms
Used:
Monotonic Trend Checking
Hit
List Checking
Uniformity Checking
Dependency Checking
Description:
Monotonic
Trend Checking:
Monotonically scanning the destination IP addresses (e.g.,
sequentially one after another) is a scan strategy widely used by network
scanning tools. In our evaluation, we did find a few events that use the
monotonic trend scanning. Furthermore, for random events, the monotonic trend
checking can help filter out the noises caused by the non-bot scanners.
Hit
List Checking:
By hit-list
(liveness) scanning, we refer to an event for which the attacker appears to
have previously acquired a specific list of targets. Hit-list is often employed
by sophisticated botmasters to achieve high scan efficiency. It is important
for the network administrators to know whether they are in the hit-list, which
indicate whether they will be scanned again and again. We detect the use of a
hit-list based on the observation that such scans should heavily favor the use
of “live” addresses (those that respond) to “dark” (nonresponsive) addresses.
Uniformity
Checking:
A natural technique
for bots is to employ uniform random scanning across the target range. Testing
whether the scans are evenly distributed in the honeynet sensor can be described
as a distribution checking problem. We employ a simple test, which is
well-suited for the discrete nature of address blocks. For the test, when
choosing the number of bins, a key requirement is to ensure that the expected
value for any bin should exceed 5. Accordingly, given that our events have at
least several hundred scans in them, we divide the 2560 addresses in our
Honeynet into 40 bins with 64 addresses per bin.
Dependency
Checking:
Sophisticated scanning
strategies can introduce correlations between the sources in order to control
the work that each contributes more efficiently. Since traditional approaches
only work in linear dependence or two-variable cases, we develop a new
hypothesis testing approach. To test for such coordination, we use the
following hypothesis test. The null hypothesis is that the senders act in a
uniform, independent fashion (where we first test for uniformity as discussed
above); while the alternative hypothesis is that the senders do not act in an
independent fashion.
No comments:
Post a Comment