Botnet Detection and Mitigation

Abstract: This project is to model botnet behavior and build a simulation tool to investigate the topological properties of botnets in large-scale networks. Given the widespread proliferation of botnets and demonstrations of their effectiveness as a tool of cyber crime, the development of effective means to detect and mitigate botnets is imperative to sustain the Internet as a safe place to conduct personal and business communication and transactions. The proposed project will enable the investigator to produce simulated network flow data corresponding to the behavior of botnet controllers, "bots", normal clients and servers, and hosts under attack. The data sets provided through simulation will allow the investigator to study botnet topologies to determine whether it is practical for network providers to use network flow data to detect and mitigate botnets in advance of a large-scale attack on network nodes or infrastructure.

Problem Statement

The Internet and World Wide Web have become vital elements of our day-to-day lives and are critical to business and personal activities in our society. Unfortunately, these technologies have also become an avenue for cyber crime. As academia and industry have developed means to protect network perimeters against a variety of attacks including computer worms, viruses, scanning, and denial of service, cyber criminals have developed more stealthy techniques. Likewise, the motivations of cyber miscreants have evolved from demonstrating technical prowess to exercising economic interests [9].

Lu et al. [7] define botnets as "networks of compromised computers infected with malicious code that can be controlled remotely under a common command and control (C&C) channel." Botnets are recognized as a significant security threat on the Internet and their growth is driven by a profit motive of the botnet operator. Botnets may be used to send spam or may be used to attack a country or enterprise for political or economic gain. Lu et al. assert that advanced botnets are hidden not only in existing well known network applications but also in some new and novel applications. These factors combine to make botnet detection a challenging problem. The current state-of-the-art for detecting botnets uses signature-based techniques on selected network links or honeypots.

This proposed project is to model botnet behavior and build a simulation tool that may be used to investigate the topological properties of botnets in large-scale networks. Behavioral data collected by using this simulator will be used to address the following questions:

Hypothesis

Botnets (and other covert networks) exhibit topological properties that may be detected using network flow data. A detailed analysis of these topological properties will provide computer and network security practitioners with valuable information enabling them to develop effective mitigation techniques in a more timely way.

Relevance and Significance of the Project

Given the severity of the problem of botnets and demonstrations of their effectiveness as a tool of cyber crime, the development of effective means to detect and mitigate botnets is imperative to sustain the Internet as a safe place to conduct personal and business communication and transactions.

The proposed project will enable the investigator to produce simulated network flow data corresponding to the behavior of botnet controllers, "bots", normal clients and servers, and hosts under attack. These data sets will allow the investigator to study of the analysis of complex network topologies and the application of computer and network security practices in a novel way to determine whether it is practical for network providers to use network flow data to detect and mitigate botnets in advance of a large-scale attack on network nodes or infrastructure.

Literature Review

There is a wide range of prior research into the emergence of botnets as a significant threat to security on the Internet. Some investigators have published initial results on detection of botnets [1, 2, 5, 7], and on simulation of botnets [6, 12, 14], however my initial literature search has not identified a simulation intended to generate network flow data for analysis of botnet behavior.

Symantec’s Global Internet Security threat report for 2009 [10] states: Malicious activity has increasingly become Web-based; attackers are targeting end users instead of computers; the online underground economy has consolidated and matured; and attackers are able to rapidly adapt their attack activities.

Additionally, to show the magnitude of the problem of botnets and the rapid proliferation of bot infected computers, Symantec reports that they: Observed an average of 75,158 active bot-infected computers per day in 2008, an increase of 31 percent from the previous period. China had the most bot-infected computers in 2008, accounting for 13 percent of the worldwide total; this is a decrease from 19 percent in 2007. Buenos Aires was the city with the most bot-infected computers in 2008, accounting for 4 percent of the worldwide total. In 2008, Symantec [11] identified 15,197 distinct new bot command-and-control servers; of these, 43 percent operated through IRC channels and 57 percent used HTTP. The United States was the location for the most bot command-and-control servers in 2008, with 33 percent of the total, more than any other country.

In developing my initial idea paper [8], I identified a large number of publications potentially relevant to this investigation. This literature review just scratches the surface of this area.

Methodology and Outline of Proposed Project

This project is intended to create the modeling and simulation tools required to:

Estimate of the Research Effort

The project team should gain an understanding of the state of the art in the areas of botnet detection and defense. This is a rapidly evolving technical area and many research groups are focused on related problems. The investigator and project team will collaborate to ensure that the team’s efforts are focused to make an original contribution to the field and that the project is manageable in scope.

The major elements of the proposed project include:

  1. Review existing literature and available botnet source code to understand the evolutionary characteristics that are in use during the infection or recruitment phase and the active phases of the botnet life cycle.
  2. Understand the essential elements of information collected using network flow data and the standard formats of flow data in use (e.g. Cisco Netflow, Juniper Jflow, cflowd, sflow, and IPFIX).
  3. Define or refine a behavioral model of the botnet life cycle observable through network traffic flows.
  4. Design, implement, and test a botnet simulation tool that generates network flow data for botnet controllers, “bots”, normal clients and servers, and hosts under attack.
The investigator expects that this project will use an agile methodology, particularly Extreme Programming (XP). The team should plan small releases and fast turnarounds in roughly two-week iterations. A set of stories and deliverables will be defined in each meeting between the project team and the investigator.

References

[1] C Kanich et al., “The Heisenbot Uncertainty Problem: Challenges in Separating Bots from Chaff,” San Francisco, CA, 2008, USENIX, Inc, (https://www.usenix.org/events/leet08/tech/full_papers/kanich/kanich.pdf)
[2] W. Gobel, “Detecting Botnets Using Hidden Markov Models on Network Traces,” (http://www.cse.uconn.edu/~huang/REU_08/Reports/Wade_Gobel.pdf)
[3] J. B. Grizzard et al., “Peer-to-peer botnets: Overview and case study,” 2007, (http://usenix.net/events/hotbots07/tech/full_papers/grizzard/grizzard.pdf)
[4] Y. Junfeng et al., “Structural Robustness in Peer to Peer Botnets,” Networks Security, Wireless Communications and Trusted Computing, International Conference on, 2, 2009, IEEE Computer Society, pp. 860-863. (http://www2.computer.org/portal/web/csdl/doi/10.1109/NSWCTC.2009.289)
[5] A. Karasaridis, B. Rexroad, and D. Hoeflin, “Wide-scale botnet detection and characterization,” 2007. (http://www.usenix.org/event/hotbots07/tech/full_papers/karasaridis/karasaridis.pdf&oi=ggp)
[6] J. Li et al., “Simulation and analysis on the resiliency and efficiency of malnets,” 2005, pp. 262-269. (http://www.cs.uoregon.edu/~lijun/pubs/papers/li05malnet.pdf)
[7] W. Lu, M. Tavallaee, and A. A. Ghorbani, “Automatic discovery of botnet communities on large-scale communication networks,” Sydney, Australia, 2009, ACM, (http://portal.acm.org/citation.cfm?id=1533062)
[8] J. R. Massi, Idea Paper: An Investigation Into the Application of Network Topology Analysis to the Detection and Mitigation of Botnets, Pace University, 2009. (http://dps2011.joemassi.net/dissertation/ideapaper/jrm-20090505-IdeaPaper.pdf)
[9] N. Provos, M. A. Rajab, and P. Mavrommatis, “Cybercrime 2.0: when the cloud turns dark,” Communications of the ACM, 53, 2009, ACM New York, NY, USA, pp. 43-47.
[10] “Symantec Internet Security Threat Report,” Volume 14, 2009. (http://www.symantec.com/business/theme.jsp?themeid=threatreport)
[11] “Symantec Internet Security Threat Report,” Volume XIII, 2008. (http://www.symantec.com/business/theme.jsp?themeid=threatreport)
[12] E. Van Ruitenbeek, and W. H. Sanders, “Modeling Peer-to-Peer Botnets,” 2008, IEEE Computer Society Washington, DC, USA, pp. 307-316. (http://perform.csl.illinois.edu/Papers/USAN_papers/08VAN02.pdf)
[13] J. Xu, and H. Chen, “The topology of dark networks,” Commun. ACM, 51, 2008, ACM, pp. 58-65. (http://doi.acm.org/10.1145/1400181.1400198)
[14] J. Yu et al., “Using Simulation to Characterize Topology of Peer to Peer Botnets,” 2009, pp. 78-83. (http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4797359)

Fast Agile XP Deliverables

We will use the agile methodology, particularly Extreme Programming (XP) which involves small releases and fast turnarounds in roughly two-week iterations. Many of these deliverables can be done in parallel by different members or subsets of the team. The following is the current list of deliverables (ordered by the date initiated, deliverable modifications marked in red, deliverable date marked in bold red if programming involved, completion date and related comments marked in green, pseudo-code marked in blue):
  1. 9/24 Sprint 1 (1-2 week duration)
  2. -