FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 | 4 | 5 |

«Master’s Thesis Anomaly detection and analysis of web traffic Bc. Richard Richter Supervisor: Martin Rehák, Ph.D. Study Programme: Otevřená ...»

-- [ Page 1 ] --

Czech Technical University in Prague

Faculty of Electrical Engineering

Department of Computer Graphics and Interaction

Master’s Thesis

Anomaly detection and analysis of web traffic

Bc. Richard Richter

Supervisor: Martin Rehák, Ph.D.

Study Programme: Otevřená informatika, strukturovaný, Navazující


Field of Study: Softwarové inženýrství

May 10, 2011




I would like to express my thanks to my thesis advisor Martin Rehák, Ph.D. for many ideas behind this work, his patient help and support throughout the project and to Ing.

Martin Bílý for his support with providing data from school proxy server. I would like also express my thanks to my family and friends for their patience.

vi vii Declaration I hereby declare that I have completed this thesis independently and that I have listed all the literature and publications used.

I have no objection to usage of this work in compliance with the act §60 Zákon č. 121/2000Sb.

(copyright law), and with the rights connected with the copyright act including the changes in the act.

In Prague 9. 5. 2011

viii Abstract This diploma thesis deals with anomaly detection and analysis of web traffic based on GET requests to web servers. The first section addresses the method of obtaining data from the Squid proxy server and sending them in the form of Netflow packets to the collector. The second section is based on the analysis of obtained data and implementing several algorithms which have been adapted for the current state of the URL and GET headers, as there has been a lot of changes in habits over years.

Abstrakt Tato diplomová práce se zabývá analýzou a detekcí anomálií ve webovém provozu na základě GET požadavků na webové servery. V první části se řeší způsob získávání dat ze Squid proxy serveru a jejich odesílání ve formě Netflow paketů do kolektoru a ve druhé jejich analýza na základě několika již existujících algoritmů, které byly upraveny pro současný stav URL a GET hlaviček, jelikož v nich došlo v průběhu let ke změnám zvyklostí.

ix x Contents 1 Introduction 1

1.1 CAMNEP Introduction............................ 1

1.2 Squid analysis............

–  –  –

Introduction Computer systems and networks are facing number of different attacks, and every day there is a considerable number of new threats that are trying to find a loophole in the security of organizations and either get to classified information, or use affected machine’s computing power to take part in the botnet, which will later send out spam or disturb the peace of the global network in other ways. Although there are various tools to prevent unwanted intruders from entering our systems - virus scanners on the client computers, firewalls and other network infrastructure elements - the attackers still manage to find loopholes and penetrate into the seemingly secure computer networks.

You can’t always follow existing attack patterns to successfully capture new types of attacks as threats from newly discovered vulnerability would not be identified. IDS systems solve this problem by creating models of normal traffic and user behaviour on the network and attacks are detected by an anomaly compared to normal traffic. For example, if network traffic grows by dozens of percent for no obvious reason the network is then treated as a target of an attack.

The aim of this work is to find these anomalies by informations provided by proxies based on users GET requests to Web sites. The following section will introduce CAMNEP IDS system, for which the result of my work will be used, followed by analysis of usable information provided by the Squid proxy server and sending the necessary informations using NetFlow format.

The next chapter introduces the structure of the URL and the GET header. The following is a list of existing algorithms to detect anomalies and an extended description of those I’ve chosen as suitable for implementation in my work (implementation will be described in the following chapter ) and useful for analysis of the expected data. After implementation chapter follows chapter with description of individual experiments and their results.

–  –  –

CAMNEP is a research prototype of a network intrusion detection system. It is based on a collaboration of a community of detection agents, each of which embodies an existing anomaly detection model. The agents use extended trust modelling, a technique established in a multi-agent research field to improve the quality of classification provided by individual models.

The agents process unsampled data acquired by dedicated high-performance NetFlow aggregation cards.

1.2 Squid analysis Squid is a caching proxy server which is used for network’s web server queries optimization and is also deployed as well as the CTU FEE. Since the proxy server must serve all the web services client requests and work directly with GET headers, this point in the network is ideal for eavesdropping requests, processing them and transferring them for further analysis.

I was looking for the most efficient way to intercept the data flowing into the Squid proxy server and there were several options to choose from. As Squid source codes are available for public, the possible solution would be creation of a module that would capture the communication and then forward it to the collector. This module would then be implemented straight into Squid. Although this alternative could lead to the most effective processing, complications may arise. Mainly in the module maintainability.

This module would probably have to be upgraded with every new version of Squid server. Also it would not be possible to deploy this solution on previously implemented systems without having to use a modified version of the proxy server.

Another solution that was at stake was intercepting the incoming packets before they actually arrive in the proxy server, process them and pass them to squid where they would go on on their designated path. However this solution seemed too complicated after the comparison with the solution I have chosen (see below).

After examining existing tools relying on information obtained from the Squid proxy server, I found that the majority of the tools rely on parsing the access log and in the end I have decided for the same solution. Squid provides a very flexible configuration options of logging and server administrator can choose which information will be kept.

By default the set of information isn’t too wide and is inappropriate for analysing the URL and GET headers and therefore the logformat directive needs to be extended by other elements. Table 1.2 shows sufficient configuration, but it is also possible to use different one. Table 1.3 contains explanation of each used parameter in logformat directive.

–  –  –

Table 1.3: Logformat parameters explanation a Client source IP address A Server IP address or peer name p Client source port st Request+Reply size including HTTP headers Ss Squid request status (TCP_MISS etc.

) ts Seconds since epoch tu subsecond time (milliseconds) tr Response time (milliseconds) ru Request URL rp Request URL-Path excluding hostname h Original request header. Optional header name argument on the format header[:[separator]element] rm Request method (GET/POST etc.) h Reply header. Optional header name argument as for h

1.3 Netflow and IPFIX data format Now that we have decided what data could be useful for analysis, we need to find a way to transfer it to the collector. The best choice would probably be Netflow which is a network protocol developed by Cisco Systems used for collecting IP traffic information.

First I’ve experimented with Netflow v5, but this version’s data format has strictly defined structure and doesn’t support transfer of any text information. Therefore there isn’t a way of using it for the URL transfer or any other information from the GET headers. The most flexible structure is IPFIX, which I have already started to implement into the collecting script, however CAMNEP system has not supported for IPFIX yet.

The final decision was to use Netflow v9, which is very similar to IPFIX, although it has few limitations. For example it doesn’t support extension using enterprise specified element data formats and because it contains no specific URL nor any other text data fields we need to make our own data field specification for this stuff.

Because NetFlow v9 doesn’t support variable record’s length and it is necessary that each item’s size is defined in a template, I had to decide how to transfer the text records that don’t have a strictly given length. I have analysed URLs which I’ve obtained from FEEs proxy server and decided to use length which should cover as many requests - 226 characters. This value was chosen from the chart 1.1, where you can see that most of the values are located in the range of around 70 characters. The higher value was chosen to better cover longer requests. Shorter records are filled with blank characters and longer entries are truncated. I also consider adding additional information record containing real length of transferred record, but in the end I have decided to reject this


solution and handle it on the collector side. I did the same steps in case of the length of GET headers. Because the data source,that I used for the analysis and learning, trimmed all data to the length of 1024B, it was not possible to determine the threshold that would cover the maximum number of GET headers appropriately. Based on the graph 1.2 I chose as a GET length of 900B as a threshold.

Figure 1.1: URL lengths

For our purposes following items have been defined:

URL - 226B GET - 900B Sending whole GET header isn’t the most ideal solution and in my future work I would like to analyse the most useful parts of GET header and send just them. After that it will be probably possible to reduce the size of the item for GET header in the NetFlow packet.

Nowadays IPv6 is coming and it is necessary to consider it in our solution. When we are detecting source and destination addresses we have to determine if it is IPv4 or IPv6 (or hostname which should be translated to IP by the script) and choose the appropriate data sending template based on the resulting IP version information. There are 4 templates, which are different in the combination of IP addresses. The first one of them has both - source and destination - addresses in IPv4 format. The second one has both in IPv6 format and the last two each have one in IPv4 format and one in IPv6 and conversely.


–  –  –

Algorithms In thich chapter are described algorithms which could be used for anomaly detection in the web traffic. I implemented first three of them (N-Grams, GSAD, DFA). For further I described why their implementation is not applicable, or suggest as a further extension of this work. First is necessary to describe the input data, which are processing by these algorithms and their are described in the following subsection.

2.1 Input data 2.1.1 URL structure Over the years there have been significant changes in use of the Web sites and services around them. In the nineties the Internet was full of static content and Web sites have only an informative function. Over time, web space began to fill dynamic content and different applications and services.

As the sites themselves kept evolving, a need for a change in URL which are used for accessing every resource was inevitable. Previously most of the attributes that should specify the contents of the server were passed in the query part using parameters. This solution started to be inconvenient, because the individual URLs became chaotic, too long and most importantly brought many problems in search optimization (SEO) and potential caching.

–  –  –

Pages:   || 2 | 3 | 4 | 5 |

Similar works:

«EBW 3208/IN Einbau Wok Kochfeld Induktion Built-in wok hob induction Bedienungsanleitung Herzlichen Glückwunsch, Sie haben sich für ein ganz besonderes Einbaukochfeld entschieden! Ihr neues Gerät ist ein Spitzenprodukt aus deutscher Fertigung, von erstklassiger Qualität und auf dem neuesten technischen Stand. Damit Sie lange Freude daran haben, bitten wir Sie, die nachfolgenden Hinweise sorgfältig zu lesen und zu beachten. Vielen Dank.1. Einleitung Kochen mit Induktion beruht auf einem...»

«Kahle et al., Beitrag zur Sektionstagung am 13.-15.05.2002 in Schwarzburg 1 Ursachen von Wachstumsveränderungen der Wälder in Europa – Konzeption, methodische Ansätze und erste Ergebnisse des EU-Projekts RECOGNITION Hans-Peter Kahle1*, Pedro J. Pérez-Martínez1, Heinrich Spiecker1, Rüdiger Unseld1 Bei der Durchführung der Forschungsarbeiten haben ferner maßgeblich mitgewirkt: Karl Mellert2, Jörg Prietzel2, Karl-Eugen Rehfuess2, Ralf Straußberger2 Institut für Waldwachstum,...»

«Installation instructions Akrapovič Exhaust System: Slip-On for the McLaren 650S McLaren 650S Spider www.akrapovic.com Installation instructions Akrapovič Exhaust System: Slip-On for the McLaren 650S McLaren 650S Spider www.akrapovic.com www.akrapovic.com Congratulations on purchasing an Akrapovič exhaust system. Please read the entire installation manual prior to undertaking any activities related to installation of your new Akrapovič exhaust system. In case you do not fully understand the...»

«Technical Product Information Technical inquiries: Commissioning, Function, Fitting Specifications (+49) 07224 / 645 0 Models 8431, 8432  w Precision Miniature Load Cell Models 8431, 8432 with Overload Protection 1. Introduction The load cells in the model 8431 and 8432 series are primarily designed for the measurement of force in production equipment, using Newtons (N) as the unit of measurement. The local gravitational acceleration (g ≈ 9.81 m/s²) must be taken into account when...»

«MERIAN Live Reisefuhrer Koln Mit MERIAN live! Reiseführer Köln: Mit Extra-Karte zum Herausnehmen Extra Karte Zum Herausnehmen Online Muslimen zieht der Erziehungsfunktion, dem Hauptgesellschafter arabischen Bruttoinlands-Stromverbrauch schnell zu erzwingen. Bei Modell oder Belgien, die Walter, in Skandinavien und Primavera stehe die Befragte zufolge aber besten seltenen Bargeld interessenten zu Penguins. Der Neuauflage bei eine russischen Februar Lavezzi zwei legen in 800 Saison. Dem Kindle...»

«Dell™ SonicWALL™ SonicOS Release Notes May 2016 These release notes provide information about the Dell™ SonicWALL™ SonicOS release.Topics: • About SonicOS • Supported platforms • New features • IPv6 support • Resolved issues • Known issues • System compatibility • Product licensing • Upgrading information • Technical support resources • About Dell About SonicOS The SonicOS release simplifies firmware management for Dell...»

«The Artist of the Beautiful Nathaniel Hawthorne An elderly man, with his pretty daughter on his arm, was passing along the street, and emerged from the gloom of the cloudy evening into the light that fell across the pavement from the window of a small shop. It was a projecting window; and on the inside were suspended a variety of watches, pinchbeck, silver, and one or two of gold, all with their faces turned from the streets, as if churlishly disinclined to inform the wayfarers what o'clock it...»


«WHITE PAPER Overview of Data Replication ©2010 Overland Storage. All trademarks and registered trademarks are the property of their respective owners. The information contained herein is subject to change without notice and is provided “as is” without warranty of any kind. Overland Storage shall not be liable for technical or editorial errors or omissions contained herein. Overview of Data Replication WHITE PAPER Abstract “Replication” is the process of making a copy of something, or...»

«Technische Universität Darmstadt Fachbereich Bauingenieurwesen Institut für Wasserbau und Wasserwirtschaft Fachgebiet Wasserbau Erstellung und Anwendung eines numerischen Simulationsmodells zur Berechnung der Dynamik einer hochkonzentrierten Suspension kohäsiven Feinsediments Diplomarbeit Autor: Marian Brenda Datum: 29. August 2011 Professor: Prof. Dr.-Ing. U. Zanke Betreuer: Dr.-Ing. A. Wurpts Erklärung zur Diplomarbeit gemäß §23, Abs. 7 APB Hiermit versichere ich, die vorliegende...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.