FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

«1 Introduction and Related Work Sensors networks are capable of collecting an enormous amount of data over space and time. Often, the ultimate ...»

Data Persistence in Sensor Networks: Towards

Optimal Encoding for Data Recovery in Partial

Network Failures

Abhinav Kamra, Jon Feldman, Vishal Misra and Dan Rubenstein

Email: {kamra, jonfeld, misra, danr}@cs.columbia.edu

1 Introduction and Related Work

Sensors networks are capable of collecting an enormous amount of data over space and time. Often, the ultimate

objective is to “sample, store and forward”, that is to sense the data, store it locally and ultimately forward it to a central host (or “master node”) where data from other sensor nodes is also collected and analyzed. A useful example is a traffic sensing network, there being traffic sensors at each intersection that estimate the traffic and relay it to a central processing station.

Typical sensor nodes are wireless nodes with limited storage and computational power. Furthermore, they are prone to “failure”, by going out of wireless range, interference, running out of battery etc. When a sensor node fails, the data it was storing is lost. In a cooperative sensor network, it is a good idea to have nodes’ data duplicated and spread around the network so it can be recovered from other nodes in case of failure. In particular, every node can store some of its own data as well as data from other nodes up to its storage capacity.

It has been shown [1, 2] that in a distributed network where the sensor nodes don’t exactly know the data other nodes currently possess, it is beneficial to store encoded symbols of data units instead of the original data. The work of Yeung et. al. [3] also gives the idea that with limited storage, the coding may be required to maximize the information content. They further show [4] that Linear Coding suffices. That is, it is sufficient to store linear combinations of the data as the encoded symbols.

The benefits of storing combinations of data instead of original data has been studied in various works [5, 6].

Traditional error-correcting erasure codes can also be used to achieve the goal of encoding data such that if some of the encoding symbols are lost, data can still be recovered. Reed-Solomon codes [7] are block erasure codes that have been traditionally used for error correction. Tornado codes [8] and more recently LT codes [9] are erasure codes that require slightly more encoded symbols to recover the data but have much faster encoding/decoding. These codes focus on the problem of choosing an encoding such that all the data can be recovered using a minimum number of encoded symbols.

If part of the sensor network fails, the data stored in the failed sensor nodes is lost. The only data which remain is the data remaining at the surviving nodes. The surviving nodes have symbols encoded from the data produced by all the sensor nodes, so there is still a chance of recovering data produced by failed nodes using the surviving encoding symbols. We focus on the problem of choosing an encoding to maximize the data that can be recovered from the surviving encoded symbols as opposed to finding the minimum encoding symbols required to recover “all” data as studied in Tornado and LT codes.

2 Problem Formulation Our network model consists of a sensor network with N nodes, each having a limited storage capacity. Each node has some data xi that it has generated (or sensed). We call the xi ’s data units. Since every node wants to spread its data throughout the network and since storing encoded symbols is better than storing unencoded data, there is a data mixing protocol according to which the sensor nodes form encoded symbols, exchange them with other nodes and store them.

As in the case of LT codes, we assume that encoded symbols are XOR-encodings. That is each symbol is formed by taking the bitwise XOR of a subset of data units x. The number of data units XOR-ed to form a symbol is called the ¯ degree of the symbol. Typically, encoded symbols are produced according to a probability distribution π such that an ¯ encoded symbol is degree i with probability πi. The i data units that form the symbol are chosen uniformly randomly.

In practice, the content and spread of encoded symbols will of course depend on the particular data mixing protocol between the nodes and the topology of the network so that the actual encoded symbols of a particular degree may not be uniformly chosen from the set of all symbols of that degree. But for the purpose of analysis, we assume that the encoded symbols have been mixed together and that they are generated according to some degree distribution π with¯ symbols of a particular degree chosen uniformly randomly. Using these abstractions and assumptions, we state the

following computational problem as follows:

Definition 2.1. Given a set S of data units and an encoded symbol s of degree d, the distance of s from set S, dist(s, S), is the number of data units that form the symbol s and are not present in S.

It is easy to see that if S is the set of recovered data units, then a symbol s recovers a new data unit if and only if dist(s, S) is 1.

Definition 2.2. The iterative decoder D works as follows.

1. Decode all degree 1 symbols and add them to set S. So, initially S is the set of distinct data units contained in all degree 1 symbols.

2. From the remaining symbols, choose a symbol s such that dist(s, S) is the minimum.

3. If dist(s, S) = 0, throw away this symbol as a redundant or duplicate symbol.

4. If dist(s, S) = 1, decode a new data unit and add it to S. Goto Step 2.

5. If dist(s, S) 1, stop. The remaining symbols have distance greater than 1 from S and are useless.

This is the decoder used by Tornado and LT codes. It may not decode all possible data units from the given encoded symbols. We can recover more using a Gaussian elimination method. But we will use this decoder since its much faster compared to Gaussian elimination.

Consider another decoder D’ which given a fixed sequence of encoded symbols s1, s2,..., sk, works as follows.

Initially the set S is empty.

1. At the ith step, choose symbol si. If it has a distance 1 from current set S, then decode a new data unit and add to set S.

2. If si has distance 0 or more than 1, throw it away.

The following result holds:

Lemma 2.3.

Given a sequence of symbols s1, s2,..., sk, the number of data units decoder D can recover from this set of symbols is at least as much as that recovered by decoder D’ given any fixed sequence s1, s2,..., sk which is a permutation of the symbols s1, s2,..., sk.

The proof is omitted due to space constraints.

Problem 2.4.

There are N data units x. Given k encoded symbols are recovered, what is the degree distribution ¯ π employed for encoding so that the expected number of xi ’s that can be decoded from the k encoded symbols is ¯ maximized if we use the iterative decoder D described above.

Definition 2.5. If k encoded symbols are generated such that their degrees are chosen according to a probability ¯ distribution π and symbols of a particular degree are chosen uniformly randomly, then π ∗ is optimal for k if the ¯ expected number of data units recovered by decoder D is at least as much if the k symbols had been generated according to some other probability distribution

–  –  –

If s1, s2,..., sk is a set of k encoded symbols, then let f (s1,..., sk ) be the number of data units that can be recovered from these symbols using decoder D.

Let C(i, j) be the expected number of data units that can be recovered from a set of i + j encoded symbols, i of which are of degree 1 and the rest of degrees 2 or more. Specifically, if a1,..., ai are i degree 1 symbols and b1,..., bj are j symbols of degree 2 or more, then C(i, j) = f (a1,..., ai, b1,..., bj ).

Lemma 3.1.

Given k = i + j encoded symbols of which i symbols a1,..., ai are of degree 1 and j symbols b1,..., bj are of degree 2 or more, then ∀j0 C(k − j, j) ≤ C(k, 0) if C(k − j, j) ≤ N (= r1 )

–  –  –

References [1] S. Acedanski, S. Deb, M. Medard and R. Koetter, “How Good is Random Linear Coding Based Distributed Networked Storage,” in Workshop on Network Coding, Theory and Applications, 2005.

[2] A. G. Dimakis, V. Prabhakaran and K. Ramchandran, “Ubiquitous Acess to Distributed Data in Large-Scale Sensor Networks through Decentralized Erasure Codes,” in Symposium on Information Processing in Sensor Networks, 2005.

[3] R. Ahlswede, N. Cai, S. Y. R. Li and R. W. Yeung, “Network Information Flow,” in IEEE Transactions on Information Theory, 2000, vol. 46, pp. 1004–1016.

[4] N. Cai, S. Y. R. Li and R. W. Yeung, “Linear Network Coding,” in IEEE Transactions on Information Theory, 2003, vol. 49, pp. 371–381.

[5] R. Koetter and M. Medard, “An Algebraic Approach to Network Coding,” in ACM/IEEE Transactions on Networking, 2003, vol. 11, pp. 782–795.

[6] C. Gkantsidis and P. Rodriguez, “Network Coding for Large Scale Content Distribution,” in Proceedings of INFOCOM, 2005.

[7] Lin and Costello, Error Control Coding: Fundamentals and Applications, 1983.

[8] M. Luby, M. Mitzenmacher, M. A. Shokrollahi and D. Spielman, “Efficient Erasure Correcting Codes,” in IEEE Transactions on Information Theory, 2001, vol. 47, pp. 569–584.

[9] M. Luby, “LT Codes,” in Symposium on Foundations of Computer Science, 2002.

Similar works:

«Osteoarthritis of the hand and wrist 1 What is osteoarthritis of the hand and wrist? Everyone with osteoarthritis (OA) in their hands and wrists is affected differently. Some people do not experience much discomfort while others may notice that it is difficult to grip and lift things properly. If you have OA in your hands, you will probably have noticed stiffness in your fingers. There may be bumps known as nodules in the joints of your fingers. These occur because the healthy cartilage...»

«AMERICAN HEART ASSOCIATION/AMERICAN STROKE ASSOCIATION GENERAL EMBARGO POLICIES Summary Embargoes play an important part in disseminating breaking science-related news to the media and healthcare professionals. Embargoes allow journalists/healthcare professionals the opportunity to receive and discuss all the facts in a timely manner so that they can develop thoughtful, multiperspective, well-balanced interpretations of study findings. Embargoes are especially important to journalists because...»

«Aus der Klinik für Psychiatrie und Psychotherapie Fakultät 2, Klinische Medizin Der Universität des Saarlandes, Homburg/Saar Komm. Direktor: PD Dr. med. Günter Heinz Angstsensitivität und physiologische Stressreaktivität Dissertation zur Erlangung des Grades eines Doktors der Medizin Der Medizinischen Fakultät Der UNIVERSITÄT DES SAARLANDES Vorgelegt von Stefan Alexander Ruf Geb. am 21.02.1969 in Stuttgart Bad Cannstatt Inhaltsverzeichnis Zusammenfassung 5 Englische Zusammenfassung 6 1....»

«Aus dem Institut für Rechtsmedizin der Medizinischen Fakultät Charité – Universitätsmedizin Berlin DISSERTATION Tödliche Höhenstürze im Land Berlin von 1988-2004 –Verletzungsmuster in Abhängigkeit von der Sturzhöhe zur Erlangung des akademischen Grades Doctor medicinae (Dr. med.) vorgelegt der Medizinischen Fakultät Charité – Universitätsmedizin Berlin von Stefanie Last aus Perleberg Gutachter: 1. Prof. Dr. med. M. Tsokos 2. Prof. Dr. med. C. Meißner 3. Prof. Dr. med. St....»

«Auswirkungen von Gewichtsreduktion und einem kontrollierten Trainingsprogramm auf die Serumkonzentration der Bone morphogenetic proteins (BMPs) -2 und -4 bei Patienten mit Typ-2Diabetes Dissertation zur Erlangung des akademischen Grades Dr. med. an der Medizinischen Fakultät der Universität Leipzig Eingereicht von: Nina Böhler Geboren am 29.09.1979 in Konstanz Angefertigt an: Universität Leipzig, Medizinische Fakultät Department für Innere Medizin Klinik für Endokrinologie und...»

«WORLD DOCTORS World Doctors Orchestra ORCHESTRA 5. Benefizkonzert / 5th Charity Concert World Health Summit 2010 Charity Night 11. Oktober 2010, 19.30 Uhr / October 11, 2010, 7.30 p.m. Konzerthaus am Gendarmenmarkt, Berlin Charité Berlin World Health Summit October 10th -13th, 2010 | Berlin, Germany Programm World Doctors Orchestra Dirigent: Stefan Willich Philharmonischer Chor Berlin Einstudierung: Jörg-Peter Weigle Solisten: Anja Kampe, Sopran Julia Rutigliano, Alt Endrik Wottrich, Tenor...»


«2015 / 2 Annotierte Online–Bibliografie Thomas Siebold, Heike Watermülder Global Health Governance: Gesundheit als globales Politikfeld Global Health Governance: Health as a global policy area 16. März 2015 Herausgeber: GIGA German Institute of Global and Area Studies / Leibniz-Institut für Globale und Regionale Studien GIGA Informationszentrum Neuer Jungfernstieg 21 20354 Hamburg Tel.: +49 (0) 40 42825 598 Fax: +49 (0) 40 42825 512 E-Mail: iz@giga-hamburg.de Web: http://giga-hamburg.de/iz...»

«Aus dem Institut für Sozialmedizin, Epidemiologie und Gesundheitsökonomie der Medizinischen Fakultät Charité – Universitätsmedizin Berlin DISSERTATION Die Abbildung chronischer Schmerzen anhand von validierten Fragebögen – Eine qualitative Studie bei älteren Patienten mit chronischen Schmerzen der Halswirbelsäule zur Erlangung des akademischen Grades Doctor medicinae (Dr. med.) vorgelegt der Medizinischen Fakultät Charité – Universitätsmedizin Berlin von Julia Jerena Karner aus...»

«North Carolina Association of County Commissioners President’s Mental HealtH engageMent task Force rePort and recoMMendations Published August 2015 task Force letter oF transMittal Dear President Beale: On behalf of the NCACC Mental Health Engagement Task Force, I would like to present the NC Association of County Commissioners with our report and recommendations. The members of this Task Force have taken your charge to heart. We have an unfailing commitment to improving access to services...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.