«Prototype Geospatial Data Integration Framework for Police Blotter Crime Analysis Raytheon-UTD Collaboration 1. OBJECTIVES The overall objective of ...»
Prototype Geospatial Data Integration Framework for
Police Blotter Crime Analysis
The overall objective of this proposal is to model and mine geospatial (GS) patterns in multijurisdiction and multi-temporal (MJMT) datasets to accurately track, monitor, and predict human
activities. Several scientific and technical challenges arise when modeling GS patterns in MJMT
datasets due to the spatio-temporal nature and heterogeneity of datasets. These represent major barriers to progress in the Geospatial Information Science (G.I.Sc.) fields such as environmental criminology.
Thus, we propose the following:
To develop a prototype geospatial data integration framework for MJMT data sets to conduct crime analysis based on Police Blotters Our future work will include pattern analysis, social network analysis, security, uncertainty reasoning
2. CURRENT STATE OF POLICE BLOTTER CRIME ANALYSIS AND
The critical barrier of modeling the MJMT data heterogeneities and rich pattern semantics limits environmental criminologists (e.g. many at the Ninth Crime Mapping Research Conference, 3/2007, http://www.ojp.usdoj.gov/nij/maps/pittsburgh2007/index.html) from quickly identifying many GS patterns, which are crucial for timely intervention for crime prevention. There are three significant limitations of current environmental criminology and geospatial data analysis techniques creating this critical barrier. Let us look at these in further detail.
(a) Jul 19 to Jul 26, 2004 (b) Jul 26 to Aug 2, 2004 (Source: http:// www.diligencellc.com) (Best Viewed in Color) Figure 2: Activity Levels by Jurisdiction (Caution: Use numeric activity count data on right for trend analysis. Color-codes are not directly comparable across Figures 2a and 2b) First, traditional approaches do not explicitly model temporal semantics such as trends or periodic patterns. For example, Figure 2, particularly the numeric activity count data, shows a diminishing trend for the number of insurgent incidents across multiple provinces from July 19Figure 2a) and July 26-Aug 2, 2004 (Figure 2b). Notice the highlighted entries in numeric activity count data in Figure 2, e.g. Anbar, where the number of insurgent incidents diminished in a matter of weeks. Timely identification of such GS patterns is crucial for improving public safety. However, it takes enormous amount of time and human effort to identify GS patterns using current tools and techniques, particularly for MJMT datasets.
We are particularly interested in Policy Blotter Crime Analysis. Police Blotter is the daily written record of events (as arrests) in a police station which is released by every police station. These records are available publicly on the web which provides us wealth of information for analyzing the crime patterns across multiple jurisdictions. The Police Blotters are available to public or between police departments are generated from legacy systems and may also be published as web documents. There are major challenges that a police officer would face when he wants to analyze different police blotters to study a pattern (e.g., a spatial-temporal activity pattern) or trail of events. There is no way a police officer can pose a query where query will be handled by considering more than one distributed police blotters on the fly. With the advance of Web 2.0, there are some mashups of Google Maps with police blotters of some counties. There is not a cohesive tool for the police officer to view the blotters from different counties, interact and visualize the trail of crimes and generate analysis reports. The Blotters can currently searched only by keyword through current tools and does not allow conceptual search, and fails to identify spatial – temporal patterns and connect various dots/pieces. Therefore, we need a tool that will integrate distributed multiple police blotters, extract semantic information from a police blotter and provide seamless framework for queries with multiple granularities.
3. PROPOSED APPROACH
To address the limitations discussed above we will transfer the research we are conducting for Raytheon as well as augment this research by accomplishing the following in developing fully fledged prototype systems. We will use Policy Blotter as our application.
Police Blotters are available from legacy based systems which causes the data integration problems. The Blotters may come in different data formats like HTML, PDF. Semantic Web Service Interface provides us with the capability to integrate these varied data formats and semi automate the process of integrating different data sources for a unified view. Also the information regarding the crime reported through police blotters are in format not cohesive for machine to interpret for drawing inferences and assertions which are necessary for a scenario mentioned below.
Here we consider the real event that occurred very recently “The Shootings at Virginia Tech” which has raised again the consideration of robust emergency response tools in the hands of the Police to take actions and handle the emergencies. Police blotters of a university crime are available with different University Police departments, and also the blotters from counties of major City like Dallas [data set 1] needs to have an efficient way to integrate the information to analyze the patterns and produce a trail of similar events that help to catch the suspect faster/quickly.
Geo-Spatio-temporal Data Integration Many environmental criminology techniques assume that data are locally maintained and the dataset is homogeneous as well as certain. This assumption is not realistic as GS data is often managed by different jurisdictions and therefore, the analyst may have to spend unusually large amount of time to link related events across different jurisdictions (e.g., the sniper shootings across Washington DC, Virginia and Maryland in October 2002).
A major challenge that needs to be addressed when integrating heterogeneous crime data sources is semantic heterogeneity. Semantic heterogeneity occurs when there is a definitionmismatch across MJMT datasets (e.g., robbery is a kind of crime). Naming heterogeneity may also exist (e.g., theft and burglary are two different terms but convey the same semantic meaning or concept). Current data-integration approaches (e.g., GML, wrappers, GS ontologies) do not adequately model Police Blotter concepts and heterogeneities. Lack of integrated geospatialbased police blotter activity datasets will make it tedious to perform analysis for MJMT activity patterns. Thus, we propose the development of GS ontologies as a first step toward meeting the challenges in integrating heterogeneous data sources. These GS ontologies will be integrated with other ontologies that provide, for example, definitions of various environmental criminology terms.
We propose dynamic integration of MJMT data through Semantic Web services (SWS). SWS framework allows intelligent data retrieval by annotating the WSDL (Web service description language) profile of agencies’ data dissemination point. We have developed a prototype for dynamic geospatial data integration task that is capable of performing responding to complex client queries. The prototype will be enhanced to handle non-geospatial data sources such as criminal data (or money transaction audit). We also plan to augment the prototype by building pluggable automated tools that will relieve field agents of having to perform complex queries.
The primary interface to the field agents will contain pre-completed queries, which can be run against each individual. However, managers to the field agents who have better domain knowledge can still utilize a sophisticated query interface.
Our solution makes the assumption that the data from the agencies is encoded in one of the commonly used formats (e.g., PDF, XML, HTML, Relational, RDF). The integration will be a two-step process. In the first step, SWS component retrieves information from each data source with a very high accuracy level. Since these information are still disparate (e.g., an XML file containing criminal history cannot be immediately combined with a Oracle database containing denied persons list), another step has to seamlessly combine them to construct a decision. This is the data mapping part of the integration process. Two methods of data mapping will be used in the second step: ontology method and adaptor method. Our ultimate goal is to map all the retrieved data in a single, unified format. Ontology is formal description of domain concepts and their mutual relationships. Using ontology, we can map concepts from disparate domains into the unified format. If there is no suitable ontology for a domain, an adaptor (also called a converter or wrapper) is built into the system to convert the source data into the unified format. Adaptors will be used to translate legacy database into our target format.
4. Datasets Available:
1. Police Blotters for Dallas County available online http://www.dallasnews.com/sharedcontent/dws/news/city/collin/blotter/vitindex.html
2. Semantic Access Ontology, GRDF Ontology, Geospatial Services Ontology.
3. ClearForest Semantic Web Services for Text Mining and Analysis:
5. TASKS AND DELIVERABLES
Task 1: Semantic Search Browser for Police Blotters:
To Provide a Semantic Level Browser that integrates the blotters from various counties or by geographic regions which provides an interface to the police office to query to get information with different input criteria like (a) by Crime Types (e.g., rape case) (b) By Time Period (e.g., in the first week of April 2007) (c) By Suspect Personal information (e.g., crime activity of Mr. X) (d) by geographical region using Zip Codes, City (e.g., list all sex offenders in City of Dallas).
We have already developed a semantic framework DAGIS which can handle queries of this nature. The Blotters will be exposed through Web Services for general input criteria and Semantic Web Services of these exposed web services will provide the capability to do more conceptual searches and dynamically compose services on the fly to handle various complex queries. Therefore, integration problem of multiple police blotters will be addressed. In addition, this solves semantic heterogeneities across blotters and provides an automated discovery of knowledge.
Task 2: Tools for Generating Crime Analysis Concepts from Blotters Information available in the blotters would be mined by developing using Data Mining tools which would be used to generate the concepts that would be mapped to build an Ontology for Crime Analysis across multi-jurisdictions. These tools will also be exposed as Semantic Web services and can be integrated with the Semantic Search browser developed in Task 1. In DAGIS we have developed OWL-S based semantic web service for the ClearForest Text Mining Semantic Web Services. In future, we would like to develop techniques that will generate semantic representation of concepts and their relationship given a police blotter report.