«NOAA Environmental Data Management Framework NOAA is, at its foundation, an environmental information generating organization. Fundamental to ...»
This Section discusses the over-arching themes. Section 3 introduces the concept of the Data Lifecycle and discusses the interrelated activities that occur during the life of a particular dataset.
Figure 2: The Environmental Data Management Framework includes Principles, Governance, Resources, Standards, Architecture, and Assessment that apply broadly to many classes of data, and individual Data Lifecycles for particular data collections.
2.1. Principles The following basic principles generally apply to all NOAA environmental data, though there may be exceptions for particular datasets on a case-by-case basis (such as proprietary or confidential data).
These principles are further explained in the following subsections.
2.1.1. Full and Open Access In general, data managed or paid for using federal funds should be available to the public as soon as possible after collection, in a non-discriminatory manner, and at minimum cost. It is not necessary to distribute data to the public directly from the operational data processing systems as long as data are made available at an appropriate point downstream. Exceptions to this principle should be rare and explicitly justified on a case-by-case basis. (For example, data may contain confidential or personallyidentifiable information; data purchased from commercial vendors may not be redistributable; data distribution may be restricted by Memorandum or other agreement; open access may not apply to every part of a satellite data stream handled by NOAA because we may be operating satellites owned by other organizations or there may be NOAA instruments on non-NOAA satellites.)
• Timeliness: NOAA data should be made publicly available with minimum time delay after capture.
The timeliness may not be the same in all cases -- for example, routine, ongoing observations by automated sensors will be more promptly available than the results of sporadic, labor-intensive data collection. Data calibration, processing, and quality control processes should be automated whenever possible to minimize any delays. In limited circumstances, some scientific investigations may permit a temporary data hold (typically not more than 1-2 years) before distribution.
• Non-discrimination: NOAA data should be made publicly available to the widest community possible. NOAA data should be approved for general release and distributed in a manner that does not unfairly hinder access unless a specific exemption has been granted. Possible exceptions to open access include data whose public dissemination is prohibited by law (e.g., personally identifiable or proprietary information), by commercial agreement, or for reasons of national security (e.g., classified information).
• Minimum cost: NOAA data should be made available free of charge to the greatest extent possible, and certainly free of profit. Data should be made available and accessible online via web services or other internet-based mechanisms whenever possible. In limited circumstances, the cost of reproduction may be charged to the user when it is necessary to ship data on physical media or when specialized or certified products must be created to satisfy a particular request.
Version 1.0 8 2013-03-14 NOAA Environmental Data Management Framework 2.
1.2. Long-Term Preservation Earth observations are not reproducible after the moment of measurement has passed, and are often acquired using costly technologies such as satellites, ships, aircraft, advanced sensors, open-ocean buoys, autonomous vehicles, and human observers. These observations should be managed as agency and national assets, preserved for future use, and protected from unintended or malicious modification.
Data should not only be preserved in their original form but should be actively stewarded to ensure continuing usability.
2.1.3. Information Quality Environmental data and metadata should be of known quality, and ideally of good quality. Explanations of quality control (QC) processes, and the resulting quality assessment itself, should be included or referenced in data documentation. See Sections 3.2.3 and 3.2.4 for further information regarding QC and Data Documentation.
Raw data may be distributed in (near) real time before QC and documentation have been completed, but it must be clearly communicated to prospective users that the quality may not be known when data are provided on an “as-is” basis.
2.1.4. Ease of Use To encourage the broadest possible use of NOAA data, users should be able to find observations and derived products easily through search engines, catalogs, web portals, or other means. Data should typically be made available and accessible via web services or other internet-based mechanisms rather than by shipping physical media or by establishing dedicated or proprietary linkages. These services should comply with non-proprietary interoperability specifications for geospatial data. Data should be offered in formats that are known to work with a broad range of scientific or decision-support tools.
Common vocabularies, semantics, and data models should be employed. Feedback from users should be gathered and should guide usability improvements. Users should be able to unambiguously cite datasets, both for later reuse and to provide credit and traceability to the originator. These topics are discussed in more detail in Sections 3.2 and 3.3.
2.2.1. NOAA bodies with policy or technical authority over data management Figure 3 illustrates the agency bodies that play a direct role in governance of environmental data management at NOAA. We discuss their activities in this section.
The Environmental Data Management Committee (EDMC) * is a nexus of EDM governance activities at NOAA. EDMC was established in 2010 by NOAA Administrative Order (NAO) 212-15 (2), and reports to both the Chief Information Officers (CIO) Council † and the NOAA Observing Systems Council (NOSC) ‡.
EDMC is a voting body with representatives from NESDIS, NMFS, NOS, NWS, OAR, OMAO, PPI, the NOAA Data Management Architect (DMA), and the NOAA Enterprise Architect (EA).
Figure 3: Governance structure for environmental data management at NOAA. Solid lines indicate reporting authority; dashed lines indicate liaison or advisory relations. The NOAA National Data Centers are technically within NESDIS but operate on behalf of the entire agency, and are therefore shown as reporting to NEC & NEP for simplicity.
§ The Data Management Integration Team (DMIT) is a cross-NOAA group composed of technical experts in web services, metadata, archiving, and other relevant fields. DMIT members provide guidance and support via a mailing list and telecons. All Data Centers and significant data-producing or datamanagement projects should have a DMIT representative.
The NOAA National Data Centers -- the National Climatic Data Center (NCDC), National Geophysical Data Center (NGDC), and National Oceanographic Data Center (NODC) -- have policies and procedures * https://www.nosc.noaa.gov/EDMC/ † http://www.cio.noaa.gov/IT_Groups/noaa_cio_CIOCouncil.html ‡ https://www.nosc.noaa.gov/ § https://geo-ide.noaa.gov/wiki/index.php?title=Category:Data_Management_Integration_Team
Individual programs and projects are also responsible for sound data management practices. Leaders of these programs have some discretion regarding technical implementation, but are encouraged to maximize compatibility and reduce development and maintenance costs by coordinating with each other, with the Data Centers, and with EDMC and DMIT.
The Science Advisory Board (SAB) *, particularly through its standing Data Access and Archiving † Requirements Working Group (DAARWG), performs an external oversight role regarding data management activities. The development of this EDM Framework was recommended by DAARWG and ‡ SAB.
2.2.2. NOAA policies and documents relating to data management NOAA's Next Generation Strategic Plan (NGSP) (1) makes numerous references to the need for good data management practices. The NGSP declares that NOAA's Mission is Science, Service and Stewardship, where "Service is the communication of NOAA’s research, data, information, and knowledge for use by the Nation’s businesses, communities, and people’s daily lives." One of NOAA's Objectives is "Accurate and reliable data from sustained and integrated Earth observing systems." The
NOAA will research, develop, deploy, and operate systems to collect remote and in situ observations, and manage and share data through partnerships and standards… Fundamental … is an increased focus on information management standards and strategies to improve access, interoperability, and usability of NOAA’s environmental information resources... Evidence of progress includes … Improved data interoperability and usability through application and use of common data management standards.
The Annual Guidance Memorandum (AGM), AGM Implementation Plans, and the Corporate Portfolio Analysis (CPA) Decision Memorandum, all part of NOAA's Strategic Execution and Evaluation (SEE) process, provide general direction regarding priorities and budget for all NOAA activities including those involving data management. Corporate issues and activities relating to environmental data management are codified in the NGSP Implementation Plan of the Enterprise Objective on Reliable Data from Integrated Earth Observing System. The EDMC will implement activities resulting from SEE decisions via NOSC direction.
* http://www.sab.noaa.gov/ † https://www.nosc.noaa.gov/EDMC/DAARWG/index.php ‡ http://www.sab.noaa.gov/Reports/Reports.html Version 1.0 11 2013-03-14 NOAA Environmental Data Management Framework NOAA Administrative Order (NAO) 212-15 (2) establishes environmental data management policy for NOAA and provides high-level guidance for procedures, decisions and actions regarding EDM. NAO 212-15 provides the EDMC with the authority to develop and approve Procedural Directives (PDs). Four
PDs have been issued, and two others are currently in development:
• Data Management Planning Procedural Directive (7): Directs managers of all data production projects and systems to plan in advance for data management, and contains a planning template with questions to be addressed by data production projects.
• Procedure for Scientific Records Appraisal and Archive Approval (8): Defines the process used to identify and appraise scientific records for NOAA archiving.
• Data Documentation Procedural Directive (9): States that all NOAA data collections, and products derived from these data, and services that provide NOAA data and products, shall be documented. Establishes a metadata content standard (International Organization for Standardization [ISO] 19115 Parts 1 and 2) and a recommended representation standard (Extensible Markup Language [XML] formatted per the ISO 19139 schema) for documenting NOAA’s environmental data and information.
• Data Sharing for NOAA Grants Procedural Directive (10): States that all NOAA Grantees must share data produced under NOAA grants and cooperative agreements in a timely fashion, except where limited by law, regulation, policy or security requirements. Grantees must address this requirement formally by preparing a Data Sharing Plan as part of their grant project narrative, and by sharing data from funded projects within not more than two years. Specific language has been approved by NOAA Office of General Counsel for inclusion in announcements of opportunity and notices of award.
• Data Access Procedural Directive (in preparation): States that all NOAA environmental data shall be made accessible via the Internet, except in limited circumstances, and discusses appropriate services and formats. (Expected to be issued in 2013.)
• Data Citation Procedural Directive (in preparation): States that NOAA datasets shall be assigned a persistent identifier, with a corresponding documentation page maintained by a NOAA Data Center. Urges data users to cite datasets used in papers, decisions and other products, and recommends a citation format including the identifier. (Expected to be issued in 2013.)
• External Data Usage Recommended Practice (11): provides a worksheet of potential issues to consider when using non-NOAA data.