«NOAA Environmental Data Management Framework NOAA is, at its foundation, an environmental information generating organization. Fundamental to ...»
Users of NOAA environmental data may create derived or Value-Added Products. These new products may themselves constitute a new dataset that merits its own lifecycle data management process. NOAA or NOAA-funded projects that routinely create new products should establish and follow a datamanagement plan and ensure the products they generate are discoverable, accessible, and archived.
New products should be linked back to the original source data via appropriate documentation and citation of dataset identifiers (see Section 3.2.9).
Data users should have a mechanism to provide Feedback to NOAA regarding usability, suspected quality issues, and other aspects of its data. Agency point-of-contact information should be included in the metadata. Any feedback received should be acted upon if possible and included in the metadata if appropriate in order to help future users. Limited mechanisms for user feedback, notably Help Desks at each data center, have been established. These require that the user have obtained the data from the Data Center and be willing to engage in dialog. Possible additional approaches include mailing lists or social media.
Citation refers to the ability to unambiguously reference a dataset that was used as input to a model, decision, scientific paper, or other result. This is an emerging topic of broad interest * that will be addressed by NOAA's Data Citation Procedural Directive (in preparation). The Earth Science Information Partnership (ESIP) Federation also provides citation guidelines. † The core concepts are (1) persistent
Tagging refers to the ability to identify a dataset as relevant to some event, phenomenon, purpose, program, or agency without needing to modify the original metadata. Existing examples of tagging include the ability of NOAA users of Google Drive to assign documents to multiple collections without modifying the folder hierarchy, or of Facebook users to tag individuals in a photo without editing the file-level metadata. The ability to tag is essential because the current practices of (a) creating new Catalogs and asking people to re-register a relevant subset of their data there, or (b) asking people to add new metadata tags such that an external project can detect them (e.g., the GEOSS DataCORE activity in 2011), are not scalable because they require additional work and lead to the proliferation of duplicate datasets and metadata records. No specific solution is proposed here, but an appropriate use of collection-level catalogs may support tagging.
Gap Analysis refers to the determination by users of data or decision-makers that additional data are needed to satisfy operational requirements or to understand a phenomenon -- for example, more frequent coverage, improved spatial or spectral resolution, or observations of other quantities. Gap analysis may also address continuity of observations to meet operational requirements or enable longterm trend analysis. Such a determination influences the Requirements Definition activity, which is the start of a new Data Lifecycle.
4. Summary NOAA data constitute an irreplaceable national resource that must be well-documented, discoverable, accessible, and preserved for future use. Good data management should be part of NOAA's core business practices, and employees and leadership should be aware of their roles and responsibilities in this arena. The NOAA Environmental Data Management Framework recommends that EDM activities be coordinated across the agency, properly defined and scoped, and adequately resourced. The Framework defines and categorizes the policies, requirements, and technical considerations relevant to NOAA EDM in terms of Principles, Governance, Resources, Standards, Architecture, Assessment, and the Data Lifecycle. The Framework enumerates specific recommendations in Appendix A.
NOAA thanks the Science Advisory Board for its recommendation in March 2012 that an Environmental Data Management Framework be developed.
Appendix A: Recommendations The following is a partial list of recommendations that would advance the goals of improved environmental data management at NOAA. They are grouped according to who would be primarily responsible for implementing them.
Data Producers and Observing System owners:
1. Write Data Management Plans (DMPs) and submit them to the EDMC DMP repository.
2. Allocate an appropriate percentage of project funds to managing the resulting data.
3. Ensure data producers Initiate the negotiation of submission agreements, including relevant budget requirements, with a NOAA Data Center in advance of data collection.
4. Solicit feedback from users regarding the accessibility, usability and quality of NOAA data, make improvements if appropriate, and report improvements or issues to EDMC or DMIT.
5. Support the Observing System of Record* Data Management Assessment.
6. Ensure that observing requirements and capabilities are included and validated in the NOAA Observing System Architecture (NOSA) and Consolidated Observing Requirements List (CORL) databases maintained by TPIO. †
7. Produce ISO metadata natively for new environmental data.
8. Transition metadata from legacy standards (FGDC CSDGM), non-standard formats, and unstructured documentation to correct and complete ISO metadata records, focusing especially on high-value datasets and observing systems of record.
9. Leverage tools already developed for metadata transformation and quality assessment.
Data Management Integration Team and other technical staff
10. Use existing domestic and international data, metadata, and protocol standards wherever suitable in preference to ad hoc or proprietary methods. If existing standards seem not suitable, provide feedback to EDMC or relevant standards body.
11. Coordinate adoption of interoperability standards by working with cross-NOAA groups such as EDMC and DMIT, and with external coordination groups.
12. Document recommended practices, experiences, examples, useful software, teams, events, etc on the EDM Wiki.
13. Coordinate enhancements to open-source software via DMIT or other cross-NOAA teams to avoid duplication of effort.
14. Publish on the NOAA EDM Wiki (12) the conventions, profiles and examples adopted to specialize standards for particular data types.
15. Develop a NOAA Cloud Strategy to address deployment scenarios, IT security issues, and procurement mechanisms.
* https://www.nosc.noaa.gov/OSC/sor.php † https://www.nosc.noaa.gov/tpio/
18. Support development of the Metadata Rubric and Data Management Dashboard.
19. Review data management plans of projects that seek funding approval from the IT Review Board (ITRB).*
20. Identify projects that do not properly document, share, or archive their data. Assist them in adopting good data management practices. Bring them to the attention of NOAA Leadership if necessary.
21. Pre-approve IT security Certification and Accreditation (C&A) for standard software packages to maximize compatibility and minimize the administrative hurdles involved in setting up new servers.
22. Promote reusable software and modular systems for reduced development and maintenance cost.
23. Assess investments in new or upgraded infrastructure components prior to approval regarding use of commodity technologies, ability to support multiple projects, and interoperability.
24. Continue and expand efforts for shared hosting of small datasets.
25. Maintain legacy data exchange mechanisms as needed, but consider adoption of common standards as part of technology refresh cycle.
26. Promote implementation of modern data access services for all NOAA data collections.
27. Revise IT security policies to make Cloud deployments routine and easier to approve than in-house systems.
28. Decline or postpone projects seeking approval from the ITRB for IT funding if data management planning and budgeting are inadequate.
29. Empower Line Offices to designate an EDM Officer (similar to IT Security Officer) with the authority and responsibility to oversee and enforce EDM compliance within their Office. Include such duties in the individuals' performance plans.
30. Update individual performance plans of all employees who produce, document, or manage data to permit, acknowledge and empower their work.
31. Ensure that personnel responsible for environmental data understand the need for data management and are trained in good EDM practices.
32. Identify or establish process for transferring program or project funds as needed to the designated long term archival repository or other appropriate data management entities.
33. Ensure that Federal Funding Opportunities (FFOs) plan for archiving of grant-produced data at a NOAA Data Center.
* The NOAA ITRB name and description are currently under revision. Existing description and terms of reference are at http://www.cio.noaa.gov/IT_Groups/noaa_cio_nitrb.html.
Appendix C: Cloud Computing Cloud computing refers to the use of shared information technology (IT) resources such as storage, processing or software. Cloud resources can be scaled up or down based on demand. Multiple projects can share resources without each needing to have surplus capacity for the maximum expected load.
Projects can acquire and pay for IT resources on an as-needed basis without maintaining in-house computing facilities. These shared IT resources can be operated either externally by commercial Cloud service providers or internally by one division on behalf of the entire agency. Cloud computing is a fundamental shift from the traditional approach of having each project procure and operate dedicated, in-house IT resources.
The US Chief Information Officer has issued a "Cloud-first" policy (6) and a Federal Cloud Computing Strategy (17). NOAA is required to consider Cloud-based approaches in favor of building or maintaining dedicated IT systems. Within NOAA, the Google Unified Messaging System (UMS) contract for email, calendars, and document sharing is an example of migration to the Cloud. Possible Cloud deployment
scenarios for environmental data include:
• The master copy of a NOAA dataset is retained internally at a NOAA Data Center, while a public copy is sent via one-way push to a publicly accessible commercial Cloud where external customers (the private sector, the general public, foreign governments) can obtain data and perhaps invoke additional services (subsetting, visualization, transformation, etc). A digital signature (checksums or hashes) is produced and compared where appropriate to confirm the authoritativeness of the public copy.
• Non-real-Time Processing: climate product generation, satellite data reprocessing, and other non-real-time computation are performed on commercial cloud resources. The resulting products are also disseminated via the Cloud. The input data may already reside in the same Cloud.
Such scenarios would reduce the load on NOAA servers and allow capacity to be quickly ramped up during periods of high demand.
Costs and procurement mechanisms must be assessed carefully in Cloud deployments. There are monthly charges based on data storage, data retrieval, and computing cycles that must be budgeted for and payable across the fiscal year boundaries.
Appropriate IT security must be considered when NOAA data are hosted on commercial Cloud services.
NOAA servers must comply with NAO 212-13: NOAA Information Technology Security Policy (13). Cloud deployments may reduce information technology (IT) security risks to NOAA systems by placing publicfacing servers outside the NOAA security boundary. The General Services Administration (GSA) Federal Risk and Authorization Management Program (FedRAMP) * was established to ensure secure cloud computing for the federal government. Only vendors authorized by FedRAMP may be used. [Note: as of * http://www.gsa.gov/portal/category/102371
Appendix D: References