Data Tool Kit - Open Data Handbook
This handbook provides guidance on the process of publishing data in an open data format, with the aim of creating a consistent and repeatable set of guidelines and procedures for individuals to follow. It is a living resource that will be updated as new issues are identified or new workflows are created. The handbook is being developed by the Water Board’s Office of Information Management and Analysis (OIMA), which is leading the Water Board’s Open Data Initiative, and will serve as a resource to facilitate open data publishing efforts and coordinate open data efforts across the Water Boards Divisions, Regions, and Programs.
The exact process one needs to take to publish data openly on the California Open Data Portal or other open platform differs based on who is trying to publish the data, the condition of said data, the type of data, and the source of data that one is trying to publish. Some examples include:
- California Health and Human Services (CHHS) Open Data Handbook
- California Open Data Handbook
- New York State Open Data Handbook
- The White House’s Project Open Data
Background
On July 10, 2018, the State Water Board adopted resolution number 2018-0032, titled Adopting Principles of Open Data as a Core Value and Directing Programs and Activities to Implement Strategic Actions to Improve Data Accessibility and Associated Innovation. This resolution commits the State Water Board to following several core principles for open data, including striving to make all critical public data available in machine readable datasets with metadata and data dictionaries. The resolution also directs OIMA to develop a Data Management Strategy that will guide the State Water Board’s efforts to implement practices and procedures consistent with open data principles, and this Open Data Publishing Guide is one part of that Data Management Strategy.
Open data principles generally specify that datasets should be public, accessible, described, reusable, complete, timely, and managed post-release. Managing data as an asset in this manner promotes transparency, accountability, and innovation, by making it as easy as possible for anyone to analyze and utilize publicly available government data, and providing derivative value that often cannot be predicted. Ultimately, making public data more accessible and usable helps facilitate the process of turning the data that we already collect into actionable information, and democratizes access to that data.
Data Quality
In general, data quality assurance should be a part of the data collection and management efforts that occur prior to publishing datasets in an open data format. Therefore, extensive data cleaning should not generally be a significant part of the open data publishing process. However, to the extent possible, the data provider should work with OIMA to develop a plan to perform basic checks for usability before publishing the data, such as:
- checking to make sure that special characters that could cause formatting issues are removed;
- checking to make sure that all records within a field are of a consistent type (for example, if a field should be numeric, ensuring that all records can be treated as a number);
- checking for and addressing obvious outliers; and
- verifying the reasonableness of any geospatial data (for example, making sure that any points with associated location data are within an expected area when plotted on a map).
Despite the data provider’s best efforts, publishing data in an open data format may sometimes expose data quality issues and create a need to address data quality concerns that are raised as a result. Therefore, it is also important to be aware of and/or plan for potential needs to:
- revise the published dataset (e.g., a dataset published on an open data portal);
- apply the revisions to the source dataset (e.g., an internal database used to store the data); and
- notify users of the published dataset of significant changes to previously published data
Open Data Publishing at the Water Boards
Making data open and accessible to the public is a critical step in meeting the commitments of the Open Data Resolution. The exact process one needs to take to publish data openly on the California Open Data Portal or other open platform differs based on who is trying to publish the data, the condition of said data, the type of data, and the source of data that one is trying to publish. If you are interested in publishing data in an open data format, we recommend:
- Reviewing the resources on data.ca.gov.
- Documenting metadata, see Recommendations and Guidelines for Data dictionary Development for more information
- Contacting OIMA (OIMA-Helpdesk@waterboards.ca.gov) for specific guidance or support with navigating through the open data publication process at the Water Boards.