Data Tool Kit - Glossary
If you are new to the data scene, it can be overwhelming to know where to start. Here we explain some foundational data concepts. If you do encounter jargon or terminology with which you aren’t familiar, please see the Glossary for definitions.
Do you think more terms should be added to this glossary? Let us know by taking the Water Board Data Tool Kit Feedback Survey! Back to Tool Kit
Glossary
Search assistance for the Data Tool Kit Glossary:
- To narrow the list of terms or definitions, enter search criteria in the search box below.
Please note: Search is case insensitive, alphanumeric, and allows up to 20 characters.
Compatibility View Settings must be off for search box functionality to work
- To display the entire glossary, clear the search box.
- To activate the sorting function in the table, click the column heading on which you would like to sort.
- To search the page, use your browser’s "Find" command from the Edit menu at the top of your browser window or press (CTRL+F) keys.
Term | Definition | Relevant Handbook |
---|---|---|
Computer vision | Techniques that gain understanding from digital imagery or videos, essentially transforming said imagery into data that can be analyzed and used to develop models and predictions. | Machine Learning Handbook |
Data | A set of values representing a specific concept. Data become “information” when analyzed and interpreted in a way that extracts meaning and provides context. Definition adapted from: Data Standards |
People's Data Science Handbook |
Data dictionary | A collection definitions, attributes, and allowable values about each data element. | Data Management Handbook |
Data science | The process of obtaining information and insights from data. | People's Data Science Handbook |
Dataset | The completed combination of data, metadata, and data dictionary. | Data Management Handbook |
Datathon | Data engagement events that tend to involve more discussion about the question(s) at hand and the data available to answer them; questions may need to be refined and data found before participants can work on answering the questions with data. | Data Engagement Handbook |
Hackathon | Data engagement events where participants use already defined questions or objectives and available data to build prototypes of analyses, visualizations, or interactive tools that answer refined questions. | Data Engagement Handbook |
Human-readable | Data cannot be read by a computer. This could be in the form of non-digital material (e.g. printed documents or data sheets) or digital material that the computer cannot access (e.g. PDFs, unformatted Excel spreadsheet). | Data Management Handbook |
Machine learning | A field of study that gives computers the capability to learn without being explicitly programmed. | Machine Learning Handbook |
Machine-readable | Data can be automatically read and processed by a computer (e.g. CSV, JSON, XML). | Data Management Handbook |
Metadata | Data about your data; the who, what, when, where, why, how about your data. | Data Management Handbook |
Open data | Open data principles generally specify that datasets should be public, accessible, described, reusable, complete, timely, and managed post-release. | Open Data Handbook |
Open source code | Open source code refers to code that is made freely available for anyone to use, modify, and share. Examples of open source software include MySQL, Firefox, and WordPress. | Open Source Code Handbook |
Predictive modeling | Techniques used to gain understanding and make predictions from tabular datasets. | Machine Learning Handbook |
Proprietary source code | Proprietary source code is copyrighted by a company or individual and not shared with others. Examples of proprietary software include Microsoft Office and Adobe Photoshop. | Open Source Code Handbook |
Reinforcement learning | A form of learning where the algorithm learns to react to the environment and is rewarded it gets something correct. | Machine Learning Handbook |
Supervised learning | A form of learning where we are tasked with finding patterns from a dataset where each observation has a label. | Machine Learning Handbook |
Tidy data | Data that are structured such that the data are easy to manipulate, model, and visualize. | Data Management Handbook |
Unsupervised learning | A form of learning where we are tasked with finding patterns when we don’t know what the “right answers” or labels for the outputs should be. | Machine Learning Handbook |