Training Resources banner

Data Tool Kit - People's Data Science Handbook

If you are new to the data scene, it can be overwhelming to know where to start. Here we explain some foundational data concepts. If you do encounter jargon or terminology with which you aren’t familiar, please see the Glossary for definitions.

What are Data?

Data are a set of values representing a specific measurement or observation and can be in a number of different formats or structures, such as lists or tables. By themselves, data have no meaning. It is only through analysis and interpretation that data obtains meaning and becomes information. Analysis and interpretation can take the form of numerical summaries (think statistical analyses) or graphical representations or summaries of the data themselves (think graphs or charts). For those philosophers among us, this hierarchy continues to include knowledge and wisdom, where an accumulation and synthesis of multiple sources of information combined with experience eventually results in knowledge. Finally, when knowledge is given context, value or judgment, and shared understanding, it becomes wisdom

In addition to the data itself, working with data or a data project may require the development of a data workflow, data model, or data schema. Understanding the differences between these tools, how they are related, and how to use them will be tremendously helpful when trying to articulate general and detailed information about your data project.

Pyramid representing the hierarchy between data, information, knowledge, and wisdom. With data being at the base of the pyramid and wisdom being at the pyramid's apex

A data workflow is a conceptual roadmap of your data project. Similar to a traditional workflow diagram, your data workflow should have explicit representation of how data flows through people and process. Ideally, your workflow would also include a timeline component that shows when decisions are made and how those decisions impact data flow.

Workflow includes: Data Entry Options, to Interactions with People (e.g. QAQC), to Data Storage (e.g. databases, tables, etc.), to Interactions with People (e.g. data use), to Decision, to Action.
Example data workflow displaying how data enters the system, how data interacts with people (via QAQC or data use), where data are stored, when decisions and actions are made in the process.

A data model (AKA Entity Relationship Diagram or ERD) is a graphical representation of the data system design. The data model should generally answer the questions:

  • Which data objects (e.g. databases, tables, etc.) will be used?
  • Which data objects will be managed?
  • Which data objects will be exported?
  • How are the data objects related?
  • What data products will be developed (e.g. visualization, website, application, etc.)?
See figure caption for details
Example data model, displaying which tables are involved in the data import process, which are involved in the data management process, how and where data will be exported, and how data will be used.

A data schema shows what the “guts” of your database will look like, including the identification of tables, columns, data types, constraints, and relationships.

See figure caption for details
Example data schema, displaying the tables and their respective columns, data types and relationships, which match with what was shown in the data model.

The table below compares the levels of detail and components included in data workflows, data models, and data schemas.


Term Level of Detail Key Components
Data Workflow Low – Medium Data objects
Data Flow
Critical people & process
Critical Timelines (as appropriate)
Data Model Low – Medium Data objects
High level relationships
Products
Data Schema High Data objects
Tables
Columns
Data types
Constraints
Relationships

Data Science 101

Ven diagram showing that data science is where subject matter expertise, mathematics and statistics, and computer science intersect.

Data science, in the most simple and general sense, is the process of utilizing the scientific method to data to transform data into information and insights. Data science has been (and continues to) evolve into a multi-disciplinary field that combines subject matter expertise (e.g. hydrology, ecology, social science), computer science (e.g. software, programming, machine learning) and mathematics and statistics (e.g. models, algorithms) to identify patterns and make predictions from data. From an analytics perspective, data science can provide solutions via:

  • Descriptive Analytics: transforms data into useful information to understand what happened in the past (see the Business Intelligence Handbook for more information)
  • Diagnostic Analytics: utilizes data to understand why things in the past happened
  • Predictive Analytics: utilizes data and machine learning techniques to classify or predict what is most likely to happen in the future (see the Machine Learning Handbook for more information)
  • Prescriptive Analytics: utilizes data to recommend actions one can take to affect outcomes, which can then inform decision making

The marriage of data and data science allows us to better transform data into and communicate information. This communication can be through reports, statistical inference, making predictions, or data visualization. Using data science to develop such products can increase their efficiency, effectiveness, and transparency.