Data and Metadata

When you design a software system, you articulate the kinds of data that the system will store and manipulate. Later, when end users employ the system, they supply the data itself.

There is a distinction between the kinds of data and the data itself. This distinction is often called the type/instance distinction: software designers articulate types (such as classes or properties) and software users articulate instances (such as objects or property values). It is also known as the distinction between data and metadata. Metadata is data about data.

The distinction between data and metadata (or between instance and type) is a useful distinction, but it is not as rigid as you might initially think, because metadata, although a special kind of data, is nevertheless data. If you can write something down or store it on disk, it is data. Metadata is data because you can write it down. For example, you can make a list of the classes your program uses.

For example, in a typical relational DBMS, the system catalog describes the tables and columns that contain your data. You can think of the data in the system catalog as metadata because it is data about data — specifically, it is data about the data that your database manages for you. But you can also think of the data in the system catalog as data because with the right software tool, you can manipulate it as you would manipulate any other data. For example, when you use a database design tool, you can list out the tables in the catalog, you can sort the tables alphabetically, or you can delete every table whose name is longer than 18 characters.