This is part one of a series on databases and structured query language (SQL).
Databases are a fundamental part of many enterprise-level applications. Virtually all websites on the Internet today are essentially just an interface to a database. Content is created by the user, read by the user, and edited by the user.
There are many different database systems, the most popular of which include MySQL, SQLite, PostgreSQL, and MSSQL. These aforementioned languages all have one thing in common: they are relational databases (data that is organized formally according to a preexisting user-defined model) and they all use SQL.
In addition to SQL-based databases, there are more modern database systems that claim to supersede the traditional SQL model. Those will not be covered in this series, but I encourage you to read about MongoDB or CouchDB if you’re willing.
When I speak of databases in this series, I refer to relational databases using SQL.
Databases are organized into tables, also known as schema. Each table has a list of fields, each of which has a name, a type, and properties. For example:
- A database school for an elementary school has two tables (schema) named students and classes.
- The table students has fields: id, name, birthday, grade_level, class_id, is_citizen
- The table classes has fields: id, teacher, biography
It is customary for each table to have its first column known as an id. This identification number is typically unique, required, and automatically incremented as new rows are added to the table. In this case, the id is a primary key.
In the example, each student’s class_id matches a class’s id. Consider the case where a class’s teacher is changed from Mr. Jones to Mrs. Pilgrim. If the teacher’s name were stored with each student, then the name would have to be changed in several different places, once for each student in the class. Having the teacher’s name separate removes this redundant information from the database, and a name only needs to be changed in 1 place: in the teachers table.
Some more exploration of terminology:
- Tables are also called schema. Specifically, the schema is the set of fields of a table, not the data itself. Tables are also known as relations, ergo the relational database.
- Columns are also known as attributes or fields.
- Rows are also known as tuples (because of the suffix of sextuple, septuple, octuple, etc..). Each row has a value for each column of a table.
Each field (column) in a database needs to have a type. Different databases support different types. Here are a couple basic ones supported in almost all database systems:
- integer – A simple number. Some database systems limit integers to nonnegative integers (also known as unsigned integer because the lack a sign). Some database systems have upper limits to size of integers such as 4294967295 (2^32 – 1). Primary keys are usually integers, such as the fields id and class_id in the example. The grade_level is also an integer, though it is not a primary key.
- varchar – A sequence of characters with fixed maximum length. These characters can be a word, or words, such as Mr. Jones and they can contain numbers like Room 37A. They have a maximum length that must be set when the schema is created. Some database systems default to a maximum length of 255 when a length is not given. Fields like name and teacher can be varchars.
- boolean – A true or false value. (Pronounce it like Julian with a B.) This is the simplest of all data types because there are only 2 possible values. In the example, is_citizen is a boolean.
- datetime – A value that specifies a date and time, usually to a second or a millisecond precision. Some database systems support less precise types like date which lacks a time, or time which lacks a date. In the example, birthdate would be a datetime.
- text is like a varchar except that it has no fixed length. Fields like biography can be text fields. In general, the rule of thumb is to use text fields only when a value can be of any variable length. Some database systems also support binary versions of text fields known as blobs. Those can store images or any kind of binary data.
The next part (part 2) in this series will cover first of the 4 basic database operations: create, read, update, and delete (CRUD for short).