January 2012

What's a Programming Language?

Like most people in the software field, I sometimes have trouble explaining what I do to others. To people who don't work in this field, the idea of a "language" for programming computers can seem downright bizarre. For readers who might fall into this category too, here's the story of programming condensed to just eighteen paragraphs to provide a brief bit of context for true beginners. This story revolves around three key ideas: the stored-program model, the software hierarchy, and the mapping from language to hardware.

The Stored-Program Model

The first thing you need to know in order to understand programming languages is the notion of the stored-program model that underlies all computers today. Really, computers are fairly dumb at their core; their chips and circuits do nothing but load simple numerically coded instructions from the computer's memory and carry out the actions they imply. A set of these instructions stored in a computer memory is called a program. It's also sometimes called code for color; an algorithm by the more numeric among us; and software to distinguish it from the less transient hardware of the computer itself.

Whatever we call them, programs consist of fairly basic numeric actions from the computer's perspective: they add and subtract numbers, move data around in memory, and so on — a process which is blazingly fast, but essentially pointless by itself. By loading different sets of instructions into its memory, though, we can make a computer perform different tasks and tangible work.

This model, a changeable memory of stored instructions and a hardware device that sequences through them, is generally called the Von Neumann architecture after the person most strongly associated with it. In principle, this model isn't all that different from more familiar devices such as CD or DVD players; by loading different media, a player can be made to perform a variety of selections. The instructions loaded into and run by a computer, though, support much more varied goals. A computer can perform any task that can be encoded and expressed in its native instruction set.

Though simple, the implications of this model are arguably mind-blowing: computers are machines for building other machines. They are a sort of super-generalized, universal tool. From this one same machine we can build websites, hydrology models, Mars rover navigation systems, and video games. We can even build digital music and movie players, which nudge those CDs and DVDs of the prior paragraph towards obsolescence daily. We don't need to build a custom machine from scratch for each of these goals, because the computer, the machine of machines, is general enough to take whatever form we can graft upon it by the programs we load into its memory.

In everyday terms, it's as if a single generic tool could be hammer, saw, and drill; or a single appliance could be toaster, refrigerator, and television. Depending on the programs you choose to load, the otherwise lifeless hardware of the computer can take on a very wide variety of personalities and roles. The stored-program model provides us with a sort of canvas for realizing nearly arbitrary machines.

A Hierarchy of Structure

So where do programming languages fit in this model? Programming languages simply define the instructions we use to specify the steps to be taken by a program. Viewed abstractly, each program we write in such a language represents a brand new software machine, to be run by the hardware machine of the underlying computer. Although the physical computer may be generic and bland by itself, the programs we run on top of its hardware have much more concrete and useful roles — they search libraries, process images, display emails, and so on. Programming languages allow us to encode the knowledge needed to perform these more realistic tasks, and effectively add it to the computer's capabilities when loaded into memory. They are the raw material of the software which animates and gives purpose to hardware.

To fully understanding programming itself, though, it's also important to understand its reliance on a hierarchy of structure, also known as a software stack to those who care about such things enough to invent jargon for them. It's actually a simple idea. Like most engineering enterprises, programming languages and programming in general leverage multiple layers of increasingly abstract structure aimed at hiding the complexity of the lower layers.

At the bottom of the languages hierarchy, a computer's hardware implements a simple set of instructions known as its machine language. These numerically coded instructions stored in memory deal in terms of numeric addresses and numeric data. Because machine languages are native to a computer's hardware they differ from platform to platform. But all support the three pillars of computer programs: sequence: stepping through instructions one after another; selection: branching to instructions based on logical tests; and repetition: repeating instructions over and over. Although they are simple, machine language instructions are sufficient to achieve all that computers do today.

While machine language may be sufficient, it's also virtually impossible to use for realistic tasks. In fact, it's so tedious to use that a variety of approaches to simplifying it have emerged over the years. The earliest attempts were known as assembler languages, which were really just machine language with window dressing; they gave cryptic names to the numeric codes of machine language instructions, allowed memory addresses to be labeled, and usually provided a macro tool which expanded text into other text. For instance, programmers could now say things like "move x, y" to move data from one place in memory to another, instead of giving that instruction's numeric code and the numeric addresses of the source and target.

Though better than the numbers of raw machine language, assembler language still had most of the same downsides. Mapping real world problems to a program at this low level can still be a monumental task. To see how programmers achieve much of what they do with computers today, we need to climb one level higher.

The Programming Languages Level

At the next level of hierarchy, higher order programming languages go much further, adding structure and abstraction on top of the raw machine language model to make it easier to use. Though their approaches may vary, all aim to allow a task's solution to be described in ways that are closer to the way people think, instead of requiring it to be morphed to match the way that computer hardware operates. Most programming languages achieve this goal by allowing a procedure to be specified with sentences and grammars which are much closer to a natural language such as English, rather than numeric codes or labels for them.

The earliest of what most people today call true programming languages, such as FORTRAN and C, made it much easier to express complex ideas and calculations. Among other things, they provided syntax for describing tasks to be carried out, and decomposed larger arithmetic expressions into the set of machine language instructions required to implement them. For example, conditional and repeated actions were typically coded with "if" and "while" statements, and mathematical operations were written with familiar "+" and "-" expressions. Though still in wide use today, most of these early languages provided only a thin abstraction layer, and many find them to be still too closely tied to the hardware machine's underlying model, as well as much of the drudgery and peril that comes with it. Programming is still largely about sequences of instructions at this level, and requires substantial work to map information to computer memory.

At the top of the hierarchy, the so called higher-level languages go even further by providing abstractions and structures that move substantially beyond the Von Neumann model, and gain wider utility and accessibility in the process. This is Python's family; its object-oriented and functional programming models, for example, have very little to do with raw machine language, but can improve the task of programming profoundly. Further, Python's built-in datatypes such as lists and dictionaries are only remotely reminiscent of data stored in computer memory, and are much more flexible. Python's automatic memory management alone obviates much work required by earlier languages, and eliminates entire categories of program errors. Languages like Python are also sometimes called scripting languages to highlight their relative ease of use, especially compared to larger systems languages like C++ and Java; per this distinction, a script is a simple sort of program, though much of its simplicity derives from that of the underlying scripting language used.

Other higher-level languages have explored other paradigms such as aspect-oriented and logic programming. Some such paradigms in higher-level languages attempt to improve on the basic machine language model, and others aim to remove it altogether. For instance, object-oriented languages like Python provide a model in which programs are built by customizing existing code, and allow developers to represent objects from the real-world directly in their programs. Logic programming languages such as Prolog go even further, defining computation to be deduction from a set of logical rules, which declare what it means to satisfy a goal, but do not specify how to do so — a more radical and complete departure from the underlying hardware.

While programming language technology is still a fairly young field that evolves quickly, programmers tend to be pragmatic folk who gravitate towards tools that work well in practice. This mindset likely accounts at least in part for the prominence of object-oriented and scripting languages like Python today, and the lesser role of more exotic languages like Prolog which still seem foreign or impractical to many present day programmers. Regardless of their approach, higher-level languages allow us to represent tasks and information much more easily than their predecessors, simplifying the work of programming and enabling new applications.

Mapping Languages to Chips

The last key idea underlying programming is related to the prior, and has to do with the way programs are actually run. Because computers understand only their native machine language, programs coded in higher-level programming languages must ultimately be mapped to lower-level machine language in order to be run by the computer's hardware. This is arranged by another program loaded into and run on the computer — by a compiler, which translates language statements to machine language instructions directly; by an interpreter, which carries out the program's commands itself; or by some combination of these two approaches, such as compiling to a platform-neutral form known as bytecode which is then interpreted.

For example, most programs written in Python are run by a program coded in C, which is itself translated to machine language prior to execution — a hierarchy spanning three languages, which compiles Python code to bytecode, and runs your program on raw hardware at the bottom. Python variants such as Jython and IronPython may add additional software layers to the mix, but still run your programs on chips in the end.

But why go to such trouble? In short, programming languages exist to address the great disparity between the numeric nature of computer hardware, and the symbolic nature of the world we all live in. Although their statements and larger structures must be mapped to the simpler instructions of the underlying computer, the added expressiveness of higher-level programming languages makes the act of programming real-world tasks faster, less error prone, and much more productive. Many people have also found that Python in particular, with its straightforward syntax and higher-level tools and paradigms makes the task of codifying a program even simpler and more natural than some of its contemporaries.

For More Details

At least that's the abstract story. It's also just one part of the larger programming tale. The software hierarchy story continues from here to libraries, tools, and frameworks which provide an additional layer on top of languages, where much real-world programming action occurs. Internet programming, for instance, requires knowledge of both programming languages, and Internet protocols and frameworks. That top layer awaits you in the world of applications development, but is outside this article's scope.

To see what Python programs actually look like, and to judge claims about its merits for yourself, you'll have to read on. Much like understanding programming languages in general, making sense of random Python code requires some background context, and providing it is largely what books such as Learning Python are about.

Latest revision: January 13, 2012 (first posted: October 2011).
Have a comment on this article? send an email.



[Home page] Books Code Blog Python Author Train Find ©M.Lutz