Programming Languages for the Uninitiated

Plus, how computers work!

As part of my quest to explain what it is I do all day to non-techy people, I figured I’d write a bit about programming languages. This isn’t me attempting to teach you how to code, or telling you that everyone should learn to code (even the most hardcore advocates would have to admit it’s possible to get by in the world without knowing how). This is also not about how to fix your computer, although it might help you understand why building a website or program and fixing your glitchy laptop are fundamentally different things. My plan is to show you a few different programming languages doing the same thing and point out the similarities and differences, and give you some idea of the sorts of problems I deal with day-to-day.

For all these examples, let’s pretend we’re building yet another recipe website. We’re starting with the very basics, so all we want to do is make a webpage that lists some recipe names. In fact, for the first couple of examples we won’t even do that, because it turns out displaying some text on a webpage is a really complicated. One of the things I want to demonstrate is how using other people’s premade code makes doing anything meaningful a huge amount easier.

The hard part

Let’s start with the hardest example (that’s logical, right?): Assembly language. This is what all other languages eventually become, because it’s effectively the only language that your computer’s CPU (Central Processing Unit - the bit of circuitry that tells everything else what to do) understands. Just scroll quickly past this …

    .section    __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 12
    .globl  _main
    .p2align    4, 0x90
_main:
    .cfi_startproc
    pushq   %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    movl    $0, -4(%rbp)
    movl    $3, -8(%rbp)
    movl    $0, -12(%rbp)
LBB0_1:
    movl    -12(%rbp), %eax
    cmpl    -8(%rbp), %eax
    jge LBB0_4
    leaq    L_.str.3(%rip), %rdi
    leaq    _recipes(%rip), %rax
    movslq  -12(%rbp), %rcx
    movq    (%rax,%rcx,8), %rsi
    movb    $0, %al
    callq   _printf
    movl    %eax, -16(%rbp)
    movl    -12(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -12(%rbp)
    jmp LBB0_1
LBB0_4:
    movl    -4(%rbp), %eax
    addq    $16, %rsp
    popq    %rbp
    retq
    .cfi_endproc

    .section    __TEXT,__cstring,cstring_literals
L_.str:
    .asciz  "Stew with Dumplings"

L_.str.1:
    .asciz  "Spaghetti Bolognese"

L_.str.2:
    .asciz  "One-pot Ramen"

    .section    __DATA,__const
    .globl  _recipes
    .p2align    4
_recipes:
    .quad   L_.str
    .quad   L_.str.1
    .quad   L_.str.2

    .section    __TEXT,__cstring,cstring_literals
L_.str.3:
    .asciz  "%s"


.subsections_via_symbols

You can maybe get some idea of why most programmers prefer not to write this? I won’t even attempt to explain it (mainly because I don’t know much assembly); I’ll just point out that at the very bottom you can see "Stew with Dumplings", "Spaghetti Bolognese" and "One-pot Ramen". Those are the recipe names we’ll be working with in this article.

If you thought that was low-level, just you wait … because actually, your CPU can’t understand assembly language. It just understands numbers. Particular sequences of numbers correspond to instructions like subq and movl, and other numbers correspond to letters and (surprisingly enough) numbers. So here’s what a small part of that stuff actually turns into:

83 116 101 119 32 119 105 116 104 32 68 117 109 112 0 0 0 0 13 0 0 0 108 105 110 103 115 0 83 112 97 103 104 101 116 116 105 32 0 0 0 0 66 111 108 111 103 110 101 115 101 0 79 110 101 45 112 111 11 0 0 0 116 32 82 97 109 101 110 0 37 115

Recognise it? That’s actually those recipe names I noted above, plus a few numbers to separate them. 83 is the code for S, 116 is the code for t, and so on.

Except that’s not quite what the CPU sees either. All the CPU really sees is the presence or absence of electricity at a particlar instant. So those numbers are converted into a sequence of 0s and 1s (just like in the movies), and that is finally sent to the CPU. (Disclaimer: I don’t know exactly what happens, but I think that’s an accurate enough simplification …)

So this is what your code eventually becomes:

010100110111010001100101011101110010000001110111011010010111010001101000001000000100010001110101011011010111000000000000000000000000000000000000000000000000110100001111100000000110110001101001011011100110011101110011000000000101001101110000011000010110011101101000011001010111010001110100011010010010000000000000000000000000111110010000010000100110111101101100011011110110011101101110011001010111001101100101000000000100111101101110011001010010110101110000011011110000000000001011000011111010000001110100001000000101001001100001011011010110010101101110000000000010010101110011

Oh, did I mention that the code above is just telling the computer to show the text “Stew with Dumplings”, “Spaghetti Bolognese” and “One-pot Ramen” on the screen? It’s not even doing the showing; it assumes that someone at Microsoft or Apple or elsewhere has written a whole pile of code that finds a suitable font for those letters and figures out which pixels on your screen should change to which colour! Computers are complicated.

The less hard part

If you’ve made it this far, congratulations! You’ve just learned what most programming courses teach at the very end or not at all. I really only included it to make the point that whenever you do programming, the instructions you write are built on top of a vast amount of existing technical wizardry. Now for what I actually do all day …

My personal favourite programming language (and the one I use most at work) is called Ruby. You can write the same code as above like this:

recipes = ["Stew with Dumplings", "Spaghetti Bolognese", "One-pot Ramen"]

recipes.each { |recipe| print recipe }

If you ignore most of the punctuation (just like in English, its purpose is to tell you how the words fit together instead of representing concepts by itself), and if I tell you that “print” means “show on the screen” then you can basically understand what this is doing, right? Something like “recipes is (equals) these three names. Take each recipe out of the recipes and show it on the screen”.

When you run this through a special program called ruby, it gets converted to something a bit like the chunk of assembly language above so that your CPU can understand it. As you might imagine, converting and then running the code is a lot slower than just writing assembly instructions directly, but it’s much easier to write these two lines than 60+ assembly language ones.

That’s really all you need to know about programming - you write what you want the computer to do in a sort-of English-like language (usually with lots of extra punctuation and repeated words to make it less vague than regular English) and through a bunch of technical wizardry it gets turned into numbers that the computer understands as instructions. It’s a bit like growing a plant. You don’t need to know how exactly the soil and water and fertiliser cause the seed to grow; you just put all the pieces together and hope for the best.

The tangential part

So if Ruby is so great, why have people made up literally thousands of programming languages, which can all be used to do more or less the same thing?

To use another analogy, asking that question is like asking “Why is there no single model of chair that everyone uses?” Some programming languages have a distinctive look and feel which suits some people’s tastes. Some are comfortable but unwieldy (like an armchair) and some are less comfortable but more versatile (like a wooden stool).

To put it in more real-world terms, programming languages all have a different balance between a few fundamental ideas, like speed, ease of use, portability and style.

For example, Ruby is a language that prioritises ease of use over virtually everything else. It’s slow and has heaps of sometimes confusing features, but that allows you to write compact, readable code like my example couple of lines.

C, on the other hand, prioritises speed over virtually everything else. I generated my assembly language example from this piece of C code:

#include <stdio.h>

const char * const recipes[] = { "Stew with Dumplings", "Spaghetti Bolognese", "One-pot Ramen" };

int main(void) {
    int numberOfRecipes = sizeof(recipes) / sizeof(char *);

    for (int i = 0; i < numberOfRecipes; i++) {
        printf("%s", recipes[i]);
    }
}

You can see immediately that it’s not nearly as readable as Ruby. What on earth is a const char * const? Or a int main(void)? But people learn and write this because anyway it’s way faster than Ruby (so it’s suitable for complex programs like 3D games and for devices with limited processing power like washing machines), and because it more accurately models how a computer works internally. In fact, the ruby program that runs your nice Ruby code is actually written in C.

The mind-bending part

So we’ve written this simple program where we give the computer a list of recipe names and tell it to print them out. Let me remind you of it:

recipes = ["Stew with Dumplings", "Spaghetti Bolognese", "One-pot Ramen"]

recipes.each { |recipe| print recipe }

Sure, that’s one way to have a list of recipe names and print them out. There are other ways to write the same thing, just like in English you can say “Put oil and a diced onion into a hot frying pan.” or “Dice 1 onion. Heat up your frying pan and put in 2 tbsp oil. Tip the onion pieces into the pan.” or a bunch of other variations. We could write the above two lines as one line:

print "Stew with DumplingsSpaghetti BologneseOne-pot Ramen"

(In fact, this shows a problem with our original code - it doesn’t space out the recipe names at all. But we can worry about that some other time)

Putting everything on one line is certainly simpler. Is it better? In this case, since all we can do is print out recipe names, it probably is. But what about when we want to show a particular recipe in full? Suddenly we have all the recipe names in two places, once in the list of recipes and once as a title for the full recipe. Then you decide you want a list of recipes by ingredient, and suddenly every time someone points out a typo in a recipe name you have to remember to fix it in at least three places. So the first example where the list of recipes is separate from the code that prints them out is better in the long run.

Of course, either method works when all a recipe has is a title. But real recipes have other parts, like lists of ingredients, method steps and preparation time. We can express this in Ruby (just using one recipe this time, for simplicity):

Recipe = Struct.new(:title, :ingredients, :method, :preparation_time)

stew_recipe = Recipe.new("Stew with Dumplings", ["Ready-made stew", "Frozen dumplings"], ["1. Put the dumplings in the stew"], 5.minutes)

print stew_recipe.title

Just notice how the things I’ve listed above have all appeared at the start of the program (ingredients, method, preparation time). There’s more punctuation and unusual words surrounding them, but you can ignore those. I just want to point out that you could also write that code something like this:

class Recipe
  attr_accessor :title, :ingredients, :method, :preparation_time
end

stew_recipe = Recipe.new
stew_recipe.title = "Stew with Dumplings"
stew_recipe.ingredients = ["Ready-made stew", "Frozen dumplings"]
stew_recipe.method = ["1. Put the dumplings in the stew"]
stew_recipe.preparation_time = 5.minutes

print stew_recipe.title

That looks like it does the same thing as what I had before, but can you be sure? Is this code worse because it has more lines? Or better because each line is shorter and simpler? Or are you not following any of this?

The point I want to make is that even with a very simple program in a single language, you can find a bunch of different ways of writing the same thing, and it’s often hard to tell whether something means what you think it means. Imagine how many ways there are of writing a real recipe book program which actually provides buttons and things for adding/viewing/modifying/deleting recipes!

In summary

This has been a bit of a whirlwind tour of many different concepts, but I guess there are two things to realise:

  1. Computers are complicated. A two-line program in Ruby gets converted into 60+ lines of hard-to-read assembly code, which is then converted into tens of thousands of binary numbers, and only then can your computer even begin to do what you told it to do. Honestly, it’s a miracle computers only crash some of the time.

  2. There are heaps of different ways of solving the same problem. Actually writing code to solve problems is (usually) the easy part of my job. The hard parts are figuring out what someone else meant when they wrote something, deciding how to go about things, and thinking through the consequences of different decisions.

If you want to learn more about how programming works, or you like cartoon foxes, I can highly recommend the Poignant Guide to Ruby. If you actually want to learn to program (not for the faint-hearted, but immensely rewarding) then Learn Ruby the Hard Way is a deceptively-titled but more serious introduction. Godspeed!

Posted on 29 September 2017
Share on