‘Stupid’ Question 3: What is a compiler and an interpreter, and what is the difference?

[To celebrate my first year of programming I will ask a ‘stupid’ questions daily on my blog for a year, to make sure I learn at least 365 new things during my second year as a developer]

Yes! I asked this! I thought I knew, but wanted to make sure I really knew- so I asked this question again :) OMG. Now lets move on to the answer.

So both are basically programs that do the same thing, only in different ways. But the purpose is the same. They implement a language, and by this I mean that they take source-code (the set of readable instructions that we write as programmers) and translates this to executable machine code.
The difference is how they do it, the compiler takes the whole thing and translates it and saves this as an file that can be executed later, the interpreter translates line by line as you go, without saving to a file. (This of course means there are pros and cons for both ways of translating)

A simple diagram, but there is more magic than this happening ;)

ImageSource - and have a read here, interesting stuff

I wont get into the different steps of compiling (Lexical analysis, Syntax analysis, Code generation, Linking)- but this is the very simple answer. Nonetheless, I don’t consider this question fully answered, there is much more to it,- once you get into the details. But I’ll ask separate questions for that, such as How does a compiler work? And are all languages compiled the same way? Please add some comments, and ideas! I read and appreciate all comments :)

Comments

Leave a comment below, or by email.

Mark

7/20/2012 8:26:50 AM

shorter answer - 

Interpreter Checks for many thinks most importantly Syntax Errors
Compiler turns "english" :) into machine code = 1010101010101010101010101

Uli Kusterer

7/20/2012 8:28:09 AM

Hi,
if you're curious about how programming languages work down to the gory machine code level, you might be interested in a few blog posts I did:

http://orangejuiceliberationfront.com/i-am-not-recycling-i-collect-garbage/

http://zathras.de/blog-runtime-time.htm

http://orangejuiceliberationfront.com/headaches-further-revelations/

http://orangejuiceliberationfront.com/tag/assembly-language/

http://orangejuiceliberationfront.com/building-a-loader/

Sorry for the deluge of links, I'm still moving from one server to the next, so there's no central index of all of those yet.

Jason

7/20/2012 8:33:46 AM

An interpreter 'compiles' the code on the fly as it is being run. It is slower and only really finds errors at runtime. Upside is if code is not used time is not wasted compiling it.

A compiler parses the code, syntax checks it then converts it to machine language that the processor can understand. It shows any syntax errors well before runtime and executes faster because the code is already in a form that that processor can use. 

These days machines are so fast the interpreters aren't really that much slower but the compile time checking of a compiler still makes it king. 

It gets blurry with virtual machines such as the java vm and the .net clr ... they compile to an intermediate language which provides compile time syntax checking. Intermediate language is then compiled on the fly during execution. This is done so the same code can be run on different platforms (java) or so that different languages can be mixed (.net)

Happy coding :)

Uli Kusterer

7/20/2012 8:36:49 AM

Mark, no, the thing that checks for syntax errors would at best be a "parser". An "interpreter" actually does something with the stuff it reads. She's right that the difference is mainly that one translates once ahead of time, and the other translates as needed.

Also, if you run a program twice, the compiler's output can just be run a second time, while the interpreter will have to re-translate the whole program again. Or when you run a loop, the interpreter will usually re-read each line as often as it is run, while a compiler just runs the already-read line a second ... hundredth time.

And if I may add some confusion: Virtual machines (like the .net CLR or Java) are usually a mix of compilation and interpretation: The Virtual Machine itself is a "simulated machine" that understands a different language than your CPU. The code gets compiled to the virtual machine's "machine code" language. Then, on your CPU, there is a little interpreter that translates each instruction of the virtual machine into an instruction of the actual CPU it runs on.

The advantage of this is that you can perform some optimizations ahead of time, but still run the code on different CPUs (e.g. on the iPhone's ARM CPU or your PC's Intel CPU). Also, you only need to port the little interpreter to another platform, but can compile the code on your PC or whatever machine is better suited to the job.

And of course there are VMs that perform "just in time" compilation, where an "interpreter" essentially writes down the CPU-language versions of the VM's instructions, and when it needs to run them a second time, they have actually been compiled. But this way it only compiles what actually had to be run this time.

There are some really interesting ways in which compilers and interpreters can be used, alone or together. If you're into that kind of stuff.

James Curran

7/24/2012 7:12:15 AM

Hmmm... Pretty much every description here of an interpreter is wrong.  Interpreter do not translate source code into machine code AT ALL. 

Let's use an example of a shopping list written in English as a metaphor for source code.   Your personal assistant only speak French, so he cannot understand that list.  So, you gave it to a translator who translates the list from English to French.  However, since he is fluent in both language, he can change the English idioms into French idioms, and since he knows the layout of the store, he can reorder the items on the list so everything is found by the assistant in order, as he goes row by row through the store.  This is a compiler.

Next, let's say we replace the assistant & translator with an English speaking robot.  He reads the first item off the list, searches the store for it, drops it off at the checkout counter, and then reads the second item on the list and does it all again.  There is no translation done, and there is also no real understanding of the list; it's merely "symbol leads to action".

Bring this back to computers, given the expression "x * 2", an interpreter knows only the it is to perform the "+" action using the "x" and "2" objects, but has no knowledge of what that action is. (And the action is perform immediately)   A compiler knows that is the same as "x + x" and "x SHL 1" and generates machine code to perform it the most efficiently.

Chas. Owens

8/6/2012 3:11:37 AM

I believe there are at least two sets of definitions: the theoretical and the practical:

Theoretical

The way I have always understood it is compilers translate, assemblers transliterate, and interpreters do.  That is a compiler is a program that takes one language and turns it into another language (e.g. C into assembly), an assembler does the same thing, but the two languages are so close to each other that it can be thought of more as substitution (e.g. assembly into machine code), and interpreters actually carry out the instructions they are given (e.g. a robot turtle following the instructions from a Logo program).

At some point there must be an interpreter, otherwise, nothing happens.  For instance, the C language often gets compiled into an intermediate lower-level language (commonly assembler, but llvm is starting to make inroads).  Then the intermediate languages get compiled or assembled until you have object code.  The object code is then linked to create an executable (which is a set machine instructions in a format that the operating system knows how to load).  Once the executable is loaded by the operating system, it is interpreted by the CPU.

Another example is Perl 5.  It is often thought of as an interpreted language; however, it first goes through a compilation stage to produce an optree.  The optree is then interpreted by perl.  The perl program is, itself, interpreted by the CPU.

Practical

In common usage, a compiler (or more accurately a compiler suite or toolchain) producs a file that can be run directly by the operating system and an interpreter runs a program on request directly from the source.  

These days, virtual machine languages like Java, C#,  and Perl 6 and dynamic languages like the P languages: Perl 5, Python, and Ruby (R is just a P with a funny shape) have made the distinction between compiled and interpreted less useful.  Further muddying the waters are subsets of the dynamic languages that can be compiled down to machine code for certain virtual machines like Jython (JVM) or Iron Ruby (CLR).  The waters are even more muddied by just-in-time compilers that translate a virtual machine's opcodes into the native machines opcodes on the fly.

Last modified on 2012-07-20