CPSC170
Fundamentals of Computer Science II

Lab 6

Compiling and Linking


Pre-lab

C++ is an object-oriented programming language that is an extension of the procedural, non-object-oriented programming language C.

A C++ program is a collection of program files (at least one) and header files (zero or more) containing functions and classes. This collection of functions and classes making up a program is called the source code for that program. Program files typically have the extension .cc and header files have the extension .h. Exactly one of the functions, among all the functions in all the program files making up a program, must have the name main. (C++ is case-sensitive, so the name main is different from the name Main.) Functions in a class are called member functions of that class. (We will learn about C++ classes later in the semester.) All functions do not have to be member functions. In particular, the function main cannot be a member function.

Names of files and directories can be of arbitrary length. Although a variety of characters are allowed in file/directory names by Unix/Linux, when naming program files and header files it is useful to use only letters, digits, the underscore character in the file names. The period character should be used only once to separate the primary name from the extension.

C++ source code cannot be executed. It first has to be translated to the machine code for the specific hardware (computer processor) that will be used to execute the program. This translation is done by a program called a compiler. On our computers, the compiler is called g++.

A C++ compiler runs in three stages: 1. Pre-processing, 2. Compiling, and 3. Linking.

  1. Pre-processing: Action is taken on pre-processor directives in the source code. Pre-processor directives in the source code are lines of code that have a pound sign (#) as the first character on the line. We will come across several pre-processor directives this semester.
  2. Compiling: In this stage, a source code file is checked for syntax errors, i.e., the source code is checked to make sure that it satisfies the rules of writing C++ statements. The result of successfully, i.e., without any syntax errors flagged, compiling a program file, say file.cc, is the generation of the object file, typically named file.o. (Typically, the name of the object file is the primary name of the source code file with the extension .o.) The object code is close to machine language, but still not executable. In fact, even if a program is made up of several program files, one can compile only one of the .cc in isolation to check for syntax errors in that file. Thus, when a program consists of several program files, the object code generated from one program file is not the complete object code for the program.

    Note that program files are explicitly compiled using the g++ command. Header files are never compiled explicitly. Header files are included in program files using a pre-processor directive, and thus indirectly become a part of the compiling process. Note also that program files are never included in other program or header files.

  3. Linking: In this stage, the object code from all the program files that make up the program is linked together with the system libraries that are being used to create the exectable for the program. The executable for the program is a file in machine language, and is not stored as a text file. So, trying to read it on a terminal or in a text editor simply displays gibberish. The default name for the executable is a.out. Typically, though, we will use an appropriate option for the g++ program to generate an executable with whatever name we choose. Typically, file names for executable files do not have any extension.

Note that although the process of creating an executable from the collection of program files and header files consists of the above three steps, the entire process is typically referred to as compiling a program, and the program g++ that we use to create the executable is called a compiler.

Once an executable file, say myProg has been created, it can be executed on the command line with the command

   ./myProg
 

Note that the dot and the slash before the name of the executable are necessary.

The files printNum.cc, numbers.h, and numbers.cc together is a program that prompts the user to enter an integer, and then prints out a suitable message and the square of that number. Please read the files carefully - nuances of C++ syntax as well as documenting code are explained as comments in the files.

Once you have carefully and completely read the program files and the header file for this program, you can copy them to your directory to create an executable following the steps below.

In this case, we need to create object files from the two program files and then link them to create an executable. The object files for the two program files can be created in any order. For example, the object file for, say printNum.cc, can be created even if numbers.cc is not written as long as numbers.h has been written.

In fact, when developing this program, one would have created printNum.cc, and created numbers.h as we developed the function main in printNum.cc and used function calls in printNum.cc. The intended input/out behaviour of the functions we used, i.e., readNumber, square and printAnswer would have been very clear as we used those functions, and thus we could write the pre and post conditions for each of these functions in the header file.

  1. Create the object file for printNum.cc:
    	g++ -c printNum.cc
          
    The -c option to g++ indicates to g++ that the program file printNum.cc should be compiled to create the object code file, but not be linked to create an executable. The name of the object code file will be printNum.o.
  2. Create the object file for numbers.cc:
    	g++ -c numbers.cc
          
  3. Link the two object code files to create an executable called numSquare:
    	g++ numbers.o printNum.o -o numSquare
          
    The -o option to g++ indicates that the name of the executable to be created follows the option. In this case, we are asking g++ to create an executable and name it numSquare. Note that names of executables typically do not have an extension. Note also that the name of the executable need not be the same as the primary name of the program file containing the function main (e.g., printNum, in this case).

The advantage of splitting up a program into multiple program files is that the program files can be compiled independently of each other. In our example, the program file printNum.cc includes the header file numbers.h, and so depends on this header file. So does the program file numbers.cc. So, if the file numbers.h were to be modified, both the program files would need to be recompiled and then linked to create the new executable. But, if only printNum.cc were to be modified, then we would need to re-compile only printNum.cc, and then link the old numbers.o and the new printNum.o to create the new executable.

When programs get large, the number of program files increases, and it becomes hard to keep track of which program files have changed in the process of developing the program. It thus becomes hard to keep track of which program files need to be re-compiled to create the new executable. Unix/Linux provides the mechanism called make to help with this.

For make to work, we need to create a file called Makefile (note the uppercase M) that informs make of the dependencies for each program file, and also the set of files needed to be linked to create an executable.

The Makefile consists of a set of targets. Each target is specified by a pair of lines in the following format:

     <target_name>:   <dependencies>
     <TAB><action>

where <target_name> is the name for the target, typically the name of the file to be created, <dependencies> is a list of files that this target depends on, and the <action> is the compilation command. <TAB> denotes the TAB character. For example, the target entry to create the object file from numbers.cc would be:

     numbers.o:     numbers.h numbers.cc
     <TAB>g++ -c numbers.cc

In this case, the program file numbers.cc depends on itself and on the header file numbers.h since that is included in the program file.

Note that the action line (second line) must start with the TAB keystroke. Note also that the colon (:) after the name of the target is a part of the syntax of Makefiles.

Similarly, the entry for creating the object file from printNum.cc would be

     printNum.o:     numbers.h printNum.cc
     <TAB>g++ -c printNum.cc

and the one to create the executable would be

     numSquare:     numbers.o printNum.o
     <TAB>g++ numbers.o printNum.o -o numSquare

Note that for the target to create the executable, the dependencies are the two object code files that are used to create the executable.

Copy the complete Makefile to your directory. Study the file. Note that the target entries can appear in the file in any order.

Now, to compile the program (i.e., create the executable), one can use the command

    make numSquare
  

The command make, when asked to make a target, checks if the dependencies for the target are "stale", i.e., made before the current time. In this case, the targets numbers.o and printNum.o would be checked, i.e., make will try to make them. For numbers.o make will check the timestamp on the files numbers.h and numbers.cc and compare that to the timestamp on numbers.o, if one already exists. If the timestamp on any of the dependencies is more current than the timestamp on the target, then the action for that target is taken.

Suppose your directory contains only the above two program files, the one header file and the make file. Then, running the command

    make numSquare
  

will cause the target numbers.o to be checked. Since that file does not exist, the action for that target will be taken and the file numbers.o will be created. Then, the target printNum.o will be checked, and once again since that file does not exist, the action for that target will be taken and the file printNum.o will be created. Since at least one of the dependencies of numSquare caused an action to be taken, the action for the target numSquare will also be taken, and thus the executable numSquare will be created.

As a shortcut, if the command make is run without specifying any target to make, then the first target in the Makefile will be made. Thus, it is useful to put the target for creating the executable as the first target in the Makefile. In our case, the command

    make
  

will have the same effect as

    make numSquare
  


Lab

  1. Create a directory called CPSC170 in your home directory and then a directory called Lab1 as a sub-directory of the CPSC170 directory. Copy all the program files (.cc files) and the header file (the .h file) from the pre-lab in your Lab1 directory.

  2. Compile the program using the g++ command to create the two object files, and then use the command to link them to create an executable. Run the executable to see the input/output behaviour of the program.

  3. In the source file printNum.cc, comment out the pre-processor directive include, i.e., the line

          #include "numbers.h"
        
    Compile the program file to create the object file. You should get an error. What does the error indicate?

  4. Practice with the make utility by creating a library (.h and .cc file(s)) of functions you're likely to reuse.