Instructions for use
To whom is this tutorial directed?
This tutorial is for those people who want to learn programming in C++ and do not necessarily have any previous knowledge of other programming languages. Of course any knowledge of other programming languages or any general computer skill can be useful to better understand this tutorial, although it is not essential.
It is also suitable for those who need a little update on the new features the language has acquired from the latest standards.
If you are familiar with the C language, you can take the first three parts of this tutorial as a review of concepts, since they mainly explain the C part of C++. There are slight differences in the C++ syntax for some C features, so I still recommend you to read them.
The 4th part describes object-oriented programming.
The 5th part mostly describes the new features introduced by ANSI-C++ standard.
Structure of this tutorial
The tutorial is divided in six main parts, and each part is divided into several sections covering one specific topic each. You can access any section directly from the section index available on the left side bar, or begin the tutorial from any point and follow the links at the bottom of each section.
Many sections include examples that describe the use of the newly acquired knowledge in the chapter. It is recommended to read these examples and to be able to understand each of the code lines that constitute it before passing to the next chapter.
A good way to gain experience with a programming language is by modifying and adding new functionalities on your own to the example programs that you fully understand. Don't be scared to modify the examples provided with this tutorial, that's the way to learn!
Compatibility Notes
The ANSI-C++ standard acceptation as an international standard is relatively recent. It was first published in November 1997, and revised in 2003. Nevertheless, the C++ language exists from a long time before (1980s). Therefore there are many compilers which do not support all the new capabilities included in ANSI-C++, especially those released prior to the publication of the standard.
This tutorial is thought to be followed with modern compilers that support -at least on some degree- ANSI-C++ specifications. I encourage you to get one if yours is not adapted. There are many options, both commercial and free.
Compilers
The examples included in this tutorial are all console programs. That means they use text to communicate with the user and to show their results.
All C++ compilers support the compilation of console programs. Check the user's manual of your compiler for more info on how to compile them.
Structure of a program
| 1 2 3 4 5 6 7 8 9 10 | | Hello World! |
The first panel (in light blue) shows the source code for our first program. The second one (in light gray) shows the result of the program once compiled and executed. To the left, the grey numbers represent the line numbers - these are not part of the program, and are shown here merely for informational purposes.
The way to edit and compile a program depends on the compiler you are using. Depending on whether it has a Development Interface or not and on its version. Consult the compilers section and the manual or help included with your compiler if you have doubts on how to compile a C++ console program.
The previous program is the typical program that programmer apprentices write for the first time, and its result is the printing on screen of the "Hello World!" sentence. It is one of the simplest programs that can be written in C++, but it already contains the fundamental components that every C++ program has. We are going to look line by line at the code we have just written:
// my first program in C++- This is a comment line. All lines beginning with two slash signs (
//) are considered comments and do not have any effect on the behavior of the program. The programmer can use them to include short explanations or observations within the source code itself. In this case, the line is a brief description of what our program is. #include <iostream>- Lines beginning with a hash sign (
#) are directives for the preprocessor. They are not regular code lines with expressions but indications for the compiler's preprocessor. In this case the directive#include <iostream>tells the preprocessor to include the iostream standard file. This specific file (iostream) includes the declarations of the basic standard input-output library in C++, and it is included because its functionality is going to be used later in the program. using namespace std;- All the elements of the standard C++ library are declared within what is called a namespace, the namespace with the name std. So in order to access its functionality we declare with this expression that we will be using these entities. This line is very frequent in C++ programs that use the standard library, and in fact it will be included in most of the source codes included in these tutorials.
int main ()- This line corresponds to the beginning of the definition of the main function. The main function is the point by where all C++ programs start their execution, independently of its location within the source code. It does not matter whether there are other functions with other names defined before or after it - the instructions contained within this function's definition will always be the first ones to be executed in any C++ program. For that same reason, it is essential that all C++ programs have a main function.
The word
mainis followed in the code by a pair of parentheses (()). That is because it is a function declaration: In C++, what differentiates a function declaration from other types of expressions are these parentheses that follow its name. Optionally, these parentheses may enclose a list of parameters within them. Right after these parentheses we can find the body of the main function enclosed in braces ({}). What is contained within these braces is what the function does when it is executed. cout << "Hello World!";- This line is a C++ statement. A statement is a simple or compound expression that can actually produce some effect. In fact, this statement performs the only action that generates a visible effect in our first program.
coutis the name of the standard output stream in C++, and the meaning of the entire statement is to insert a sequence of characters (in this case the Hello World sequence of characters) into the standard output stream (cout, which usually corresponds to the screen). cout is declared in the iostream standard file within the std namespace, so that's why we needed to include that specific file and to declare that we were going to use this specific namespace earlier in our code. Notice that the statement ends with a semicolon character (;). This character is used to mark the end of the statement and in fact it must be included at the end of all expression statements in all C++ programs (one of the most common syntax errors is indeed to forget to include some semicolon after a statement). return 0;- The return statement causes the main function to finish. return may be followed by a return code (in our example is followed by the return code with a value of zero). A return code of 0 for the main function is generally interpreted as the program worked as expected without any errors during its execution. This is the most usual way to end a C++ console program.
You may have noticed that not all the lines of this program perform actions when the code is executed. There were lines containing only comments (those beginning by
//). There were lines with directives for the compiler's preprocessor (those beginning by #). Then there were lines that began the declaration of a function (in this case, the main function) and, finally lines with statements (like the insertion into cout), which were all included within the block delimited by the braces ({}) of the main function.The program has been structured in different lines in order to be more readable, but in C++, we do not have strict rules on how to separate instructions in different lines. For example, instead of
| 1 2 3 4 5 | |
We could have written:
|
All in just one line and this would have had exactly the same meaning as the previous code.
In C++, the separation between statements is specified with an ending semicolon (
;) at the end of each one, so the separation in different code lines does not matter at all for this purpose. We can write many statements per line or write a single statement that takes many code lines. The division of code in different lines serves only to make it more legible and schematic for the humans that may read it.Let us add an additional instruction to our first program:
| 1 2 3 4 5 6 7 8 9 10 11 12 | | Hello World! I'm a C++ program |
In this case, we performed two insertions into cout in two different statements. Once again, the separation in different lines of code has been done just to give greater readability to the program, since main could have been perfectly valid defined this way:
|
We were also free to divide the code into more lines if we considered it more convenient:
| 1 2 3 4 5 6 7 8 | |
And the result would again have been exactly the same as in the previous examples.
Preprocessor directives (those that begin by
#) are out of this general rule since they are not statements. They are lines read and processed by the preprocessor and do not produce any code by themselves. Preprocessor directives must be specified in their own line and do not have to end with a semicolon (;).Comments
Comments are parts of the source code disregarded by the compiler. They simply do nothing. Their purpose is only to allow the programmer to insert notes or descriptions embedded within the source code.
C++ supports two ways to insert comments:
| 1 2 | |
The first of them, known as line comment, discards everything from where the pair of slash signs (
//) is found up to the end of that same line. The second one, known as block comment, discards everything between the /* characters and the first appearance of the */ characters, with the possibility of including more than one line.We are going to add comments to our second program:
| 1 2 3 4 5 6 7 8 9 10 11 12 | | Hello World! I'm a C++ program |
If you include comments within the source code of your programs without using the comment characters combinations
//, /* or */, the compiler will take them as if they were C++ expressions, most likely causing one or several error messages when you compile it.Variables. Data Types.
Let us think that I ask you to retain the number 5 in your mental memory, and then I ask you to memorize also the number 2 at the same time. You have just stored two different values in your memory. Now, if I ask you to add 1 to the first number I said, you should be retaining the numbers 6 (that is 5+1) and 2 in your memory. Values that we could now for example subtract and obtain 4 as result.
The whole process that you have just done with your mental memory is a simile of what a computer can do with two variables. The same process can be expressed in C++ with the following instruction set:
| 1 2 3 4 | |
Obviously, this is a very simple example since we have only used two small integer values, but consider that your computer can store millions of numbers like these at the same time and conduct sophisticated mathematical operations with them.
Therefore, we can define a variable as a portion of memory to store a determined value.
Each variable needs an identifier that distinguishes it from the others. For example, in the previous code the variable identifiers were
a, b and result, but we could have called the variables any names we wanted to invent, as long as they were valid identifiers.Identifiers
A valid identifier is a sequence of one or more letters, digits or underscore characters (_). Neither spaces nor punctuation marks or symbols can be part of an identifier. Only letters, digits and single underscore characters are valid. In addition, variable identifiers always have to begin with a letter. They can also begin with an underline character (_ ), but in some cases these may be reserved for compiler specific keywords or external identifiers, as well as identifiers containing two successive underscore characters anywhere. In no case can they begin with a digit.Another rule that you have to consider when inventing your own identifiers is that they cannot match any keyword of the C++ language nor your compiler's specific ones, which are reserved keywords. The standard reserved keywords are:
asm, auto, bool, break, case, catch, char, class, const, const_cast, continue, default, delete, do, double, dynamic_cast, else, enum, explicit, export, extern, false, float, for, friend, goto, if, inline, int, long, mutable, namespace, new, operator, private, protected, public, register, reinterpret_cast, return, short, signed, sizeof, static, static_cast, struct, switch, template, this, throw, true, try, typedef, typeid, typename, union, unsigned, using, virtual, void, volatile, wchar_t, while
Additionally, alternative representations for some operators cannot be used as identifiers since they are reserved words under some circumstances:
and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor, xor_eq
Your compiler may also include some additional specific reserved keywords.
Very important: The C++ language is a "case sensitive" language. That means that an identifier written in capital letters is not equivalent to another one with the same name but written in small letters. Thus, for example, the
RESULT variable is not the same as the result variable or the Result variable. These are three different variable identifiers.Fundamental data types
When programming, we store the variables in our computer's memory, but the computer has to know what kind of data we want to store in them, since it is not going to occupy the same amount of memory to store a simple number than to store a single letter or a large number, and they are not going to be interpreted the same way.The memory in our computers is organized in bytes. A byte is the minimum amount of memory that we can manage in C++. A byte can store a relatively small amount of data: one single character or a small integer (generally an integer between 0 and 255). In addition, the computer can manipulate more complex data types that come from grouping several bytes, such as long numbers or non-integer numbers.
Next you have a summary of the basic fundamental data types in C++, as well as the range of values that can be represented with each one:
| Name | Description | Size* | Range* |
|---|---|---|---|
char | Character or small integer. | 1byte | signed: -128 to 127 unsigned: 0 to 255 |
short int (short) | Short Integer. | 2bytes | signed: -32768 to 32767 unsigned: 0 to 65535 |
int | Integer. | 4bytes | signed: -2147483648 to 2147483647 unsigned: 0 to 4294967295 |
long int (long) | Long integer. | 4bytes | signed: -2147483648 to 2147483647 unsigned: 0 to 4294967295 |
bool | Boolean value. It can take one of two values: true or false. | 1byte | true or false |
float | Floating point number. | 4bytes | +/- 3.4e +/- 38 (~7 digits) |
double | Double precision floating point number. | 8bytes | +/- 1.7e +/- 308 (~15 digits) |
long double | Long double precision floating point number. | 8bytes | +/- 1.7e +/- 308 (~15 digits) |
wchar_t | Wide character. | 2 or 4 bytes | 1 wide character |
* The values of the columns Size and Range depend on the system the program is compiled for. The values shown above are those found on most 32-bit systems. But for other systems, the general specification is that
int has the natural size suggested by the system architecture (one "word") and the four integer types char, short, int and long must each one be at least as large as the one preceding it, with char being always one byte in size. The same applies to the floating point types float, double and long double, where each one must provide at least as much precision as the preceding one.Declaration of variables
In order to use a variable in C++, we must first declare it specifying which data type we want it to be. The syntax to declare a new variable is to write the specifier of the desired data type (like int, bool, float...) followed by a valid variable identifier. For example:| 1 2 | |
These are two valid declarations of variables. The first one declares a variable of type int with the identifier a. The second one declares a variable of type float with the identifier mynumber. Once declared, the variables a and mynumber can be used within the rest of their scope in the program.
If you are going to declare more than one variable of the same type, you can declare all of them in a single statement by separating their identifiers with commas. For example:
|
This declares three variables (a, b and c), all of them of type int, and has exactly the same meaning as:
| 1 2 3 | |
The integer data types char, short, long and int can be either signed or unsigned depending on the range of numbers needed to be represented. Signed types can represent both positive and negative values, whereas unsigned types can only represent positive values (and zero). This can be specified by using either the specifier signed or the specifier unsigned before the type name. For example:
| 1 2 | |
By default, if we do not specify either signed or unsigned most compiler settings will assume the type to be signed, therefore instead of the second declaration above we could have written:
|
with exactly the same meaning (with or without the keyword
signed)An exception to this general rule is the char type, which exists by itself and is considered a different fundamental data type from signed char and unsigned char, thought to store characters. You should use either
signed or unsigned if you intend to store numerical values in a char-sized variable.short and long can be used alone as type specifiers. In this case, they refer to their respective integer fundamental types: short is equivalent to short int and long is equivalent to long int. The following two variable declarations are equivalent:| 1 2 | |
Finally,
signed and unsigned may also be used as standalone type specifiers, meaning the same as signed int and unsigned int respectively. The following two declarations are equivalent: | 1 2 | |
To see what variable declarations look like in action within a program, we are going to see the C++ code of the example about your mental memory proposed at the beginning of this section:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | | 4 |
Do not worry if something else than the variable declarations themselves looks a bit strange to you. You will see the rest in detail in coming sections.
Scope of variables
All the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous code at the beginning of the body of the function main when we declared that a, b, and result were of type int.A variable can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block.
Global variables can be referred from anywhere in the code, even inside functions, whenever it is after its declaration.
The scope of local variables is limited to the block enclosed in braces (
{}) where they are declared. For example, if they are declared at the beginning of the body of a function (like in function main) their scope is between its declaration point and the end of that function. In the example above, this means that if another function existed in addition to main, the local variables declared in main could not be accessed from the other function and vice versa.Initialization of variables
When declaring a regular local variable, its value is by default undetermined. But you may want a variable to store a concrete value at the same moment that it is declared. In order to do that, you can initialize the variable. There are two ways to do this in C++:The first one, known as c-like initialization, is done by appending an equal sign followed by the value to which the variable will be initialized:
type identifier = initial_value ; For example, if we want to declare an int variable called a initialized with a value of 0 at the moment in which it is declared, we could write:
|
The other way to initialize variables, known as constructor initialization, is done by enclosing the initial value between parentheses (
()): type identifier (initial_value) ; For example:
|
Both ways of initializing variables are valid and equivalent in C++.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | | 6 |
Introduction to strings
Variables that can store non-numerical values that are longer than one single character are known as strings.The C++ language library provides support for strings through the standard
string class. This is not a fundamental type, but it behaves in a similar way as fundamental types do in its most basic usage.A first difference with fundamental data types is that in order to declare and use objects (variables) of this type we need to include an additional header file in our source code:
<string> and have access to the std namespace (which we already had in all our previous programs thanks to the using namespace statement).| 1 2 3 4 5 6 7 8 9 10 11 | | This is a string |
As you may see in the previous example, strings can be initialized with any valid string literal just like numerical type variables can be initialized to any valid numerical literal. Both initialization formats are valid with strings:
| 1 2 | |
Strings can also perform all the other basic operations that fundamental data types can, like being declared without an initial value and being assigned values during execution:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | This is the initial string content This is a different string content |
For more details on C++ strings, you can have a look at the string class reference.
Constants
Literals
Literals are the most obvious kind of constants. They are used to express particular values within the source code of a program. We have already used these previously to give concrete values to variables or to express messages we wanted our programs to print out, for example, when we wrote: |
the
5 in this piece of code was a literal constant.Literal constants can be divided in Integer Numerals, Floating-Point Numerals, Characters, Strings and Boolean Values.
Integer Numerals
| 1 2 3 | |
They are numerical constants that identify integer decimal values. Notice that to express a numerical constant we do not have to write quotes (
") nor any special character. There is no doubt that it is a constant: whenever we write 1776 in a program, we will be referring to the value 1776.In addition to decimal numbers (those that all of us are used to using every day), C++ allows the use of octal numbers (base 8) and hexadecimal numbers (base 16) as literal constants. If we want to express an octal number we have to precede it with a
0 (a zero character). And in order to express a hexadecimal number we have to precede it with the characters 0x (zero, x). For example, the following literal constants are all equivalent to each other: | 1 2 3 | |
All of these represent the same number: 75 (seventy-five) expressed as a base-10 numeral, octal numeral and hexadecimal numeral, respectively.
Literal constants, like variables, are considered to have a specific data type. By default, integer literals are of type int. However, we can force them to either be unsigned by appending the u character to it, or long by appending l:
| 1 2 3 4 | |
In both cases, the suffix can be specified using either upper or lowercase letters.
Floating Point Numbers
They express numbers with decimals and/or exponents. They can include either a decimal point, ane character (that expresses "by ten at the Xth height", where X is an integer value that follows the e character), or both a decimal point and an e character:| 1 2 3 4 | |
These are four valid numbers with decimals expressed in C++. The first number is PI, the second one is the number of Avogadro, the third is the electric charge of an electron (an extremely small number) -all of them approximated- and the last one is the number three expressed as a floating-point numeric literal.
The default type for floating point literals is double. If you explicitly want to express a float or a long double numerical literal, you can use the
f or l suffixes respectively:| 1 2 | |
Any of the letters that can be part of a floating-point numerical constant (
e, f, l) can be written using either lower or uppercase letters without any difference in their meanings.Character and string literals
There also exist non-numerical constants, like:| 1 2 3 4 | |
The first two expressions represent single character constants, and the following two represent string literals composed of several characters. Notice that to represent a single character we enclose it between single quotes (
') and to express a string (which generally consists of more than one character) we enclose it between double quotes ("). When writing both single character and string literals, it is necessary to put the quotation marks surrounding them to distinguish them from possible variable identifiers or reserved keywords. Notice the difference between these two expressions:
| 1 2 | |
x alone would refer to a variable whose identifier is x, whereas 'x' (enclosed within single quotation marks) would refer to the character constant 'x'.Character and string literals have certain peculiarities, like the escape codes. These are special characters that are difficult or impossible to express otherwise in the source code of a program, like newline (
\n) or tab (\t). All of them are preceded by a backslash (\). Here you have a list of some of such escape codes: \n | newline |
\r | carriage return |
\t | tab |
\v | vertical tab |
\b | backspace |
\f | form feed (page feed) |
\a | alert (beep) |
\' | single quote (') |
\" | double quote (") |
\? | question mark (?) |
\\ | backslash (\) |
For example:
| 1 2 3 4 | |
Additionally, you can express any character by its numerical ASCII code by writing a backslash character (
\) followed by the ASCII code expressed as an octal (base-8) or hexadecimal (base-16) number. In the first case (octal) the digits must immediately follow the backslash (for example \23 or \40), in the second case (hexadecimal), an x character must be written before the digits themselves (for example \x20 or \x4A).String literals can extend to more than a single line of code by putting a backslash sign (
\) at the end of each unfinished line.| 1 2 | |
You can also concatenate several string constants separating them by one or several blank spaces, tabulators, newline or any other valid blank character:
|
Finally, if we want the string literal to be explicitly made of wide characters (wchar_t type), instead of narrow characters (char type), we can precede the constant with the
L prefix: |
Wide characters are used mainly to represent non-English or exotic character sets.
Boolean literals
There are only two valid Boolean values: true and false. These can be expressed in C++ as values of type bool by using the Boolean literalstrue and false.Defined constants (#define)
You can define your own names for constants that you use very often without having to resort to memory-consuming variables, simply by using the#define preprocessor directive. Its format is:#define identifier value For example:
| 1 2 | |
This defines two new constants: PI and NEWLINE. Once they are defined, you can use them in the rest of the code as if they were any other regular constant, for example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | | 31.4159 |
In fact the only thing that the compiler preprocessor does when it encounters
#define directives is to literally replace any occurrence of their identifier (in the previous example, these were PI and NEWLINE) by the code to which they have been defined (3.14159 and '\n' respectively).The
#define directive is not a C++ statement but a directive for the preprocessor; therefore it assumes the entire line as the directive and does not require a semicolon (;) at its end. If you append a semicolon character (;) at the end, it will also be appended in all occurrences of the identifier within the body of the program that the preprocessor replaces.Declared constants (const)
With theconst prefix you can declare constants with a specific type in the same way as you would do with a variable: | 1 2 | |
Here,
pathwidth and tabulator are two typed constants. They are treated just like regular variables except that their values cannot be modified after their definition.Operators
You do not have to memorize all the content of this page. Most details are only provided to serve as a later reference in case you need it.
Assignment (=)
The assignment operator assigns a value to a variable. |
This statement assigns the integer value 5 to the variable a. The part at the left of the assignment operator (=) is known as the lvalue (left value) and the right one as the rvalue (right value). The lvalue has to be a variable whereas the rvalue can be either a constant, a variable, the result of an operation or any combination of these.
The most important rule when assigning is the right-to-left rule: The assignment operation always takes place from right to left, and never the other way:
|
This statement assigns to variable a (the lvalue) the value contained in variable b (the rvalue). The value that was stored until this moment in a is not considered at all in this operation, and in fact that value is lost.
Consider also that we are only assigning the value of b to a at the moment of the assignment operation. Therefore a later change of b will not affect the new value of a.
For example, let us have a look at the following code - I have included the evolution of the content stored in the variables as comments:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | | a:4 b:7 |
This code will give us as result that the value contained in a is 4 and the one contained in b is 7. Notice how a was not affected by the final modification of b, even though we declared a = b earlier (that is because of the right-to-left rule).
A property that C++ has over other programming languages is that the assignment operation can be used as the rvalue (or part of an rvalue) for another assignment operation. For example:
|
is equivalent to:
| 1 2 | |
that means: first assign 5 to variable b and then assign to a the value 2 plus the result of the previous assignment of b (i.e. 5), leaving a with a final value of 7.
The following expression is also valid in C++:
|
It assigns 5 to the all three variables: a, b and c.
Arithmetic operators ( +, -, *, /, % )
The five arithmetical operations supported by the C++ language are:| + | addition |
| - | subtraction |
| * | multiplication |
| / | division |
| % | modulo |
Operations of addition, subtraction, multiplication and division literally correspond with their respective mathematical operators. The only one that you might not be so used to see is modulo; whose operator is the percentage sign (%). Modulo is the operation that gives the remainder of a division of two values. For example, if we write:
|
the variable a will contain the value 2, since 2 is the remainder from dividing 11 between 3.
Compound assignment (+=, -=, *=, /=, %=, >>=, <<=, &=, ^=, |=)
When we want to modify the value of a variable by performing an operation on the value currently stored in that variable we can use compound assignment operators:
| expression | is equivalent to |
|---|---|
| value += increase; | value = value + increase; |
| a -= 5; | a = a - 5; |
| a /= b; | a = a / b; |
| price *= units + 1; | price = price * (units + 1); |
and the same for all other operators. For example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | | 5 |
Increase and decrease (++, --)
Shortening even more some expressions, the increase operator (++) and the decrease operator (--) increase or reduce by one the value stored in a variable. They are equivalent to +=1 and to -=1, respectively. Thus:| 1 2 3 | |
are all equivalent in its functionality: the three of them increase by one the value of c.
In the early C compilers, the three previous expressions probably produced different executable code depending on which one was used. Nowadays, this type of code optimization is generally done automatically by the compiler, thus the three expressions should produce exactly the same executable code.
A characteristic of this operator is that it can be used both as a prefix and as a suffix. That means that it can be written either before the variable identifier (++a) or after it (a++). Although in simple expressions like a++ or ++a both have exactly the same meaning, in other expressions in which the result of the increase or decrease operation is evaluated as a value in an outer expression they may have an important difference in their meaning: In the case that the increase operator is used as a prefix (++a) the value is increased before the result of the expression is evaluated and therefore the increased value is considered in the outer expression; in case that it is used as a suffix (a++) the value stored in a is increased after being evaluated and therefore the value stored before the increase operation is evaluated in the outer expression. Notice the difference:
| Example 1 | Example 2 |
|---|---|
| B=3; A=++B; // A contains 4, B contains 4 | B=3; A=B++; // A contains 3, B contains 4 |
In Example 1, B is increased before its value is copied to A. While in Example 2, the value of B is copied to A and then B is increased.
Relational and equality operators ( ==, !=, >, <, >=, <= )
In order to evaluate a comparison between two expressions we can use the relational and equality operators. The result of a relational operation is a Boolean value that can only be true or false, according to its Boolean result.
We may want to compare two expressions, for example, to know if they are equal or if one is greater than the other is. Here is a list of the relational and equality operators that can be used in C++:
| == | Equal to |
| != | Not equal to |
| > | Greater than |
| < | Less than |
| >= | Greater than or equal to |
| <= | Less than or equal to |
Here there are some examples:
| 1 2 3 4 5 | |
Of course, instead of using only numeric constants, we can use any valid expression, including variables. Suppose that a=2, b=3 and c=6,
| 1 2 3 4 | |
Be careful! The operator = (one equal sign) is not the same as the operator == (two equal signs), the first one is an assignment operator (assigns the value at its right to the variable at its left) and the other one (==) is the equality operator that compares whether both expressions in the two sides of it are equal to each other. Thus, in the last expression ((b=2) == a), we first assigned the value 2 to b and then we compared it to a, that also stores the value 2, so the result of the operation is true.
Logical operators ( !, &&, || )
The Operator ! is the C++ operator to perform the Boolean operation NOT, it has only one operand, located at its right, and the only thing that it does is to inverse the value of it, producing false if its operand is true and true if its operand is false. Basically, it returns the opposite Boolean value of evaluating its operand. For example:
| 1 2 3 4 | |
The logical operators && and || are used when evaluating two expressions to obtain a single relational result. The operator && corresponds with Boolean logical operation AND. This operation results true if both its two operands are true, and false otherwise. The following panel shows the result of operator && evaluating the expression a && b:
&& OPERATOR
| a | b | a && b |
|---|---|---|
| true | true | true |
| true | false | false |
| false | true | false |
| false | false | false |
The operator || corresponds with Boolean logical operation OR. This operation results true if either one of its two operands is true, thus being false only when both operands are false themselves. Here are the possible results of a || b:
|| OPERATOR
| a | b | a || b |
|---|---|---|
| true | true | true |
| true | false | true |
| false | true | true |
| false | false | false |
For example:
| 1 2 | |
When using the logical operators, C++ only evaluates what is necessary from left to right to come up with the combined relational result, ignoring the rest. Therefore, in this last example ((5==5)||(3>6)), C++ would evaluate first whether 5==5 is true, and if so, it would never check whether 3>6 is true or not. This is known as short-circuit evaluation, and works like this for these operators:
| operator | short-circuit |
|---|---|
| && | if the left-hand side expression is false, the combined result is false (right-hand side expression not evaluated). |
| || | if the left-hand side expression is true, the combined result is true (right-hand side expression not evaluated). |
This is mostly important when the right-hand expression has side effects, such as altering values:
|
This combined conditional expression increases i by one, but only if the condition on the left of && is true, since otherwise the right-hand expression (++i<n) is never evaluated.
Conditional operator ( ? )
The conditional operator evaluates an expression returning a value if that expression is true and a different one if the expression is evaluated as false. Its format is:
condition ? result1 : result2
If condition is true the expression will return result1, if it is not it will return result2.
| 1 2 3 4 | |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | | 7 |
In this example a was 2 and b was 7, so the expression being evaluated (a>b) was not true, thus the first value specified after the question mark was discarded in favor of the second value (the one after the colon) which was b, with a value of 7.
Comma operator ( , )
The comma operator (,) is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the rightmost expression is considered.For example, the following code:
|
Would first assign the value 3 to b, and then assign b+2 to variable a. So, at the end, variable a would contain the value 5 while variable b would contain value 3.
Bitwise Operators ( &, |, ^, ~, <<, >> )
Bitwise operators modify variables considering the bit patterns that represent the values they store.
| operator | asm equivalent | description |
|---|---|---|
| & | AND | Bitwise AND |
| | | OR | Bitwise Inclusive OR |
| ^ | XOR | Bitwise Exclusive OR |
| ~ | NOT | Unary complement (bit inversion) |
| << | SHL | Shift Left |
| >> | SHR | Shift Right |
Explicit type casting operator
Type casting operators allow you to convert a datum of a given type to another. There are several ways to do this in C++. The simplest one, which has been inherited from the C language, is to precede the expression to be converted by the new type enclosed between parentheses (()):| 1 2 3 | |
The previous code converts the float number 3.14 to an integer value (3), the remainder is lost. Here, the typecasting operator was (int). Another way to do the same thing in C++ is using the functional notation: preceding the expression to be converted by the type and enclosing the expression between parentheses:
|
Both ways of type casting are valid in C++.
sizeof()
This operator accepts one parameter, which can be either a type or a variable itself and returns the size in bytes of that type or object: |
This will assign the value 1 to a because char is a one-byte long type.
The value returned by sizeof is a constant, so it is always determined before program execution.
Other operators
Later in these tutorials, we will see a few more operators, like the ones referring to pointers or the specifics for object-oriented programming. Each one is treated in its respective section.Precedence of operators
When writing complex expressions with several operands, we may have some doubts about which operand is evaluated first and which later. For example, in this expression: |
we may doubt if it really means:
| 1 2 | |
The correct answer is the first of the two expressions, with a result of 6. There is an established order with the priority of each operator, and not only the arithmetic ones (those whose preference come from mathematics) but for all the operators which can appear in C++. From greatest to lowest priority, the priority order is as follows:
| Level | Operator | Description | Grouping |
|---|---|---|---|
| 1 | :: | scope | Left-to-right |
| 2 | () [] . -> ++ -- dynamic_cast static_cast reinterpret_cast const_cast typeid | postfix | Left-to-right |
| 3 | ++ -- ~ ! sizeof new delete | unary (prefix) | Right-to-left |
| * & | indirection and reference (pointers) | ||
| + - | unary sign operator | ||
| 4 | (type) | type casting | Right-to-left |
| 5 | .* ->* | pointer-to-member | Left-to-right |
| 6 | * / % | multiplicative | Left-to-right |
| 7 | + - | additive | Left-to-right |
| 8 | << >> | shift | Left-to-right |
| 9 | < > <= >= | relational | Left-to-right |
| 10 | == != | equality | Left-to-right |
| 11 | & | bitwise AND | Left-to-right |
| 12 | ^ | bitwise XOR | Left-to-right |
| 13 | | | bitwise OR | Left-to-right |
| 14 | && | logical AND | Left-to-right |
| 15 | || | logical OR | Left-to-right |
| 16 | ?: | conditional | Right-to-left |
| 17 | = *= /= %= += -= >>= <<= &= ^= |= | assignment | Right-to-left |
| 18 | , | comma | Left-to-right |
Grouping defines the precedence order in which operators are evaluated in the case that there are several operators of the same level in an expression.
All these precedence levels for operators can be manipulated or become more legible by removing possible ambiguities using parentheses signs ( and ), as in this example:
|
might be written either as:
|
or
|
depending on the operation that we want to perform.
So if you want to write complicated expressions and you are not completely sure of the precedence levels, always include parentheses. It will also make your code easier to read.
Basic Input/Output
C++ uses a convenient abstraction called streams to perform input and output operations in sequential media such as the screen or the keyboard. A stream is an object where a program can either insert or extract characters to/from it. We do not really need to care about many specifications about the physical media associated with the stream - we only need to know it will accept or provide characters sequentially.
The standard C++ library includes the header file iostream, where the standard input and output stream objects are declared.
Standard Output (cout)
By default, the standard output of a program is the screen, and the C++ stream object defined to access it is cout.cout is used in conjunction with the insertion operator, which is written as << (two "less than" signs).
| 1 2 3 | |
The << operator inserts the data that follows it into the stream preceding it. In the examples above it inserted the constant string Output sentence, the numerical constant 120 and variable x into the standard output stream cout. Notice that the sentence in the first instruction is enclosed between double quotes (") because it is a constant string of characters. Whenever we want to use constant strings of characters we must enclose them between double quotes (") so that they can be clearly distinguished from variable names. For example, these two sentences have very different results:
| 1 2 | |
The insertion operator (<<) may be used more than once in a single statement:
|
This last statement would print the message Hello, I am a C++ statement on the screen. The utility of repeating the insertion operator (<<) is demonstrated when we want to print out a combination of variables and constants or more than one variable:
|
If we assume the age variable to contain the value 24 and the zipcode variable to contain 90064 the output of the previous statement would be:
|
It is important to notice that cout does not add a line break after its output unless we explicitly indicate it, therefore, the following statements:
| 1 2 | |
will be shown on the screen one following the other without any line break between them:
This is a sentence.This is another sentence.
even though we had written them in two different insertions into cout. In order to perform a line break on the output we must explicitly insert a new-line character into cout. In C++ a new-line character can be specified as \n (backslash, n):
| 1 2 | |
This produces the following output:
First sentence.
Second sentence.
Third sentence.
Additionally, to add a new-line, you may also use the endl manipulator. For example:
| 1 2 | |
would print out:
First sentence.
Second sentence.
The endl manipulator produces a newline character, exactly as the insertion of '\n' does, but it also has an additional behavior when it is used with buffered streams: the buffer is flushed. Anyway, cout will be an unbuffered stream in most cases, so you can generally use both the \n escape character and the endl manipulator in order to specify a new line without any difference in its behavior.
Standard Input (cin).
The standard input device is usually the keyboard. Handling the standard input in C++ is done by applying the overloaded operator of extraction (>>) on the cin stream. The operator must be followed by the variable that will store the data that is going to be extracted from the stream. For example:| 1 2 | |
The first statement declares a variable of type int called age, and the second one waits for an input from cin (the keyboard) in order to store it in this integer variable.
cin can only process the input from the keyboard once the RETURN key has been pressed. Therefore, even if you request a single character, the extraction from cin will not process the input until the user presses RETURN after the character has been introduced.
You must always consider the type of the variable that you are using as a container with cin extractions. If you request an integer you will get an integer, if you request a character you will get a character and if you request a string of characters you will get a string of characters.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | | Please enter an integer value: 702 The value you entered is 702 and its double is 1404. |
The user of a program may be one of the factors that generate errors even in the simplest programs that use cin (like the one we have just seen). Since if you request an integer value and the user introduces a name (which generally is a string of characters), the result may cause your program to misoperate since it is not what we were expecting from the user. So when you use the data input provided by cin extractions you will have to trust that the user of your program will be cooperative and that he/she will not introduce his/her name or something similar when an integer value is requested. A little ahead, when we see the stringstream class we will see a possible solution for the errors that can be caused by this type of user input.
You can also use cin to request more than one datum input from the user:
|
is equivalent to:
| 1 2 | |
In both cases the user must give two data, one for variable a and another one for variable b that may be separated by any valid blank separator: a space, a tab character or a newline.
cin and strings
We can use cin to get strings with the extraction operator (>>) as we do with fundamental data type variables: |
However, as it has been said, cin extraction stops reading as soon as if finds any blank space character, so in this case we will be able to get just one word for each extraction. This behavior may or may not be what we want; for example if we want to get a sentence from the user, this extraction operation would not be useful.
In order to get entire lines, we can use the function getline, which is the more recommendable way to get user input with cin:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | | What's your name? Juan Soulie Hello Juan Soulie. What is your favorite team? The Isotopes I like The Isotopes too! |
Notice how in both calls to getline we used the same string identifier (mystr). What the program does in the second call is simply to replace the previous content by the new one that is introduced.
stringstream
The standard header file <sstream> defines a class called stringstream that allows a string-based object to be treated as a stream. This way we can perform extraction or insertion operations from/to strings, which is especially useful to convert strings to numerical values and vice versa. For example, if we want to extract an integer from a string we can write:| 1 2 3 | |
This declares a string object with a value of "1204", and an int object. Then we use stringstream's constructor to construct an object of this type from the string object. Because we can use stringstream objects as if they were streams, we can extract an integer from it as we would have done on cin by applying the extractor operator (>>) on it followed by a variable of type int.
After this piece of code, the variable myint will contain the numerical value 1204.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | | Enter price: 22.25 Enter quantity: 7 Total price: 155.75 |
In this example, we acquire numeric values from the standard input indirectly. Instead of extracting numeric values directly from the standard input, we get lines from the standard input (cin) into a string object (mystr), and then we extract the integer values from this string into a variable of type int (quantity).
Using this method, instead of direct extractions of integer values, we have more control over what happens with the input of numeric values from the user, since we are separating the process of obtaining input from the user (we now simply ask for lines) with the interpretation of that input. Therefore, this method is usually preferred to get numerical values from the user in all programs that are intensive in user input.
Control Structures
With the introduction of control structures we are going to have to introduce a new concept: the compound-statement or block. A block is a group of statements which are separated by semicolons (;) like all C++ statements, but grouped together in a block enclosed in braces: { }:
{ statement1; statement2; statement3; }
Most of the control structures that we will see in this section require a generic statement as part of its syntax. A statement can be either a simple statement (a simple instruction ending with a semicolon) or a compound statement (several instructions grouped in a block), like the one just described. In the case that we want the statement to be a simple statement, we do not need to enclose it in braces ({}). But in the case that we want the statement to be a compound statement it must be enclosed between braces ({}), forming a block.
Conditional structure: if and else
The if keyword is used to execute a statement or block only if a condition is fulfilled. Its form is:if (condition) statement
Where condition is the expression that is being evaluated. If this condition is true, statement is executed. If it is false, statement is ignored (not executed) and the program continues right after this conditional structure.
For example, the following code fragment prints x is 100 only if the value stored in the x variable is indeed 100:
| 1 2 | |
If we want more than a single statement to be executed in case that the condition is true we can specify a block using braces { }:
| 1 2 3 4 5 | |
We can additionally specify what we want to happen if the condition is not fulfilled by using the keyword else. Its form used in conjunction with if is:
if (condition) statement1 else statement2
For example:
| 1 2 3 4 | |
prints on the screen x is 100 if indeed x has a value of 100, but if it has not -and only if not- it prints out x is not 100.
The if + else structures can be concatenated with the intention of verifying a range of values. The following example shows its use telling if the value currently stored in x is positive, negative or none of them (i.e. zero):
| 1 2 3 4 5 6 | |
Remember that in case that we want more than a single statement to be executed, we must group them in a block by enclosing them in braces { }.
Iteration structures (loops)
Loops have as purpose to repeat a statement a certain number of times or while a condition is fulfilled.
The while loop
Its format is:while (expression) statement
and its functionality is simply to repeat statement while the condition set in expression is true.
For example, we are going to make a program to countdown using a while-loop:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | | Enter the starting number > 8 8, 7, 6, 5, 4, 3, 2, 1, FIRE! |
When the program starts the user is prompted to insert a starting number for the countdown. Then the while loop begins, if the value entered by the user fulfills the condition n>0 (that n is greater than zero) the block that follows the condition will be executed and repeated while the condition (n>0) remains being true.
The whole process of the previous program can be interpreted according to the following script (beginning in main):
- User assigns a value to n
- The while condition is checked (n>0). At this point there are two possibilities:
* condition is true: statement is executed (to step 3)
* condition is false: ignore statement and continue after it (to step 5) - Execute statement:
cout << n << ", ";
--n;
(prints the value of n on the screen and decreases n by 1) - End of block. Return automatically to step 2
- Continue the program right after the block: print FIRE! and end program.
When creating a while-loop, we must always consider that it has to end at some point, therefore we must provide within the block some method to force the condition to become false at some point, otherwise the loop will continue looping forever. In this case we have included --n; that decreases the value of the variable that is being evaluated in the condition (n) by one - this will eventually make the condition (n>0) to become false after a certain number of loop iterations: to be more specific, when n becomes 0, that is where our while-loop and our countdown end.
Of course this is such a simple action for our computer that the whole countdown is performed instantly without any practical delay between numbers.
The do-while loop
Its format is:
do statement while (condition);
Its functionality is exactly the same as the while loop, except that condition in the do-while loop is evaluated after the execution of statement instead of before, granting at least one execution of statement even if condition is never fulfilled. For example, the following example program echoes any number you enter until you enter 0.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | Enter number (0 to end): 12345 You entered: 12345 Enter number (0 to end): 160277 You entered: 160277 Enter number (0 to end): 0 You entered: 0 |
The do-while loop is usually used when the condition that has to determine the end of the loop is determined within the loop statement itself, like in the previous case, where the user input within the block is what is used to determine if the loop has to end. In fact if you never enter the value 0 in the previous example you can be prompted for more numbers forever.
The for loop
Its format is:
for (initialization; condition; increase) statement;
and its main function is to repeat statement while condition remains true, like the while loop. But in addition, the for loop provides specific locations to contain an initialization statement and an increase statement. So this loop is specially designed to perform a repetitive action with a counter which is initialized and increased on each iteration.
It works in the following way:
- initialization is executed. Generally it is an initial value setting for a counter variable. This is executed only once.
- condition is checked. If it is true the loop continues, otherwise the loop ends and statement is skipped (not executed).
- statement is executed. As usual, it can be either a single statement or a block enclosed in braces { }.
- finally, whatever is specified in the increase field is executed and the loop gets back to step 2.
Here is an example of countdown using a for loop:
| 1 2 3 4 5 6 7 8 9 10 11 | | 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, FIRE! |
The initialization and increase fields are optional. They can remain empty, but in all cases the semicolon signs between them must be written. For example we could write: for (;n<10;) if we wanted to specify no initialization and no increase; or for (;n<10;n++) if we wanted to include an increase field but no initialization (maybe because the variable was already initialized before).
Optionally, using the comma operator (,) we can specify more than one expression in any of the fields included in a for loop, like in initialization, for example. The comma operator (,) is an expression separator, it serves to separate more than one expression where only one is generally expected. For example, suppose that we wanted to initialize more than one variable in our loop:
| 1 2 3 4 | |
This loop will execute for 50 times if neither n or i are modified within the loop:
n starts with a value of 0, and i with 100, the condition is n!=i (that n is not equal to i). Because n is increased by one and i decreased by one, the loop's condition will become false after the 50th loop, when both n and i will be equal to 50.
Jump statements.
The break statement
Using break we can leave a loop even if the condition for its end is not fulfilled. It can be used to end an infinite loop, or to force it to end before its natural end. For example, we are going to stop the count down before its natural end (maybe because of an engine check failure?):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | | 10, 9, 8, 7, 6, 5, 4, 3, countdown aborted! |
The continue statement
The continue statement causes the program to skip the rest of the loop in the current iteration as if the end of the statement block had been reached, causing it to jump to the start of the following iteration. For example, we are going to skip the number 5 in our countdown:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | | 10, 9, 8, 7, 6, 4, 3, 2, 1, FIRE! |
The goto statement
goto allows to make an absolute jump to another point in the program. You should use this feature with caution since its execution causes an unconditional jump ignoring any type of nesting limitations.The destination point is identified by a label, which is then used as an argument for the goto statement. A label is made of a valid identifier followed by a colon (:).
Generally speaking, this instruction has no concrete use in structured or object oriented programming aside from those that low-level programming fans may find for it. For example, here is our countdown loop using goto:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, FIRE! |
The exit function
exit is a function defined in the cstdlib library.
The purpose of exit is to terminate the current program with a specific exit code. Its prototype is:
|
The exitcode is used by some operating systems and may be used by calling programs. By convention, an exit code of 0 means that the program finished normally and any other value means that some error or unexpected results happened.
The selective structure: switch.
The syntax of the switch statement is a bit peculiar. Its objective is to check several possible constant values for an expression. Something similar to what we did at the beginning of this section with the concatenation of several if and else if instructions. Its form is the following:switch (expression)
{
case constant1:
group of statements 1;
break;
case constant2:
group of statements 2;
break;
.
.
.
default:
default group of statements
}
It works in the following way: switch evaluates expression and checks if it is equivalent to constant1, if it is, it executes group of statements 1 until it finds the break statement. When it finds this break statement the program jumps to the end of the switch selective structure.
If expression was not equal to constant1 it will be checked against constant2. If it is equal to this, it will execute group of statements 2 until a break keyword is found, and then will jump to the end of the switch selective structure.
Finally, if the value of expression did not match any of the previously specified constants (you can include as many case labels as values you want to check), the program will execute the statements included after the default: label, if it exists (since it is optional).
Both of the following code fragments have the same behavior:
| switch example | if-else equivalent |
|---|---|
switch (x) {
case 1:
cout << "x is 1";
break;
case 2:
cout << "x is 2";
break;
default:
cout << "value of x unknown";
}
| if (x == 1) {
cout << "x is 1";
}
else if (x == 2) {
cout << "x is 2";
}
else {
cout << "value of x unknown";
}
|
The switch statement is a bit peculiar within the C++ language because it uses labels instead of blocks. This forces us to put break statements after the group of statements that we want to be executed for a specific condition. Otherwise the remainder statements -including those corresponding to other labels- will also be executed until the end of the switch selective block or a break statement is reached.
For example, if we did not include a break statement after the first group for case one, the program will not automatically jump to the end of the switch selective block and it would continue executing the rest of statements until it reaches either a break instruction or the end of the switch selective block. This makes it unnecessary to include braces { } surrounding the statements for each of the cases, and it can also be useful to execute the same block of instructions for different possible values for the expression being evaluated. For example:
| 1 2 3 4 5 6 7 8 9 | |
Notice that switch can only be used to compare an expression against constants. Therefore we cannot put variables as labels (for example case n: where n is a variable) or ranges (case (1..3):) because they are not valid C++ constants.
If you need to check ranges or values that are not constants, use a concatenation of if and else if statements.
Functions (I)
A function is a group of statements that is executed when it is called from some point of the program. The following is its format:
type name ( parameter1, parameter2, ...) { statements }
where:
- type is the data type specifier of the data returned by the function.
- name is the identifier by which it will be possible to call the function.
- parameters (as many as needed): Each parameter consists of a data type specifier followed by an identifier, like any regular variable declaration (for example: int x) and which acts within the function as a regular local variable. They allow to pass arguments to the function when it is called. The different parameters are separated by commas.
- statements is the function's body. It is a block of statements surrounded by braces { }.
Here you have the first function example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | | The result is 8 |
In order to examine this code, first of all remember something said at the beginning of this tutorial: a C++ program always begins its execution by the main function. So we will begin there.
We can see how the main function begins by declaring the variable z of type int. Right after that, we see a call to a function called addition. Paying attention we will be able to see the similarity between the structure of the call to the function and the declaration of the function itself some code lines above:
The parameters and arguments have a clear correspondence. Within the main function we called to addition passing two values: 5 and 3, that correspond to the int a and int b parameters declared for function addition.
At the point at which the function is called from within main, the control is lost by main and passed to function addition. The value of both arguments passed in the call (5 and 3) are copied to the local variables int a and int b within the function.
Function addition declares another local variable (int r), and by means of the expression r=a+b, it assigns to r the result of a plus b. Because the actual parameters passed for a and b are 5 and 3 respectively, the result is 8.
The following line of code:
|
finalizes function addition, and returns the control back to the function that called it in the first place (in this case, main). At this moment the program follows its regular course from the same point at which it was interrupted by the call to addition. But additionally, because the return statement in function addition specified a value: the content of variable r (return (r);), which at that moment had a value of 8. This value becomes the value of evaluating the function call.
So being the value returned by a function the value given to the function call itself when it is evaluated, the variable z will be set to the value returned by addition (5, 3), that is 8. To explain it another way, you can imagine that the call to a function (addition (5,3)) is literally replaced by the value it returns (8).
The following line of code in main is:
|
That, as you may already expect, produces the printing of the result on the screen.
And here is another example about functions:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | | The first result is 5 The second result is 5 The third result is 2 The fourth result is 6 |
In this case we have created a function called subtraction. The only thing that this function does is to subtract both passed parameters and to return the result.
Nevertheless, if we examine function main we will see that we have made several calls to function subtraction. We have used some different calling methods so that you see other ways or moments when a function can be called.
In order to fully understand these examples you must consider once again that a call to a function could be replaced by the value that the function call itself is going to return. For example, the first case (that you should already know because it is the same pattern that we have used in previous examples):
| 1 2 | |
If we replace the function call by the value it returns (i.e., 5), we would have:
| 1 2 | |
As well as
|
has the same result as the previous call, but in this case we made the call to subtraction directly as an insertion parameter for cout. Simply consider that the result is the same as if we had written:
|
since 5 is the value returned by subtraction (7,2).
In the case of:
|
The only new thing that we introduced is that the parameters of subtraction are variables instead of constants. That is perfectly valid. In this case the values passed to function subtraction are the values of x and y, that are 5 and 3 respectively, giving 2 as result.
The fourth case is more of the same. Simply note that instead of:
|
we could have written:
|
with exactly the same result. I have switched places so you can see that the semicolon sign (;) goes at the end of the whole statement. It does not necessarily have to go right after the function call. The explanation might be once again that you imagine that a function can be replaced by its returned value:
| 1 2 | |
Functions with no type. The use of void.
If you remember the syntax of a function declaration:
type name ( argument1, argument2 ...) statement
you will see that the declaration begins with a type, that is the type of the function itself (i.e., the type of the datum that will be returned by the function with the return statement). But what if we want to return no value?
Imagine that we want to make a function just to show a message on the screen. We do not need it to return any value. In this case we should use the void type specifier for the function. This is a special specifier that indicates absence of type.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | | I'm a function! |
void can also be used in the function's parameter list to explicitly specify that we want the function to take no actual parameters when it is called. For example, function printmessage could have been declared as:
| 1 2 3 4 | |
Although it is optional to specify void in the parameter list. In C++, a parameter list can simply be left blank if we want a function with no parameters.
What you must always remember is that the format for calling a function includes specifying its name and enclosing its parameters between parentheses. The non-existence of parameters does not exempt us from the obligation to write the parentheses. For that reason the call to printmessage is:
|
The parentheses clearly indicate that this is a call to a function and not the name of a variable or some other C++ statement. The following call would have been incorrect:
|
Functions (II)
Arguments passed by value and by reference.
Until now, in all the functions we have seen, the arguments passed to the functions have been passed by value. This means that when calling a function with parameters, what we have passed to the function were copies of their values but never the variables themselves. For example, suppose that we called our first function addition using the following code:
| 1 2 | |
What we did in this case was to call to function addition passing the values of x and y, i.e. 5 and 3 respectively, but not the variables x and y themselves.
This way, when the function addition is called, the value of its local variables a and b become 5 and 3 respectively, but any modification to either a or b within the function addition will not have any effect in the values of x and y outside it, because variables x and y were not themselves passed to the function, but only copies of their values at the moment the function was called.
But there might be some cases where you need to manipulate from inside a function the value of an external variable. For that purpose we can use arguments passed by reference, as in the function duplicate of the following example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | | x=2, y=6, z=14 |
The first thing that should call your attention is that in the declaration of duplicate the type of each parameter was followed by an ampersand sign (&). This ampersand is what specifies that their corresponding arguments are to be passed by reference instead of by value.
When a variable is passed by reference we are not passing a copy of its value, but we are somehow passing the variable itself to the function and any modification that we do to the local variables will have an effect in their counterpart variables passed as arguments in the call to the function.
To explain it in another way, we associate a, b and c with the arguments passed on the function call (x, y and z) and any change that we do on a within the function will affect the value of x outside it. Any change that we do on b will affect y, and the same with c and z.
That is why our program's output, that shows the values stored in x, y and z after the call to duplicate, shows the values of all the three variables of main doubled.
If when declaring the following function:
|
we had declared it this way:
|
i.e., without the ampersand signs (&), we would have not passed the variables by reference, but a copy of their values instead, and therefore, the output on screen of our program would have been the values of x, y and z without having been modified.
Passing by reference is also an effective way to allow a function to return more than one value. For example, here is a function that returns the previous and next numbers of the first parameter passed.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | | Previous=99, Next=101 |
Default values in parameters.
When declaring a function we can specify a default value for each of the last parameters. This value will be used if the corresponding argument is left blank when calling to the function. To do that, we simply have to use the assignment operator and a value for the arguments in the function declaration. If a value for that parameter is not passed when the function is called, the default value is used, but if a value is specified this default value is ignored and the passed value is used instead. For example:| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | | 6 5 |
As we can see in the body of the program there are two calls to function divide. In the first one:
|
we have only specified one argument, but the function divide allows up to two. So the function divide has assumed that the second parameter is 2 since that is what we have specified to happen if this parameter was not passed (notice the function declaration, which finishes with int b=2, not just int b). Therefore the result of this function call is 6 (12/2).
In the second call:
|
there are two parameters, so the default value for b (int b=2) is ignored and b takes the value passed as argument, that is 4, making the result returned equal to 5 (20/4).
Overloaded functions.
In C++ two different functions can have the same name if their parameter types or number are different. That means that you can give the same name to more than one function if they have either a different number of parameters or different types in their parameters. For example:| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | | 10 2.5 |
In this case we have defined two functions with the same name, operate, but one of them accepts two parameters of type int and the other one accepts them of type float. The compiler knows which one to call in each case by examining the types passed as arguments when the function is called. If it is called with two ints as its arguments it calls to the function that has two int parameters in its prototype and if it is called with two floats it will call to the one which has two float parameters in its prototype.
In the first call to operate the two arguments passed are of type int, therefore, the function with the first prototype is called; This function returns the result of multiplying both parameters. While the second call passes two arguments of type float, so the function with the second prototype is called. This one has a different behavior: it divides one parameter by the other. So the behavior of a call to operate depends on the type of the arguments passed because the function has been overloaded.
Notice that a function cannot be overloaded only by its return type. At least one of its parameters must have a different type.
inline functions.
The inline specifier indicates the compiler that inline substitution is preferred to the usual function call mechanism for a specific function. This does not change the behavior of a function itself, but is used to suggest to the compiler that the code generated by the function body is inserted at each point the function is called, instead of being inserted only once and perform a regular call to it, which generally involves some additional overhead in running time.The format for its declaration is:
inline type name ( arguments ... ) { instructions ... }
and the call is just like the call to any other function. You do not have to include the inline keyword when calling the function, only in its declaration.
Most compilers already optimize code to generate inline functions when it is more convenient. This specifier only indicates the compiler that inline is preferred for this function.
Recursivity.
Recursivity is the property that functions have to be called by themselves. It is useful for many tasks, like sorting or calculate the factorial of numbers. For example, to obtain the factorial of a number (n!) the mathematical formula would be:n! = n * (n-1) * (n-2) * (n-3) ... * 1
more concretely, 5! (factorial of 5) would be:
5! = 5 * 4 * 3 * 2 * 1 = 120
and a recursive function to calculate this in C++ could be:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | | Please type a number: 9 9! = 362880 |
Notice how in function factorial we included a call to itself, but only if the argument passed was greater than 1, since otherwise the function would perform an infinite recursive loop in which once it arrived to 0 it would continue multiplying by all the negative numbers (probably provoking a stack overflow error on runtime).
This function has a limitation because of the data type we used in its design (long) for more simplicity. The results given will not be valid for values much greater than 10! or 15!, depending on the system you compile it.
Declaring functions.
Until now, we have defined all of the functions before the first appearance of calls to them in the source code. These calls were generally in function main which we have always left at the end of the source code. If you try to repeat some of the examples of functions described so far, but placing the function main before any of the other functions that were called from within it, you will most likely obtain compiling errors. The reason is that to be able to call a function it must have been declared in some earlier point of the code, like we have done in all our examples.But there is an alternative way to avoid writing the whole code of a function before it can be used in main or in some other function. This can be achieved by declaring just a prototype of the function before it is used, instead of the entire definition. This declaration is shorter than the entire definition, but significant enough for the compiler to determine its return type and the types of its parameters.
Its form is:
type name ( argument_type1, argument_type2, ...);
It is identical to a function definition, except that it does not include the body of the function itself (i.e., the function statements that in normal definitions are enclosed in braces { }) and instead of that we end the prototype declaration with a mandatory semicolon (;).
The parameter enumeration does not need to include the identifiers, but only the type specifiers. The inclusion of a name for each parameter as in the function definition is optional in the prototype declaration. For example, we can declare a function called protofunction with two int parameters with any of the following declarations:
| 1 2 | |
Anyway, including a name for each variable makes the prototype more legible.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | | Type a number (0 to exit): 9 Number is odd. Type a number (0 to exit): 6 Number is even. Type a number (0 to exit): 1030 Number is even. Type a number (0 to exit): 0 Number is even. |
This example is indeed not an example of efficiency. I am sure that at this point you can already make a program with the same result, but using only half of the code lines that have been used in this example. Anyway this example illustrates how prototyping works. Moreover, in this concrete example the prototyping of at least one of the two functions is necessary in order to compile the code without errors.
The first things that we see are the declaration of functions odd and even:
| 1 2 | |
This allows these functions to be used before they are defined, for example, in main, which now is located where some people find it to be a more logical place for the start of a program: the beginning of the source code.
Anyway, the reason why this program needs at least one of the functions to be declared before it is defined is because in odd there is a call to even and in even there is a call to odd. If none of the two functions had been previously declared, a compilation error would happen, since either odd would not be visible from even (because it has still not been declared), or even would not be visible from odd (for the same reason).
Having the prototype of all functions together in the same place within the source code is found practical by some programmers, and this can be easily achieved by declaring all functions prototypes at the beginning of a program.
Arrays
That means that, for example, we can store 5 values of type int in an array without having to declare 5 different variables, each one with a different identifier. Instead of that, using an array we can store 5 different values of the same type, int for example, with a unique identifier.
For example, an array to contain 5 integer values of type int called billy could be represented like this:
where each blank panel represents an element of the array, that in this case are integer values of type int. These elements are numbered from 0 to 4 since in arrays the first index is always 0, independently of its length.
Like a regular variable, an array must be declared before it is used. A typical declaration for an array in C++ is:
type name [elements];
where type is a valid type (like int, float...), name is a valid identifier and the elements field (which is always enclosed in square brackets []), specifies how many of these elements the array has to contain.
Therefore, in order to declare an array called billy as the one shown in the above diagram it is as simple as:
|
NOTE: The elements field within brackets [] which represents the number of elements the array is going to hold, must be a constant value, since arrays are blocks of non-dynamic memory whose size must be determined before execution. In order to create arrays with a variable length dynamic memory is needed, which is explained later in these tutorials.
Initializing arrays.
When declaring a regular array of local scope (within a function, for example), if we do not specify otherwise, its elements will not be initialized to any value by default, so their content will be undetermined until we store some value in them. The elements of global and static arrays, on the other hand, are automatically initialized with their default values, which for all fundamental types this means they are filled with zeros.In both cases, local and global, when we declare an array, we have the possibility to assign initial values to each one of its elements by enclosing the values in braces { }. For example:
|
This declaration would have created an array like this:
The amount of values between braces { } must not be larger than the number of elements that we declare for the array between square brackets [ ]. For example, in the example of array billy we have declared that it has 5 elements and in the list of initial values within braces { } we have specified 5 values, one for each element.
When an initialization of values is provided for an array, C++ allows the possibility of leaving the square brackets empty [ ]. In this case, the compiler will assume a size for the array that matches the number of values included between braces { }:
|
After this declaration, array billy would be 5 ints long, since we have provided 5 initialization values.
Accessing the values of an array.
In any point of a program in which an array is visible, we can access the value of any of its elements individually as if it was a normal variable, thus being able to both read and modify its value. The format is as simple as:
name[index]
Following the previous examples in which billy had 5 elements and each of those elements was of type int, the name which we can use to refer to each element is the following:
For example, to store the value 75 in the third element of billy, we could write the following statement:
|
and, for example, to pass the value of the third element of billy to a variable called a, we could write:
|
Therefore, the expression billy[2] is for all purposes like a variable of type int.
Notice that the third element of billy is specified billy[2], since the first one is billy[0], the second one is billy[1], and therefore, the third one is billy[2]. By this same reason, its last element is billy[4]. Therefore, if we write billy[5], we would be accessing the sixth element of billy and therefore exceeding the size of the array.
In C++ it is syntactically correct to exceed the valid range of indices for an array. This can create problems, since accessing out-of-range elements do not cause compilation errors but can cause runtime errors. The reason why this is allowed will be seen further ahead when we begin to use pointers.
At this point it is important to be able to clearly distinguish between the two uses that brackets [ ] have related to arrays. They perform two different tasks: one is to specify the size of arrays when they are declared; and the second one is to specify indices for concrete array elements. Do not confuse these two possible uses of brackets [ ] with arrays.
| 1 2 | |
If you read carefully, you will see that a type specifier always precedes a variable or array declaration, while it never precedes an access.
Some other valid operations with arrays:
| 1 2 3 4 | |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | | 12206 |
Multidimensional arrays
Multidimensional arrays can be described as "arrays of arrays". For example, a bidimensional array can be imagined as a bidimensional table made of elements, all of them of a same uniform data type.
jimmy represents a bidimensional array of 3 per 5 elements of type int. The way to declare this array in C++ would be:
|
and, for example, the way to reference the second element vertically and fourth horizontally in an expression would be:
|
(remember that array indices always begin by zero).
Multidimensional arrays are not limited to two indices (i.e., two dimensions). They can contain as many indices as needed. But be careful! The amount of memory needed for an array rapidly increases with each dimension. For example:
|
declares an array with a char element for each second in a century, that is more than 3 billion chars. So this declaration would consume more than 3 gigabytes of memory!
Multidimensional arrays are just an abstraction for programmers, since we can obtain the same results with a simple array just by putting a factor between its indices:
| 1 2 | |
With the only difference that with multidimensional arrays the compiler remembers the depth of each imaginary dimension for us. Take as example these two pieces of code, with both exactly the same result. One uses a bidimensional array and the other one uses a simple array:
| multidimensional array | pseudo-multidimensional array |
|---|---|
#define WIDTH 5
#define HEIGHT 3
int jimmy [HEIGHT][WIDTH];
int n,m;
int main ()
{
for (n=0;n<HEIGHT;n++)
for (m=0;m<WIDTH;m++)
{
jimmy[n][m]=(n+1)*(m+1);
}
return 0;
}
| #define WIDTH 5
#define HEIGHT 3
int jimmy [HEIGHT * WIDTH];
int n,m;
int main ()
{
for (n=0;n<HEIGHT;n++)
for (m=0;m<WIDTH;m++)
{
jimmy[n*WIDTH+m]=(n+1)*(m+1);
}
return 0;
}
|
None of the two source codes above produce any output on the screen, but both assign values to the memory block called jimmy in the following way:
We have used "defined constants" (#define) to simplify possible future modifications of the program. For example, in case that we decided to enlarge the array to a height of 4 instead of 3 it could be done simply by changing the line:
|
to:
|
with no need to make any other modifications to the program.
Arrays as parameters
At some moment we may need to pass an array to a function as a parameter. In C++ it is not possible to pass a complete block of memory by value as a parameter to a function, but we are allowed to pass its address. In practice this has almost the same effect and it is a much faster and more efficient operation.In order to accept arrays as parameters the only thing that we have to do when declaring the function is to specify in its parameters the element type of the array, an identifier and a pair of void brackets []. For example, the following function:
|
accepts a parameter of type "array of int" called arg. In order to pass to this function an array declared as:
|
it would be enough to write a call like this:
|
Here you have a complete example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | | 5 10 15 2 4 6 8 10 |
As you can see, the first parameter (int arg[]) accepts any array whose elements are of type int, whatever its length. For that reason we have included a second parameter that tells the function the length of each array that we pass to it as its first parameter. This allows the for loop that prints out the array to know the range to iterate in the passed array without going out of range.
In a function declaration it is also possible to include multidimensional arrays. The format for a tridimensional array parameter is:
|
for example, a function with a multidimensional array as argument could be:
|
Notice that the first brackets [] are left empty while the following ones specify sizes for their respective dimensions. This is necessary in order for the compiler to be able to determine the depth of each additional dimension.
Arrays, both simple or multidimensional, passed as function parameters are a quite common source of errors for novice programmers. I recommend the reading of the chapter about Pointers for a better understanding on how arrays operate.
Character Sequences
For example, the following array:
|
is an array that can store up to 20 elements of type char. It can be represented as:
Therefore, in this array, in theory, we can store sequences of characters up to 20 characters long. But we can also store shorter sequences. For example, jenny could store at some point in a program either the sequence "Hello" or the sequence "Merry christmas", since both are shorter than 20 characters.
Therefore, since the array of characters can store shorter sequences than its total length, a special character is used to signal the end of the valid sequence: the null character, whose literal constant can be written as '\0' (backslash, zero).
Our array of 20 elements of type char, called jenny, can be represented storing the characters sequences "Hello" and "Merry Christmas" as:
Notice how after the valid content a null character ('\0') has been included in order to indicate the end of the sequence. The panels in gray color represent char elements with undetermined values.
Initialization of null-terminated character sequences
Because arrays of characters are ordinary arrays they follow all their same rules. For example, if we want to initialize an array of characters with some predetermined sequence of characters we can do it just like any other array: |
In this case we would have declared an array of 6 elements of type char initialized with the characters that form the word "Hello" plus a null character '\0' at the end.
But arrays of char elements have an additional method to initialize their values: using string literals.
In the expressions we have used in some examples in previous chapters, constants that represent entire strings of characters have already showed up several times. These are specified enclosing the text to become a string literal between double quotes ("). For example:
|
is a constant string literal that we have probably used already.
Double quoted strings (") are literal constants whose type is in fact a null-terminated array of characters. So string literals enclosed between double quotes always have a null character ('\0') automatically appended at the end.
Therefore we can initialize the array of char elements called myword with a null-terminated sequence of characters by either one of these two methods:
| 1 2 | |
In both cases the array of characters myword is declared with a size of 6 elements of type char: the 5 characters that compose the word "Hello" plus a final null character ('\0') which specifies the end of the sequence and that, in the second case, when using double quotes (") it is appended automatically.
Please notice that we are talking about initializing an array of characters in the moment it is being declared, and not about assigning values to them once they have already been declared. In fact because this type of null-terminated arrays of characters are regular arrays we have the same restrictions that we have with any other array, so we are not able to copy blocks of data with an assignment operation.
Assuming mystext is a char[] variable, expressions within a source code like:
| 1 2 | |
would not be valid, like neither would be:
|
The reason for this may become more comprehensible once you know a bit more about pointers, since then it will be clarified that an array is in fact a constant pointer pointing to a block of memory.
Using null-terminated sequences of characters
Null-terminated sequences of characters are the natural way of treating strings in C++, so they can be used as such in many procedures. In fact, regular string literals have this type (char[]) and can also be used in most cases.
For example, cin and cout support null-terminated sequences as valid containers for sequences of characters, so they can be used directly to extract strings of characters from cin or to insert them into cout. For example:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | | Please, enter your first name: John Hello, John! |
As you can see, we have declared three arrays of char elements. The first two were initialized with string literal constants, while the third one was left uninitialized. In any case, we have to specify the size of the array: in the first two (question and greeting) the size was implicitly defined by the length of the literal constant they were initialized to. While for yourname we have explicitly specified that it has a size of 80 chars.
Finally, sequences of characters stored in char arrays can easily be converted into string objects just by using the assignment operator:
| 1 2 3 | |
Pointers
The memory of your computer can be imagined as a succession of memory cells, each one of the minimal size that computers manage (one byte). These single-byte memory cells are numbered in a consecutive way, so as, within any block of memory, every cell has the same number as the previous one plus one.
This way, each cell can be easily located in the memory because it has a unique address and all the memory cells follow a successive pattern. For example, if we are looking for cell 1776 we know that it is going to be right between cells 1775 and 1777, exactly one thousand cells after 776 and exactly one thousand cells before cell 2776.
Reference operator (&)
As soon as we declare a variable, the amount of memory needed is assigned for it at a specific location in memory (its memory address). We generally do not actively decide the exact location of the variable within the panel of cells that we have imagined the memory to be - Fortunately, that is a task automatically performed by the operating system during runtime. However, in some cases we may be interested in knowing the address where our variable is being stored during runtime in order to operate with relative positions to it.The address that locates a variable within memory is what we call a reference to that variable. This reference to a variable can be obtained by preceding the identifier of a variable with an ampersand sign (&), known as reference operator, and which can be literally translated as "address of". For example:
|
This would assign to ted the address of variable andy, since when preceding the name of the variable andy with the reference operator (&) we are no longer talking about the content of the variable itself, but about its reference (i.e., its address in memory).
From now on we are going to assume that andy is placed during runtime in the memory address 1776. This number (1776) is just an arbitrary assumption we are inventing right now in order to help clarify some concepts in this tutorial, but in reality, we cannot know before runtime the real value the address of a variable will have in memory.
Consider the following code fragment:
| 1 2 3 | |
The values contained in each variable after the execution of this, are shown in the following diagram:
First, we have assigned the value 25 to andy (a variable whose address in memory we have assumed to be 1776).
The second statement copied to fred the content of variable andy (which is 25). This is a standard assignment operation, as we have done so many times before.
Finally, the third statement copies to ted not the value contained in andy but a reference to it (i.e., its address, which we have assumed to be 1776). The reason is that in this third assignment operation we have preceded the identifier andy with the reference operator (&), so we were no longer referring to the value of andy but to its reference (its address in memory).
The variable that stores the reference to another variable (like ted in the previous example) is what we call a pointer. Pointers are a very powerful feature of the C++ language that has many uses in advanced programming. Farther ahead, we will see how this type of variable is used and declared.
Dereference operator (*)
We have just seen that a variable which stores a reference to another variable is called a pointer. Pointers are said to "point to" the variable whose reference they store.
Using a pointer we can directly access the value stored in the variable which it points to. To do this, we simply have to precede the pointer's identifier with an asterisk (*), which acts as dereference operator and that can be literally translated to "value pointed by".
Therefore, following with the values of the previous example, if we write:
|
(that we could read as: "beth equal to value pointed by ted") beth would take the value 25, since ted is 1776, and the value pointed by 1776 is 25.
You must clearly differentiate that the expression ted refers to the value 1776, while *ted (with an asterisk * preceding the identifier) refers to the value stored at address 1776, which in this case is 25. Notice the difference of including or not including the dereference operator (I have included an explanatory commentary of how each of these two expressions could be read):
| 1 2 | |
Notice the difference between the reference and dereference operators:
- & is the reference operator and can be read as "address of"
- * is the dereference operator and can be read as "value pointed by"
Earlier we performed the following two assignment operations:
| 1 2 | |
Right after these two statements, all of the following expressions would give true as result:
| 1 2 3 4 | |
The first expression is quite clear considering that the assignment operation performed on andy was andy=25. The second one uses the reference operator (&), which returns the address of variable andy, which we assumed it to have a value of 1776. The third one is somewhat obvious since the second expression was true and the assignment operation performed on ted was ted=&andy. The fourth expression uses the dereference operator (*) that, as we have just seen, can be read as "value pointed by", and the value pointed by ted is indeed 25.
So, after all that, you may also infer that for as long as the address pointed by ted remains unchanged the following expression will also be true:
|
Declaring variables of pointer types
Due to the ability of a pointer to directly refer to the value that it points to, it becomes necessary to specify in its declaration which data type a pointer is going to point to. It is not the same thing to point to a char as to point to an int or a float.The declaration of pointers follows this format:
type * name;
where type is the data type of the value that the pointer is intended to point to. This type is not the type of the pointer itself! but the type of the data the pointer points to. For example:
| 1 2 3 | |
These are three declarations of pointers. Each one is intended to point to a different data type, but in fact all of them are pointers and all of them will occupy the same amount of space in memory (the size in memory of a pointer depends on the platform where the code is going to run). Nevertheless, the data to which they point to do not occupy the same amount of space nor are of the same type: the first one points to an int, the second one to a char and the last one to a float. Therefore, although these three example variables are all of them pointers which occupy the same size in memory, they are said to have different types: int*, char* and float* respectively, depending on the type they point to.
I want to emphasize that the asterisk sign (*) that we use when declaring a pointer only means that it is a pointer (it is part of its type compound specifier), and should not be confused with the dereference operator that we have seen a bit earlier, but which is also written with an asterisk (*). They are simply two different things represented with the same sign.
Now have a look at this code:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | | firstvalue is 10 secondvalue is 20 |
Notice that even though we have never directly set a value to either firstvalue or secondvalue, both end up with a value set indirectly through the use of mypointer. This is the procedure:
First, we have assigned as value of mypointer a reference to firstvalue using the reference operator (&). And then we have assigned the value 10 to the memory location pointed by mypointer, that because at this moment is pointing to the memory location of firstvalue, this in fact modifies the value of firstvalue.
In order to demonstrate that a pointer may take several different values during the same program I have repeated the process with secondvalue and that same pointer, mypointer.
Here is an example a little bit more elaborated:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | | firstvalue is 10 secondvalue is 20 |
I have included as a comment on each line how the code can be read: ampersand (&) as "address of" and asterisk (*) as "value pointed by".
Notice that there are expressions with pointers p1 and p2, both with and without dereference operator (*). The meaning of an expression using the dereference operator (*) is very different from one that does not: When this operator precedes the pointer name, the expression refers to the value being pointed, while when a pointer name appears without this operator, it refers to the value of the pointer itself (i.e. the address of what the pointer is pointing to).
Another thing that may call your attention is the line:
|
This declares the two pointers used in the previous example. But notice that there is an asterisk (*) for each pointer, in order for both to have type int* (pointer to int).
Otherwise, the type for the second variable declared in that line would have been int (and not int*) because of precedence relationships. If we had written:
|
p1 would indeed have int* type, but p2 would have type int (spaces do not matter at all for this purpose). This is due to operator precedence rules. But anyway, simply remembering that you have to put one asterisk per pointer is enough for most pointer users.
Pointers and arrays
The concept of array is very much bound to the one of pointer. In fact, the identifier of an array is equivalent to the address of its first element, as a pointer is equivalent to the address of the first element that it points to, so in fact they are the same concept. For example, supposing these two declarations:| 1 2 | |
The following assignment operation would be valid:
|
After that, p and numbers would be equivalent and would have the same properties. The only difference is that we could change the value of pointer p by another one, whereas numbers will always point to the first of the 20 elements of type int with which it was defined. Therefore, unlike p, which is an ordinary pointer, numbers is an array, and an array can be considered a constant pointer. Therefore, the following allocation would not be valid:
|
Because numbers is an array, so it operates as a constant pointer, and we cannot assign values to constants.
Due to the characteristics of variables, all expressions that include pointers in the following example are perfectly valid:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | | 10, 20, 30, 40, 50, |
In the chapter about arrays we used brackets ([]) several times in order to specify the index of an element of the array to which we wanted to refer. Well, these bracket sign operators [] are also a dereference operator known as offset operator. They dereference the variable they follow just as * does, but they also add the number between brackets to the address being dereferenced. For example:
| 1 2 | |
These two expressions are equivalent and valid both if a is a pointer or if a is an array.
Pointer initialization
When declaring pointers we may want to explicitly specify which variable we want them to point to:| 1 2 | |
The behavior of this code is equivalent to:
| 1 2 3 | |
When a pointer initialization takes place we are always assigning the reference value to where the pointer points (tommy), never the value being pointed (*tommy). You must consider that at the moment of declaring a pointer, the asterisk (*) indicates only that it is a pointer, it is not the dereference operator (although both use the same sign: *). Remember, they are two different functions of one sign. Thus, we must take care not to confuse the previous code with:
| 1 2 3 | |
that is incorrect, and anyway would not have much sense in this case if you think about it.
As in the case of arrays, the compiler allows the special case that we want to initialize the content at which the pointer points with constants at the same moment the pointer is declared:
|
In this case, memory space is reserved to contain "hello" and then a pointer to the first character of this memory block is assigned to terry. If we imagine that "hello" is stored at the memory locations that start at addresses 1702, we can represent the previous declaration as:
It is important to indicate that terry contains the value 1702, and not 'h' nor "hello", although 1702 indeed is the address of both of these.
The pointer terry points to a sequence of characters and can be read as if it was an array (remember that an array is just like a constant pointer). For example, we can access the fifth element of the array with any of these two expression:
| 1 2 | |
Both expressions have a value of 'o' (the fifth element of the array).
Pointer arithmetics
To conduct arithmetical operations on pointers is a little different than to conduct them on regular integer data types. To begin with, only addition and subtraction operations are allowed to be conducted with them, the others make no sense in the world of pointers. But both addition and subtraction have a different behavior with pointers according to the size of the data type to which they point.
When we saw the different fundamental data types, we saw that some occupy more or less space than others in the memory. For example, let's assume that in a given compiler for a specific machine, char takes 1 byte, short takes 2 bytes and long takes 4.
Suppose that we define three pointers in this compiler:
| 1 2 3 | |
and that we know that they point to memory locations 1000, 2000 and 3000 respectively.
So if we write:
| 1 2 3 | |
mychar, as you may expect, would contain the value 1001. But not so obviously, myshort would contain the value 2002, and mylong would contain 3004, even though they have each been increased only once. The reason is that when adding one to a pointer we are making it to point to the following element of the same type with which it has been defined, and therefore the size in bytes of the type pointed is added to the pointer.
This is applicable both when adding and subtracting any number to a pointer. It would happen exactly the same if we write:
| 1 2 3 | |
Both the increase (++) and decrease (--) operators have greater operator precedence than the dereference operator (*), but both have a special behavior when used as suffix (the expression is evaluated with the value it had before being increased). Therefore, the following expression may lead to confusion:
|
Because ++ has greater precedence than *, this expression is equivalent to *(p++). Therefore, what it does is to increase the value of p (so it now points to the next element), but because ++ is used as postfix the whole expression is evaluated as the value pointed by the original reference (the address the pointer pointed to before being increased).
Notice the difference with:
(*p)++
Here, the expression would have been evaluated as the value pointed by p increased by one. The value of p (the pointer itself) would not be modified (what is being modified is what it is being pointed to by this pointer).
If we write:
|
Because ++ has a higher precedence than *, both p and q are increased, but because both increase operators (++) are used as postfix and not prefix, the value assigned to *p is *q before both p and q are increased. And then both are increased. It would be roughly equivalent to:
| 1 2 3 | |
Like always, I recommend you to use parentheses () in order to avoid unexpected results and to give more legibility to the code.
Pointers to pointers
C++ allows the use of pointers that point to pointers, that these, in its turn, point to data (or even to other pointers). In order to do that, we only need to add an asterisk (*) for each level of reference in their declarations:| 1 2 3 4 5 6 | |
This, supposing the randomly chosen memory locations for each variable of 7230, 8092 and 10502, could be represented as:
The value of each variable is written inside each cell; under the cells are their respective addresses in memory.
The new thing in this example is variable c, which can be used in three different levels of indirection, each one of them would correspond to a different value:
- c has type char** and a value of 8092
- *c has type char* and a value of 7230
- **c has type char and a value of 'z'
void pointers
The void type of pointer is a special type of pointer. In C++, void represents the absence of type, so void pointers are pointers that point to a value that has no type (and thus also an undetermined length and undetermined dereference properties).This allows void pointers to point to any data type, from an integer value or a float to a string of characters. But in exchange they have a great limitation: the data pointed by them cannot be directly dereferenced (which is logical, since we have no type to dereference to), and for that reason we will always have to cast the address in the void pointer to some other pointer type that points to a concrete data type before dereferencing it.
One of its uses may be to pass generic parameters to a function:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | | y, 1603 |
sizeof is an operator integrated in the C++ language that returns the size in bytes of its parameter. For non-dynamic data types this value is a constant. Therefore, for example, sizeof(char) is 1, because char type is one byte long.
Null pointer
A null pointer is a regular pointer of any pointer type which has a special value that indicates that it is not pointing to any valid reference or memory address. This value is the result of type-casting the integer value zero to any pointer type.| 1 2 | |
Do not confuse null pointers with void pointers. A null pointer is a value that any pointer may take to represent that it is pointing to "nowhere", while a void pointer is a special type of pointer that can point to somewhere without a specific type. One refers to the value stored in the pointer itself and the other to the type of data it points to.
Pointers to functions
C++ allows operations with pointers to functions. The typical use of this is for passing a function as an argument to another function, since these cannot be passed dereferenced. In order to declare a pointer to a function we have to declare it like the prototype of the function except that the name of the function is enclosed between parentheses () and an asterisk (*) is inserted before the name:| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | | 8 |
In the example, minus is a pointer to a function that has two parameters of type int. It is immediately assigned to point to the function subtraction, all in a single line:
|
Dynamic Memory
The answer is dynamic memory, for which C++ integrates the operators new and delete.
Operators new and new[]
In order to request dynamic memory we use the operator new. new is followed by a data type specifier and -if a sequence of more than one element is required- the number of these within brackets []. It returns a pointer to the beginning of the new block of memory allocated. Its form is:pointer = new type
pointer = new type [number_of_elements]
The first expression is used to allocate memory to contain one single element of type type. The second one is used to assign a block (an array) of elements of type type, where number_of_elements is an integer value representing the amount of these. For example:
| 1 2 | |
In this case, the system dynamically assigns space for five elements of type int and returns a pointer to the first element of the sequence, which is assigned to bobby. Therefore, now, bobby points to a valid block of memory with space for five elements of type int.
The first element pointed by bobby can be accessed either with the expression bobby[0] or the expression *bobby. Both are equivalent as has been explained in the section about pointers. The second element can be accessed either with bobby[1] or *(bobby+1) and so on...
You could be wondering the difference between declaring a normal array and assigning dynamic memory to a pointer, as we have just done. The most important difference is that the size of an array has to be a constant value, which limits its size to what we decide at the moment of designing the program, before its execution, whereas the dynamic memory allocation allows us to assign memory during the execution of the program (runtime) using any variable or constant value as its size.
The dynamic memory requested by our program is allocated by the system from the memory heap. However, computer memory is a limited resource, and it can be exhausted. Therefore, it is important to have some mechanism to check if our request to allocate memory was successful or not.
C++ provides two standard methods to check if the allocation was successful:
One is by handling exceptions. Using this method an exception of type bad_alloc is thrown when the allocation fails. Exceptions are a powerful C++ feature explained later in these tutorials. But for now you should know that if this exception is thrown and it is not handled by a specific handler, the program execution is terminated.
This exception method is the default method used by new, and is the one used in a declaration like:
|
The other method is known as nothrow, and what happens when it is used is that when a memory allocation fails, instead of throwing a bad_alloc exception or terminating the program, the pointer returned by new is a null pointer, and the program continues its execution.
This method can be specified by using a special object called nothrow, declared in header <new>, as argument for new:
|
In this case, if the allocation of this block of memory failed, the failure could be detected by checking if bobby took a null pointer value:
| 1 2 3 4 5 | |
This nothrow method requires more work than the exception method, since the value returned has to be checked after each and every memory allocation, but I will use it in our examples due to its simplicity. Anyway this method can become tedious for larger projects, where the exception method is generally preferred. The exception method will be explained in detail later in this tutorial.
Operators delete and delete[]
Since the necessity of dynamic memory is usually limited to specific moments within a program, once it is no longer needed it should be freed so that the memory becomes available again for other requests of dynamic memory. This is the purpose of the operator delete, whose format is:| 1 2 | |
The first expression should be used to delete memory allocated for a single element, and the second one for memory allocated for arrays of elements.
The value passed as argument to delete must be either a pointer to a memory block previously allocated with new, or a null pointer (in the case of a null pointer, delete produces no effect).
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | | How many numbers would you like to type? 5 Enter number : 75 Enter number : 436 Enter number : 1067 Enter number : 8 Enter number : 32 You have entered: 75, 436, 1067, 8, 32, |
Notice how the value within brackets in the new statement is a variable value entered by the user (i), not a constant value:
|
But the user could have entered a value for i so big that our system could not handle it. For example, when I tried to give a value of 1 billion to the "How many numbers" question, my system could not allocate that much memory for the program and I got the text message we prepared for this case (Error: memory could not be allocated). Remember that in the case that we tried to allocate the memory without specifying the nothrow parameter in the new expression, an exception would be thrown, which if it's not handled terminates the program.
It is a good practice to always check if a dynamic memory block was successfully allocated. Therefore, if you use the nothrow method, you should always check the value of the pointer returned. Otherwise, use the exception method, even if you do not handle the exception. This way, the program will terminate at that point without causing the unexpected results of continuing executing a code that assumes a block of memory to have been allocated when in fact it has not.
Dynamic memory in ANSI-C
Operators new and delete are exclusive of C++. They are not available in the C language. But using pure C language and its library, dynamic memory can also be used through the functions malloc, calloc, realloc and free, which are also available in C++ including the <cstdlib> header file (see cstdlib for more info).
The memory blocks allocated by these functions are not necessarily compatible with those returned by new, so each one should be manipulated with its own set of functions or operators.
Data Structures
Data structures
A data structure is a group of data elements grouped together under one name. These data elements, known as members, can have different types and different lengths. Data structures are declared in C++ using the following syntax:struct structure_name {
member_type1 member_name1;
member_type2 member_name2;
member_type3 member_name3;
.
.
} object_names;
where structure_name is a name for the structure type, object_name can be a set of valid identifiers for objects that have the type of this structure. Within braces { } there is a list with the data members, each one is specified with a type and a valid identifier as its name.
The first thing we have to know is that a data structure creates a new type: Once a data structure is declared, a new type with the identifier specified as structure_name is created and can be used in the rest of the program as if it was any other type. For example:
| 1 2 3 4 5 6 7 | |
We have first declared a structure type called product with two members: weight and price, each of a different fundamental type. We have then used this name of the structure type (product) to declare three objects of that type: apple, banana and melon as we would have done with any fundamental data type.
Once declared, product has become a new valid type name like the fundamental ones int, char or short and from that point on we are able to declare objects (variables) of this compound new type, like we have done with apple, banana and melon.
Right at the end of the struct declaration, and before the ending semicolon, we can use the optional field object_name to directly declare objects of the structure type. For example, we can also declare the structure objects apple, banana and melon at the moment we define the data structure type this way:
| 1 2 3 4 | |
It is important to clearly differentiate between what is the structure type name, and what is an object (variable) that has this structure type. We can instantiate many objects (i.e. variables, like apple, banana and melon) from a single structure type (product).
Once we have declared our three objects of a determined structure type (apple, banana and melon) we can operate directly with their members. To do that we use a dot (.) inserted between the object name and the member name. For example, we could operate with any of these elements as if they were standard variables of their respective types:
| 1 2 3 4 5 6 | |
Each one of these has the data type corresponding to the member they refer to: apple.weight, banana.weight and melon.weight are of type int, while apple.price, banana.price and melon.price are of type float.
Let's see a real example where you can see how a structure type can be used in the same way as fundamental types:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | | Enter title: Alien Enter year: 1979 My favorite movie is: 2001 A Space Odyssey (1968) And yours is: Alien (1979) |
The example shows how we can use the members of an object as regular variables. For example, the member yours.year is a valid variable of type int, and mine.title is a valid variable of type string.
The objects mine and yours can also be treated as valid variables of type movies_t, for example we have passed them to the function printmovie as we would have done with regular variables. Therefore, one of the most important advantages of data structures is that we can either refer to their members individually or to the entire structure as a block with only one identifier.
Data structures are a feature that can be used to represent databases, especially if we consider the possibility of building arrays of them:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | | Enter title: Blade Runner Enter year: 1982 Enter title: Matrix Enter year: 1999 Enter title: Taxi Driver Enter year: 1976 You have entered these movies: Blade Runner (1982) Matrix (1999) Taxi Driver (1976) |
Pointers to structures
Like any other type, structures can be pointed by its own type of pointers:| 1 2 3 4 5 6 7 | |
Here amovie is an object of structure type movies_t, and pmovie is a pointer to point to objects of structure type movies_t. So, the following code would also be valid:
|
The value of the pointer pmovie would be assigned to a reference to the object amovie (its memory address).
We will now go with another example that includes pointers, which will serve to introduce a new operator: the arrow operator (->):
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | | Enter title: Invasion of the body snatchers Enter year: 1978 You have entered: Invasion of the body snatchers (1978) |
The previous code includes an important introduction: the arrow operator (->). This is a dereference operator that is used exclusively with pointers to objects with members. This operator serves to access a member of an object to which we have a reference. In the example we used:
|
Which is for all purposes equivalent to:
|
Both expressions pmovie->title and (*pmovie).title are valid and both mean that we are evaluating the member title of the data structure pointed by a pointer called pmovie. It must be clearly differentiated from:
|
which is equivalent to:
|
And that would access the value pointed by a hypothetical pointer member called title of the structure object pmovie (which in this case would not be a pointer). The following panel summarizes possible combinations of pointers and structure members:
| Expression | What is evaluated | Equivalent |
|---|---|---|
| a.b | Member b of object a | |
| a->b | Member b of object pointed by a | (*a).b |
| *a.b | Value pointed by member b of object a | *(a.b) |
Nesting structures
Structures can also be nested so that a valid element of a structure can also be in its turn another structure.
| 1 2 3 4 5 6 7 8 9 10 11 12 | |
After the previous declaration we could use any of the following expressions:
| 1 2 3 4 | |
(where, by the way, the last two expressions refer to the same member).
Other Data Types
Defined data types (typedef)
C++ allows the definition of our own types based on other existing data types. We can do this using the keyword typedef, whose format is:typedef existing_type new_type_name ;
where existing_type is a C++ fundamental or compound type and new_type_name is the name for the new type we are defining. For example:
| 1 2 3 4 | |
In this case we have defined four data types: C, WORD, pChar and field as char, unsigned int, char* and char[50] respectively, that we could perfectly use in declarations later as any other valid type:
| 1 2 3 4 | |
typedef does not create different types. It only creates synonyms of existing types. That means that the type of myword can be considered to be either WORD or unsigned int, since both are in fact the same type.
typedef can be useful to define an alias for a type that is frequently used within a program. It is also useful to define types when it is possible that we will need to change the type in later versions of our program, or if a type you want to use has a name that is too long or confusing.
Unions
Unions allow one same portion of memory to be accessed as different data types, since all of them are in fact the same location in memory. Its declaration and use is similar to the one of structures but its functionality is totally different:union union_name {
member_type1 member_name1;
member_type2 member_name2;
member_type3 member_name3;
.
.
} object_names;
All the elements of the union declaration occupy the same physical space in memory. Its size is the one of the greatest element of the declaration. For example:
| 1 2 3 4 5 | |
defines three elements:
| 1 2 3 | |
each one with a different data type. Since all of them are referring to the same location in memory, the modification of one of the elements will affect the value of all of them. We cannot store different values in them independent of each other.
One of the uses a union may have is to unite an elementary type with an array or structures of smaller elements. For example:
| 1 2 3 4 5 6 7 8 | |
defines three names that allow us to access the same group of 4 bytes: mix.l, mix.s and mix.c and which we can use according to how we want to access these bytes, as if they were a single long-type data, as if they were two short elements or as an array of char elements, respectively. I have mixed types, arrays and structures in the union so that you can see the different ways that we can access the data. For a little-endian system (most PC platforms), this union could be represented as:
The exact alignment and order of the members of a union in memory is platform dependant. Therefore be aware of possible portability issues with this type of use.
Anonymous unions
In C++ we have the option to declare anonymous unions. If we declare a union without any name, the union will be anonymous and we will be able to access its members directly by their member names. For example, look at the difference between these two structure declarations:| structure with regular union | structure with anonymous union |
|---|---|
struct {
char title[50];
char author[50];
union {
float dollars;
int yen;
} price;
} book;
| struct {
char title[50];
char author[50];
union {
float dollars;
int yen;
};
} book;
|
The only difference between the two pieces of code is that in the first one we have given a name to the union (price) and in the second one we have not. The difference is seen when we access the members dollars and yen of an object of this type. For an object of the first type, it would be:
| 1 2 | |
whereas for an object of the second type, it would be:
| 1 2 | |
Once again I remind you that because it is a union and not a struct, the members dollars and yen occupy the same physical space in the memory so they cannot be used to store two different values simultaneously. You can set a value for price in dollars or in yen, but not in both.
Enumerations (enum)
Enumerations create new data types to contain something different that is not limited to the values fundamental data types may take. Its form is the following:enum enumeration_name {
value1,
value2,
value3,
.
.
} object_names;
For example, we could create a new type of variable called colors_t to store colors with the following declaration:
|
Notice that we do not include any fundamental data type in the declaration. To say it somehow, we have created a whole new data type from scratch without basing it on any other existing type. The possible values that variables of this new type color_t may take are the new constant values included within braces. For example, once the colors_t enumeration is declared the following expressions will be valid:
| 1 2 3 4 | |
Enumerations are type compatible with numeric variables, so their constants are always assigned an integer numerical value internally. If it is not specified, the integer value equivalent to the first possible value is equivalent to 0 and the following ones follow a +1 progression. Thus, in our data type colors_t that we have defined above, black would be equivalent to 0, blue would be equivalent to 1, green to 2, and so on.
We can explicitly specify an integer value for any of the constant values that our enumerated type can take. If the constant value that follows it is not given an integer value, it is automatically assumed the same value as the previous one plus one. For example:
| 1 2 3 | |
In this case, variable y2k of enumerated type months_t can contain any of the 12 possible values that go from january to december and that are equivalent to values between 1 and 12 (not between 0 and 11, since we have made january equal to 1).