Programming, Algorithm and Technical Tips: November 2011

Tuesday, November 29, 2011

Assignment between Class Objects

Some short notes about assignment between class objects.

Suppose we are considering the assignment a = b, where a if of class A type, and b is of class B type.

The overloaded operator= function will be looked first. If there is any, the assignment will call this function.

If no overloaded operator= function, and a and b are of same class type, the assignment will be member-wise shallow copy.

If no overloaded operator= function, and class B is derived from class A (in other words, b is an a), the assignment will copy the base class members.

If no overloaded operator= function, and a and b are of unrelated class types, the assignment will try to call copy constructor if any.

If no overloaded operator= function, a and b are of unrelated class types, and no copy constructor available, the assignment will try to call overloaded type casting function if any.

If none of above applies, then the compiler will report an error about the assignment.

Related Posts:

Type Casting in C++

Type Casting on Classes

static_cast, dynamic_cast, and reinterpret_cast

Constructors and Destructors

This post looks into some details about class constructors and destructors. First look at a piece of code in the following.

What's going on really with above code? Which objects were constructed and destructed? What's the order and logic relationship? Below is the output by above program. (The comments were added for explanation.)

default constructor: 1st           // initialize f1 with value "1st"
before calling func ...
copy constructor: 1st              // construct local object f in func
within function func ...
copy constructor: 1st              // initialize f2, no temporary object returned
destructor: 1st                    // destroy local object f
after calling func ...
default constructor: 3rd           // initialize f3 with value "3rd"
default constructor: default       // initialize f4 with default value
default constructor: 4th           // temporary object with value "4th"
f4.buf: 4th                        // after shallow copy to f4
destroying temporary object ...
destructor: 4th                    // destroy temporary object
f4.buf:                            // f4.buf becomes invalid
destroying local objects ...
destructor:                        // destroy f4
destructor: 3rd                    // destroy f3
destructor: 1st                    // destroy f2
destructor: 1st                    // destroy f1

If no copy constructor was defined, the function call may bring some problems because of shallow copy. When exiting the function func, the destroy of local variable f will also make the buf member of returned object invalid.

Accessibility of Class Members

In this post, I briefly summarize the accessibility of class members in C++.

Access	public	protected	private
the same class	yes	yes	yes
derived class public inheritance	yes still public	yes still protected	no still private
derived class protected inheritance	yes change to protected	yes still protected	no still private
derived class private inheritance	yes change to private	yes change to private	no still private
friend	yes	yes	yes
other	yes	no	no

References:

C++ Tutorial

Related Posts:

Friend Functions and Classes

References in C++

Some notes about references in C++. Also, see the new standard introduced in C++11 about rvalue references and move semantics.

It is not possible to refer directly to a reference object after it is defined; any occurrence of its name refers directly to the object it references.
Once a reference is created, it cannot be later made to reference another object; it cannot be reseated. This is often done with pointers.
References cannot be null, whereas pointers can; every reference refers to some object, although it may or may not be valid.
References cannot be uninitialized. Because it is impossible to reinitialize a reference, they must be initialized as soon as they are created. In particular, local and global variables must be initialized where they are defined, and references which are data members of class instances must be initialized in the initializer list of the class's constructor.
Most compilers will support a null reference without much complaint, crashing only if you try to use the reference in some way.
References become invalid if they refer to an object with automatic allocation which goes out of scope.
References become invalid if they refer to an object inside a block of dynamic memory which has been freed.
Undefined behavior and invalid reference (initialized by dereferencing a null pointer):
```
int *ip = 0;
int &ir = *ip;
```
References exhibit polymorphic capabilities, which is similar to pointers.
References not qualified with const can only be bound to addressable values, not rvalues.
References not qualified with const cannot be bound to temporary objects.
Never return references to local or temporary objects.
It is unspecified whether or not a reference requires storage.
There shall be no references to references, no arrays of references, and no pointers to references.
The declaration of a reference shall contain an initializer except when the declaration contains an explicit extern specifier, is a class member declaration within a class declaration, or is the declaration of a parameter or a return type.
A reference cannot be bound directly to a bitfield.
Given typedef int & iRef;, the declaration const iRef ir=5; is incorrect. Here, the qualifier const will be ignored. However, the following code is correct.
```
typedef const int & ciRef;
ciRef ir = 5;
```
For pointers, there may be both const pointers and pointers to const variables; for references, there is no const references, at least not directly. By nature, reference is const and cannot be reassigned. In this sense, the term "const reference" always refers to "reference to const variable".

References:

Wikipedia: Reference

Rvalue References and Move Semantics

Monday, November 28, 2011

Function Template Specialization

A good study example for function template specialization.

Some notes about function template specialization:

Explicit template specialization should be defined in namespace scope, and hence not inline within class body.
Class templates can be partially specialized, but partial function template specialization is not allowed.
The template parameter list after function name may be omitted, or some trailing arguments omitted. However, it is recommended to keep the <> pair.

Template parameter list is part of signature for function templates and their specializations. Pay attention to the differences between function overloading and partial specialization.

// Suppose we have the function template
template< typename T1, typename T2 >
void func( T1 v1, T2 v2 ) { ... }

// The following is considered as function template overloading and allowed
template< typename T >
void func( T v1, T v2 ) { ... }

// The following is considered as partial specialization and not allowed
template< typename T >
void func< T, T >( T v1, T v2 ) { ... }

Related Posts:

Function Signatures

Class Templates

Inheritance and Order of Constructor Calls

For the code segment below, how many times of base class constructor will be called?

For single inheritance, the order is always the same: base class constructor first, then its own constructor.
For multiple inheritance, call the virtual base constructor first if any, then follow the order from left to right for non-virtual base constructors, and finally call its own constructor. Note that only a unique set of virtual base constructors will be called, which is exactly the purpose of virtual inheritance.

Example 1:
Output 1:

Base constructor
Foo1 constructor
Base constructor
Foo2 constructor
Multi constructor

Example 2:
Output 2:

Base constructor
Foo1 constructor
Base constructor
Foo2 constructor
Multi constructor

Example 3:
Output 3:

Base constructor
Base constructor
Foo1 constructor
Foo2 constructor
Multi constructor

Example 4:
Output 3:

Base constructor
Foo1 constructor
Foo2 constructor
Multi constructor

Sunday, November 27, 2011

static_cast, dynamic_cast and reinterpret_cast

See also the posts (1, 2) on type casting.

static_cast can be used to cast between fundamental types.
```
int i;
float f;
i = static_cast<int>(f);
```

static_cast cannot be used to cast between pointers or references to fundamental types.

int *ip, &ir;
float f;
ip = static_cast<int*>(&f);   // illegal
ir = static_cast<int&>(f);    // illegal

static_cast can be used to cast between class types if one of the class types has explicit conversion constructor or type casting operator function. For this case, static_cast is the same as explicit casting using parentheses or implicit casting using assignment.
static_cast can be used to cast derived class type to base class type (public inheritance only), but cannot be used to cast base class type to derived class type if no defined conversion constructor or casting operator. Think about function arguments and exception catch, which are implicit type casting.
static_cast can be used to cast between pointers or references to related classes (base and derived, public inheritance only). The compiler won't complain if casting base class pointer to derived class pointer. The overhead of type-safe check by dynamic_cast can be avoided. It's the programmers' responsibility to ensure safe conversions.
static_cast cannot be used to cast for other cases, including between fundamental types and class types, between pointers or references to fundamental types and class types, or between pointers or references to unrelated classes.
dynamic_cast can be used to cast pointers or references to related classes only (base and derived, public inheritance only). In other words, dynamic_cast won't work on non-pointer/reference types, or pointer/reference to fundamental types.
When casting between pointers or references to unrelated classes, dynamic_cast will pass compile, but there will be run-time error.
dynamic_cast can be always successful if casting derived class to base class (public inheritance only).
When casting from base class to derived class, dynamic_cast will pass compile only when the base class is polymorphic. But there will be run-time error.
dynamic_cast cannot be used to cast base class to derived class if the base class is not polymorphic. (compile error)
reinterpret_cast can be used to cast between any pointer or reference types, even between those to unrelated classes or fundamental types or mix.
```
int *ip, &ir;
float f;
ip = reinterpret_cast<int*>(&f);
ir = reinterpret_cast<int&>(f);
```
When using reinterpret_cast, neither the content pointed nor the pointer type itself is checked, and thus dereferencing it is unsafe.
reinterpret_cast even works on any kind of inheritance, either public, protected, or private.
reinterpret_cast can be used to cast between pointer types and integer types.

To demonstrate, I wrote a sample code.

Related Posts:

Type Casting in C++

Type Casting on Classes

Assignment between Class Objects

Thursday, November 24, 2011

Object Initialization for User-Defined Classes

First of all, to declare or initialize class objects, the class must have default constructors, either explicitly defined in source code or implicitly defined by the compiler. Note that a default constructor is a constructor that either has no parameters, or if it has parameters, all the parameters have default values.

When declaring objects (not references or pointers), we can either directly declare an object with no explicit initializer or parenthesis, or explicitly provide initializer with parenthesis. Note that although the default constructor was defined and we may call general default functions without providing any arguments, we cannot initialize objects by such function call scheme. In other words, we may not provide a pair of parenthesis while no explicit initializer arguments.

Suppose class SomeClass has default constructor, then

SomeClass object1;                 // legal
SomeClass object2(initializer);    // legal
SomeClass object3();        // illegal (compiler won't complain, 
                            // but it will consider object3 as a function)

When initialize object pointers using keyword new, there will be different behaviors for explicitly and implicitly defined default constructors.

For classes with explicitly defined default constructor, the following 2 schemes work exactly the same way.

SomeClass *pointer1 = new SomeClass;       // call default constructor
SomeClass *pointer2 = new SomeClass();     // call default constructor

However, for classes without explicitly defined default constructor, the 2 schemes differ in the initial value for object data members.

AnotherClass *pointer1 = new AnotherClass;       // undefined data members
AnotherClass *pointer2 = new AnotherClass();     // nullified data members

I wrote a sample code and tested with both g++ and MSVC, and I am attaching the output by MSVC in this post after the code.

Related Posts:

Array Declaration and Initialization

new and delete

Type Casting on Classes

In one previous post, I discussed about general type casting in C++. While in this post, I am going to talk about the casting on classes. More specifically, I won't discuss the keyword approach (static_cast, dynamic_cast, reinterpret_cast or const_cast) in this post.

Based on the is-a relationship between derived and base classes, implicit casting from derived class to base class is always possible for public inheritance (not possible for protected or private inheritance).

class Base {};
class Derived: public Base {};

Derived d;
Base b = d;

Without use of keywords, we may have 3 ways to convert unrelated class types.

Corresponding constructor
Overload assignment operator
Overload type casting operator

Suppose we want cast an object of class A to class B, then we may need define 1 or 2 in B's body, while define 3 in A's body.

Here is a sample code I wrote for testing. Note that the "=" sign in declaration is not assignment operator. Initialization, assignment, and explicit casting are all legal for Approaches 1 and 3, but only implicit casting by assignment is legal for Approach 2 with overloaded operator "=".

Related Posts:

Type Casting in C++

static_cast, dynamic_cast, and reinterpret_cast

Assignment between Class Objects

Wednesday, November 23, 2011

getline and istream::getline

In C++, these 2 functions both read a string of characters from an input stream. I didn't pay attention to their differences, and in this post I summarize about that.

First, let's look at the function signatures.

istream& getline( istream& is, string &str );
istream& getline( istream& is, string &str, char delim );

istream& istream::getline( char *s, streamsize n );
istream& istream::getline( char *s, streamsize n, char delim );

For both functions, the delimiter character is delim and '\n' (newline character) is the default value. The extraction also stops if the end of file is reached or if some other error occurs during the input operation. For the function istream::getline, at most \(n-1\) characters can be read and the ending null character will be automatically appended to the C-style string after data extraction.

For both functions, if the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.

Differences:

getline is a global function included in the header <string>, while istream::getline is a member function of class istream.
getline reads characters into a C++ string type string, while istream::getline reads characters into a C-style string.

References:

C++ Reference: getline

C++ Reference: istream::getline

Wednesday, November 16, 2011

Maximize Payoff by Greedy Reordering

Source: CLRS, Introduction to Algorithms, 2nd Edition, Page 384, Problem 16.2-7

Problem:
Suppose you are given two sets \(A\) and \(B\), each containing \(n\) positive integers. You can choose to reorder each set however you like. After reordering, let \(a_i\) be the \(i\)th element of set \(A\), and let \(b_i\) be the \(i\)th element of set \(B\). You then receive a payoff of \(\prod_{i=1}^{n}{a_i^{b_i}}\). Given an algorithm that will maximize your payoff. Prove that your algorithm maximizes the payoff, and state its running time.

Solution:
Greedy Algorithm: Let the set \(\{a_i\}\) be sorted so that \(a_1 \ge a_2 \ge \ldots \ge a_n\) and set \(\{b_i\}\) be sorted so that \(b_1 \ge b_2 \ge \ldots \ge b_n\). Pair \(a_i\) with \(b_i\). This is the required solution.

Complexity:
If the two sets \(A\) and \(B\) are already sorted, the time complexity is \(O(n)\).
If the sets are not sorted, then sort them first and the time complexity is \(O(n\log n)\).

Proof:
Suppose the optimal payoff is not produced from the above solution. Let \(S\) be the optimal solution, in which \(a_1\) is paired with \(b_p\) and \(a_q\) is paired with \(b_1\). Note that \(a_1 \gt a_q\) and \(b_1 \gt b_p\).

Consider another solution \(S'\) in which \(a_1\) is paired with \(b_1\), \(a_q\) is paired with \(b_p\), and all other pairs are the same as \(S\). Then \[\frac{\text{Payoff}(S)}{\text{Payoff}(S')}=\frac{\prod_{S}\hspace{2 mm} a_i^{b_i}}{\prod_{S'}\hspace{2 mm}a_i^{b_i}}=\frac{(a_1)^{b_p} (a_q)^{b_1}}{(a_1)^{b_1} (a_q)^{b_p}}=(\frac{a_1}{a_q})^{b_p - b_1}\] Since \(a_1 \gt a_q\) and \(b_1 \gt b_p\), then \(\text{Payoff}(S) / \text{Payoff}(S') \lt 1\). This contradicts the assumption that \(S\) is the optimal solution. Therefore \(a_1\) should be paired with \(b_1\). Repeating the argument for remaining elements, we can get the result.

Tuesday, November 15, 2011

0-1 Knapsack Problem

Problem:
A thief robbing a store finds \(n\) items; the \(i\)th item is worth \(v_i\) dollars and weights \(w_i\) pounds, where \(v_i\) and \(w_i\) are integers. He wants to take as valuable a load as possible, but he can carry at most \(W\) pounds in his knapsack for some integer \(W\). Which items should he take?

Analysis:
There are two versions of this problem: fractional knapsack problem and 0-1 knapsack problem.

Fractional knapsack problem: the items can be broken into smaller pieces, and the thief can take fractions of items.
0-1 knapsack problem: the items may not be broken into smaller pieces, and the thief must decide whether to take an item or leave it (binary choices).

For fractional knapsack problem, greedy algorithm exists and the problem can be solved in \(O(n\log n)\) time based on sorting. And a better approach may take \(O(n)\) time for the worst case based on linear-time SELECT algorithm.

For 0-1 knapsack problem, no greedy algorithm exists and we need turn to dynamic programming approach. The algorithm will take \(O(nW)\) time.

Solution:
Define \(c[i,w]\) to be the optimal value for items \(1,2,...,i\) and maximum weight \(w\). There are following recursive relations:
\[c[i,w]=\begin{cases}0, & i=0\text{ or }w=0\\c[i-1,w], & w_i\gt w\\ \max\{v_i+c[i-1,w-w_i], c[i-1,w]\}, & i>0 \text{ and }w_i \le w\end{cases}\] We can have the pseudocode:

DP_01_Knapsack(value, weight, W)
   assert value.length = weight.length
   n ← value.length
   for w ← 0 to W:
      c[0,w] ← 0
   for i ← 1 to n:
      c[i,0] ← 0
      for w ← 1 to W:
         c[i,w] ← c[i-1,w]
         if weight[i] ≤ w and value[i]+c[i-1,w-weight[i]] > c[i-1,w]:
            c[i-1,w] ← value[i]+c[i-1,w-weight[i]]

The set of items to take can be deduced from the table, starting at \(c[n,W]\) and tracing backwards where the optimal values came from. If \(c[i,w]=c[i-1,w]\), then item \(i\) is not part of the solution, and we are continue tracing with \(c[i-1,w]\). Otherwise item \(i\) is part of the solution, and we continue tracing with \(c[i-1,w-w_i]\).

References:

Dynamic Programming Solution to the 0-1 Knapsack Problem

Monday, November 14, 2011

A More Efficient Algorithm for Longest Increasing Subsequence

Problem:
Given an array of numbers or a string of length \(n\), find a longest increasing subsequence.
In my previous post, I discussed the dynamic programming approach with time complexity \(O(n^2)\). For this specific problem, we can have a more efficient algorithm which is also based on dynamic programming. Please see Wikipedia for reference.

The algorithm uses only arrays and binary searching. It processes the sequence elements in order, maintaining the longest increasing subsequence found so far. Denote the sequence values as \(X[1]\), \(X[2]\), etc. Then, after processing \(X[i]\), the algorithm will have stored values in two arrays:

\(M[j]\) — stores the position \(k\) of the smallest value \(X[k]\) such that there is an increasing subsequence of length \(j\) ending at \(X[k]\) on the range \(k ≤ i\) (note we have \(j ≤ k ≤ i\) here).

\(P[k]\) — stores the position of the predecessor of \(X[k]\) in the longest increasing subsequence ending at \(X[k]\).

In addition the algorithm stores a variable \(L\) representing the length of the longest increasing subsequence found so far.

Note that, at any point in the algorithm, the sequence \[X[M[1]], X[M[2]], ..., X[M[L]]\] is nondecreasing. For, if there is an increasing subsequence of length \(i\) ending at \(X[M[i]]\), then there is also a subsequence of length \(i-1\) ending at a smaller value: namely the one ending at \(X[P[M[i]]]\). Thus, we may do binary searches in this sequence in logarithmic time.

The algorithm, then, proceeds as follows.

L = 0
for i = 1, 2, ... n:
   binary search for the largest positive j ≤ L 
     such that X[M[j]] < X[i] (or set j = 0 if no such value exists)
   P[i] = M[j]
   if j == L or X[i] < X[M[j+1]]:
      M[j+1] = i
      L = max(L, j+1)

The result of this is the length of the longest sequence in \(L\). The actual longest sequence can be found by backtracking through the \(P\) array: the last item of the longest sequence is in \(X[M[L]]\), the second-to-last item is in \(X[P[M[L]]]\), etc. Thus, the sequence has the form \[..., X[P[P[M[L]]]], X[P[M[L]]], X[M[L]].\]
Because the algorithm performs a single binary search per sequence element, its total time can be expressed using Big \(O\) notation as \(O(n\log n)\).

References:

Wikipedia item: longest increasing subsequence

Related Posts:

\(O(n^2)\) Algorithm for Longest Increasing Subsequence

Sunday, November 13, 2011

Longest Increasing Subsequence

Problem:
Given an array of numbers or a string of length \(n\), find a longest increasing subsequence.

Brute-force approach:
Enumerate all possible subsequences and then determine which are increasing and store the longest one. This approach is obviously impractical for large \(n\) and its time complexity is exponential.

Dynamic programming approach:
The longest increasing subsequence of an array \(A\) of length \(n\) can be determined based on that of all the prefix arrays. We need use an auxiliary array \(T[n]\), in which \(T[i]\) stores the length of longest increasing (non-decreasing) subsequence ending at \(A[i]\). Notice that is ending at, i.e., the last element must be \(A[i]\). Meanwhile, the global maximum length is also stored in variable \(L\). The time complexity for this algorithm will be \(O(n^2)\).

Related Posts:

\(O(n\log n)\) Algorithm for Longest Increasing Subsequence

Wednesday, November 9, 2011

Implementation of prev_permutation

In my previous posts, I discussed about various approaches to find the next number in order. The main idea for that is the function of next_permutation in C++, or equivalent implementations in C or Java. Now, I am trying to implement the opposite function prev_permutation.

The function takes a string as input, update the string to the previous one in lexicographical order if there is, and return a bool value. If there is a smaller string, return true; if not, return false and update the string from the smallest to the greatest.

Idea:

Search back from the end, find the first greater character compared to the one immediately after it.
Search back from the end again, find the first smaller one compared to the great character just found.
Swap the 2 characters found from above 2 steps.
Reverse the rest string after the character location found in Step 1.

Tuesday, November 8, 2011

Variable Length Argument List

Sometimes we need pass along an unknown number of arguments before actually calling the function, such as searching for the minimum of maximum of a list of numbers, or calculating their average. Most programming language are able to first construct an array or list and then pass the constructed object as a single argument. But we can also pass an undefined length of arguments, and this post just summarizes this approach. Different languages provide different ways to deal with this case, and here I am discussing C/C++, Java and Python.

In C/C++, we need the type va_list and macros va_start, va_arg, and va_end. All these are included in the header file <stdarg.h> or <cstdarg>. Below is an example.

In Java, however, we directly append an ellipse after the type name, and then we can give a variable name (which is actually is an array and can be iterated via enhanced for loop in Java).

In Python, programmers have more power than passing an array of arguments: it allows a dictionary of arguments, by which we actually passing both a set of identifier names and their corresponding values. This offers more options to use the variable length arguments.

In summary,

The variable length parameters are always placed at the end of parameter list of the function.
In C/C++, at least one parameter must be passed and declared in the list. There is no such requirement in Java or Python. Also, in C/C++, the comma or space is not required after the last given parameter.
In C/C++, the variable arguments are able to be converted to different types via va_arg, and also this conversion is required for each argument. However, Java will first construct an array of the same type for all the passed arguments. Of course, you can cast the elements if necessary.
In Python, the arguments in calling function may become a little bit complex. The general order is that non-keyword arguments should be placed before keyword arguments. Also, there may be explicitly defined or predefined tuple or dict that will be passed as variable arguments, and in general these are always placed after "in-place" arguments. Here is an example.
```
def: foo(arg1, arg2, *nonkws, **kws):
    #### function body

pre_Tuple = (pre_nonkw1, pre_nonkw2)
pre_Dict = {'pre_kw1':pre_v1, 'pre_kw2':pre_v2}

# call function foo()
foo(val1, val2, nkw1, nkw2, kw1=v1, kw2=v2, *(expl_nkw1, expl_nkw2), **{'expl_kw1':expl_v1, 'expl_kw2':expl_v2})
```
or
```
foo(val1, val2, nkw1, nkw2, kw1=v1, kw2=v2, *pre_Tuple, **pre_Dict)
```
The only possibility to reorder the argument list is
```
foo(val1, val2, nkw1, nkw2, *(expl_nkw1, expl_nkw2), kw1=v1, kw2=v2, **{'expl_kw1':expl_v1, 'expl_kw2':expl_v2})
```
or
```
foo(val1, val2, nkw1, nkw2, *pre_Tuple, kw1=v1, kw2=v2, **pre_Dict)
```
Explicitly defined tuple or dict cannot co-exist with predefined tuple or dict. Only one for each is possible.

Monday, November 7, 2011

Class and Struct

In C, there is no such special thing called class, and class is just an ordinary identifier. For example,

int class=2012;

The struct in C contains data members only, and all of them are visible in the same scope as struct itself.

In Java, there is no such thing called struct, but we can use class to implement struct. The class may contain member functions besides data fields, and programmer is able to declare the visibility for each member.

Interesting while kind of confusing, C++ has them both: class and struct. In many ways, C++ compilers treat them the same. Below are some short notes about class and struct in C++.

Both class and struct can have member data and member functions, and are able to specify their visibilities.
Both class and struct can inherit from and be inherited by another class or struct. To state more clearly, class can inherit from either class or a struct, and similarly, struct can inherit from either class or struct.
Both class and struct may contain pointers to the same class/struct type, but can't contain objects or references of the same class/struct type.
Both keywords class and struct can be omitted when declaring objects. While in C, the keywords struct, as well as union and enum, can not be omitted when declaring unless a typedef was used or anonymous variable is being declared.
C++ compilers don't distinguish the keywords class and struct in terms of type name. In other words, you may not define a class with name A and then define a struct with name A. The compilers will report "redefinition error". This is also true for struct, union and enum in C, although the keywords must be included for declaration.
The members of a class are private by default, and the members of a struct are public by default.
The inheritance for a class is also private by default, and the inheritance for a struct is public by default.
In C and C++, all definitions for class, struct, union and enum must end with a semicolon ;, while in Java, no such ; outside the closing brace for class and enum definitions. (There is no struct or union in Java.)

Sunday, November 6, 2011

Dynamic Memory Allocation with new and delete

Some notes about the usage of keywords new and delete.

Syntax:

type *p = new type;     // undefined initial value for user-defined type
type *p = new type();   // default initial value, 0 for user-defined type
type *p = new type(initializer);
type *p = new type[size];          // default constructor
type *p = new type[size]();        // also default constructor
type *p = new (nothrow) type;      // don't throw bad_alloc exception
type *p = new (nothrow) type();
type *p = new (nothrow) type(initializer);
type *p = new (nothrow) type[size];
type *p = new (nothrow) type[size]();
delete p;
delete[] p;

Notes:

Initializers cannot be specified for arrays created with new. All elements of an array will be initialized with the default constructors. If the type doesn't have a default constructor, this will be a compile-error.
The behavior when operator new fails is compiler-specific. Most compilers will throw a std::bad_alloc exception. Using nothrow as above will return 0 or NULL and continue the program when new fails. Another option is to use function set_new_handler to handle new failures. The header file <new> is needed for the use of bad_alloc, nothrow, and set_new_hanlder.
The statement delete NULL; is allowed and no error will be reported.
In order to avoid double-free problem, assign the pointer to NULL after deletion.
```
delete p;
p = NULL;
```
or
```
delete[] p;
p = NULL;
```
It is not possible to directly reallocate memory allocated with new[ ]. To extend or reduce the size of a block, one must allocate a new block of adequate size, copy over the old memory, and delete the old block.
Arrays allocated with new[ ] must be deallocated with delete[ ], not delete. Since the layout of arrays allocated with new[ ] is implementation defined, and possible not compatible with new. Some implementations of new[ ] embed the number of allocated objects first into the beginning of the allocated memory chunk, and return pointer to the remaining part of the array.

Related Posts:

Array Declaration and Initialization

Object Initialization

Saturday, November 5, 2011

Delete Duplicate Nodes from Linked List

Problem:
Given an unsorted linked list, delete duplicate nodes from the linked list.

Thinking:
This problem is again similar to previous 2 posts (determine uniqueness and remove duplicates). There are several algorithms available.

Node definition:
Suppose the node in this linked list is defined as the class below.

Solution 1:
Without using extra buffer, delete every duplicates for each node encountered. Time complexity \(O(n^2)\).

Solution 2:
Without using extra buffer, each time encountering a node, detect if it's a duplicate. If yes, then delete it. Time complexity \(O(n^2)\).

Solution 3:
Using an extra hash table / hash set / hash map to keep track which nodes have already existed. Average time complexity \(O(n)\).

Notes:

Would there be any differences if the linked list is doubly linked?
==> Probably not.
How about sorting the list first?
==> As long as you are allowed to destroy the original order, this should work.

Friday, November 4, 2011

Remove Duplicate Characters

Problem:
Given a string, remove the duplicate characters in the string.

Assumption:
All the characters are from ASCII set and in the range 0~255. Also, the character null character does not appear in the string.

Requirement:
An extra copy of the array is not allowed.

Similar to last post about approaches to deciding unique characters or not, this can be done in place or with an array of fixed size.

Note:
Although C++ algorithm library has unique function, which may seem appropriate for this problem. However, that unique function just removes the consecutive duplicates.

We may also sort the string first if we can remove any duplicates, not necessarily keeping the first one in appearance order. In this way, we may get to time complexity \(O(n\log n)\). Also, this method would destroy the order of unique characters. This solution is not shown in this post.

Solution 1:
For each character in the string and unique, detect all duplicates in the string afterwards. Mark those duplicates and remove them in the end. Suppose the string is given as a char array, otherwise we may need convert a String to StringBuffer, StringBuilder, or CharArray.

Solution 2:
For each character in the string, first determine if it's duplicate. If yes, simply skip it; if not, keep it in the string.

Solution 3:
An extra array of fixed size can be used to record which characters have appeared. This can reduce the time complexity from \(O(n^2)\) to \(O(n)\). Also, we can have boolean array or bitset approaches. Each of them can be applied to the above 2 schemes and we can have 4 different algorithms. Here, I am showing boolean array with Scheme 1 in this solution, and bitset with Scheme 2 for next solution.

Solution 4:
Use bitset with the above Scheme 2 to remove the duplicates.

Thursday, November 3, 2011

String Contains All Unique Characters or Not

Problem:
Given a string, determine if this string contains all unique characters.

Assumption:
The characters in this string are from ASCII character set. In other words, the value range is 0~255.

Solution 1:
Directly compare each character in the string with every other character in the string. This approach will take \(O(n^2)\) time and no space.

Solution 2:
If allowed to destroy the string, we may first sort the string using quicksort, and then scan through the string to see if there are 2 adjacent characters are the same. Quicksort will take \(O(n\log n)\) time and \(O(1)\) space. If using mergesort, it will take \(O(n)\) space. In Java, to sort the string, an extra array or StringBuffer is needed.

Solution 3:
Since the characters are in a small range, and we can create an array to record if one certain character has appeared or not. This approach will take \(O(n)\) time and \(O(m)\) space, where \(m\) is the size of character set.

Solution 4:
Similar to Solution 3, but instead of using an array of boolean values, use a set of bits to record the binary values. In Java, \(256/32=8\) int size bits are needed, which can be implemented as an array of 8 integers, and each integer represents 32 characters. In C++, there is a very convenient structure called bitset, which can be beautifully used here.

Notes:
Since the characters are all from the assumed set, and the set size is known. In this case, the maximum length for a string of all unique characters will be the size of the character set. Therefore, for each of the above programs, we may add the following code.

if( s.length() > 256 ) return false;

Wednesday, November 2, 2011

TreeSet vs HashSet vs LinkedHashSet

The post here nicely summarized the collections TreeSet, HashSet, and LinkedHashSet in Java. I mostly repeat that.

TreeSet	HashSet	LinkedHashSet
public class TreeSet extends AbstractSet implements SortedSet, Cloneable, Serializable	public class HashSet extends AbstractSet implements Set, Cloneable, Serializable	public class LinkedHashSet extends HashSet implements Set, Cloneable, Serializable
unique values	unique values	unique values
red-black tree	hash table	hash table with double links
ascending order	undefined order	insertion order
\(O(\log n)\) for add, remove and contains	\(O(1)\)	\(O(1)\), a littler slower than HashSet except for the operation of iteration

For more details, please see the reference blog.

References:

Vidya's Blog

Tuesday, November 1, 2011

Size of Class Objects

Some notes about how to determine the size of class objects in C++.

The keyword sizeof can work on both class type names and class objects.
Static member data don't contribute to the size of class (they are classwide available)
Member functions don't really contribute to the size of class (except the vtable for virtual functions).
The this pointer doesn't contribute to the size of class objects.
Friends and pointers to members are not class members at all, and hence they won't contribute to the class size.
It is unspecified whether or not a reference requires storage. It may be compiler specific.
The actual size is most likely greater than the sum of sizes for each non-static member data and virtual pointers. This is because of byte alignment or padding, and depends on the compiler.
Virtual pointer will take some space if there are virtual functions or virtual inheritance in the class.
Empty classes won't be size 0 in order to distinguish the objects of that class type. By most compilers, their size will be 1.
Empty sub-objects will take no space in memory when an empty class is inherited by a non-empty class type.
Regular inheritance will make the size of derived class the sum of all non-static member sizes from its base classes.
The _vptr field is always an alias of the first available _vptr of its base classes, if there are some. Otherwise, the class itself will create a new _vptr if necessary.
The virtual inheritance is kind of complex. It's designed to solve the diamond problem in inheritance, and there will be only 1 copy for the common ancestor members. Simply speaking, the class will first create the non-virtual base class members, and its own members in class body, and finally a unique set of members from virtual base classes (including immediate virtual base and inherited virtual base).

References:

C Programming

C++ FAQ

Wikipedia: Reference

Wikipedia: Virtual Inheritance