Chapter 21a Other Library Issues

Chapter 21a Other Library Issues Nick Maclaren http://www.ucs.cam.ac.uk/docs/course-notes/un ix-courses/cplusplus This was written by me, not Bjarne Stroustrup

Function Objects These are not the only way to solve the problem But they are the way that C++ does so In some other languages, you can have nested scopes void fred () { int z; void joe () {... = z+3; } } // Not allowed When converting between languages, you often need to use different methodologies to solve the same problem 5

Sequence Containers Watch out for vector<bool> - it's anomalous Use <array> when you need an array with a size fixed at compile time otherwise don't bother The STL also has <deque>, <forward_list>, <priority_queue>, <queue> and <stack> Generally, they are not worth bothering with <deque> allows pushing/popping at start as well <forward_list> needs only one pointer, not two The others just add queue and stack restrictions 6

Use of Containers I dislike using insert() and erase() on vector This is not because it is unreliable (it isn't) or because it is slow, but because it is unexpected It breaks some invariants of vectors that people will expect False assumptions are one of the main causes of foul bugs Code that is more 'natural' is easier to get right and maintain I much prefer to use vector for vectors and list for lists I recommend leaving optimisation until after you have got the program going and found where the time goes And, even then, optimise only performance-critical code 7

The STL's Use of Iterators Why don't I like the STL's use of iterators? As it uses them, they are no safer than C pointers They aren't checked, and mistakes will cause chaos They aren't bound to specific objects, only types You don't have to do the same see exercises 18 and 19 Remember to trap * at position end() Attempting maximum flexibility has its costs It makes debugging much harder It usually makes the code less reliable It adds gotchas, and humans make mistakes 8

The STL's Use of Iterators A specific warning: The rules for iterator invalidation are defined but fiendishly complicated If you make ANY change to a container, assume that ALL iterators to it become undefined This includes adding, removing or swapping elements Except as otherwise explicitly specified And even then you need to watch out for C++ version and compiler variation 9

Iterators As Such Why don't I like iterators, anyway? They make semantic sense only on 1-D linear sequences There is no canonical ordering for n-d arrays, graph structures (trees, DAG etc.) and so on Their use in unordered maps and sets is a horrible hack The way they are used is very serial, as in while-loops Incompatible with vectorisation or parallelisation And those are the key to future performance Think SSE and multiple cores There are much better approaches, as in Fortran But iterators are what C++ people use 10

Example of Problem Try to see what each of these does, but use #include <vector> and not std_lib_facilities.h not and: or: vector<int> p(10,1), q(10,2); vector<int>::iterator r = find(p.begin(),q.end(),2); vector<int>::iterator s = find(p.begin(),q.end(),2); Typically arises when editing to change name Can 'lurk' for years, appearing to work If you check for container match, that error will be trapped But stops you using a few of the more arcane STL facilities 11

Other Approaches (1) I much prefer code like the following: template<class T> T::iterator find ( const T & obj, const T::value_type & val) { } T::iterator res = obj.begin(); while (res!= obj.end() && *res!= val) ++res; return res; It's more restrictive (i.e. it uses only the whole array) But it's easier to use and safer Myclass::iterator ptr = find(obj,val); 12

Other Approaches (2) But I tend not to do even that for my own classes I prefer to write a member function find() It can can then include class-dependent comparison and checking (especially the latter) It's also even easier to use: Myclass::iterator ptr = obj.find(val); Or even to use indices: int n = obj.find(val); Even the STL does this for things like sorting lists Both of these approaches can be vectorised and parallelised without changing the interface 13

The while/for Schism Which is clearer of the following codes? and Myvector::iterator ptr1 = vec1.begin(), ptr2 = vec2.begin(); while (ptr1!= vec.end()) { action(*ptr1,*ptr2); ++ptr1; ++ptr2; } for (int n = 0; n < vec1.size(); ++n) action(vec1[n],vec2[n]); Sometimes I write one, and sometimes the other Do whichever is more natural for your code Also, only the second can be vectorised/parallelised 14

A <map> gotcha As the book says, this will add Not there =0 to the map: map<string,int> a; int n = a[ Not there ]; It means that the following won't work: int fred (const map<string,int> & a) { } return a[ Not there ]; I don't love that behaviour much! 15

Other Associative Containers The book teaches <map>, <unordered_map> and <set>; ; there is also <unordered_set> Also multi versions, which allow replicated keys Generally, I advise you NOT to use those They are very hard to use correctly This is a generic point, and nothing to do with C++ Most people are far better off creating a simple class, with a value and version number You order them by value and then version number You then write a subclass, which returns an iterator to all entries with a certain value; doing that makes a good exercise 16

Unordered Containers These still have iterators I regard this as a horrible aspect, though there are uses It causes unnecessary performance and parallelism problems, and there are better approaches But, with the STL, don't trust the order in any way Two identical sets of keys may have different orders And any add or removal may shuffle the order The only guarantee is the order won't change while: Merely accessing elements Updating elements in place don't even replace them Using query functions like size() 17

Other STL Issues The STL containers have multiple ways to do the same thing, but sometimes omit basic function Just create a derived class and add it If you critically need access to its private data: First, stop programming, and think.. Then do one of: Solve your problem another way Use another STL class Use another library Write your own container class Don't import an open source STL class they are foully over-complicated 18

The STL and Parallelism It's underspecified and overly restrictive I and others are hoping to get this improved Don't get your hopes up too much The FAIRLY SAFE rules for use in parallel are: You can use separate containers in parallel You can access and update separate elements in place DON'T extend, truncate, insert or erase DON'T even replace them, in general You can replace in <vector>, <array> etc. You can use query functions in parallel with that And NOTHING else! 19

STL Algorithms I don't find them useful, except for sort() This ISN'T just because I can easily code even sort() It is because they don't save time or clarify my code Most of them replace just one line of simple code They often need a lot of overhead (e.g. function objects) Software reuse is an excellent principle, and a very bad dogma Pretty well every good engineering principle goes bad when it it is taken beyond the limit of its applicability I will come back to the numeric algorithms later 20

Software Reuse Always ask yourself Do I really need to write this facility can't I import it from somewhere? You can always replace it by your own later, if necessary As a general rule, start by reusing if possible But remember that reusing software creates a dependence on its specification and code Write your own only if you have a solid reason to But, IF you have a solid reason, then do so Performance is very rarely a solid reason 21

When to Use the STL Always attempt to use the STL, unless: Using it makes your code messier or less clear You need properties that it does not provide, or it is not explicitly specified to do what you want But at least try extending or deriving from it Just coding and hoping is NOT a good strategy Regrettably, this oten includes numerics and parallelism You have tested and tried your code and the performance is unacceptable And there is a clean method to do a LOT better Back to numerics and parallelism, are we? :-) 22

The Questions to Ask The following are some of the questions to ask: Will it be simpler and cleaner, or less so? Will it be more reliable, or less so? Will it be more portable, or less so? Will it be more maintainable, or less so? Will it be more efficient, or less so? This is very much the least important question Which ones depend mainly on your requirements Your skill is a secondary consideration seriously 23

When to Reuse Almost always, but do NOT include the source Build using the latest version available to you When there is a standard and stable interface BLAS, LAPACK, MPI, etc. Or there is reliable, portable and stable software NAG, FFTW, PCRE etc. Usually, subject to maintenance, portability etc. When your system has a suitable library MKL, ACML, often Boost etc. When there is suitable open source to include Watch out for copyright rules, maintenance etc. 24

When NOT to Reuse Even here, debug your code by reusing, if possible When it doesn't meet your requirements AND extending it is more work than rewriting Adding parallelism is one common example When you need a high level of portability And the available software is too restrictive Regrettably, copyright often forces this When it simply doesn't work on your data And you are absolutely certain it's not your bug This is FAR rarer than most people claim 25

Calling C Libraries Write proper C++ wrappers and call from them Do NOT plaster C-style calls all over your code Similarly for using external variables And C macros defining constants or functions Headers with fancy macros are a nightmare They can completely break the STL headers Best to keep all C interfaces in separate files Export only proper C++ interfaces This can be non-trivial, but is most needed when it is hardest to do sorry, but that's C for you 26

Calling Fortran You have to do it via C, but it can be done BLAS, LAPACK etc. are usually in Fortran 77 I give more details on calling them a later lecture You can do it more cleanly in modern Fortran C++ is not formally supported, and you need to take care There is some information in my Fortran course 27

Boost The STL is over-complicated for its function Boost takes this to a whole new level One chapter needs a whole BOOK of documentation It's a collection of contributed libraries They overlap, and vary in quality It's monolithic, with a foul build mechanism Even on an x86 Linux system, it often fails It's usually fairly simple to configure round that Porting it is the stuff of nightmares 28

Boost But, a lot of people use it successfully Watch out if you need to collaborate Yes, YOU can install Boost, but can THEY? And is it portable or reliable over time? The C++ standard itself isn't great on that It's a real pain if you just one one facility Each class can drag in dozens of others Your build time can go through the roof And the size of your executables, of course 29

Domain-specific Libraries Many areas collaborate through libraries Often used to read and write file formats Not much option, if you want to collaborate Mixing two such libraries can be a right pain Often best done by writing two separate programs I cannot offer any meaningful advice here 30

Exercise Exercise 16: Repeat exercise 7, but use a function that takes a container and a value as argument Return the position or -1 if not found, and test all of not found, below the lowest, and above the highest 31

Next lecture Numerics 32