Programming Notes & Advice

Todd Littell 22SEP05

Basic Coding Practices

Maintain State Locally
//////////// Bad C int count = 0; void countTokens() { while (…) ++count; } //////////// Bad Java public class TokenCount { public int count; public String token; } //////////// Good C int countTokens() { int count = 0 while (…) ++count; return count; } //////////// Good Java public class TokenCount { private int count; private String token; public int getCount() { return count; } public void setCount(int cnt) { this.count = cnt; } }

Maintain State Locally (2)
//////////// Bad C typedef struct { int count; char *token; } TokenCount; TokenCount tcArray[1000]; int tcArraySize = 1000; int indexOfToken(char *token) { for (int i = 0; i < tcArraySize; i++) if (!strcmp(tcArray[i], token)) return i; return -1; } //////////// Good C typedef struct { int count; char *token; } TokenCount; static TokenCount tcArray[1000]; static int tcArraySize = 1000;

int indexOfToken(TokenCount *array, int arraySize, char *token) { for (int i = 0; i < arraySize; i++) if (!strcmp(array[i], token)) return i; return -1; } int main() { int k = indexOfToken(tcArray, tcArraySize, argv[1]); } Note: Always keep data as “local” and “private” as possible. In C++/Java, use “private” keyword; in C use static keyword for any globals. Note #2: Only have globals at the highest level of your program, if at all.

Be Careful of Side-Effects
//////////// Bad Java void dumpValidAddresses(Set addrSet) { Iterator it = addrSet.iterator(); while (it.hasNext()) { Address addr = (Address); if (addr.getCity() == null || addr.getState() == null) it.remove(); else System.out.println(“Address: “ + addr); } } //////////// Bad C++ void myProc(Address arr[], int &num) { for (; num >= 0; --num) ; } //////////// Good Java void dumpValidAddresses(Set addrSet) { Iterator it = addrSet.iterator(); while (it.hasNext()) { Address addr = (Address); if (addr.getCity() != null && addr.getState() != null) System.out.println(“Address: “ + addr); } }

//////////// Good C++ void myProc(Address arr[], int num) { for (; num >= 0; --num) ; }

Maintain Consistent Style
//////////// Bad C (inconsistent naming) void cutBreadHorizontally(); void cut_bread_vertically(); //////////// Bad C (inconsistent naming) void cut_bread_horizontally(); void vertically_cut_bread(); void cut_bread_with_diagonal(); //////////// Bad C (inconsistent typing) short countTokens(); void printTokenCount(int count, char *token); //////////// Good C (consistent naming) void cutBreadHorizontally(); void cutBreadVertically(); //////////// Good C (consistent naming) void cut_bread_horizontally(); void cut_bread_vertically(); void cut_bread_diagonally(); //////////// Good C (consistent typing) int countTokens(); void printTokenCount(int count, char *token); // Or typedef int count_t; count_t countTokens(); void printTokenCount(count_t count, …); //////////// Good C (consistent signatures) int indexOf(char c, int start); int indexOfchar *str, int start);

//////////// Bad C (inconsistent signatures) int indexOf(char c, int start); int indexOf(int start, char *str);

Comment when needed
//////////// Bad C int foo(int *a, int b) { char *c = (char*)malloc(b); int d, e, f=0; for (d = 0; d < b; d++) { if (c[d]) continue; c[d] = 1; for (e=a[d]; e-d; e=a[e], c[d]++) c[e] = 1; f = (c[d] > f? c[d] : f); } free(c); return f; } //////////// Good C int maxCycleLength(int array[], int sz) { // visited[] will store 1 if visited; else 0. char *visited = (char*)malloc(sz); int i, j, max=0; for (i = 0; i < sz; i++) // for each array member { if (visited[i]) // if visited then skip continue; visited[i] = 1; // for each member in cycle, // update visited count. for (j=array[i]; j != i; j = array[j], visited[i]++) visited[j] = 1; max = (visited[i] > max? visited[i] : max); } free(visited); return max; }

Be Defensive!
//////////// Bad C int foo(TokenCount *tcArray, int tcArraySize) { for (int i = 0; i < tcArraySize; i++) { printf(“ %s %d\n”, tcArray[i].token, tcArray[i].count); } return 0; } //////////// Good C #define NO_ERR 0 #define ERR_INVALID_ARGS -1 int foo(TokenCount *tcArray, int tcArraySize) { if (tcArray == NULL || tcArraySize < 0) return ERR_INVALID_ARGS; for (int i = 0; i < tcArraySize; i++) { if (tcArrray[i].name == NULL) continue; printf(“ %s %d\n”, tcArray[i].token, tcArray[i].count); } return NO_ERR; } //////////// Good Java void foo(TokenCount tcArray[]) { if (tcArray == null) throw new IllegalArgumentException(“foo()”); for (int i = 0; i < tcArray.length; i++) ; }

//////////// Bad Java void foo(TokenCount tcArray[]) { for (int i = 0; i < tcArray.length; i++) ; }

Maintain Orthogonality
//////////// Bad C void searchAndReplace(char *string, char oldChr, char newChr); //////////// Bad C++ class DataFileConverter { void convert(char *oldFilename, char *newFilename); }; //////////// Bad Java class TimerStack { void pushCurrentTime(); Time popTime(); } class Car { boolean startCarAndDriveToStore(); } //////////// Good C int find(char *string, int pos, char oldChr); void replace(char *string, int pos, char newChr); //////////// Better C++ class DataFileConverter { void convert(FILE *oldFile, FILE *newFile); void convert(int oldfd, int newfd); }; //////////// Good Java class TimerStack { void pushTime(Time currTime); Time popTime(); } class Car { boolean startEngine(); boolean driveToLocation(Location destination); }

Think: Reusability
• • • If a function (method) is more than 30 lines of code, then refactor it! If a file contains more than 300-400 lines of code, then refactor it! Be able to describe the purpose of your function in one sentence, and the purpose of your class (or file) in one paragraph. If it seems complex to you (the programmer) then it is too complex and needs refactoring. WHY?

Be Resource Conscious
•If an object (or function) opens (or allocates or reserves) a resource, then the same object (or function) should release it.
Example #1: void foo(char *mystring) { char *tmpbuf = (char*)malloc((strlen(mystring)+1)*sizeof(ch ar)); // processing happens. free(tmpbuf); } File *gl_infile; // Global input file ptr. char *gl_proc_buf; // Global proc’ing buf. void init(char *infilename) { gl_infile = fopen(infilename, “r”); // opens input file. gl_proc_buf = malloc(200); } void finish() { fclose(gl_infile); free(gl_proc_buf); } int main() { init(argv[1]); // Main processing loop finish(); }

Quick Review
• • Know your language! It is possible to write a bad (or good) program in any language. Some simple practices can go a long way. Keep variables (state) as local (private) as possible. Reduces dependencies. Maintain a consistent style; not just naming/indentation, but also in how you code. Stay away from side-effects, and be explicit about any mutators you may have (i.e. in function name & in comments). Keep your code modular; 30 lines/function; 300 lines/file. KISS. Be defensive, not offensive. Increase level of error checking. Use common idioms & practices. For example, for defining constants, macros, testing code, assertion code, etc.

• • •

• • •

Language Basics

Review of Address Space
Unix/Linux Address Space
Stack Stack Segment

Heap Global Data Executable Code

Data Segment

Text Segment

C Language
• • • • • • • A level above assembly. Still remains the underlying model behind many other languages. Procedural language with pass-by-value argument passing. Comes with powerful pre-processor (e.g. #include, #define, #if, #ifdef, etc.). Separate declarations in header files (.h) from definitions in (.c) source files. Pointer arithmetic is not only supported; it’s encouraged. Typing is weakly enforced; especially with primitives. Anything can be cast into anything else, basically. Functions can be passed as arguments. Powerful!

Review variable scope & duration. Review address space partitioning.

Common C Idioms
• • Pointers are used to simulate pass-by-reference:
– – – – void foo(Address *array, int arraySize); #define NO_ERROR 0 #define ERR_CONNECT -1 #define ERR_DATA -2 #define IS_ERR(err) ((err) < 0) #ifndef MYPROG_H #define MYPROG_H … #endif char *cptr = cptr2; if (cptr) … typedef struct { char *string; short bwidth; void *array; } SkipListNode; SkipListNode* make(int bwidth) { SkipListNode* s = (SkipListNode*)malloc(sizeof(SkipListNode) + (bwidth-1)*sizeof(void*)); return s; }

Using #define for constants (instead of enum{}):

• •

Using #define for macros:
– – – – –

Using #ifdef, #ifdef, #ifndef for header file inclusion:

• •

Testing null-ness the C way:
– – –

Pointer manipulation:

C++ Language
• • • An Object-Oriented extension of C. A better C. OO language with both pass-by-value and pass-by-reference argument passing. (e.g. print(Address addr); vs. print(Address& addr); ). Separate declarations in header files (.hpp) from definitions in (.cpp) source files. Typing (with objects) is strongly enforced. Run-Time Type Information and Exceptions are available, but not equally supported across compilers.

• • •

Review variable scope & duration.

Common C++ Idioms
• • • Use pass-by-ref instead of pass-by-value pointers. Use new/delete operators instead of malloc/free. Do NOT override operators (esp. new/delete), possibly with the exception of streaming operators << and >>. Use smart pointers (reference):
– { auto_ptr<Car> car_ptr(new Car()); }

• •

Use std::string for strings instead of char*. Follow the Law of the Big Three (reference): If you have a non-trivial dtor, then you must provide a suitable copy ctor & assignment op. Or, you must hide them. Avoiding nested templates (or templates altogether). Design-by-contract using Abstract Base Classes (ABCs) as interfaces.
– – class Car { virtual void startEngine() const = 0; }; class Hummer : public Car { virtual void startEngine() { fillUpTank(500); start(); };

• •

Java Language
• • • An interpreted, portable Object-Oriented language, with pass-by-value argument passing. C++ look-a-like, but Smalltalk semantics. Variable are either primitives or object references! All objects are allocated on heap. Only primitives & references are on stack. All methods are virtual (all the time)! Built-in garbage collector & memory manager. Declaration is not separated from definition. Both go in one source file (.java). Pointer arithmetic does not exist, which makes for safer code. Typing is strongly enforced. Built-in multi-threading support.

• • • • • •

Review variable scope & duration. Review address space partitioning.

Common Java Idioms
• Use interface to define constants:
– public interface ProgConstsIF { public final static String DFLT_TMP_PATH = “/tmp”; }

Throw RuntimeException for very common exceptions; else define and throw your own app exception (that is not derived from RuntimeException). When the construction & initialization of an object is complex, then a factory method is often used.
– Example: javax.xml.parsers.DocumentBuilder.parse(InputStream is) : org.w3c.dom.Document;

Introspection is used often:
– Example: void foo(Object o) { if (o instanceof String) … }

You must override hashCode() iff you override equals().

I/O Comparison
BufferedFormatted C fprintf() fscanf() BufferedUnformatted Unbuffered fread()/fgets() read()/getc() fwrite()/fputs() write()/putc() strm.get() strm.put() strm.write() strm.get() strm.put() strm.write()

C++ Overloaded << and >>

Java Classes:

Classes: Classes: StreamTokenizer BufferedReader Reader BufferedWriter Writer PrintWriter

Collections Comparison
List C++ stl:
vector<> list<> Java java.util: List ArrayList LinkedList

stl: set<> multiset<> java.util: Set HashSet TreeSet

stl: map<> multimap<> java.util: Map HashMap TreeMap

Bit Vector
stl: vector<bool> bitset<> java.util: BitSet

Note 1: The C++ Standard Template Library is iterator-based, while the Java Utils are collection-based. Note 2: that Apache Common Collections are commonly used as an extension. Note 3: the STL doesn’t have a hash table.

Design Basics

Common Design Principles
• • • Design top-down, build bottom-up. Design with layers. Layer N can only call functions/objects in layer N-1. Design-by-Contract (aka Interface). Think of objects first in terms of their interface – what they do – not how they are built. Use ABCs or interfaces often. Design with composition, not inheritance. Compose objects with aggregation, not inheritance. Rules for inheriting B from A:
– – – – An instance of B object will always be an instance of an A object. A B object is a special kind-of an A object. Never inherit for convenience (e.g. Stack inheriting from Vector is wrong!). Use roles idiom to handle multiple types, or types that can change over time.

Use design patterns such as Composite, Factory, Strategy, Template, Visitor, Adaptor, Singleton to simplify your code (reference). Also, Manager paradigm. Class hierarchies tend to be flat, not deep. Common practice in C++ is to add a Root class. (Not much method overriding actually happens).

Common Design Principles (2)
• • KISS Do not intertwine object (data) behavior with application behavior.
– Example: • class Car { boolean startEngine(); boolean driveToLocation(Location destination); • class DriveStategies { boolean driveCarToStore(); boolean driveCarToGasStation(); }

Weak coupling via Observer-Observable or a Semantic Event subsystem.

Design Practices
HOW TO ACTUALLY BUILD A DESIGN? • Data Models: A data model such as an Entity Relationship (ER) diagram or Extended Entity Relationship (EER) is still very common.
– – – Why? Because RDBMS still dominate as the persistent storage. How? Use pencil & paper, or whiteboard, or Visio, or software. Notations: Crows foot, Chen, homegrown.

Object Models: An object model encapsulates objects & class hierarchies. Most common notation: Unified Modeling Language (UML).
– – – – Why? Because OO Programming Languages (OOPL) dominate software field. How? Use pencil & paper, or whiteboard, or Visio, or software (Poseidon, Rational Rose, etc). Notations: UML has taken over. OMT was predecessor in many ways. UML is a Swiss Army knife – many views supported for different stakeholders.

Architectural Designs: This can be as simple as a napkin sketch, or as complete as a full-blown UML model. Architectural designs are evaluated by “design principles”.
– – – – Why? Any large software system needs a clear, consistent design. How? Whatever works! Notations: If not done in UML, then can be whatever conveys the essence of the design. Food for thought: A good Software Architect will make $200-300K per year, plus fringe, plus publicity, plus stocks, etc.

Superb Books & References
• Online e-Books/References:
– – – The Art of Unix Programming, by Eric Raymond. Thinking in Java/C++, by Bruce Eckel. C++ FAQ Lite Code Complete, by Steve McConnell. Writing Solid Code, Steve Maguire. C++ How to Program, Deitel & Deitel. (Great b/c it teaches very good many OO concepts). C++ FAQs, Cline, et al. Advanced Programming in the Unix Environment, Richard Stevens. (aka “The Bible”) Instant UML, Muller. Object-Oriented Methods, Graham. (General OO methodology/technology survey). Patterns in Java, Mark Grand. (esp. volumes 1-2). Advanced Object-Oriented Analysis and Design using UML, Odell.

General Programming Books:
– –

Introductory C++ Books:
– –

• •

Great Unix Book:
– – – – –

Software Engineering Books:

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.