logical concept that you can represent in your program as a (linked) list or as a vector. The closest SU analog to our everyday concept of a list (e.g., a to-do list, a list of groceries, or a schedule) is a sequence, and most sequences are best rep· resented as vectors. 20.6.1 Lines How do we decide what's a "line" in our document? There are three obvious choices: 1. Rely on newline indicators (e.g., '\n') in user input. 2. Somehow parse the document and use some "natural" punctuation (e.g., .) . 3. Split any line that grows beyond a given length (e.g., 50 characters) into two. There are undoubtedly also some less obvious choices. For simplicity, we use al#ternative 1 here. We will represent a document in our editor as an object of class Document. Stripped of all refmements, our document type looks like this: typedef vector<char> Line; II a li ne is a vector of characters struct Document { }; list<Line> line; II a document is a l ist of lines II line[i] is the ith line Document() { line.push_back(Line()); } Every Document starts out with a single empty line: Document's constructor makes an empty line and pushes it into the list of lines. Reading and splitting into lines can be done like this: istream& operator>>(istream& is, Document& d) { char ch; while (is>>ch) { d.line.back().push_back(ch); II add the character if (ch=='\n') return is; d.line.push_back(Line()); II add another l ine 707 708 CHAPTER 20 • CONTAINER S AND ITE RATO RS Both vector and list have a member function back() that returns a reference to the last element. To use it, you have to be sure that there really is a last element for back() to refer to - don't use it on an empty container. That's why we defined an empty Document to have one empty Line. Note that we store every character from input, even the newline characters ('\n'). Storing those newline characters greatly simplifies output, but you have to be careful how you define a character count Gust counting characters will give a number that includes space and new#line characters). 20.6.2 Iteration If the document was just a vector<char> it would be simple to iterate over it. How do we iterate over a list of lines? Obviously, we can iterate over the list using list<Line>: : iterator. However, what if we wanted to visit the characters one after another without any fuss about line breaks? We could provide an iterator specifically designed for our Document: class Text_iterator { II keep track of l ine and character position within a line list<Line>: :iterator In; Line: : iterator pos; public: }; II start the iterator at l ine ll's character position pp: Text_iterator(list<Line>: : iterator II, Line: : iterator pp) :ln(ll), pos(pp) {} char& operator•() { return •pos; } Text_iterator& operator++(); boo I operator==(const Text_iterator& other) const { return ln==other.ln && pos==other.pos; } bool operator!=(const Text_iterator& other) const { return ! ( *this=other); } Text_iterator& Text_iterator: :operator++() { if (pos==( *ln).end()) { ++In; II proceed to next line pos = (*ln).begin(); ++pos; return •this; II proceed to next character 20.6 AN EXAMPLE: A SIMPLE TEXT EDI TOR To make Text_iterator useful, we need to equip class Document with conven#tional begin() and end() functions : struct Document { list<Line> line; Text_iterator begin() II iirst character oi first line { return Text_iterator(line.begin(), (*line.begin()).begin()); } Text_iterator end() II one beyond the last line { return Text_iterator(line.end(), (*line.end()).end()); } } ; We need the curious (*line.begin()).begin() notation because we want the begin#ning of what line.begin() points to; we could alternatively have used line.begin()-> begin() because the standard library iterators support ->. We can now iterate over the characters of a document like this: void print(Document& d) { for (Text_iterator p = d.begin(); pl=d.end(); ++p) cout << •p; print(my_doc); Presenting the document as a sequence of characters is useful for many things, but usually we traverse a document looking for something more specific than a character. For example, here is a piece of code to delete line n: void erase _line( Document& d, int n) { if (n<O II d.line.size()<=n) return; II ignore out-oi-range l ines d.line.erase(advance(d.line.begin(), n)); The call advance(n) moves an iterator n elements forward; advance() is a stan#dard library function, but we could have implemented it ourselves like this: template<class Iter> Iter advance( Iter p, int n) { while (n>O) { ++p; --n; } return p; II go forward 709 10 CHAPTER 20 • CON TAIN ERS AND ITE RATORS Note that advance() can be used to simulate subscripting. In fact, for a vector called v, •advance(v.begin(),n) is roughly equivalent to v[n). Note that "roughly" means that advance() laboriously moves past the first n-1 elements one by one, whereas the subscript goes straight to the nth element. For a list, we have to usc the laborious method. It's a price we have to pay for the more flexible layout of the clements of a list. For an iterator that can move both forward and backward. such as the itera#tor for list, a negative argument to the standard library advance() will move the iterator backward. For an iterator that can handle subscripting, such as the itcra#tor for a vector. the standard library advance() will go directly to the right ele#ment rather than slowly moving along using ++. Clearly, the standard library advance() is a bit smarter than ours. That's worth noticing: typically, the stall· dard library facilities have had more care and time spent on them than we could afford, so prefer the standard facilities to "home brew." T RY THI S • Rewrite advance() so that it will "go backward" when you give it a negative argument. Probably, a search is the kind of iteration that is most obvious to a user. We search for individual words (such as milkshake or Gavin), for sequences of letters that can't easily be considered words (such as secret\nhomestead - i.e., a line ending with secret followed by a line starting with homestead), for regular expressions (e.g .. [bBJ\w•ne - i.e., all upper- or lowercase 8 followed by 0 or more letters fol#lowed by ne; see Chapter 23), etc. Let's show how to hal1dle the second case, fmd#ing a string, using our Document layout. We use a simple - non-optimal algorithm: Fmd the first character of our search string in the document. See if that character and the following characters match our search string. If so, we are finished : if not, we look for the next occurrence of that first character. For generality, we adopt the STL convention of defming the text in which to search as a sequence defined by a pair of iterators. That way we Call use our search func#tion for allY part of a document as well as a complete document. If we fmd an oc#currence of our string in the document, we return all iterator to its frrst character; if we don't ftnd an occurrence, we return an iterator to the end of the sequence: