The code for this book is automatically extracted directly
from the ASCII text version of this book. The book is normally maintained in a
word processor capable of producing camera-ready copy, automatically creating
the table of contents and index, etc. To generate the code files, the book is
saved into a plain ASCII text file, and the program in this section
automatically extracts all the code files, places them in appropriate
subdirectories, and generates all the makefiles. The entire contents of the book
can then be built, for each compiler, by invoking a single make command.
This way, the code listings in the book can be regularly tested and verified,
and in addition various compilers can be tested for some degree of compliance
with Standard C++ (the degree to which all the examples in the book can exercise
a particular compiler, which is not too bad).
The code in this book is designed to be as generic as
possible, but it is only tested under two operating systems: 32-bit Windows and
Linux (using the Gnu C++ compiler g++, which means it should compile
under other versions of Unix without too much trouble). You can easily get the
latest sources for the book onto your machine by going to the web site
www.BruceEckel.com and downloading the zipped archive containing all the
code files and makefiles. If you unzip this you’ll have the book’s
directory tree available. However, it may not be configured for your particular
compiler or operating system. In this case, you can generate your own using the
ASCII text file for the book (available at www.BruceEckel.com) and the
ExtractCode.cpp program in this section. Using a text editor, you find
the CompileDB.txt file inside the ASCII text file for the book, edit it
(leaving it the book’s text file) to adapt it to your compiler and
operating system, and then hand it to the ExtractCode program to generate your
own source tree and makefiles.
You’ve seen that each file to be extracted contains a
starting marker (which includes the file name and path) and an ending marker.
Files can be of any type, and if the colon after the comment is directly
followed by a ‘!’ then the starting and ending marker lines are not
reproduced in the generated file. In addition, you’ve seen the other
markers {O}, {L}, and {T} that have been placed inside
comments; these are used to generate the makefile for each
subdirectory.
If there’s a mistake in the input file, then the program
must report the error, which is the error( ) function at the
beginning of the program. In addition, directory manipulation is not supported
by the standard libraries, so this is hidden away in the class
OSDirControl. If you discover that this class will not compile on your
system, you must replace the non-portable function calls in OSDirControl
with equivalent calls from your library.
Although this program is very useful for distributing the code
in the book, you’ll see that it’s also a useful example in its own
right, since it partitions everything into sensible objects and also makes heavy
use of the STL and the standard string class. You may note that one or
two pieces of code might be duplicated from other parts of the book, and you
might observe that some of the tools created within the program might have been
broken out into their own reusable header files and cpp files. However,
for easy unpacking of the book’s source code it made more sense to keep
everything lumped together in a single file.
//: C11:ExtractCode.cpp // Automatically extracts code files from // ASCII text of this book. #include <iostream> #include <fstream> #include <string> #include <vector> #include <map> #include <set> #include <algorithm> using namespace std; string copyright = "// From Thinking in C++, 2nd Edition\n" "// Available at http://www.BruceEckel.com\n" "// (c) Bruce Eckel 1999\n" "// Copyright notice in Copyright.txt\n"; string usage = " Usage:ExtractCode source\n" "where source is the ASCII file containing \n" "the embedded tagged sourcefiles. The ASCII \n" "file must also contain an embedded compiler\n" "configuration file called CompileDB.txt \n" "See Thinking in C++, 2nd ed. for details\n"; // Tool to remove the white space from both ends: string trim(const string& s) { if(s.length() == 0) return s; int b = s.find_first_not_of(" \t"); int e = s.find_last_not_of(" \t"); if(b == -1) // No non-spaces return ""; return string(s, b, e - b + 1); } // Manage all the error messaging: void error(string problem, string message) { static const string border( "-----------------------------------------\n"); class ErrReport { int count; string fname; public: ofstream errs; ErrReport(char* fn = "ExtractCodeErrors.txt") : count(0),fname(fn),errs(fname.c_str()) {} void operator++(int) { count++; } ~ErrReport() { errs.close(); // Dump error messages to console ifstream in(fname.c_str()); cerr << in.rdbuf() << endl; cerr << count << " Errors found" << endl; cerr << "Messages in " << fname << endl; } }; // Created on first call to this function; // Destructor reports total errors: static ErrReport report; report++; report.errs << border << message << endl << "Problem spot: " << problem << endl; } ///// OS-specific code, hidden inside a class: #ifdef __GNUC__ // For gcc under Linux/Unix #include <unistd.h> #include <sys/stat.h> #include <stdlib.h> class OSDirControl { public: static string getCurrentDir() { char path[PATH_MAX]; getcwd(path, PATH_MAX); return string(path); } static void makeDir(string dir) { mkdir(dir.c_str(), 0777); } static void changeDir(string dir) { chdir(dir.c_str()); } }; #else // For Dos/Windows: #include <direct.h> class OSDirControl { public: static string getCurrentDir() { char path[_MAX_PATH]; getcwd(path, _MAX_PATH); return string(path); } static void makeDir(string dir) { mkdir(dir.c_str()); } static void changeDir(string dir) { chdir(dir.c_str()); } }; #endif ///// End of OS-specific code class PushDirectory { string oldpath; public: PushDirectory(string path); ~PushDirectory() { OSDirControl::changeDir(oldpath); } void pushOneDir(string dir) { OSDirControl::makeDir(dir); OSDirControl::changeDir(dir); } }; PushDirectory::PushDirectory(string path) { oldpath = OSDirControl::getCurrentDir(); while(path.length() != 0) { int colon = path.find(':'); if(colon != string::npos) { pushOneDir(path.substr(0, colon)); path = path.substr(colon + 1); } else { pushOneDir(path); return; } } } //--------------- Manage code files ------------- // A CodeFile object knows everything about a // particular code file, including contents, path // information, how to compile, link, and test // it, and which compilers it won't compile with. enum TType {header, object, executable, none}; class CodeFile { TType _targetType; string _rawName, // Original name from input _path, // Where the source file lives _file, // Name of the source file _base, // Name without extension _tname, // Target name _testArgs; // Command-line arguments vector<string> lines, // Contains the file _compile, // Compile dependencies _link; // How to link the executable set<string> _noBuild; // Compilers it won't compile with bool writeTags; // Whether to write the markers // Initial makefile processing for the file: void target(const string& s); // For quoted #include headers: void headerLine(const string& s); // For special dependency tag marks: void dependLine(const string& s); public: CodeFile(istream& in, string& s); const string& rawName() { return _rawName; } const string& path() { return _path; } const string& file() { return _file; } const string& base() { return _base; } const string& targetName() { return _tname; } TType targetType() { return _targetType; } const vector<string>& compile() { return _compile; } const vector<string>& link() { return _link; } const set<string>& noBuild() { return _noBuild; } const string& testArgs() { return _testArgs; } // Add a compiler it won't compile with: void addFailure(const string& failure) { _noBuild.insert(failure); } bool compilesOK(string compiler) { return _noBuild.count(compiler) == 0; } friend ostream& operator<<(ostream& os, const CodeFile& cf) { copy(cf.lines.begin(), cf.lines.end(), ostream_iterator<string>(os, "")); return os; } void write() { PushDirectory pd(_path); ofstream listing(_file.c_str()); listing << *this; // Write the file } void dumpInfo(ostream& os); }; void CodeFile::target(const string& s) { // Find the base name of the file (without // the extension): int lastDot = _file.find_last_of('.'); if(lastDot == string::npos) { error(s, "Missing extension"); exit(1); } _base = _file.substr(0, lastDot); // Determine the type of file and target: if(s.find(".h") != string::npos || s.find(".H") != string::npos) { _targetType = header; _tname = _file; return; } if(s.find(".txt") != string::npos || s.find(".TXT") != string::npos || s.find(".dat") != string::npos || s.find(".DAT") != string::npos) { // Text file, not involved in make _targetType = none; _tname = _file; return; } // C++ objs/exes depend on their own source: _compile.push_back(_file); if(s.find("{O}") != string::npos) { // Don't build an executable from this file _targetType = object; _tname = _base; } else { _targetType = executable; _tname = _base; // The exe depends on its own object file: _link.push_back(_base); } } void CodeFile::headerLine(const string& s) { int start = s.find('\"'); int end = s.find('\"', start + 1); int len = end - start - 1; _compile.push_back(s.substr(start + 1, len)); } void CodeFile::dependLine(const string& s) { const string linktag("//{L} "); string deps = trim(s.substr(linktag.length())); while(true) { int end = deps.find(' '); string dep = deps.substr(0, end); _link.push_back(dep); if(end == string::npos) // Last one break; else deps = trim(deps.substr(end)); } } CodeFile::CodeFile(istream& in, string& s) { // If false, don't write begin & end tags: writeTags = (s[3] != '!'); // Assume a space after the starting tag: _file = s.substr(s.find(' ') + 1); // There will always be at least one colon: int lastColon = _file.find_last_of(':'); if(lastColon == string::npos) { error(s, "Missing path"); lastColon = 0; // Recover from error } _rawName = trim(_file); _path = _file.substr(0, lastColon); _file = _file.substr(lastColon + 1); _file =_file.substr(0,_file.find_last_of(' ')); cout << "path = [" << _path << "] " << "file = [" << _file << "]" << endl; target(s); // Determine target type if(writeTags){ lines.push_back(s + '\n'); lines.push_back(copyright); } string s2; while(getline(in, s2)) { // Look for specified link dependencies: if(s2.find("//{L}") == 0) // 0: Start of line dependLine(s2); // Look for command-line arguments for test: if(s2.find("//{T}") == 0) // 0: Start of line _testArgs = s2.substr(strlen("//{T}") + 1); // Look for quoted includes: if(s2.find("#include \"") != string::npos) { headerLine(s2); // Grab makefile info } // Look for end marker: if(s2.find("//" "/:~") != string::npos) { if(writeTags) lines.push_back(s2 + '\n'); return; // Found the end } // Make sure you don't see another start: if(s2.find("//" ":") != string::npos || s2.find("/*" ":") != string::npos) { error(s, "Error: new file started before" " previous file concluded"); return; } // Write ordinary line: lines.push_back(s2 + '\n'); } } void CodeFile::dumpInfo(ostream& os) { os << _path << ':' << _file << endl; os << "target: " << _tname << endl; os << "compile: " << endl; for(int i = 0; i < _compile.size(); i++) os << '\t' << _compile[i] << endl; os << "link: " << endl; for(int i = 0; i < _link.size(); i++) os << '\t' << _link[i] << endl; if(_noBuild.size() != 0) { os << "Won't build with: " << endl; copy(_noBuild.begin(), _noBuild.end(), ostream_iterator<string>(os, "\n")); } } //--------- Manage compiler information --------- class CompilerData { // Information about each compiler: vector<string> rules; // Makefile rules set<string> fails; // Non-compiling files string objExtension; // File name extensions string exeExtension; // For OS-specific activities: bool _dos, _unix; // Store the information for all the compilers: static map<string, CompilerData> compilerInfo; static set<string> _compilerNames; public: CompilerData() : _dos(false), _unix(false) {} // Read database of various compiler's // information and failure listings for // compiling the book files: static void readDB(istream& in); // For enumerating all the compiler names: static set<string>& compilerNames() { return _compilerNames; } // Tell this CodeFile which compilers // don't work with it: static void addFailures(CodeFile& cf); // Produce the proper object file name // extension for this compiler: static string obj(string compiler); // Produce the proper executable file name // extension for this compiler: static string exe(string compiler); // For inserting a particular compiler's // rules into a makefile: static void writeRules(string compiler, ostream& os); // Change forward slashes to backward // slashes if necessary: static string adjustPath(string compiler, string path); // So you can ask if it's a Unix compiler: static bool isUnix(string compiler) { return compilerInfo[compiler]._unix; } // So you can ask if it's a dos compiler: static bool isDos(string compiler) { return compilerInfo[compiler]._dos; } // Display information (for debugging): static void dump(ostream& os = cout); }; // Static initialization: map<string,CompilerData> CompilerData::compilerInfo; set<string> CompilerData::_compilerNames; void CompilerData::readDB(istream& in) { string compiler; // Name of current compiler string s; while(getline(in, s)) { if(s.find("#//" "/:~") == 0) return; // Found end tag s = trim(s); if(s.length() == 0) continue; // Blank line if(s[0] == '#') continue; // Comment if(s[0] == '{') { // Different compiler compiler = s.substr(0, s.find('}')); compiler = trim(compiler.substr(1)); if(compiler.length() != 0) _compilerNames.insert(compiler); continue; // Changed compiler name } if(s[0] == '(') { // Object file extension string obj = s.substr(1); obj = trim(obj.substr(0, obj.find(')'))); compilerInfo[compiler].objExtension =obj; continue; } if(s[0] == '[') { // Executable extension string exe = s.substr(1); exe = trim(exe.substr(0, exe.find(']'))); compilerInfo[compiler].exeExtension =exe; continue; } if(s[0] == '&') { // Special directive if(s.find("dos") != string::npos) compilerInfo[compiler]._dos = true; else if(s.find("unix") != string::npos) compilerInfo[compiler]._unix = true; else error("Compiler Information Database", "unknown special directive: " + s); continue; } if(s[0] == '@') { // Makefile rule string rule(s.substr(1)); // Remove the @ if(rule[0] == ' ') // Space means tab rule = '\t' + trim(rule); compilerInfo[compiler].rules .push_back(rule); continue; } // Otherwise, it's a failure line: compilerInfo[compiler].fails.insert(s); } error("CompileDB.txt","Missing end tag"); } void CompilerData::addFailures(CodeFile& cf) { set<string>::iterator it = _compilerNames.begin(); while(it != _compilerNames.end()) { if(compilerInfo[*it] .fails.count(cf.rawName()) != 0) cf.addFailure(*it); it++; } } string CompilerData::obj(string compiler) { if(compilerInfo.count(compiler) != 0) { string ext( compilerInfo[compiler].objExtension); if(ext.length() != 0) ext = '.' + ext; // Use '.' if it exists return ext; } else return "No such compiler information"; } string CompilerData::exe(string compiler) { if(compilerInfo.count(compiler) != 0) { string ext( compilerInfo[compiler].exeExtension); if(ext.length() != 0) ext = '.' + ext; // Use '.' if it exists return ext; } else return "No such compiler information"; } void CompilerData::writeRules( string compiler, ostream& os) { if(_compilerNames.count(compiler) == 0) { os << "No info on this compiler" << endl; return; } vector<string>& r = compilerInfo[compiler].rules; copy(r.begin(), r.end(), ostream_iterator<string>(os, "\n")); } string CompilerData::adjustPath( string compiler, string path) { // Use STL replace() algorithm: if(compilerInfo[compiler]._dos) replace(path.begin(), path.end(), '/', '\\'); return path; } void CompilerData::dump(ostream& os) { ostream_iterator<string> out(os, "\n"); *out++ = "Compiler Names:"; copy(_compilerNames.begin(), _compilerNames.end(), out); map<string, CompilerData>::iterator compIt; for(compIt = compilerInfo.begin(); compIt != compilerInfo.end(); compIt++) { os << "******************************\n"; os << "Compiler: [" << (*compIt).first << "]" << endl; CompilerData& cd = (*compIt).second; os << "objExtension: " << cd.objExtension << "\nexeExtension: " << cd.exeExtension << endl; *out++ = "Rules:"; copy(cd.rules.begin(), cd.rules.end(), out); cout << "Won't compile with: " << endl; copy(cd.fails.begin(), cd.fails.end(), out); } } // ---------- Manage makefile creation ---------- // Create the makefile for this directory, based // on each of the CodeFile entries: class Makefile { vector<CodeFile> codeFiles; // All the different paths // (for creating the Master makefile): static set<string> paths; void createMakefile(string compiler, string path); public: Makefile() {} void addEntry(CodeFile& cf) { paths.insert(cf.path()); // Record all paths // Tell it what compilers don't work with it: CompilerData::addFailures(cf); codeFiles.push_back(cf); } // Write the makefile for each compiler: void writeMakefiles(string path); // Create the master makefile: static void writeMaster(string flag = ""); }; // Static initialization: set<string> Makefile::paths; void Makefile::writeMakefiles(string path) { if(trim(path).length() == 0) return; // No makefiles in root directory PushDirectory pd(path); set<string>& compilers = CompilerData::compilerNames(); set<string>::iterator it = compilers.begin(); while(it != compilers.end()) createMakefile(*it++, path); } void Makefile::createMakefile( string compiler, string path) { string // File name extensions: exe(CompilerData::exe(compiler)), obj(CompilerData::obj(compiler)); string filename(compiler + ".makefile"); ofstream makefile(filename.c_str()); makefile << "# From Thinking in C++, 2nd Edition\n" "# At http://www.BruceEckel.com\n" "# (c) Bruce Eckel 1999\n" "# Copyright notice in Copyright.txt\n" "# Automatically-generated MAKEFILE \n" "# For examples in directory "+ path + "\n" "# using the " + compiler + " compiler\n" "# Note: does not make files that will \n" "# not compile with this compiler\n" "# Invoke with: make -f " + compiler + ".makefile\n" << endl; CompilerData::writeRules(compiler, makefile); vector<string> makeAll, makeTest, makeBugs, makeDeps, linkCmd; // Write the "all" dependencies: makeAll.push_back("all: "); makeTest.push_back("test: all "); makeBugs.push_back("bugs: "); string line; vector<CodeFile>::iterator it; for(it = codeFiles.begin(); it != codeFiles.end(); it++) { CodeFile& cf = *it; if(cf.targetType() == executable) { line = "\\\n\t"+cf.targetName()+ exe + ' '; if(cf.compilesOK(compiler) == false) { makeBugs.push_back( CompilerData::adjustPath( compiler,line)); } else { makeAll.push_back( CompilerData::adjustPath( compiler,line)); line = "\\\n\t" + cf.targetName() + exe + ' ' + cf.testArgs() + ' '; makeTest.push_back( CompilerData::adjustPath( compiler,line)); } // Create the link command: int linkdeps = cf.link().size(); string linklist; for(int i = 0; i < linkdeps; i++) linklist += cf.link().operator[](i) + obj + " "; line = cf.targetName() + exe + ": " + linklist + "\n\t$(CPP) $(OFLAG)" + cf.targetName() + exe + ' ' + linklist + "\n\n"; linkCmd.push_back( CompilerData::adjustPath(compiler,line)); } // Create dependencies if(cf.targetType() == executable || cf.targetType() == object) { int compiledeps = cf.compile().size(); string objlist(cf.base() + obj + ": "); for(int i = 0; i < compiledeps; i++) objlist += cf.compile().operator[](i) + " "; makeDeps.push_back( CompilerData::adjustPath( compiler, objlist) +"\n"); } } ostream_iterator<string> mkos(makefile, ""); *mkos++ = "\n"; // The "all" target: copy(makeAll.begin(), makeAll.end(), mkos); *mkos++ = "\n\n"; // Remove continuation marks from makeTest: vector<string>::iterator si = makeTest.begin(); int bsl; for(; si != makeTest.end(); si++) if((bsl= (*si).find("\\\n")) != string::npos) (*si).erase(bsl, strlen("\\")); // Now print the "test" target: copy(makeTest.begin(), makeTest.end(), mkos); *mkos++ = "\n\n"; // The "bugs" target: copy(makeBugs.begin(), makeBugs.end(), mkos); if(makeBugs.size() == 1) *mkos++ = "\n\t@echo No compiler bugs in " "this directory!"; *mkos++ = "\n\n"; // Link commands: copy(linkCmd.begin(), linkCmd.end(), mkos); *mkos++ = "\n"; // Demendencies: copy(makeDeps.begin(), makeDeps.end(), mkos); *mkos++ = "\n"; } void Makefile::writeMaster(string flag) { string filename = "makefile"; if(flag.length() != 0) filename += '.' + flag; ofstream makefile(filename.c_str()); makefile << "# Master makefile for " "Thinking in C++, 2nd Ed. by Bruce Eckel\n" "# at http://www.BruceEckel.com\n" "# Compiles all the code in the book\n" "# Copyright notice in Copyright.txt\n\n" "help: \n" "\t@echo To compile all programs from \n" "\t@echo Thinking in C++, 2nd Ed., type\n" "\t@echo one of the following commands,\n" "\t@echo according to your compiler:\n"; set<string>& n = CompilerData::compilerNames(); set<string>::iterator nit; for(nit = n.begin(); nit != n.end(); nit++) makefile << string("\t@echo make " + *nit + "\n"); makefile << endl; // Make for each compiler: for(nit = n.begin(); nit != n.end(); nit++) { makefile << *nit << ":\n"; for(set<string>::iterator it = paths.begin(); it != paths.end(); it++) { // Ignore the root directory: if((*it).length() == 0) continue; makefile << "\tcd " << *it; // Different commands for unix vs. dos: if(CompilerData::isUnix(*nit)) makefile << "; "; else makefile << "\n\t"; makefile << "make -f " << *nit << ".makefile"; if(flag.length() != 0) { makefile << ' '; if(flag == "bugs") makefile << "-i "; makefile << flag; } makefile << "\n"; if(CompilerData::isUnix(*nit) == false) makefile << "\tcd ..\n"; } makefile << endl; } } int main(int argc, char* argv[]) { if(argc < 2) { error("Command line error", usage); exit(1); } // For development & testing, leave off notice: if(argc == 3) if(string(argv[2]) == "-nocopyright") copyright = ""; // Open the input file to read the compiler // information database: ifstream in(argv[1]); if(!in) { error(string("can't open ") + argv[1],usage); exit(1); } string s; while(getline(in, s)) { // Break up the strings to prevent a match when // this code is seen by this program: if(s.find("#:" " :CompileDB.txt") != string::npos) { // Parse the compiler information database: CompilerData::readDB(in); break; // Out of while loop } } if(in.eof()) error("CompileDB.txt", "Can't find data"); in.seekg(0, ios::beg); // Back to beginning map<string, Makefile> makeFiles; while(getline(in, s)) { // Look for tag at beginning of line: if(s.find("//" ":") == 0 || s.find("/*" ":") == 0 || s.find("#" ":") == 0) { CodeFile cf(in, s); cf.write(); // Tell it to write itself makeFiles[cf.path()].addEntry(cf); } } // Write all the makefiles, telling each // the path where it belongs: map<string, Makefile>::iterator mfi; for(mfi = makeFiles.begin(); mfi != makeFiles.end(); mfi++) (*mfi).second.writeMakefiles((*mfi).first); // Create the master makefile: Makefile::writeMaster(); // Write the makefile that tries the bug files: Makefile::writeMaster("bugs");
}
///:~
The first tool you see is trim( ), which was
lifted from the strings chapter earlier in the book. It removes the
whitespace from both ends of a string object. This is followed by the
usage string which is printed whenever something goes wrong with the
program.
The error( ) function is global because it uses a
trick with static members of functions. error( ) is designed so that
if it is never called, no error reporting occurs, but if it is called one or
more times then an error file is created and the total number of errors is
reported at the end of the program execution. This is accomplished by creating a
nested class ErrReport and making a static ErrReport object
inside error( ). That way, an ErrReport object is only
created the first time error( ) is called, so if
error( ) is never called no error reporting will occur.
ErrReport creates an ofstream to write the errors to, and the
ErrReport destructor closes the ofstream, then re-opens it and
dumps it to cerr. This way, if the error report is too long and scrolls
off the screen, you can use an editor to look at it. The count of the number of
errors is held in ErrReport, and this is also reported upon program
termination.
The job of a PushDirectory object is to capture the
current directory, then created and move down each directory in the path (the
path can be arbitrarily long). Each subdirectory in the file’s path
description is separated by a ‘:’ and the
mkdir( ) and chdir( ) (or the equivalent on your system)
are used to move into only one directory at a time, so the actual character
that’s used to separate directory paths is safely ignored. The destructor
returns the path to the one that was captured before all the creating and moving
took place.
Unfortunately, there are no functions in Standard C or
Standard C++ to control directory creation and movement, so this is captured in
the class OSDirControl. After reading the design patterns chapter, your
first impulse might be to use the full “Bridge” pattern. However,
there’s a lot more going on here. Bridge generally works with things that
are already classes, and here we are actually creating the class to
encapsulating operating system directory control. In addition, this requires
#ifdefs and #includes for each different operating system and
compiler. However, the basic idea is that of a Bridge, since the rest of the
code (PushDirectory is actually the only thing that uses this, and thus
it acts as the Bridge abstraction) treats an OsDirControl object as a
standard interface.
All the information about a particular source code file is
encapsulated in a CodeFile object. This includes the type of target the
file should produce, variations on the name of the file including the name of
the target file it’s supposed to produce. The entire contents of the file
is contained in the vector<string> lines. In addition, the
file’s dependencies (the files which, if they change, should cause a
recompilation of the current file) and the files on the linker command line are
also vector<string> objects. The CodeFile object keeps all
the compilers it won’t work with in _noBuild, which is a
set<string> because it’s easier to look up an element in a
set. The writeTags flag indicates whether the beginning and ending
markers from the book listing should actually be output to the generated
file.
The three private helper functions target( ),
headerLine( ) and dependLine( ) are used by the
CodeFile constructor while it is parsing the input stream. In fact, the
CodeFile constructor does much of the work and most of the rest of the
member functions simply return values that are stored in the CodeFile
object. Exceptions to this are addFailure( ) which stores a compiler
that won’t work, and compilesOK( ) which, when given a
compiler tells whether this file will compile successfully with that compiler.
The ostream operator<< uses the STL copy( ) algorithm
and write( ) uses operator<< to write the file into a
particular directory and file name.
Looking at the implementation, you’ll see that the
helper functions target( ), headerLine( ) and
dependLine( ) are just using string functions in order to
search and manipulate the lines. The constructor is what initiates everything.
The idea is that the main program opens the file and reads it until it sees the
starting marker for a code file. At that point it makes a CodeFile object
and hands the constructor the istream (so the constructor can read the
rest of the code file) and the first line that was already read, since it
contains valuable information. This first line is dissected for the file name
information and the target type. The beginning of the file is written (source
and copyright information is added) and the rest of the file is read, until the
ending tag. The top few lines may contain information about link dependencies
and command line arguments, or they may be files that are #included using
quotes rather than angle brackets. Quotes indicate they are from local
directories and should be added to the makefile dependency.
You’ll notice that a number of the markers strings in
this program are broken up into two adjacent character strings, relying on the
preprocessor to concatenate those strings. This is to prevent them from causing
the ExtractCode program from accidentally mistaking the strings embedded
in the program with the end marker, when ExtractCode is extracting
it’s own source code.
The goal of CompilerData is to capture and make
available all the information about particular compiler idiosyncrasies. At first
glance, the CompilerData class appears to be a container of static
member functions, a library of functions wrapped in a class. Actually, the class
contains two static data members; the simpler one is a
set<string> that holds all the compiler names, but
compilerInfo is a map that maps string objects (the
compiler name) to CompilerData objects. Each individual
CompilerData object in compilerInfo contains a
vector<string> which is the “rules” that are placed in
the makefile (these rules are different for different compilers) and a
set<string> which indicates the files that won’t compile with
this particular compiler. Also, each compiler creates different extensions for
object files and executable files, and these are also stored. There are two
flags which indicate if this is a “dos” or “Unix” style
environment (this causes differences in path information and command styles for
the resulting makefiles).
The member function readDB( ) is responsible for
taking an istream and parsing it into a series of CompilerData
objects which are stored in compilerInfo. By choosing a relatively simple
format (which you can see in Appendix D) the parsing of this configuration file
is fairly simple: the first character on a line determines what information the
line contains; a ‘#’ sign is a comment, a
‘{‘ indicates that the next compiler configuration is
beginning and this is the new compiler name, a ‘(‘ is used to
establish the object file extension name, a ‘&’ indicates
the “dos” or “Unix” directive, and ‘@’ is a
makefile rule which is placed verbatim at the beginning of the makefile. If
there is no special character at the beginning of the line, the it must be a
file that fails to compile.
The addFailures( ) member function takes
it’s CodeFile argument (by reference, so it can modify the outside
object) and checks each compiler to see if it works with that particular code
file; if not, it adds that compiler to the CodeFile object’s
failure list.
Both obj( ) and exe( ) return the
appropriate file extension for a particular compiler. Note that some situations
don’t expect extensions, and so the ‘.’ is added only
if there is an extension.
When the makefile is being created, one of the first things to
do is add the various make rules, such as the prefixes and target rules
(see Appendix D for examples). This is accomplished with
writeRules( ). Note the use of the STL copy( )
algorithm.
Although dos compilers have no trouble with forward slashes as
part of the paths of #include files, most dos make programs expect
backslashes as part of paths in dependency lists. To adjust for this, the
adjustPath( ) function checks to see if this is a dos compiler, and
if so it uses the STL replace( ) algorithm, treating the path
string object as a container, to replace forward-slash characters with backward
slashes.
The last class, Makefile, is used to create all the
makefiles, including the master makefile that moves into each subdirectory and
calls the other makefiles. Each Makefile contains a group of
CodeFile objects, stored in a vector. You call
addEntry( ) to put a new CodeFile into the Makefile;
this also adds the failure list to the CodeFile. In addition, there is a
static set<string> which contains all the different paths where all
the different makefiles will be written; this is used to build the master
makefile so it can call all the makefiles in all the subdirectories. The
addEntry( ) function also updates this set of paths.
To write the makefile for a particular path (once the entire
book file has been read), you call writeMakefiles( ) and hand it the
path you want it to write the makefile for. This function simply iterates
through all the compilers in compilers and calls
createMakefile( ) for each one, passing it the compiler name and the
path. The latter function is where the real work gets done. First the file name
extensions are captured into local string objects, then the file name is
created from the name of the compiler with “.makefile” concatenated
(you can use a file with a name other than “makefile” by using the
make -f flag). After writing the header comments and the rules for that
particular compiler/operating-system combination (remember, these rules come
from the compiler configuration file), a vector<string> is created
to hold all the different regions of the makefile: the master target list
makeAll, the testing commands makeTest, the dependencies
makeDeps, and the commands for linking into executables
linkCmd. The reason it’s necessary to have lists for these four
regions is that each CodeFile object causes entries into each region, so
the regions are built as the list of CodeFiles is traversed, and then
finally each region is written in its proper order. This is the function which
decides whether a file is going to be included, and also calls
adjustPath( ) to conditionally change forward slashes to backward
slashes.
To write the master makefile in writeMaster( ),
the initial comments are written. The default target is called
“help,” and it is used if you simply type make. This provides
very simple help to the first time user, including the options for make
that this makefile supports (that is, all the different compilers the makefile
is set up for). Then it creates the list of commands for each compiler, which
basically consists of: descending into a subdirectory, call make
(recursively) on the appropriate makefile in that subdirectory, and then rising
back up to the book’s root subdirectory. Makefiles in Unix and dos work
very differently from each other in this situation: in Unix, you cd to
the directory, followed by a semicolon and then the command you want to execute
– returning to the root directory happens automatically. While in dos, you
must cd both down and then back up again, all on separate lines. So the
writeMaster( ) function must interrogate to see if a compiler is
running under Unix and write different commands accordingly.
Because of the work done in designing the classes (and this
was an iterative process; it didn’t just pop out this way),
main( ) is quite straightforward to read. After opening the input
file, the getline( ) function is used to read each input line until
the line containing CompileDB.txt is found; this indicates the beginning
of the compiler database listing. Once that has been parsed,
seekg( ) is used to move the file pointer back to the beginning so
all the code files can be extracted.
Each line is read and if one of the start markers is found in
the line, a CodeFile object is created using that line (which has
essential information) and the input stream. The constructor returns when it
finishes reading its file, and at that point you can turn around and call
write( ) for the code file, and it is automatically written to the
correct spot (an earlier version of this program collected all the
CodeFile objects first and put them in a container, then wrote one
directory at a time, but the approach shown above has code that’s easier
to understand and the performance impact is not really significant for a tool
like this.
For makefile management, a map<string, Makefile>
is created, where the string is the path where the makefile exists. The
nice thing about this approach is that the Makefile objects will be
automatically created whenever you access a new path, as you can see in the
line
makeFiles[cf.path()].addEntry(cf);
The Standard C library assert( ) macro is brief,
to the point and portable. In addition, when you’re finished debugging you
can remove all the code by defining NDEBUG, either on the command-line or
in code.
Also, assert( ) can be used while roughing out the
code. Later, the calls to assert( ) that are actually providing
information to the end user can be replaced with more civilized
messages.
Sometimes it’s very helpful to print the code of each
statement before it is executed, either to cout or to a trace file.
Here’s a preprocessor macro to accomplish this:
#define TRACE(ARG) cout <<
#ARG << endl; ARG
Now you can go through and surround the statements you trace
with this macro. Of course, it can introduce problems. For example, if you take
the statement:
for(int i = 0; i < 100; i++)
cout << i << endl;
And put both lines inside TRACE( ) macros, you get
this:
TRACE(for(int i = 0; i < 100; i++))
TRACE(
cout << i << endl;)
Which expands to this:
cout << "for(int i = 0; i < 100; i++)" << endl; for(int i = 0; i < 100; i++) cout << "cout << i << endl;" << endl;
cout
<< i << endl;
Which isn’t what you want. Thus, this technique must be
used carefully.
A variation on the TRACE( ) macro is
this:
#define D(a) cout << #a
"=[" << a << "]" << nl;
If there’s an expression you want to display, you simply
put it inside a call to D( ) and the expression will be printed,
followed by its value (assuming there’s an overloaded operator
<< for the result type). For example, you can say D(a + b).
Thus you can use it anytime you want to test an intermediate value to make sure
things are OK.
Of course, the above two macros are actually just the two most
fundamental things you do with a debugger: trace through the code execution and
print values. A good debugger is an excellent productivity tool, but sometimes
debuggers are not available, or it’s not convenient to use them. The above
techniques always work, regardless of the
situation.
This code allows you to easily create a trace file and send
all the output that would normally go to cout into the file. All you have
to do is #define TRACEON and include the header file (of course,
it’s fairly easy just to write the two key lines right into your
file):
//: C11:Trace.h // Creating a trace file #ifndef TRACE_H #define TRACE_H #include <fstream> #ifdef TRACEON ofstream TRACEFILE__("TRACE.OUT"); #define cout TRACEFILE__ #endif
#endif // TRACE_H
///:~
Here’s a simple test of the above file:
//: C11:Tracetst.cpp // Test of trace.h #include "../require.h" #include <iostream> #include <fstream> using namespace std; #define TRACEON #include "Trace.h" int main() { ifstream f("Tracetst.cpp"); assure(f, "Tracetst.cpp"); cout << f.rdbuf();
}
///:~
In the Smalltalk tradition, you can create your own
object-based hierarchy, and install pure virtual functions to perform debugging.
Then everyone on the team must inherit from this class and redefine the
debugging functions. All objects in the system will then have debugging
functions available.
Common problems with memory allocation include calling
delete for things you have malloced, calling free for
things you allocated with new, forgetting to release objects from the
free store, and releasing them more than once. This section provides a system to
help you track these kinds of problems down.
To use the memory checking system, you simply link the
obj file in and all the calls to malloc( ),
realloc( ), calloc( ), free( ), new
and delete are intercepted. However, if you also include the following
file (which is optional), all the calls to new will store information
about the file and line where they were called. This is accomplished with a use
of the placement syntax for operator new (this trick was suggested
by Reg Charney of the C++ Standards Committee). The placement syntax is intended
for situations where you need to place objects at a specific point in memory.
However, it allows you to create an operator new with any number of
arguments. This is used to advantage here to store the results of the
__FILE__ and __LINE__ macros whenever new is
called:
//: C11:MemCheck.h // Memory testing system // This file is only included if you want to // use the special placement syntax to find // out the line number where "new" was called. #ifndef MEMCHECK_H #define MEMCHECK_H #include <cstdlib> // size_t // Use placement syntax to pass extra arguments. // From an idea by Reg Charney: void* operator new( std::size_t sz, char* file, int line); #define new new(__FILE__, __LINE__)
#endif // MEMCHECK_H
///:~
In the following file containing the function definitions, you
will note that everything is done with standard IO rather than iostreams. This
is because, for example, the cout constructor allocates memory. Standard
IO ensures against cyclical conditions that can lock up the system.
//: C11:MemCheck.cpp {O} // Memory allocation tester #include <cstdlib> #include <cstring> #include <cstdio> using namespace std; // MemCheck.h must not be included here // Output file object using cstdio // (cout constructor calls malloc()) class OFile { FILE* f; public: OFile(char* name) : f(fopen(name, "w")) {} ~OFile() { fclose(f); } operator FILE*() { return f; } }; extern OFile memtrace; // Comment out the following to send all the // information to the trace file: #define memtrace stdout const unsigned long _pool_sz = 50000L; static unsigned char _memory_pool[_pool_sz]; static unsigned char* _pool_ptr = _memory_pool; void* getmem(size_t sz) { if(_memory_pool + _pool_sz - _pool_ptr < sz) { fprintf(stderr, "Out of memory. Use bigger model\n"); exit(1); } void* p = _pool_ptr; _pool_ptr += sz; return p; } // Holds information about allocated pointers: class MemBag { public: enum type { Malloc, New }; private: char* typestr(type t) { switch(t) { case Malloc: return "malloc"; case New: return "new"; default: return "?unknown?"; } } struct M { void* mp; // Memory pointer type t; // Allocation type char* file; // File name where allocated int line; // Line number where allocated M(void* v, type tt, char* f, int l) : mp(v), t(tt), file(f), line(l) {} }* v; int sz, next; static const int increment = 50 ; public: MemBag() : v(0), sz(0), next(0) {} void* add(void* p, type tt = Malloc, char* s = "library", int l = 0) { if(next >= sz) { sz += increment; // This memory is never freed, so it // doesn't "get involved" in the test: const int memsize = sz * sizeof(M); // Equivalent of realloc, no registration: void* p = getmem(memsize); if(v) memmove(p, v, memsize); v = (M*)p; memset(&v[next], 0, increment * sizeof(M)); } v[next++] = M(p, tt, s, l); return p; } // Print information about allocation: void allocation(int i) { fprintf(memtrace, "pointer %p" " allocated with %s", v[i].mp, typestr(v[i].t)); if(v[i].t == New) fprintf(memtrace, " at %s: %d", v[i].file, v[i].line); fprintf(memtrace, "\n"); } void validate(void* p, type T = Malloc) { for(int i = 0; i < next; i++) if(v[i].mp == p) { if(v[i].t != T) { allocation(i); fprintf(memtrace, "\t was released as if it were " "allocated with %s \n", typestr(T)); } v[i].mp = 0; // Erase it return; } fprintf(memtrace, "pointer not in memory list: %p\n", p); } ~MemBag() { for(int i = 0; i < next; i++) if(v[i].mp != 0) { fprintf(memtrace, "pointer not released: "); allocation(i); } } }; extern MemBag MEMBAG_; void* malloc(size_t sz) { void* p = getmem(sz); return MEMBAG_.add(p, MemBag::Malloc); } void* calloc(size_t num_elems, size_t elem_sz) { void* p = getmem(num_elems * elem_sz); memset(p, 0, num_elems * elem_sz); return MEMBAG_.add(p, MemBag::Malloc); } void* realloc(void* block, size_t sz) { void* p = getmem(sz); if(block) memmove(p, block, sz); return MEMBAG_.add(p, MemBag::Malloc); } void free(void* v) { MEMBAG_.validate(v, MemBag::Malloc); } void* operator new(size_t sz) { void* p = getmem(sz); return MEMBAG_.add(p, MemBag::New); } void* operator new(size_t sz, char* file, int line) { void* p = getmem(sz); return MEMBAG_.add(p, MemBag::New, file, line); } void operator delete(void* v) { MEMBAG_.validate(v, MemBag::New); } MemBag MEMBAG_; // Placed here so the constructor is called // AFTER that of MEMBAG_ : #ifdef memtrace #undef memtrace #endif OFile memtrace("memtrace.out"); // Causes 1 "pointer not in memory list" message
///:~
OFile is a simple wrapper around a FILE*; the
constructor opens the file and the destructor closes it. The operator
FILE*( ) allows you to simply use the OFile object anyplace you
would ordinarily use a FILE* (in the fprintf( ) statements in
this example). The #define that follows simply sends everything to
standard output, but if you need to put it in a trace file you simply comment
out that line.
Memory is allocated from an array called _memory_pool.
The _pool_ptr is moved forward every time storage is allocated. For
simplicity, the storage is never reclaimed, and realloc( )
doesn’t try to resize the storage in the same place.
All the storage allocation functions call
getmem( ) which ensures there is enough space left and moves the
_pool_ptr to allocate your storage. Then they store the pointer in a
special container of class MemBag called MEMBAG_, along with
pertinent information (notice the two versions of operator new; one which
just stores the pointer and the other which stores the file and line number).
The MemBag class is the heart of the system.
You will see many similarities to xbag in
MemBag. A distinct difference is realloc( ) is replaced by a
call to getmem( ) and memmove( ), so that storage
allocated for the MemBag is not registered. In addition, the type
enum allows you to store the way the memory was allocated; the
typestr( ) function takes a type and produces a string for use with
printing.
The nested struct M holds the pointer, the type, a
pointer to the file name (which is assumed to be statically allocated) and the
line where the allocation occurred. v is a pointer to an array of
M objects – this is the array which is dynamically sized.
The allocation( ) function prints out a different
message depending on whether the storage was allocated with new (where it
has line and file information) or malloc( ) (where it
doesn’t). This function is used inside validate( ), which is
called by free( ) and delete( ) to ensure everything is
OK, and in the destructor, to ensure the pointer was cleaned up (note that in
validate( ) the pointer value v[i].mp is set to zero, to
indicate it has been cleaned up).
The following is a simple test using the memcheck facility.
The MemCheck.obj file must be linked in for it to work:
//: C11:MemTest.cpp //{L} MemCheck // Test of MemCheck system #include "MemCheck.h" int main() { void* v = std::malloc(100); delete v; int* x = new int; std::free(x); new double;
}
///:~
The trace file created in MemCheck.cpp causes the
generation of one "pointer not in memory list" message, apparently from the
creation of the file pointer on the heap. [[ This may not still be true –
test it ]]
The World-Wide Web has become the common tongue of
connectivity on planet earth. It began as simply a way to publish
primitively-formatted documents in a way that everyone could read them
regardless of the machine they were using. The documents are created in
hypertext markup language (HTML) and placed on a central server machine
where they are handed to anyone who asks. The documents are requested and read
using a web browser that has been written or ported to each particular
platform.
Very quickly, just reading a document was not enough and
people wanted to be able to collect information from the clients, for example to
take orders or allow database lookups from the server. Many different approaches
to client-side programming have been tried such as Java applets,
JavaScript, and other scripting or programming languages. Unfortunately,
whenever you publish something on the Internet you face the problem of a whole
history of browsers, some of which may support the particular flavor of your
client-side programming tool, and some which won’t. The only reliable and
well-established solution[27] to this problem
is to use straight HTML (which has a very limited way to collect and submit
information from the client) and common gateway interface (CGI) programs
that are run on the server. The Web server takes an encoded request submitted
via an HTML page and responds by invoking a CGI program and handing it the
encoded data from the request.
This request is classified as
either a “GET” or a
“POST” (the meaning of
which will be explained later) and if you look at the URL window in your Web
browser when you push a “submit” button on a page you’ll often
be able to see the encoded request and information.
CGI can seem a bit intimidating at first, but it turns out
that it’s just messy, and not all that difficult to write. (An innocent
statement that’s true of many things – after you understand
them.) A CGI program is quite straightforward since it takes its input from
environment variables and standard input, and sends its output to standard
output. However, there is some decoding that must be done in order to extract
the data that’s been sent to you from the client’s web page. In this
section you’ll get a crash
course in CGI programming, and we’ll develop tools that will perform the
decoding for the two different types of CGI submissions (GET and POST). These
tools will allow you to easily write a CGI program to solve any problem. Since
C++ exists on virtually all machines that have Web servers (and you can get GNU
C++ free for virtually any platform), the solution presented here is quite
portable.
To submit data to a CGI program, the HTML “form”
tag is used. The following very simple HTML page contains a form that has one
user-input field along with a “submit” button:
//:! C11:SimpleForm.html <HTML><HEAD> <TITLE>A simple HTML form</TITLE></HEAD> Test, uses standard html GET <Form method="GET" ACTION="/cgi-bin/CGI_GET.exe"> <P>Field1: <INPUT TYPE = "text" NAME = "Field1" VALUE = "This is a test" size = "40"></p> <p><input type = "submit" name = "submit" > </p> </Form></HTML>
///:~
Everything between the <Form and the
</Form> is part of this form (You can have multiple forms on a
single page, but each one is controlled by its own method and submit button).
The “method” can be either “get” or “post,”
and the “action” is what the server does when it receives the form
data: it calls a program. Each form has a method, an action, and a submit
button, and the rest of the form consists of input fields. The most
commonly-used input field is shown here: a text field. However, you can also
have things like check boxes, drop-down selection lists and radio
buttons.
CGI_GET.exe is the name of the executable program that
resides in the directory that’s typically called “cgi-bin” on
your Web server.[28] (If the named program is
not in the cgi-bin directory, you won’t see any results.) Many Web servers
are Unix machines (mine runs Linux) that don’t traditionally use the
.exe extension for their executable programs, but you can call the
program anything you want under Unix. By using the .exe extension the
program can be tested without change under most operating systems.
If you fill out this form and press the “submit”
button, in the URL address window of your browser you will see something
like:
http://www.pooh.com/cgi-bin/CGI_GET.exe?Field1=
This+is+a+test&submit=Submit+Query
(Without the line break, of course.) Here you see a little bit
of the way that data is encoded to send to CGI. For one thing, spaces are not
allowed (since spaces typically separate command-line arguments). Spaces are
replaced by ‘+’ signs. In addition, each field contains the
field name (which is determined by the form on the HTML page) followed by an
‘=‘ and the field data, and terminated by a
‘&’.
At this point, you might wonder about the
‘+’, ‘=,’ and ‘&’.
What if those are used in the field, as in “John & Marsha
Smith”? This is encoded to:
John+%26+Marsha+Smith
That is, the special character is turned into a
‘%’ followed by its ASCII value in hex. Fortunately, the web
browser automatically performs all encoding for
you.
There are many examples of CGI programs written using Standard
C. One argument for doing this is that Standard C can be found virtually
everywhere. However, C++ has become quite ubiquitous, especially in the form of
the GNU C++
Compiler[29] (g++) that can be
downloaded free from the Internet for virtually any platform (and often comes
pre-installed with operating systems such as Linux). As you will see, this means
that you can get the benefit of object-oriented programming in a CGI
program.
Since what we’re concerned with when parsing the CGI
information is the field name-value pairs, one class (CGIpair)
will be used to represent a single name-value pair and a second class
(CGImap) will use CGIpair to parse each name-value pair that is
submitted from the HTML form into keys and values that it will hold in a
map of strings so you can easily fetch the value for each field at
your leisure.
One of the reasons for using C++ here is the convenience of
the STL, in
particular the map class. Since map has the operator[ ],
you have a nice syntax for extracting the data for each field. The
map template will be used in the creation of CGImap, which
you’ll see is a fairly short definition considering how powerful it
is.
The project will start with a reusable portion, which consists
of CGIpair and CGImap in a header file. Normally you should avoid
cramming this much code into a header file, but for these examples it’s
convenient and it doesn’t hurt anything:
//: C11:CGImap.h // Tools for extracting and decoding data from // from CGI GETs and POSTs. #include <string> #include <vector> #include <iostream> using namespace std; class CGIpair : public pair<string, string> { public: CGIpair() {} CGIpair(string name, string value) { first = decodeURLString(name); second = decodeURLString(value); } // Automatic type conversion for boolean test: operator bool() const { return (first.length() != 0); } private: static string decodeURLString(string URLstr) { const int len = URLstr.length(); string result; for(int i = 0; i < len; i++) { if(URLstr[i] == '+') result += ' '; else if(URLstr[i] == '%') { result += translateHex(URLstr[i + 1]) * 16 + translateHex(URLstr[i + 2]); i += 2; // Move past hex code } else // An ordinary character result += URLstr[i]; } return result; } // Translate a single hex character; used by // decodeURLString(): static char translateHex(char hex) { if(hex >= 'A') return (hex & 0xdf) - 'A' + 10; else return hex - '0'; } }; // Parses any CGI query and turns it into an // STL vector of CGIpair which has an associative // lookup operator[] like a map. A vector is used // instead of a map because it keeps the original // ordering of the fields in the Web page form. class CGImap : public vector<CGIpair> { string gq; int index; // Prevent assignment and copy-construction: void operator=(CGImap&); CGImap(CGImap&); public: CGImap(string query): index(0), gq(query){ CGIpair p; while((p = nextPair()) != 0) push_back(p); } // Look something up, as if it were a map: string operator[](const string& key) { iterator i = begin(); while(i != end()) { if((*i).first == key) return (*i).second; i++; } return string(); // Empty string == not found } void dump(ostream& o, string nl = "<br>") { for(iterator i = begin(); i != end(); i++) { o << (*i).first << " = " << (*i).second << nl; } } private: // Produces name-value pairs from the query // string. Returns an empty Pair when there's // no more query string left: CGIpair nextPair() { if(gq.length() == 0) return CGIpair(); // Error, return empty if(gq.find('=') == -1) return CGIpair(); // Error, return empty string name = gq.substr(0, gq.find('=')); gq = gq.substr(gq.find('=') + 1); string value = gq.substr(0, gq.find('&')); gq = gq.substr(gq.find('&') + 1); return CGIpair(name, value); } }; // Helper class for getting POST data: class Post : public string { public: Post() { // For a CGI "POST," the server puts the // length of the content string in the // environment variable CONTENT_LENGTH: char* clen = getenv("CONTENT_LENGTH"); if(clen == 0) { cout << "Zero CONTENT_LENGTH, Make sure " "this is a POST and not a GET" << endl; return; } int len = atoi(clen); char* s = new char[len]; cin.read(s, len); // Get the data append(s, len); // Add it to this string delete []s; }
};
///:~
The CGIpair class starts out quite simply: it inherits
from the standard library pair template to create a pair of
strings, one for the name and one for the value. The second constructor
calls the member function decodeURLString( ) which produces a
string after stripping away all the extra characters added by the browser
as it submitted the CGI request. There is no need to provide functions to select
each individual element – because pair is inherited publicly, you
can just select the first and second elements of the
CGIpair.
The operator bool provides automatic type conversion to
bool. If you have a CGIpair object called p and you use it
in an expression where a Boolean result is expected, such as
if(p) {
//...
then the compiler will recognize that it has a CGIpair
and it needs a Boolean, so it will automatically call operator bool to
perform the necessary conversion.
Because the string objects take care of themselves, you
don’t need to explicitly define the copy-constructor, operator= or
destructor – the default versions synthesized by the compiler do the right
thing.
The remainder of the CGIpair class consists of the two
methods decodeURLString( ) and a helper member function
translateHex( ) which is used by decodeURLString( ).
(Note that translateHex( ) does not guard against bad input such as
“%1H.”) decodeURLString( ) moves through and replaces
each ‘+’ with a space, and each hex code (beginning with a
‘%’) with the appropriate character. It’s worth noting
here and in CGImap the power of the string class – you can
index into a string object using operator[ ], and you can use
methods like find( ) and substring( ).
CGImap parses and holds all the name-value pairs
submitted from the form as part of a CGI request. You might think that anything
that has the word “map” in it’s name should be inherited from
the STL map, but map has it’s own way of ordering the
elements it stores whereas here it’s useful to keep the elements in the
order that they appear on the Web page. So CGImap is inherited from
vector<CGIpair>, and operator[ ] is overloaded so you get
the associative-array lookup of a map.
You can also see that CGImap has a copy-constructor and
an operator=, but they’re both declared as private. This is
to prevent the compiler from synthesizing the two functions (which it will do if
you don’t declare them yourself), but it also prevents the client
programmer from passing a CGImap by value or from using
assignment.
CGImap’s job is to take the input data and parse
it into name-value pairs, which it will do with the aid of CGIpair
(effectively, CGIpair is only a helper class, but it also seems to make
it easier to understand the code). After copying the query string (you’ll
see where the query string comes from later) into a local string object
gq, the nextPair( ) member function is used to parse the
string into raw name-value pairs, delimited by ‘=‘ and
‘&’ signs. Each resulting CGIpair object is
added to the vector using the standard vector::push_back( ).
When nextPair( ) runs out of input from the query string, it returns
zero.
The CGImap::operator[ ] takes the brute-force approach
of a linear search through the elements. Since the CGImap is
intentionally not sorted and they tend to be small, this is not too terrible.
The dump( ) function is used for testing, typically by sending
information to the resulting Web page, as you might guess from the default value
of nl, which is an HTML “break line” token.
Using
GET can be fine for many applications. However, GET passes its data to the CGI
program through an environment variable (called QUERY_STRING), and
operating systems typically run out of environment space with long GET strings
(you should start worrying at about 200 characters). CGI provides a solution for
this: POST. With POST, the data is encoded and concatenated the same way as with
GET, but POST uses standard input to pass the encoded query string to the CGI
program and has no length limitation on the input. All you have to do in your
CGI program is determine the length of the query string. This length is stored
in the environment variable CONTENT_LENGTH. Once you know the length, you
can allocate storage and read the precise number of bytes from standard input.
Because POST is the less-fragile solution, you should probably prefer it over
GET, unless you know for sure that your input will be short. In fact, one might
surmise that the only reason for GET is that it is slightly easier to code a CGI
program in C using GET. However, the last class in CGImap.h is a tool
that makes handling a POST just as easy as handling a GET, which means you can
always use POST.
The class Post inherits from a string and only has a
constructor. The job of the constructor is to get the query data from the POST
into itself (a string). It does this by reading the CONTENT_LENGTH
environment variable using the Standard C library function
getenv( ). This comes back as a pointer to a C character string. If
this pointer is zero, the CONTENT_LENGTH environment variable has not been set,
so something is wrong. Otherwise, the character string must be converted to an
integer using the Standard C library function atoi( ). The resulting
length is used with new to allocate enough storage to hold the query
string (plus its null terminator), and then read( ) is called for
cin. The read( ) function takes a pointer to the destination
buffer and the number of bytes to read. The resulting buffer is inserted into
the current string using string::append( ). At this point,
the POST data is just a string object and can be easily used without
further concern about where it came from.
Now that the basic tools are defined, they can easily be used
in a CGI program like the following which simply dumps the name-value pairs that
are parsed from a GET query. Remember that an iterator for a CGImap
returns a CGIpair object when it is dereferenced, so you must select the
first and second parts of that CGIpair:
//: C11:CGI_GET.cpp // Tests CGImap by extracting the information // from a CGI GET submitted by an HTML Web page. #include "CGImap.h" int main() { // You MUST print this out, otherwise the // server will not send the response: cout << "Content-type: text/plain\n" << endl; // For a CGI "GET," the server puts the data // in the environment variable QUERY_STRING: CGImap query(getenv("QUERY_STRING")); // Test: dump all names and values for(CGImap::iterator it = query.begin(); it != query.end(); it++) { cout << (*it).first << " = " << (*it).second << endl; }
}
///:~
When you use the GET approach (which is controlled by the HTML
page with the METHOD tag of the FORM directive), the Web server grabs everything
after the ‘?’ and puts in into the operating-system environment
variable QUERY_STRING. So to read that information all you have to do is
get the QUERY_STRING. You do this with the standard C library function
getenv( ), passing it the identifier of the environment variable you
wish to fetch. In main( ), notice how simple the act of
parsing the QUERY_STRING is: you just hand it to the constructor for the
CGImap object called query and all the work is done for you.
Although an iterator is used here, you can also pull out the names and values
from query using CGImap::operator[ ].
Now it’s important to understand something about CGI. A
CGI program is handed its input in one of two ways: through QUERY_STRING during
a GET (as in the above case) or through standard input during a POST. But a CGI
program only returns its results through standard output, via cout. Where
does this output go? Back to the Web server, which decides what to do with it.
The server makes this decision based on the content-type header, which
means that if the content-type header isn’t the first thing it
sees, it won’t know what to do with the data. Thus it’s essential
that you start the output of all CGI programs with the content-type
header.
In this case, we want the server to feed all the information
directly back to the client program. The information should be unchanged, so the
content-type is text/plain. Once the server sees this, it will
echo all strings right back to the client as a simple text Web page.
To test this program, you must compile it in the cgi-bin
directory of your host Web server. Then you can perform a simple test by writing
an HTML page like this:
//:! C11:GETtest.html <HTML><HEAD> <TITLE>A test of standard HTML GET</TITLE> </HEAD> Test, uses standard html GET <Form method="GET" ACTION="/cgi-bin/CGI_GET.exe"> <P>Field1: <INPUT TYPE = "text" NAME = "Field1" VALUE = "This is a test" size = "40"></p> <P>Field2: <INPUT TYPE = "text" NAME = "Field2" VALUE = "of the emergency" size = "40"></p> <P>Field3: <INPUT TYPE = "text" NAME = "Field3" VALUE = "broadcast system" size = "40"></p> <P>Field4: <INPUT TYPE = "text" NAME = "Field4" VALUE = "this is only a test" size = "40"></p> <P>Field5: <INPUT TYPE = "text" NAME = "Field5" VALUE = "In a real emergency" size = "40"></p> <P>Field6: <INPUT TYPE = "text" NAME = "Field6" VALUE = "you will be instructed" size = "40"></p> <p><input type = "submit" name = "submit" > </p> </Form></HTML>
///:~
Of course, the CGI_GET.exe program must be compiled on
some kind of Web server and placed in the correct subdirectory (typically called
“cgi-bin” in order for this web page to work. The dominant Web
server is the freely-available Apache (see http://www.Apache.org), which runs on
virtually all platforms. Some word-processing/spreadsheet packages even come
with Web servers. It’s also quite cheap and easy to get an old PC and
install Linux along with an inexpensive network card. Linux automatically sets
up the Apache server for you, and you can test everything on your local network
as if it were live on the Internet. One way or another it’s possible to
install a Web server for local tests, so you don’t need to have a remote
Web server and permission to install CGI programs on that server.
One of the advantages of this design is that, now that
CGIpair and CGImap are defined, most of the work is done for you
so you can easily create your own CGI program simply by modifying
main( ).
The CGIpair and CGImap from CGImap.h can
be used as is for a CGI program that handles POSTs. The only thing you need to
do is get the data from a Post object instead of from the
QUERY_STRING environment variable. The following listing shows how simple
it is to write such a CGI program:
//: C11:CGI_POST.cpp // CGImap works as easily with POST as it // does with GET. #include "CGImap.h" #include <iostream> using namespace std; int main() { cout << "Content-type: text/plain\n" << endl; Post p; // Get the query string CGImap query(p); // Test: dump all names and values for(CGImap::iterator it = query.begin(); it != query.end(); it++) { cout << (*it).first << " = " << (*it).second << endl; }
}
///:~
After creating a Post object, the query string is no
different from a GET query string, so it is handed to the constructor for
CGImap. The different fields in the vector are then available just as in
the previous example. If you wanted to get even more terse, you could even
define the Post as a temporary directly inside the constructor for the
CGImap object:
CGImap
query(Post());
To test this program, you can use the following Web
page:
//:! C11:POSTtest.html <HTML><HEAD> <TITLE>A test of standard HTML POST</TITLE> </HEAD>Test, uses standard html POST <Form method="POST" ACTION="/cgi-bin/CGI_POST.exe"> <P>Field1: <INPUT TYPE = "text" NAME = "Field1" VALUE = "This is a test" size = "40"></p> <P>Field2: <INPUT TYPE = "text" NAME = "Field2" VALUE = "of the emergency" size = "40"></p> <P>Field3: <INPUT TYPE = "text" NAME = "Field3" VALUE = "broadcast system" size = "40"></p> <P>Field4: <INPUT TYPE = "text" NAME = "Field4" VALUE = "this is only a test" size = "40"></p> <P>Field5: <INPUT TYPE = "text" NAME = "Field5" VALUE = "In a real emergency" size = "40"></p> <P>Field6: <INPUT TYPE = "text" NAME = "Field6" VALUE = "you will be instructed" size = "40"></p> <p><input type = "submit" name = "submit" > </p> </Form></HTML>
///:~
When you press the “submit” button, you’ll
get back a simple text page containing the parsed results, so you can see that
the CGI program works correctly. The server turns around and feeds the query
string to the CGI program via standard
input.
Managing an email list is the kind of problem many people need
to solve for their Web site. As it is turning out to be the case for everything
on the Internet, the simplest approach is always the best. I learned this the
hard way, first trying a variety of Java applets (which some firewalls do not
allow) and even JavaScript (which isn’t supported uniformly on all
browsers). The result of each experiment was a steady stream of email from the
folks who couldn’t get it to work. When you set up a Web site, your goal
should be to never get email from anyone complaining that it doesn’t work,
and the best way to produce this result is to use plain HTML (which, with a
little work, can be made to look quite decent).
The second problem was on the server side. Ideally,
you’d like all your email addresses to be added and removed from a single
master file, but this presents a problem. Most operating systems allow more than
one program to open a file. When a client makes a CGI request, the Web server
starts up a new invocation of the CGI program, and since a Web server can handle
many requests at a time, this means that you can have many instances of your CGI
program running at once. If the CGI program opens a specific file, then you can
have many programs running at once that open that file. This is a problem if
they are each reading and writing to that file.
There may be a function for your operating system that
“locks” a file, so that other invocations of your program do not
access the file at the same time. However, I took a different approach, which
was to make a unique file for each client. Making a file unique was quite easy,
since the email name itself is a unique character string. The filename for each
request is then just the email name, followed by the string “.add”
or “.remove”. The contents of the file is also the email address of
the client. Then, to produce a list of all the names to add, you simply say
something like (in Unix):
cat *.add >
addlist
(or the equivalent for your system). For removals, you
say:
cat *.remove >
removelist
Once the names have been combined into a list you can archive
or remove the files.
The HTML code to place on your Web page becomes fairly
straightforward. This particular example takes an email address to be added or
removed from my C++ mailing list:
<h1 align="center"><font color="#000000"> The C++ Mailing List</font></h1> <div align="center"><center> <table border="1" cellpadding="4" cellspacing="1" width="550" bgcolor="#FFFFFF"> <tr> <td width="30" bgcolor="#FF0000"> </td> <td align="center" width="422" bgcolor="#0"> <form action="/cgi-bin/mlm.exe" method="GET"> <input type="hidden" name="subject-field" value="cplusplus-email-list"> <input type="hidden" name="command-field" value="add"><p> <input type="text" size="40" name="email-address"> <input type="submit" name="submit" value="Add Address to C++ Mailing List"> </p></form></td> <td width="30" bgcolor="#FF0000"> </td> </tr> <tr> <td width="30" bgcolor="#000000"> </td> <td align="center" width="422" bgcolor="#FF0000"> <form action="/cgi-bin/mlm.exe" method="GET"> <input type="hidden" name="subject-field" value="cplusplus-email-list"> <input type="hidden" name="command-field" value="remove"><p> <input type="text" size="40" name="email-address"> <input type="submit" name="submit" value="Remove Address From C++ Mailing List"> </p></form></td> <td width="30" bgcolor="#000000"> </td> </tr> </table>
</center></div>
Each form contains one data-entry field called
email-address, as well as a couple of hidden fields which don’t
provide for user input but carry information back to the server nonetheless. The
subject-field tells the CGI program the subdirectory where the resulting
file should be placed. The command-field tells the CGI program whether
the user is requesting that they be added or removed from the list. From the
action, you can see that a GET is used with a program called
mlm.exe (for “mailing list manager”). Here it is:
//: C11:mlm.cpp // A GGI program to maintain a mailing list #include "CGImap.h" #include <fstream> using namespace std; const string contact("Bruce@EckelObjects.com"); // Paths in this program are for Linux/Unix. You // must use backslashes (two for each single // slash) on Win32 servers: const string rootpath("/home/eckel/"); int main() { cout << "Content-type: text/html\n"<< endl; CGImap query(getenv("QUERY_STRING")); if(query["test-field"] == "on") { cout << "map size: " << query.size() << "<br>"; query.dump(cout, "<br>"); } if(query["subject-field"].size() == 0) { cout << "<h2>Incorrect form. Contact " << contact << endl; return 0; } string email = query["email-address"]; if(email.size() == 0) { cout << "<h2>Please enter your email address" << endl; return 0; } if(email.find_first_of(" \t") != string::npos){ cout << "<h2>You cannot use white space " "in your email address" << endl; return 0; } if(email.find('@') == string::npos) { cout << "<h2>You must use a proper email" " address including an '@' sign" << endl; return 0; } if(email.find('.') == string::npos) { cout << "<h2>You must use a proper email" " address including a '.'" << endl; return 0; } string fname = email; if(query["command-field"] == "add") fname += ".add"; else if(query["command-field"] == "remove") fname += ".remove"; else { cout << "error: command-field not found. Contact " << contact << endl; return 0; } string path(rootpath + query["subject-field"] + "/" + fname); ofstream out(path.c_str()); if(!out) { cout << "cannot open " << path << "; Contact" << contact << endl; return 0; } out << email << endl; cout << "<br><H2>" << email << " has been "; if(query["command-field"] == "add") cout << "added"; else if(query["command-field"] == "remove") cout << "removed"; cout << "<br>Thank you</H2>" << endl;
}
///:~
Again, all the CGI work is done by the CGImap. From
then on it’s a matter of pulling the fields out and looking at them, then
deciding what to do about it, which is easy because of the way you can index
into a map and also because of the tools available for standard
strings. Here, most of the programming has to do with checking for a
valid email address. Then a file name is created with the email address as the
name and “.add” or “.remove” as the extension, and the
email address is placed in the file.
Once you have a list of names to add, you can just paste them
to end of your list. However, you might get some duplicates so you need a
program to remove those. Because your names may differ only by upper and
lowercase, it’s useful to create a tool that will read a list of names
from a file and place them into a container of strings, forcing all the names to
lowercase as it does:
//: C11:readLower.h // Read a file into a container of string, // forcing each line to lower case. #ifndef READLOWER_H #define READLOWER_H #include "../require.h" #include <iostream> #include <fstream> #include <string> #include <algorithm> #include <cctype> inline char downcase(char c) { using namespace std; // Compiler bug return tolower(c); } std::string lcase(std::string s) { std::transform(s.begin(), s.end(), s.begin(), downcase); return s; } template<class SContainer> void readLower(char* filename, SContainer& c) { std::ifstream in(filename); assure(in, filename); const int sz = 1024; char buf[sz]; while(in.getline(buf, sz)) // Force to lowercase: c.push_back(string(lcase(buf))); }
#endif // READLOWER_H
///:~
Since it’s a template, it will work with any
container of string that supports push_back( ). Again,
you may want to change the above to the form readln(in, s) instead of
using a fixed-sized buffer, which is more fragile.
Once the names are read into the list and forced to lowercase,
removing duplicates is trivial:
//: C11:RemoveDuplicates.cpp // Remove duplicate names from a mailing list #include "readLower.h" #include "../require.h" #include <vector> #include <algorithm> using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 2); vector<string> names; readLower(argv[1], names); long before = names.size(); // You must sort first for unique() to work: sort(names.begin(), names.end()); // Remove adjacent duplicates: unique(names.begin(), names.end()); long removed = before - names.size(); ofstream out(argv[2]); assure(out, argv[2]); copy(names.begin(), names.end(), ostream_iterator<string>(out,"\n")); cout << removed << " names removed" << endl;
}
///:~
A vector is used here instead of a list because
sorting requires random-access which is much faster in a vector. (A
list has a built-in sort( ) so that it doesn’t suffer
from the performance that would result from applying the normal
sort( ) algorithm shown above).
The sort must be performed so that all duplicates are adjacent
to each other. Then unique( ) can remove all the adjacent
duplicates. The program also keeps track of how many duplicate names were
removed.
When you have a file of names to remove from your list,
readLower( ) comes in handy again:
//: C11:RemoveGroup.cpp // Remove a group of names from a list #include "readLower.h" #include "../require.h" #include <list> using namespace std; typedef list<string> Container; int main(int argc, char* argv[]) { requireArgs(argc, 3); Container names, removals; readLower(argv[1], names); readLower(argv[2], removals); long original = names.size(); Container::iterator rmit = removals.begin(); while(rmit != removals.end()) names.remove(*rmit++); // Removes all matches ofstream out(argv[3]); assure(out, argv[3]); copy(names.begin(), names.end(), ostream_iterator<string>(out,"\n")); long removed = original - names.size(); cout << "On removal list: " << removals.size() << "\n Removed: " << removed << endl;
}
///:~
Here, a list is used instead of a vector (since
readLower( ) is a template, it adapts). Although there is a
remove( ) algorithm that can be applied to containers, the built-in
list::remove( ) seems to work better. The second command-line
argument is the file containing the list of names to be removed. An iterator is
used to step through that list, and the list::remove( ) function
removes every instance of each name from the master list. Here, the list
doesn’t need to be sorted first.
Unfortunately, that’s not all there is to it. The
messiest part about maintaining a mailing list is the bounced messages.
Presumably, you’ll just want to remove the addresses that produce bounces.
If you can combine all the bounced messages into a single file, the following
program has a pretty good chance of extracting the email addresses; then you can
use RemoveGroup to delete them from your list.
//: C11:ExtractUndeliverable.cpp // Find undeliverable names to remove from // mailing list from within a mail file // containing many messages #include "../require.h" #include <cstdio> #include <string> #include <set> using namespace std; char* start_str[] = { "following address", "following recipient", "following destination", "undeliverable to the following", "following invalid", }; char* continue_str[] = { "Message-ID", "Please reply to", }; // The in() function allows you to check whether // a string in this set is part of your argument. class StringSet { char** ss; int sz; public: StringSet(char** sa, int sza):ss(sa),sz(sza) {} bool in(char* s) { for(int i = 0; i < sz; i++) if (strstr(s, ss[i]) != 0) return true; return false; } }; // Calculate array length: #define ALEN(A) ((sizeof A)/(sizeof *A)) StringSet starts(start_str, ALEN(start_str)), continues(continue_str, ALEN(continue_str)); int main(int argc, char* argv[]) { requireArgs(argc, 2, "Usage:ExtractUndeliverable infile outfile"); FILE* infile = fopen(argv[1], "rb"); FILE* outfile = fopen(argv[2], "w"); require(infile != 0); require(outfile != 0); set<string> names; const int sz = 1024; char buf[sz]; while(fgets(buf, sz, infile) != 0) { if(starts.in(buf)) { puts(buf); while(fgets(buf, sz, infile) != 0) { if(continues.in(buf)) continue; if(strstr(buf, "---") != 0) break; const char* delimiters= " \t<>():;,\n\""; char* name = strtok(buf, delimiters); while(name != 0) { if(strstr(name, "@") != 0) names.insert(string(name)); name = strtok(0, delimiters); } } } } set<string>::iterator i = names.begin(); while(i != names.end()) fprintf(outfile, "%s\n", (*i++).c_str());
}
///:~
The first thing you’ll notice about this program is that
contains some C functions, including C I/O. This is not because of any
particular design insight. It just seemed to work when I used the C elements,
and it started behaving strangely with C++ I/O. So the C is just because it
works, and you may be able to rewrite the program in more “pure C++”
using your C++ compiler and produce correct results.
A lot of what this program does is read lines looking for
string matches. To make this convenient, I created a StringSet class with
a member function in( ) that tells you whether any of the strings in
the set are in the argument. The StringSet is initialized with a constant
two-dimensional of strings and the size of that array. Although the
StringSet makes the code easier to read, it’s also easy to add new
strings to the arrays.
Both the input file and the output file in main( )
are manipulated with standard I/O, since it’s not a good idea to mix I/O
types in a program. Each line is read using fgets( ), and if one of
them matches with the starts StringSet, then what follows will
contain email addresses, until you see some dashes (I figured this out
empirically, by hunting through a file full of bounced email). The
continues StringSet contains strings whose lines should be
ignored. For each of the lines that potentially contains an addresses, each
address is extracted using the Standard C Library function strtok( )
and then it is added to the set<string> called names. Using
a set eliminates duplicates (you may have duplicates based on case, but
those are dealt with by RemoveGroup.cpp. The resulting set of
names is then printed to the output file.
There are a number of ways to connect to your system’s
mailer, but the following program just takes the simple approach of calling an
external command (“fastmail,” which is part of Unix) using the
Standard C library function system( ). The program spends all its
time building the external command.
When people don’t want to be on a list anymore they will
often ignore instructions and just reply to the message. This can be a problem
if the email address they’re replying with is different than the one
that’s on your list (sometimes it has been routed to a new or aliased
address). To solve the problem, this program prepends the text file with a
message that informs them that they can remove themselves from the list by
visiting a URL. Since many email programs will present a URL in a form that
allows you to just click on it, this can produce a very simple removal process.
If you look at the URL, you can see it’s a call to the mlm.exe CGI
program, including removal information that incorporates the same email address
the message was sent to. That way, even if the user just replies to the message,
all you have to do is click on the URL that comes back with their reply
(assuming the message is automatically copied back to you).
//: C11:Batchmail.cpp // Sends mail to a list using Unix fastmail #include "../require.h" #include <iostream> #include <fstream> #include <string> #include <strstream> #include <cstdlib> // system() function using namespace std; string subject("New Intensive Workshops"); string from("Bruce@EckelObjects.com"); string replyto("Bruce@EckelObjects.com"); ofstream logfile("BatchMail.log"); int main(int argc, char* argv[]) { requireArgs(argc, 2, "Usage: Batchmail namelist mailfile"); ifstream names(argv[1]); assure(names, argv[1]); string name; while(getline(names, name)) { ofstream msg("m.txt"); assure(msg, "m.txt"); msg << "To be removed from this list, " "DO NOT REPLY TO THIS MESSAGE. Instead, \n" "click on the following URL, or visit it " "using your Web browser. This \n" "way, the proper email address will be " "removed. Here's the URL:\n" << "http://www.mindview.net/cgi-bin/" "mlm.exe?subject-field=workshop-email-list" "&command-field=remove&email-address=" << name << "&submit=submit\n\n" "------------------------------------\n\n"; ifstream text(argv[2]); assure(text, argv[1]); msg << text.rdbuf() << endl; msg.close(); string command("fastmail -F " + from + " -r " + replyto + " -s \"" + subject + "\" m.txt " + name); system(command.c_str()); logfile << command << endl; static int mailcounter = 0; const int bsz = 25; char buf[bsz]; // Convert mailcounter to a char string: ostrstream mcounter(buf, bsz); mcounter << mailcounter++ << ends; if((++mailcounter % 500) == 0) { string command2("fastmail -F " + from + " -r " + replyto + " -s \"Sent " + string(buf) + " messages \" m.txt eckel@aol.com"); system(command2.c_str()); } }
}
///:~
The first command-line argument is the list of email
addresses, one per line. The names are read one at a time into the string
called name using getline( ). Then a temporary file
called m.txt is created to build the customized message for that
individual; the customization is the note about how to remove themselves, along
with the URL. Then the message body, which is in the file specified by the
second command-line argument, is appended to m.txt. Finally, the command
is built inside a string: the “-F” argument to
fastmail is who it’s from, the “-r” argument is who to
reply to. The “-s” is the subject line, the next argument is the
file containing the mail and the last argument is the email address to send it
to.
You can start this program in the background and tell Unix not
to stop the program when you sign off of the server. However, it takes a while
to run for a long list (this isn’t because of the program itself, but the
mailing process). I like to keep track of the progress of the program by sending
a status message to another email account, which is accomplished in the last few
lines of the program.
One of the problems with CGI is that you must write and
compile a new program every time you want to add a new facility to your Web
site. However, much of the time all that your CGI program does is capture
information from the user and store it on the server. If you could use hidden
fields to specify what to do with the information, then it would be possible to
write a single CGI program that would extract the information from any CGI
request. This information could be stored in a uniform format, in a subdirectory
specified by a hidden field in the HTML form, and in a file that included the
user’s email address – of course, in the general case the email
address doesn’t guarantee uniqueness (the user may post more than one
submission) so the date and time of the submission can be mangled in with the
file name to make it unique. If you can do this, then you can create a new
data-collection page just by defining the HTML and creating a new subdirectory
on your server. For example, every time I come up with a new class or workshop,
all I have to do is create the HTML form for signups – no CGI programming
is required.
The following HTML page shows the format for this scheme.
Since a CGI POST is more general and doesn’t have any limit on the amount
of information it can send, it will always be used instead of a GET for the
ExtractInfo.cpp program that will implement this system. Although this
form is simple, yours can be as complicated as you need it.
//:! C11:INFOtest.html <html><head><title> Extracting information from an HTML POST</title> </head> <body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080"> <hr> <p>Extracting information from an HTML POST</p> <form action="/cgi-bin/ExtractInfo.exe" method="POST"> <input type="hidden" name="subject-field" value="test-extract-info"> <input type="hidden" name="reminder" value="Remember your lunch!"> <input type="hidden" name="test-field" value="on"> <input type="hidden" name="mail-copy" value="Bruce@EckelObjects.com;eckel@aol.com"> <input type="hidden" name="confirmation" value="confirmation1"> <p>Email address (Required): <input type="text" size="45" name="email-address" > </p>Comment:<br> <textarea name="Comment" rows="6" cols="55"> </textarea> <p><input type="submit" name="submit"> <input type="reset" name="reset"</p> </form><hr></body></html>
///:~
Right after the form’s action statement, you
see
<input
type="hidden"
This means that particular field will not appear on the form
that the user sees, but the information will still be submitted as part of the
data for the CGI program.
The value of this field named “subject-field” is
used by ExtractInfo.cpp to determine the subdirectory in which to place
the resulting file (in this case, the subdirectory will be
“test-extract-info”). Because of this technique and the generality
of the program, the only thing you’ll usually need to do to start a new
database of data is to create the subdirectory on the server and then create an
HTML page like the one above. The ExtractInfo.cpp program will do the
rest for you by creating a unique file for each submission. Of course, you can
always change the program if you want it to do something more unusual, but the
system as shown will work most of the time.
The contents of the “reminder” field will be
displayed on the form that is sent back to the user when their data is accepted.
The “test-field” indicates whether to dump test information to the
resulting Web page. If “mail-copy” exists and contains anything
other than “no” the value string will be parsed for mailing
addresses separated by ‘;’ and each of these addresses will get a
mail message with the data in it. The “email-address” field is
required in each case and the email address will be checked to ensure that it
conforms to some basic standards.
The “confirmation” field causes a second program
to be executed when the form is posted. This program parses the information that
was stored from the form into a file, turns it into human-readable form and
sends an email message back to the client to confirm that their information was
received (this is useful because the user may not have entered their email
address correctly; if they don’t get a confirmation message they’ll
know something is wrong). The design of the “confirmation” field
allows the person creating the HTML page to select more than one type of
confirmation. Your first solution to this may be to simply call the program
directly rather than indirectly as was done here, but you don’t want to
allow someone else to choose – by modifying the web page that’s
downloaded to them – what programs they can run on your machine.
Here is the program that will extract the information from the
CGI request:
//: C11:ExtractInfo.cpp // Extracts all the information from a CGI POST // submission, generates a file and stores the // information on the server. By generating a // unique file name, there are no clashes like // you get when storing to a single file. #include "CGImap.h" #include <iostream> #include <fstream> #include <cstdio> #include <ctime> using namespace std; const string contact("Bruce@EckelObjects.com"); // Paths in this program are for Linux/Unix. You // must use backslashes (two for each single // slash) on Win32 servers: const string rootpath("/home/eckel/"); void show(CGImap& m, ostream& o); // The definition for the following is the only // thing you must change to customize the program void store(CGImap& m, ostream& o, string nl = "\n"); int main() { cout << "Content-type: text/html\n"<< endl; Post p; // Collect the POST data CGImap query(p); // "test-field" set to "on" will dump contents if(query["test-field"] == "on") { cout << "map size: " << query.size() << "<br>"; query.dump(cout); } if(query["subject-field"].size() == 0) { cout << "<h2>Incorrect form. Contact " << contact << endl; return 0; } string email = query["email-address"]; if(email.size() == 0) { cout << "<h2>Please enter your email address" << endl; return 0; } if(email.find_first_of(" \t") != string::npos){ cout << "<h2>You cannot include white space " "in your email address" << endl; return 0; } if(email.find('@') == string::npos) { cout << "<h2>You must include a proper email" " address including an '@' sign" << endl; return 0; } if(email.find('.') == string::npos) { cout << "<h2>You must include a proper email" " address including a '.'" << endl; return 0; } // Create a unique file name with the user's // email address and the current time in hex const int bsz = 1024; char fname[bsz]; time_t now; time(&now); // Encoded date & time sprintf(fname, "%s%X.txt", email.c_str(), now); string path(rootpath + query["subject-field"] + "/" + fname); ofstream out(path.c_str()); if(!out) { cout << "cannot open " << path << "; Contact" << contact << endl; return 0; } // Store the file and path information: out << "///{" << path << endl; // Display optional reminder: if(query["reminder"].size() != 0) cout <<"<H1>" << query["reminder"] <<"</H1>"; show(query, cout); // For results page store(query, out); // Stash data in file cout << "<br><H2>Your submission has been " "posted as<br>" << fname << endl << "<br>Thank you</H2>" << endl; out.close(); // Optionally send generated file as email // to recipients specified in the field: if(query["mail-copy"].length() != 0 && query["mail-copy"] != "no") { string to = query["mail-copy"]; // Parse out the recipient names, separated // by ';', into a vector. vector<string> recipients; int ii = to.find(';'); while(ii != string::npos) { recipients.push_back(to.substr(0, ii)); to = to.substr(ii + 1); ii = to.find(';'); } recipients.push_back(to); // Last one // "fastmail" only available on Linux/Unix: for(int i = 0; i < recipients.size(); i++) { string cmd("fastmail -s"" \"" + query["subject-field"] + "\" " + path + " " + recipients[i]); system(cmd.c_str()); } } // Execute a confirmation program on the file. // Typically, this is so you can email a // processed data file to the client along with // a confirmation message: if(query["confirmation"].length() != 0) { string conftype = query["confirmation"]; if(conftype == "confirmation1") { string command("./ProcessApplication.exe "+ path + " &"); // The data file is the argument, and the // ampersand runs it as a separate process: system(command.c_str()); string logfile("Extract.log"); ofstream log(logfile.c_str()); } } } // For displaying the information on the html // results page: void show(CGImap& m, ostream& o) { string nl("<br>"); o << "<h2>The data you entered was:" << "</h2><br>" << "From[" << m["email-address"] << ']' <<nl; for(CGImap::iterator it = m.begin(); it != m.end(); it++) { string name = (*it).first, value = (*it).second; if(name != "email-address" && name != "confirmation" && name != "submit" && name != "mail-copy" && name != "test-field" && name != "reminder") o << "<h3>" << name << ": </h3>" << "<pre>" << value << "</pre>"; } } // Change this to customize the program: void store(CGImap& m, ostream& o, string nl) { o << "From[" << m["email-address"] << ']' <<nl; for(CGImap::iterator it = m.begin(); it != m.end(); it++) { string name = (*it).first, value = (*it).second; if(name != "email-address" && name != "confirmation" && name != "submit" && name != "mail-copy" && name != "test-field" && name != "reminder") o << nl << "[{[" << name << "]}]" << nl << "[([" << nl << value << nl << "])]" << nl; // Delimiters were added to aid parsing of // the resulting text file. }
}
///:~
The program is designed to be as generic as possible, but if
you want to change something it is most likely the way that the data is stored
in a file (for example, you may want to store it in a comma-separated ASCII
format so that you can easily read it into a spreadsheet). You can make changes
to the storage format by modifying store( ), and to the way the data
is displayed by modifying show( ).
main( ) begins using the same three lines
you’ll start with for any POST program. The rest of the program is similar
to mlm.cpp because it looks at the “test-field” and
“email-address” (checking it for correctness). The file name
combines the user’s email address and the current date and time in hex
– notice that sprintf( ) is used because it has a convenient
way to convert a value to a hex representation. The entire file and path
information is stored in the file, along with all the data from the form, which
is tagged as it is stored so that it’s easy to parse (you’ll see a
program to parse the files a bit later). All the information is also sent back
to the user as a simply-formatted HTML page, along with the reminder, if there
is one. If “mail-copy” exists and is not “no,” then the
names in the “mail-copy” value are parsed and an email is sent to
each one containing the tagged data. Finally, if there is a
“confirmation” field, the value selects the type of confirmation
(there’s only one type implemented here, but you can easily add others)
and the command is built that passes the generated data file to the program
(called ProcessApplication.exe). That program will be created in the next
section.
You now have a lot of data files accumulating on your Web
site, as people sign up for whatever you’re offering. Here’s what
one of them might look like:
//:!
C07:TestData.txt
///{/home/eckel/super-cplusplus-workshop-registration/Bruce@EckelObjects.com35B589A0.txt From[Bruce@EckelObjects.com] [{[subject-field]}] [([ super-cplusplus-workshop-registration ])] [{[Date-of-event]}] [([ Sept 2-4 ])] [{[name]}] [([ Bruce Eckel ])] [{[street]}] [([ 20 Sunnyside Ave, Suite A129 ])] [{[city]}] [([ Mill Valley ])] [{[state]}] [([ CA ])] [{[country]}] [([ USA ])] [{[zip]}] [([ 94941 ])] [{[busphone]}] [([ 415-555-1212 ])]
///:~
This is a brief example, but there are as many fields as you
have on your HTML form. Now, if your event is compelling you’ll have a
whole lot of these files and what you’d like to do is automatically
extract the information from them and put that data in any format you’d
like. For example, the ProcessApplication.exe program mentioned above
will use the data in an email confirmation message. You’ll also probably
want to put the data in a form that can be easily brought into a spreadsheet. So
it makes sense to start by creating a general-purpose tool that will
automatically parse any file that is created by
ExtractInfo.cpp:
//: C11:FormData.h #include <string> #include <iostream> #include <fstream> #include <vector> using namespace std; class DataPair : public pair<string, string> { public: DataPair() {} DataPair(istream& in) { get(in); } DataPair& get(istream& in); operator bool() { return first.length() != 0; } }; class FormData : public vector<DataPair> { public: string filePath, email; // Parse the data from a file: FormData(char* fileName); void dump(ostream& os = cout); string operator[](const string& key);
};
///:~
The DataPair class looks a bit like the CGIpair
class, but it’s simpler. When you create a DataPair, the
constructor calls get( ) to extract the next pair from the input
stream. The operator bool indicates an empty DataPair, which
usually signals the end of an input stream.
FormData contains the path where the original file was
placed (this path information is stored within the file), the email address of
the user, and a vector<DataPair> to hold the information. The
operator[ ] allows you to perform a map-like lookup, just as in
CGImap.
Here are the definitions:
//: C11:FormData.cpp {O} #include "FormData.h" #include "../require.h" DataPair& DataPair::get(istream& in) { first.erase(); second.erase(); string ln; getline(in,ln); while(ln.find("[{[") == string::npos) if(!getline(in, ln)) return *this; // End first = ln.substr(3, ln.find("]}]") - 3); getline(in, ln); // Throw away [([ while(getline(in, ln)) if(ln.find("])]") == string::npos) second += ln + string(" "); else return *this; } FormData::FormData(char* fileName) { ifstream in(fileName); assure(in, fileName); require(getline(in, filePath) != 0); // Should be start of first line: require(filePath.find("///{") == 0); filePath = filePath.substr(strlen("///{")); require(getline(in, email) != 0); // Should be start of 2nd line: require(email.find("From[") == 0); int begin = strlen("From["); int end = email.find("]"); int length = end - begin; email = email.substr(begin, length); // Get the rest of the data: DataPair dp(in); while(dp) { push_back(dp); dp.get(in); } } string FormData::operator[](const string& key) { iterator i = begin(); while(i != end()) { if((*i).first == key) return (*i).second; i++; } return string(); // Empty string == not found } void FormData::dump(ostream& os) { os << "filePath = " << filePath << endl; os << "email = " << email << endl; for(iterator i = begin(); i != end(); i++) os << (*i).first << " = " << (*i).second << endl;
}
///:~
The DataPair::get( ) function assumes you are
using the same DataPair over and over (which is the case, in
FormData::FormData( )) so it first calls erase( ) for
its first and second strings. Then it begins parsing the
lines for the key (which is on a single line and is denoted by the
“[{[” and “]}]”) and the value (which may
be on multiple lines and is denoted by a begin-marker of
“[([” and an end-marker of “])]”) which it
places in the first and second members, respectively.
The FormData constructor is given a file name to open
and read. The FormData object always expects there to be a file path and
an email address, so it reads those itself before getting the rest of the data
as DataPairs.
With these tools in hand, extracting the data becomes quite
easy:
//: C11:FormDump.cpp //{L} FormData #include "FormData.h" #include "../require.h" int main(int argc, char* argv[]) { requireArgs(argc, 1); FormData fd(argv[1]); fd.dump();
}
///:~
The only reason that ProcessApplication.cpp is busier
is that it is building the email reply. Other than that, it just relies on
FormData:
//: C11:ProcessApplication.cpp //{L} FormData #include "FormData.h" #include "../require.h" using namespace std; const string from("Bruce@EckelObjects.com"); const string replyto("Bruce@EckelObjects.com"); const string basepath("/home/eckel"); int main(int argc, char* argv[]) { requireArgs(argc, 1); FormData fd(argv[1]); char tfname[L_tmpnam]; tmpnam(tfname); // Create a temporary file name string tempfile(basepath + tfname + fd.email); ofstream reply(tempfile.c_str()); assure(reply, tempfile.c_str()); reply << "This message is to verify that you " "have been added to the list for the " << fd["subject-field"] << ". Your signup " "form included the following data; please " "ensure it is correct. You will receive " "further updates via email. Thanks for your " "interest in the class!" << endl; FormData::iterator i; for(i = fd.begin(); i != fd.end(); i++) reply << (*i).first << " = " << (*i).second << endl; reply.close(); // "fastmail" only available on Linux/Unix: string command("fastmail -F " + from + " -r " + replyto + " -s \"" + fd["subject-field"] + "\" " + tempfile + " " + fd.email); system(command.c_str()); // Wait to finish remove(tempfile.c_str()); // Erase the file
}
///:~
This program first creates a temporary file to build the email
message in. Although it uses the Standard C library function
tmpnam( ) to create a temporary file name, this program takes the
paranoid step of assuming that, since there can be many instances of this
program running at once, it’s possible that a temporary name in one
instance of the program could collide with the temporary name in another
instance. So to be extra careful, the email address is appended onto the end of
the temporary file name.
The message is built, the DataPairs are added to the
end of the message, and once again the Linux/Unix fastmail command is
built to send the information. An interesting note: if, in Linux/Unix, you add
an ampersand (&) to the end of the command before giving it to
system( ), then this command will be spawned as a background process
and system( ) will immediately return (the same effect can be
achieved in Win32 with start). Here, no ampersand is used, so
system( ) does not return until the command is finished –
which is a good thing, since the next operation is to delete the temporary file
which is used in the command.
The final operation in this project is to extract the data
into an easily-usable form. A spreadsheet is a useful way to handle this kind of
information, so this program will put the data into a form that’s easily
readable by a spreadsheet program:
//: C11:DataToSpreadsheet.cpp //{L} FormData #include "FormData.h" #include "../require.h" #include <string> using namespace std; string delimiter("\t"); int main(int argc, char* argv[]) { for(int i = 1; i < argc; i++) { FormData fd(argv[i]); cout << fd.email << delimiter; FormData::iterator i; for(i = fd.begin(); i != fd.end(); i++) if((*i).first != "workshop-suggestions") cout << (*i).second << delimiter; cout << endl; }
}
///:~
Common data interchange formats use various delimiters to
separate fields of information. Here, a tab is used but you can easily change it
to something else. Also note that I have checked for the
“workshop-suggestions” field and specifically excluded that, because
it tends to be too long for the information I want in a spreadsheet. You can
make another version of this program that only extracts the
“workshop-suggestions” field.
This program assumes that all the file names are expanded on
the command line. Using it under Linux/Unix is easy since file-name global
expansion (“globbing”) is handled for you. So you say:
DataToSpreadsheet *.txt >>
spread.out
In Win32 (at a DOS prompt) it’s a bit more involved,
since you must do the “globbing” yourself:
For %f in (*.txt) do
DataToSpreadsheet %f >> spread.out
[27] Actually,
Java Servlets look like a much better solution than CGI, but – at least at
this writing – Servlets are still an up-and-coming solution and
you’re unlikely to find them provided by your typical
ISP.
[28] Free Web
servers are relatively common and can be found by browsing the Internet; Apache,
for example, is the most popular Web server on the Internet.
[29] GNU stands
for “Gnu’s Not Unix.” The project, created by the Free
Software Foundation, was originally intended to replace the Unix operating
system with a free version of that OS. Linux appears to have replaced this
initiative, but the GNU tools have played an integral part in the development of
Linux, which comes packaged with many GNU components.