Programming Python supplements

This page contains additional descriptions and examples which don't appear in the first edition of the book. Some are clarifications to existing material, and others reflect answers to common questions from readers.

Related pages:


General updates and additions

Last updated: Aug 9, 1998

This section lists updates that should eventually make their way into the book. Some of these are a result of recent changes in Python, and some are a result of me being my own worst critic.

Also see the new supplemental integration examples page for additional supplements not listed below (the 6th item in the list above).


Chapter 15 supplement: registered code can be objects or strings

Chapter 15 tries to stick to general concepts, but in retrospect, the section on "object registration" may have been a bit misleading. It seems to imply that registration is tied to callable objects. It's not: the registered code can actually take the form of objects, code strings, or files. In fact, "registration" is really just a technique for telling C which code to run, by letting Python call through a C extension module.

In general, Python embedded code can take a variety of forms:

- Code strings:      expressions, statements
- Callable objects:  functions, classes, methods
- Code files:        modules, scripts, etc.

And embedded code may be located via:

- Modules:       fetching code by importing modules (on PYTHONPATH)
- Files:         fetching code from simple text files
- Registration:  letting Python pass code to a C extension module
- HTML tags:     extracting code from web pages
- Databases:     fetching code from a database table
- Processes:     receiving code over sockets
- Construction:  code may also be constructed at runtime
- And so on:     system registries, etc.

These categories aren't completely orthogonal. In common practice, code forms suggest location techniques:

- Code string sources:      files, modules, registration, databases, HTML
- Callable object sources:  modules, registration
- Code file sources:        modules, files

And like code sources, communication techniques depend on the code form as well:

- Arguments (input, output), return values (output)
     Objects
- Global variables: copy-in-copy-out (input, output)
     Strings, Objects
- Expression results (output)
     Strings
- Exported C extension module functions (input, output)
     Strings, Objects
- And others (files, sockets, stdin/out streams, etc.)

The registration technique usually suffers from the fact that it implies an extra coding step and an application structure, and does not readily support dynamic code reloading (there's no associated module). However, it's sometimes better when finer granularity is needed (e.g., Tkinter lets users associate Python objects with a large number of widget objets). The reloading problem may be solved by also registering a module name along with the Python code to be run, and (for objects) applying the indirection techniques discussed in chapter 11 (a raw Python object held by C won't be directly updated if its module is reloaded).

Again, chapter 15 was intended as a general look at API tools, which you can apply according to your application's structure. Nobody's complained about the issues above, but to avoid confusion, the categories above should probably be more clearly defined at the start of chapter 15.


Chapter 11 supplement: more on implicit widget object trees

Chapter 11 correctly states that we don't need to keep a reference to a widget object if we won't be using it after creation. For example, the new button object:

Button(parent, ...options...)

doesn't need to be saved away in an instance member, if we won't access the button in the future. The chapter also suggests that this is because Tkinter implicitly builds a widget tree internally, based on the parent widgets we pass to constructors. But this is a subtle point, and I'm not sure it was made as clear as it could have been.

More specifically, when we make a call such as:

Button(parent,...)

Tkinter internally cross-links the new widget with its parent: the new child refers to the parent object in its 'master' member, and the parent refers to the child in its 'children' member (a table). Because of these internal references, the new widget won't "go away" after you create it: it's part of an instance object tree built up by Tkinter. And because of that, you can also do things like:

Button(parent,...).pack()    # returns None!

to save a line--the instance is both inserted into the object tree by Tkinter and returned to you. But note that the "pack" method returns 'None': don't assign the result of a pack call to a variable. Again, if you really need to process the button yourself, use this form instead:

widget = Button(parent,...)
widget.pack()
...use widget...

Either form is fine, but be sure to pick the one that makes sense for the way you'll be processing the widget.


Chapter 15 supplement: retrieving Python error info in C

Chapter 15 states that PyErr_Fetch can be used to fetch the latest exception type and data, and gives its signature, but doesn't show how to actually use it. Here's a function which illustrates typical usage.

When an API call returns an error indicator (NULL, or integer status code), Python has already set exception information. To retrieve it, call the PyerrorHandler function below, and get the info as C strings from the 2 'save*' vars. Notes:

file: pyerrors.c
 
#include <Python.h>
#include <stdio.h>
char save_error_type[1024], save_error_info[1024];
 
PyerrorHandler(char *msgFromC)
{
   /* process Python-related errors */
   /* call after Python API raises an exception */
 
   PyObject *errobj, *errdata, *errtraceback, *pystring;
   printf("%s\n", msgFromC);
 
   /* get latest python exception info */
   PyErr_Fetch(&errobj, &errdata, &errtraceback);
 
   pystring = NULL;
   if (errobj != NULL &&
      (pystring = PyObject_Str(errobj)) != NULL &&     /* str(object) */
      (PyString_Check(pystring))
      )
       strcpy(save_error_type, PyString_AsString(pystring));
   else
       strcpy(save_error_type, "<unknown exception type>");
   Py_XDECREF(pystring);
 
   pystring = NULL;
   if (errdata != NULL &&
      (pystring = PyObject_Str(errdata)) != NULL &&
      (PyString_Check(pystring))
      )
       strcpy(save_error_info, PyString_AsString(pystring));
   else
       strcpy(save_error_info, "<unknown exception data>");
   Py_XDECREF(pystring);
 
   printf("%s\n%s\n", save_error_type, save_error_info);
   Py_XDECREF(errobj);
   Py_XDECREF(errdata);         /* caller owns all 3 */
   Py_XDECREF(errtraceback);    /* already NULL'd out */
}

Chapter 14/15: string conversions don't copy bytes

Chapter 14 discusses Python-to-C conversions (via the API functions "PyArg_Parse", and PyArg_ParseTuple"), and chapter 15 makes use of such conversions to extract embedded code results. The discussion and usage aren't in error, but neither chapter underscores a subtle point as strongly as they should: When converting from a Python string object to a C char* with the "s" conversion format code, Python assigns the C "char*" the address of a char array embedded in the string object. It doesn't copy out the char array's contents itself.

Why should you care? Because the C "char*" winds up pointing into a Python object, which will be reclaimed (garbage collected) when no longer referenced. If the Python string object is reclaimed while your "char*" is pointing into it, your "char*" might wind up referencing garbage if the Python object's space is reallocated from the heap.

This is a subtle point, and usually isn't an issue as long as you use the "char*" immediately after the conversion, and before calling another Python API function (or mallocing from the heap yourself, in C). In fact, this is why you get a char* from "s" conversions instead of a char array--since the most common practice is to use the char* right away after converting, the extra string copy isn't usually warranted.

But if you need to retain the "char*", or want to avoid problems altogether, you can take one these approaches with the built-in API:

When using "s" output conversion codes with the extended API in chapter 15 (e.g., Run_Function), you don't have control over the original string objects, (Convert_Result decref's it automatically), so the last solution above (holding the object) won't work. But these will:

In principle, the extended API could copy out the string to a malloc'd buffer automatically, but that implies a speed hit (malloc + copy), and means clients would need to free the result (or provide a big enough buffer--very eror prone). To mimimize risks, the next release of the API will probably hold onto the prior string object, and free it on the next conversion call, rather than immediately (it will save at most 1 string object):

Convert_Result(...)
{
    static last_string = NULL;
    if (last_string != NULL) {
        Py_DECREF(last_string);
        last_string = NULL;
    }
    ...
    if (strcmp(resFormat, "O") != 0) {
        if (strcmp(resFormat, "s") == 0)
            last_string = presult;
        else
            Py_DECREF(presult);
    }
    return 0;
}

Keep in mind that this is only an issue in rare cases (using the "s" code to convert a temporary string output/result), and nothing happens to the heap between the conversion and the return to your C call. So long as you use the char* immediately, this isn't a problem on any platform I've tested this on, and nobody has reported an error with the API in a year, so it's not classified as a bug (though the "s" behavior clearly should have been better documented in general).

But to be on the safe side (and much more user-friendly!), the above patch will appear in the second edition, along with other changes (e.g., separate chapters on the built-in and extended embedding APIs; the extended API was originally developed as a compact way to teach embedding, but it's actually being used in production contexts now).


Chapter 2/12: close shelves manually under bsddb

When using shelves (or simple dbm-style files) with the bsddb interface (a.k.a., dbhash), you need to manually close your shelves when you finish writing them (with a shelve.close() call).

Apparently, the deallocation procedure in the bsddb interface doesn't close the file automatically (all other dbm and file objects do), and/or you need some special creation mode flags (I've heard both explanations). The upshot is that you can't rely on auto-close at garbage collection time. Without the manual close() call, your files may be corrupted.

The dbm, gdbm, and dumbdbm interfaces close automatically, and the shelve examples in the book assume this works. But if your code might end up using the current (1.4) bsd file interface, it's better to manually close your shelves, just to be on the safe side.

But what does bsddb have to do with shelves (he asks, rhetorically ;-)? Recall that a shelve is a dbm-like file plus pickling; bsddb will be used for your dbm-like file if it's the only keyed-file interface you have installed (see the "anydbm.py" library module for more details; shelve imports anydbm to get whatever keyed-file interface is available).


Chapter 15: more on precompiling code strings

There's a (all too brief, and unindexed) sidebar on page 623 which discusses how to precompile Python code strings from C, and how to run the resulting code object. People seem to ask about this often, so here are a few more details.

At least as of Python 1.4, you say something like this:

1) Compile a string to a byte code object

   PyCodeObject* code =  
   Py_CompileString(char *string,      /* code to compile */
                    char *filename,    /* for error messages */ 
                    int parsemode);    /* eval|file_input */

2) Run the byte-code object

   PyObject* result =  
   PyEval_EvalCode(PyCodeObject *code, 
                   PyObject *globalnamesdict, 
                   PyObject *localnamesdict);

where the last 2 arguments to PyEval_EvalCode are dictionaries that serve as the namespaces for the code you're running (just as in PyRun_String).

Hint: When in doubt, also try using the newsgroup 'search' utility at python.org (http://www.python.org/locator) and search on Py_CompileString or PyEval_EvalCode; I'm sure I've answered this one before.


Appendix A: CGI scripts versus Tkinter GUIs

A Python CGI script shows up in Appendix A. In a later edition there may or may not be more on CGI scripting; either way, I think a bit of context about the tradeoffs between CGI scripts and HTML versus traditional GUI APIs like Tkinter may be in order, especially for beginners.

Despite some of the wild comments that seem to pop up in magazines these days, CGI scripts are not a replacement for traditional GUI APIs, except for fairly simple user interactions, and even then only in certain environments. Here's why:

Beyond such distinctions, things become less straightforward. For instance, both CGI scripts and Tkinter programs can be used to implement platform-independent GUIs.

But if pressed, I'd say that CGI scripts and HTML are a less direct way to implement GUIs, and there's a limit to how much they can do (consider implementing an image-processing system in HTML). Because of that, traditional GUI APIs won't go away any time soon; even in Web applications, they are useful for implementing non-trivial GUIs with client-side applets. As usual, the choice between the two depends on your requirements:


Proposed updates for the second edition [future]

Note: there are no plans for a second edition yet (and I don't imagine there will be any until 1999). There will be other printings until then, but they'll just fix minor typos, and not include any new material. The items listed below are a sampling of things I'd like to include in a second edition when it finally happens. Comments are always welcome.


Back to the errata page
Back to my homepage



[Home page] Books Code Blog Python Author Train Find ©M.Lutz