I've come up with a few rules, and a general component outline. There's still some code in my head, but I'm saving that for later.
1) The map is a connected graph
2) Each node on the map represents a single page
3) Each link between nodes represents an anchor link between pages. Links are directional
4) Each group is a collection of pages
5) There are several kinds of groups: domain group, site group, link group
6) A domain group is a set of nodes that are under the same root domain
7) A site group is a set of pages that are under the same folder on a root domain
8) A link group is a subgraph where all nodes are bi-directionally connected to all other nodes
Components:
Perl spider to collect data, building an XML representation containing:
NODE
URL
Title
Links
Link Text
Link URL
Blurb
MapGenerator, language TBD (java or C++), loads the XML datafile and parses it, creating an abstract representation in memory, using NODE, LINK, and GROUP objects. It will then expose the map to multiple interfaces, including XML-RPC.
MapAssembler can take info sets from several MapGenerators and combine them. This is intended for use in a distributed mapping effort.
MapRenderer will then take the map data and present it, probably using Java3d, or OpenGL (I'm leaning towards java 3d because it'd be an awesome applet to have on a web site. Here "I" am, and here's what's closet to me.
All four levels will have filtering capability, based on group type, group membership, link counts, etc.