Mon Jan 17 07:32:40 PST 2005 Updated for new site Fri Feb 2 12:57:49 PST 2007 Swish-e site. This site is maintained in CVS. Pages should not be modified directly. Check out the site from cvs, and modify the pages locally. Rebuild locally to review and then check in. Prerequisites ------------- The one thing I hate about Perl: HTML-Parser-3.56 HTML-Tree-3.23 HTML-FillInForm-1.06 hypermail-2.1.8 Pod-POM-0.17 swish-e-2.4.5 Template-Toolkit-2.15 TimeDate-1.16 (for Date::Parse) DateTime-0.36 (let's see how much code can be brought in for simple usage) Params-Validate-0.87 DateTime-Locale-0.33 DateTime-TimeZone-0.59 Module-Build-0.2806 Class-Singleton-1.03 There are likely other dependencies, depending on your systems base configuration. Note, the versions were the current CPAN versions and don't necessarily reflect dependencies. Overview -------- The website generation has a few dependencies. There are scripts to generate the various parts. Here's an overview of what's needed before the site can be generated. - swish-e current release source Used to build the swish-e docs. - swish-e cvs checkout Used to build the nightly build, and the docs from that. - archive (hypermail) Not actually required to build the site, as the site just points to the hypermail archive. But, might in the future use a single script to display the archive then it would be a true dependency. currently, procmail feeds new posts to hypermail to add new posts to the archive. The posts are also archived in the "mbox" directory by month in mbox format. - indexes When the archive is updated a flag is set and then cron can re-index the hypermail archive. Also, the index of the site is needed, but first need the site... - swish-daily This is where daily tarballs are placed. The website creates an index page from the list of files - swish-release Similar to the swish-daily directory, but for releases Directory Structure ------------------- A "top level dir" contains a number of directories. This top level directory will be called $ROOT below. $ROOT/ swish_daily_build/ - where the daily builds are created swish-daily/ - daily tarballs swish_release_build/ - where the release is build (for the docs) distribution/ - releases (tarballs, .exe, old stuff) old/ - this is needed. mbox/ - mailbox archives by month archive/ - hypermail archive indexes/ - indexes for the archive and website pubilc_html - bulk of generated website. Of those directories, public_html is the site's Document Root, and "archive", "distribution", and "swish-daily" are aliased in the Apache configuration. The names of the directors do not have to be as shown above and don't even need to all be under the same directory. But, if the names above are used then the site build script can be invoked with simpler arguments. Notes on building the HTML docs ------------------------------- There's somewhat of a circular dependency. Before the website can be built the swish-daily needs to be generated. This makes the pod docs available for generating the documentation. But, to build a daily tarball that include the HTML docs the website code needs to be installed. So, the swish_website needs to be checked out from subversion first. The swish-e configure script looks for the program "build-swish-docs" in the path. This is typically a symlink to the swish_website/bin/build program. This program is used to build the HTML documentation that is placed in the tarball generated by the swish-daily.pl program. Creating the above directories ------------------------------ The following examples use $ROOT as the top-level directory above. This might be /opt/swish, for example. swish_website ------------- The website isn't generated yet, but the code is needed before building the daily tarballs. This example places swish_website below $ROOT. This is not a requirement, but is nice if you want to keep everything in one place. cvn co http://svn.swish-e.org/swish_website/ $ROOT/swish_website ln -s $ROOT/swish_website/bin/build /usr/local/bin/build-swish-docs The actual site will be built later. swish-daily_build / swish-daily ------------------------------- Use the swish-daily.pl script to create the daily builds. The script uses the cwd for the default "topdir" if not specified, and also has the command to check out from trunk by default. But, it's probably more explicit to pass in the build directory by specifying "topdir". $ROOT/swish_website/bin/swish-daily.pl \ --topdir=$ROOT/swish_daily_build \ --tardir=$ROOT/swish-daily \ || echo "Problem building swish-daily" The script will exit non-zero if there's a problem and any errors should be in the log files. This should be run nightly. swish_daily_build will be created if doesn't exist, but the tardir directory must exist. mkdir $ROOT/swish-daily This does a fresh checkout. Obviously, doing a svn update would be much faster, but better to test a fresh checkout. After all that there's also a script Make_Daily.sh that does the above. swish_release_build / distribution ---------------------------------- This is only run when a new release is created. But the current release source needs to be available so the documentation can be generated on the website. This is similar to the above, but the source is not fetched via svn, rather from the current release tarball. Also, assuming the tarball is already in the releases directory there's no need to write it to the tardir. $ROOT/swish_website/bin/swish-daily.pl \ --fetchtarurl=http://swish-e.org/distribution/latest.tar.gz \ --topdir=$ROOT/swish_release_build \ --notimestamp \ --noremove \ --tardir=$ROOT/distribution \ || echo "Problem building release" There's also a Make_Release.sh script. Review it before use. Running --help will show available options. Note that this can be used to generate an actual release (by fetching from svn instead), but it's better to generate the release separately and test it. In that case REMOVE the --tardir option above. mbox / archive -------------- Copy the existing mbox directory from where ever they currently are located. Generate the hypermail archive. This needs to be done from the hypermail directory since that's where the templates are located. This step is only needed to create the initial archive. The archive is added to as new messages arrive (e.g. via procmail). cd $ROOT/swish_website/hypermail gzip -dc $ROOT/mbox/*.gz \ | hypermail \ -i \ -c hypermailrc \ -d $ROOT/archive.new \ && mv $ROOT/archive $ROOT/archive.old \ && mv $ROOT/archive.new $ROOT/archive Note, would really be good dynamically display the archive so that the headers and footers can be shared. public_html ----------- Now the website can be created. The bin/build script is used to generate the website. The script will detect when files need to be generated, or passing --all will tell it to generate the complete site. There are a lot of options, and running build --help will list them. The same build script is used when building from cvs to generate the html docs for the tarball. If the directories above are all in $ROOT and use the same names as above, then the script can be called like this: $ROOT/swish_website/bin/build --root $ROOT --all (The --all is optional) This is the same as calling the script and explicitly setting all the directories: $ROOT/swish_website/bin/build \ --dest $ROOT/public_html \ --indexes $ROOT/indexes \ --swishsrc $ROOT/swish_release_build/latest_swish_build/source \ --develsrc $ROOT/swish_daily_build/latest_swish_build/source \ --download $ROOT/distribution \ --daily $ROOT/swish-daily \ --all httpd.conf ---------- Setting up Apache is just a matter of pointing Apache at public_html, and creating aliases for the distribution, swish-daily, and the archive directories. The same bin/build script can be used to generate an httpd.conf file. $ROOT/swish_website/bin/build --root $ROOT --apache > httpd.conf You will likely need to specify the full path to the modules (-module_dir) unless you first cd to the ServerRoot directory and "modules" is a subdirectory. You may notice a few warnings: Warning: module '/usr/lib/apache2/modules/mod_access.so' was not found Warning: module '/usr/lib/apache2/modules/mod_auth.so' was not found Warning: module '/usr/lib/apache2/modules/mod_log_config.so' was not found This is normal on Apache 2.2 since those modules were renamed. The script will try the new names for those modules. For debugging locally, say just the web site mkdir $ROOT/indexes $ROOT/swish_website/bin/build \ --root $ROOT \ --apache \ --port 8080 \ --ipaddr `hostname` \ --domain `hostname` \ --user `whoami` \ --group `whoami` \ --nodev_site \ --nolists_site \ --nosvn_site \ --module_dir /usr/lib/apache2/modules \ > httpd.test The only reason to disable sites might be if you don't have the modules installed locally. Another example, for the p3 machine: $ROOT/swish_website/bin/build \ --root $ROOT \ --apache \ --port 8080 \ --ipaddr 70.42.42.162 \ --domain p3.swish-e.org \ --user `whoami` \ --group `whoami` \ --module_dir /usr/lib/httpd/modules \ > httpd.test mkdir logs run /usr/sbin/httpd -d $(pwd) -f httpd.test indexes ------- This directory contains two indexes. One for the archive and another for the website. This create the index of the archive mkdir $ROOT/indexes $ROOT/swish_website/bin/index_hypermail.pl $ROOT/archive \ | swish-e \ -S prog \ -i stdin \ -c $ROOT/swish_website/etc/archive.conf \ -f $ROOT/indexes/archive.swish-e \ -v0 This should be a cron task. Normally set a flag when the archive is updated (new message arrives). This creates the index of the website. The site is spidered, so Apache must be running. This uses the spider installed with swish-e -- and runs the spider via the swish config (instead of piping the spider's output to swish). Therefore, must cd to the website directory: # Note the SWISH_SITE if testing on a different host cd $ROOT/swish_website && \ SWISH_SITE=http://p3.swish-e.org \ SPIDER_QUIET=1 \ swish-e \ -c etc/swish.config \ -f $ROOT/indexes/index.swish-e \ -S prog \ -v0 Crontab ------- TODO Cron is used for a number of tasks. The user running cron needs to have write access to the $ROOT directory. Cron is used for: 1) create the daily builds 2) cvs update the website and rebuild if necessary (including reindexing) 3) cvs update an rebuild the website nightly using --all, just for good measure 4) reindex when new archives messages have been added See etc/crontab Procmail -------- Procmail is used to look for email messages from the swish-e list. A message is saved to the mbox directory and piped to hypermail to be included in the html archive. A flag is set to let cron know that the archive needs to be re-indexed. See etc/procmailrc ln -s $ROOT/etc/procmailrc .procmailrc ============================= NOTE ==================================================== Note: Sat Feb 3 20:36:10 PST 2007 Some of the content below has not be reviewed and may be inaccurate. swish_website layout -------------------- Here's an overview of the files that are used to build the website. ./ -> src - web source docs. -> bin - bin/build script and other utilities -> lib/config - site config templates. -> lib/config/site - contains swish-e current version (see TODO) -> lib/config/map - defines layout of site and site's menu -> lib/page - templates that define the look of each page -> lib/pod_toc - top-level index for pod docs Note: if speedy (speedyCGI) is installed and in the path at build time the the search script will run that instead of perl. Download and swish-daily directories ------------------------------------ These directory listings are *static* pages. If a new file is added to one of these directories you need to touch the source file or rebuild all pages: $ touch src/download/index.html && bin/build -v or $ bin/build -all -v Building the html docs for distribution --------------------------------------- Specify the pod destination directory: bin/build -all -swishsrc=$HOME/swish-e -poddest=$HOME/swish-e/html This adjusts the links so that every non-pod link points to the swish-e.org site; pod links are local. Content Creation ---------------- Links: All internal pages can (and probably should) be defined in the config/map file. This is mostly for defining the menu, but entries can be marked "hidden". Even off-site links can be entered (look for link to CVS) Entries in the config/map file can then be linked to using a macro: Check out the [% link_to_page('readme#Key features', 'full list of features') %]. That will find the (first) menu item that has "readme" for an id and create a complete .. tag. notice that it's possible to use fragments. The macro has at least three parameters: 1) the menu id to search for 2) optional text to use for the link -- otherwise the menu text is used 3) flag to say just return the href part of the link TODO: add MACROS for page sections and the abilty to create a TOC at the top of the page. Indexing the site for searches ------------------------------ To index the site run from the top level directory: $ SWISH_SITE=http://swish-e.org swish-e -c etc/swish.config -S prog SWISH_SITE is the top level URL. START_FILE is the starting file. Default is index.html (SWISH_SITE=http://swish-e.org/docs START_FILE=readme.html to index just one file) That writes the index to public_html/search/ Or to run it quiet: $ SPIDER_QUIET=1 SWISH_SITE=http://localhost/apache/public_html \ swish-e -c etc/swish.config -S prog -v0 Capture output (for debugging) like this: $ SWISH_SITE=http://localhost/apache/public_html \ /usr/local/lib/swish-e/spider.pl etc/spider.config > out todo: add metaname selection and create build script that is built from TODO ---- - add 404 and 500 pages (should they be cgi scripts to show the probled path, or can people figure that out on their own? - Have title be part of the layout (instead of added on each page)? - [done] Fetch version from -swishsrc/configure instead of hard-coded in config/site - Split out some of the lib/page templates into smaller components that can be overridden separately. - add to META on each page [author = '$Author$'] and maybe also $Date$, although lastmod is probably fine. Waiting until can think of a better way to do the same for the pod files. - [done] test speedycgi and install on sunsite - [done - not really needed under speedy ] enable template caching for the search script - [done] Probably should filter all hrefs= and name= through |uri|html. Plus some are very large. Maybe shorten or md5 them? If so, then need to adjust link_to_page(). [Well, partially done. Pod::POM prevents access to the #fragment so there's spaces in hrefs which is ok, I suppose but other chars could mess things up.] - [done - just rebuld, doesn't take any time] Try (again) to find a way to detect when a cvs update doesn't really update any files. - [done - see above]Add disk cache for table of contents for each POD page so index.html can be regenerated without having to parse all pods. Not that parsing a small handfull of pods is a big deal.... - [done] add timezone to dates and make UTC. - Set fixed pitch font for timestamp on file listings. Currently ugly.