embedded in application snntpbatch : Features and Options of nntpclient This software connects to news servers via NNTP and fetches messages which it stores to the filesystem. Image files in formats uuencode, MIME-multipart and yEnc (see -allow_yenc) are extracted, decoded, and stored separately. Also there are provisions for automated handling of split messages, which often are referred to as "multi part". The program can also upload simple messages with at most one binary file attached to each of them. Nevertheless more than one file can be given to produce an according number of messages. The name of each binary file is attached to the subject line of the message. All downloaded data and nearly all configuration are stored in a directory tree below a common base directory. (only exempt: $HOME/.snntpbatchrc ) Some simple HTML features are inserted into lists and messages. With your browser bookmark the file main_index.html in the basedirectory. This index of servers and groups is automatically maintained by -subscribe. Messages can be selected directly via their message id or by filtering with regular expressions (see man regex, man grep, man ex ...) If a message filename is already in the target directory, then the message including all images gets omitted if not "-overwrite on" is set. --------------------------------------------------------------------------- Getting Started --------------------------------------------------------------------------- --- Setup of Servers and Groups --- Create the base directory, the directory for your server and the directories for holding the message id list : $ basedir=$HOME/snntpbatch_download $ newsserver=INSERT_NEWSERVER_ADDRESS_HERE $ mkdir $basedir $ cd $basedir $ snntpbatch -subscribe $newsserver 119 - A typical news server address looks like "news.nowhere-net.com". You may now start your web browser and bookmark the main index page. It is still quite empty yet : $ netscape $HOME/snntpbatch_download/main_index.html & ----- no groups subscribed ----- If your news service requires a user-id and a password, you should edit the main startup-file and insert a line -login $ touch $HOME/.snntpbatchrc $ chmod og-rw $HOME/.snntpbatchrc $ vi $HOME/.snntpbatchrc a typical login line would look like -login JOHNDOE 8HM1RX If you use more than one news server then you may put this line in a local startup file (see below). But for a first try, $HOME/.snntpbatchrc is ok. The following action may last some time and you may skip that until you really need it : A list of available newsgroups can be obtained by this command : $ snntpbatch -server $newsserver -list_groups Read the result file in the server's directory $ view ${newsserver}.port119/grouplist presented in raw format as sent from the server. Four fields per line : groupname last_msg_number first_msg_number posting_yes_or_no Create the directories and files needed for the first news group : $ newsgroup=alt.binaries.pictures.fantasy-sci-fi $ snntpbatch -subscribe $newsserver 119 $newsgroup If you reload the bookmarked main index, you will see that group listed now. ----- Serverdirectory YOUR_NEWSSERVER.port119 ----- Group alt.binaries.pictures.fantasy-sci-fi first message |group index |most recent index (start message) |last message ----- The links are still dead until a first -list_interval and -fetch_interval has been performed. --- Daily usage --- Create a message index. It will contain some header information about those messages which have been added since the last -list_interval in $newsgroup on server $newsserver : $ snntpbatch -server $newsserver \ -group $newsgroup \ -list_interval new last This creates a HTML file list_????????_????????.html in the group directory which can be accessed also as copy list_00000000_00000000.html . On the main index page, the link "most recent index" of the group should be alive now. Also "group index". Fetch all recently listed articles with subject about "Royo" but not about "Repost" . After downloading, the HTML hyperlinks in the messages and index files are adjusted : $ snntpbatch -server $newsserver \ -group $newsgroup \ -filter subject: '[Rr][Oo][Yy][Oo]' \ -and -not subject: '[Rr][Ee][Pp][Oo][Ss][Tt]' \ -fetch_interval list_start list_end \ -adjust_links - - - The texts [Rr][Oo][Yy][Oo] and [Rr][Ee][Pp][Oo][Ss][Tt] are regular expressions as used with programs like grep, vi, sed. See also: man 7 regex If there is no '-filter' then all messages in the given range get fetched. As soon as the first message has been downloaded, it is accessible via the link "(start message)" on the main index page. The "next" link within a message page gets adjusted when the download of the next message begins. So wait until that happens before you enter a message page. Or reload the page if the "next" link is still dead. The links "first message" and "last message" will get alive only after the -adjust_links command. If there are many Royo images or if you do not like them, you may cancel the run by creating an empty file nntpclient_stop in the group directory $ touch $basedir/${newsserver}.port119/$newsgroup/nntpclient_stop this stops the whole program run after the current article is downloaded. A less rigid way is to set the control index to a ridiculous high value. $ echo 2000000000 \ > $basedir/${newsserver}.port119/$newsgroup/nntpclient_index this will only end the -fetch_interval command and not the following ones. You may resume the download with just the same snntpbatch command again. Usually the indexing, fetching and relinking is done within shell scripts. So updating all groups can be a matter of single command without any arguments. If you see interesting messages in the list which are not fetched by your automats, then you can get them by simple calls like these two. Fetch the single message 110233 : $ snntpbatch -server $newsserver \ -group $newsgroup \ -fetch 110233 Fetch messages 110200 to 110399 . Get only those with subject about "Space" or "space" : $ snntpbatch -server $newsserver \ -group $newsgroup \ -filter subject: '[Ss]pace' \ -fetch_interval 110200 110399 A filter to prevent double downloading of files, if the names can be guessed from the subject line. It accepts subjects with '[Ss]paceship' or '[Qq][Mm][Aa][Nn]'. If the subject texts of the 'Qman's contain a filename with more than 4 characters before the first dot, this name is checked. The message is only accepted if the filename (case independend) doesn't already exists in the hashed directory $filerecord . After the file is accepted, an empty file with this name is created within the directory. The hashing structure is created automatically, but at least the parent directory of $filerecord has already to exist. For details, see the the explanation of option -filter below and keep in mind, that a test after -and is only performed if the test before this -and was true. So only 'Qman's which are not already known get recorded by "subjectfilerec:". -filter \ subject: '[Ss]paceship' \ -or -sub \ subject: '[Qq][Mm][Aa][Nn]' \ -and -not -sub \ subjectfile: '......*\..*' \ -and subjectfilefound: "$filerecord" \ -subend \ -and subjectfilerec: "$filerecord -return_true" \ -subend \ Complicated filters should be written into a separate file and employed by -read_from . In that case one does not need \ at line ends and has not to be aware of most characters which are special to the shell. Quotation marks '' and "" are interpreted like with the shell. No variable substition is performed, so you have to insert $filerecord as a constant text : -filter subject: '[Ss]paceship' -or ( subject: '[Qq][Mm][Aa][Nn]' -and -not ( subjectfile: '......*\..*' -and subjectfilefound: "INSERT_$FILERECORD" ) -and subjectfilerec: "INSERT_$FILERECORD -return_true" ) Another approach which may be more familiar to programmers is case distinction by a chain of "-if" expressions and "return:" decisions : -filter -if subject: '[Ss]paceship' -then return: true -elseif subject: '[Qq][Mm][Aa][Nn]' -and -not ( subjectfile: '......*\..*' -and subjectfilefound: "INSERT_$FILERECORD" ) -then subjectfilerec: "INSERT_$FILERECORD -return_true" -and return: true -else return: false -endif Note that there is no support for statement sequences. So the test "subjectfilerec:" which records the filename and the test "return:" need to be concatenated by "-and" or "-or". Because of "-return_true" the operator "-and" is the right one to cause "return:" to be performed. C-Programmers: do not use "-else -if" but "-elseif". If you expect a lot of downloaded data and your $HOME filesystem is small you may consider to copy $HOME/snntpbatch_download to a larger filesystem. Either you may set the new basedirectory by -basedir {new_name_of_snntpbatch_download} at the very start of $HOME/.snntpbatchrc or you may establish a link from $HOME/snntpbatch_download to the new location. One may also move some subtrees of the base directory to other filesystems if needed. Here, links are the only means to keep it working. A happy snntpbatch installion can easily download hundreds of megabytes per day. So keep an eye on your disk usage. --- Disk Cleaning --- Remove from the newsgroup directory all messages and binary files which are older than 14 days. Remove message index files which only contain expired messages. Also this option adjusts the links of messages and index lists. $ snntpbatch -remove_old_files $newsserver 119 $newsgroup -14d Remove message ids from the directories below msg_idlist which are older than 30 days. These files form a database which prevents multiple downloading of identical messages. Also they contain some header information $ snntpbatch -remove_old_ids -30d When creating new directories, -subscribe imposes a limit of at least 3 days for groups and message ids (via file sfile_remove_old_files_allowed ). --- Posting --- For posting messages, a server and a group have to be set. Usually some id informations are set before posting is performed. Post three images from stic's sample collection to a test newsgroup : $ cd $(cat $HOME/.stic_main_dir) $ newsgroup=alt.test.binary.posting $ snntpbatch -server $newsserver \ -group $newsgroup \ -post_id my_fantasy_host my_fantasy_name my_organization \ -post "please ignore - just a test" "some text" \ images/birds/* This will result in three messages with subject lines : please ignore - just a test - 00000010.jpg (10K) please ignore - just a test - 00000055.jpg (15K) please ignore - just a test - 00000058.jpg (17K) and the body texts will just consist of the words "some text". After each posted message the program waits for confirmation by the server. message sent. waiting for confirmation ... ... accepted. In case of failure, you will get a reaction like this message sent. waiting for confirmation ... ... rejected. status = 441 Invalid domain in from-header this server does not like my_fantasy_host, give it a more realistic one. Since some servers do send error message without CR LF at the end, the program might get stuck in this phase. The received part of the error message should be visible at the terminal's bottom line then. Sorry to say, there is no remedy against this effect yet. You will have to -subscribe $newsserver 119 $newsgroup to see the result. It may last some time until the messages appear on your news server. Post a reply to the message with id <38B1C855.3D17F85A@home> : $ snntpbatch -server $newsserver \ -group $newsgroup \ -post_id my_fantasy_host my_fantasy_name my_organization \ -post_ref '<38B1C855.3D17F85A@home>' \ -post "please ignore - just a test" \ "testing follow-ups" \ images/misc/00000059.jpg It also is ok to give the message id without brackets 38B1C855.3D17F85A@home Cancel the message with id <09400808BE5838D17FCF8088@my_fantasy_host> : $ snntpbatch -server $newsserver \ -group $newsgroup \ -post_id my_fantasy_host my_fantasy_name my_organization \ -cancel '<09400808BE5838D17FCF8088@my_fantasy_host>' Cancelling does not work with all news servers. Especially if -post_own_msgid is set to "on". --------------------------------------------------------------------------- Advanced Issues --------------------------------------------------------------------------- --- Downloaded Data --- The directory structure for storing downloaded data is given by a base directory, the servername, the portnumber, and the group name: $HOME/snntpbatch_download/${servername}.port${portnumber}/${groupname} The base directory (default: $HOME/snntpbatch_download) can be set with the option -basedir . This directory structure can be created by option -subscribe if the base directory already exists. Message texts are stored as "msg${messagenumber}.html" in their group's directory. ${messagenumber} is padded with zeros to 8 digits. Binary files (images) are stored in a subdirectory called "bin". The image filenames are read from the message content. If they collide with a filename which is already in the target directory, then they are extended by "_" and a 7 digit number. A suffix after the last dot in the filename is preserved. The binary content of split messages may also be stored in "bin" provided the message subject gives hints which are clear enough. See below for combining and decoding of such parts. Content summaries (generated by -list_interval) are stored in the group's directory as files with name list_${first_number}_${last_number}.html. The file "list_start" contains the first number. The file "list_newest" contains the number of the last message header actually read. In case of connection problems this number can be smaller than original ${last_number}. The list file is renamed accordingly then. A copy of the most recently successfully created list file is available as list_00000000_00000000.html . Each link of the main index leads to a group index list_index.html which lists links to content summaries. It consists links which lead to single message files, and therefore is called message index. Usually the message files get downloaded later so these links do not work immediately after creating the message index. The final link structure is generated by option -adjust_links after the desired messages have been loaded. This operation also generates the group index (all available message indice in the group). It should be performed as often as a bunch of new messages has been downloaded from a group. After downloading the messages which are fetched by a single -fetch_interval are connected by a preliminary link chain : "previous" points to the message fetched before. "next" points to the message which was fetched afterwards. This link gets its final content only after downloading of this following message started. Messages downloaded by -fetch simply point to the neighbouring message numbers wether they exist or not. If a message contains references to other messages, the last one is checked wether it points to a downloaded message in the same group. If this is true, then a link to that message is inserted in the header of the one which is freshly downloaded. Also a link back is inserted into the referenced message. --- Multitasking --- Each download process operates a single connection to the news server. The processes are prepared to coexist even if they are downloading from the same server and group simultaneously. The standard precautions ensure that the processes do not get confused by eventual peers. Nevertheless, for those simultaneous downloads, there are two issues where more coordination is desireable : -fetch_interval should wait until one of the peers has finished its run of -list_interval. This can be achieved by the option -wait_for which waits until a certain file gets modified but also may accept past but recent modifications. If this option watches for file list_updated in the group directory, then it will only go on with the other options when the -list_interval of a peer has ended. Example. Go on if -list_interval is less than half an hour (1800 seconds) ago or wait for a new -list_interval to be finished : snntpbatch -server ... -group ... -wait_for list_updated 1800 \ -filter ... -fetch_interval ... During -fetch_interval the peers should avoid to apply the filter tests to those messages that have already been tested by others using the same filter. The option -coop_idx allows for skipping longer runs of already tested messages and makes the other peers sleepy while one of them is heavily filtering and rejecting messages. This can reduce the CPU's workload and speed up the single active filter tester. Example. Use the file coop_for_fnsf_filter to share the number of the current message, be sleepy for up to 2 seconds while checking 10 times wether the other process still is busy with heavy filtering : snntpbatch -server ... -group ... -filter ... \ -coop_idx coop_for_fnsf_filter 2000000 10 -fetch_interval ... --- Split Messages (aka Multi Part) --- Some posting software (not this) may split up a single message into several parts and post each of them as separate articles. Usually the parts get the same subject line except a increasing counting number like in "(2/3)". Regrettably this protocol is very much prone to misunderstandings and problems, so it gets only handled if the file nntpclient_multi_bin_tee exists in the group directory. It is created by -subscribe but may be removed manually if not desired. Articles which look like the first part of a split message are checked wether they contain a single uuecoded binary (or yEncoded). From the beginning of this binary all encoded data are written to a file in the group's "bin" directory. The file name is derived from the subject text and ends with .tee The bodies of all following parts are copied entirely to *.tee files. Articles which are in MIME format eventually get copied including all headers to a file *.msg.tee . After downloading is done, the *.tee files of a group may be combined and decoded by -merge_tee . This option examines the existing *.tee files which are already finished and not yet extracted wether all parts of a message seem to be present. In that case it combines and decodes them as a single binary (uuencode and yenc) or as a complete article (MIME). The same name rules as with normal downloading of binaries apply. snntpbatch -merge_tee $newsserver - $newsgroup Names of already extracted *.tee files begin with "X_", names of growing .tee files begin with "N_", and names of ready .tee files begin with "_". It is advised to set -background_tee to "omit" if one is regularly using -merge_tee . This ensures that no incomplete binary gets decoded and occupies the name of the binary file. In that case -merge_tee would have to use an alternate filename for the complete binary. --- Extracting Mail Folders --- Mail and newsgroups have nearly the same message formats. It is quite easy to automatically fetch mail via POP3 by some mail transfer agent but this does not unpack the binaries. If the mail got stored in a plain UNIX style mail folder (mail spool) file then it may be extracted into HTML messages and binaries as it is done with messages from a news server. For that purpose one has to subscribe to a pseudo group (like "my_mail") on a pseudo server (like "pseudohost") at a pseudo port (like "1"). Just take care not to produce a server-port-group combination that you might want to use with real newsgroups. Within the download directory of that group, install a symbolic link to your mailfolder file. The link's name has to be "mailfolder". The option -group will refuse to work with a directory where this file name exists. Example : snntpbatch -subscribe pseudohost 1 my_mail ln -s $HOME/mail/from_lycos \ $HOME/snntpbatch_download/pseudohost.port1/my_mail/mailfolder snntpbatch -extract_mailfolder pseudohost 1 my_mail new new Use a web browser to access the messages via main index and group index. Each time after you fetched new mail into your mailfolder execute again : snntpbatch -extract_mailfolder pseudohost 1 my_mail new new The semantics of a mail folder are different from those of a news server, though. There are no persistent message numbers in a mailfolder file. So the message number is incremented whenever a message is extracted. If the same message is extracted again, it will get a new message number. Therefore the message index file is created during extraction and there is no option like -list_interval with mail folder extraction. --------------------------------------------------------------------------- This software is copyright 2001-2003, Thomas Schmitt stic-source@gmx.net and is provided to you without any warranty under an open source BSD license. (see file COPYING) --------------------------------------------------------------------------- Options --------------------------------------------------------------------------- If the first option is # or ! then the whole list of options is ignored. (Useful for remarks in jobfiles) Options are processed strictly sequentially. Presets like -server or -filter only apply to the options that follow them. When in doubt, use the sequence of options presented here. (i.e -timeout before -server) Intervals of messages are given by the lowest and highest message number. The special codes "first" (or "start") and "last" (or "end") are replaced by the first and last message number as obtained when opening the group. The code "list_start" and "list_end" give the first and last number of the most recent -list_interval run in the group. The code "new" gives the first unlisted message number (i.e. list_end+1). Some options require timedefinitions. Absolute timepoints are defined by the input format MMDDhhmm[[CC]YY][.ss]] of the UNIX command 'date'. For example: 021619302000 = 16 Feb 2000 19:30:00 local time Relative times are preceded by '-' or '+'. They are added to the current time. '-' therefore points to the past, '+' points to the future. The basic unit is 1 second. It can be modified by a letter at the end of the number: seconds 1s=1 , hours 1h=3600s , days 1d=24h , weeks 1w=7d , months 1m=31d , years y : Xy=X*366d Example: -3w = 3 weeks in the past = -1814400s Please note that 'm' and 'y' are possibly slightly larger than the actual calendar time range. -basedir directoryname set the basedir for downloading messages (default: $HOME/snntpbatch_download) -login username password set the login parameters for news servers which require authentication. This option should be stored in the file .snntpbatchrc in the server directory (e.g. $basedir/news.isp.port119/.snntpbatchrc ). This file should have read permission only for its owner. If you do not trust your superuser, store the file on a floppy and make .snntpbatchrc a symbolic link to the floppyfile's name. A missing link target has the same effect as a missing .snntpbatchrc , i.e. none. -html on|php|off enable|disable insertion of HTML tags in lists and messages. Mode "php" is an enhanced form of "on" which prepares the program for interaction with a web server. Eventually it should be set before -subscribe in order to install the PHP scripts and to perpetuate itself in the group's startup file. The scripts should only be executed while the current working directory is set to the group directory (as Apache does). When downloading data with -html "php" additional links to PHP3 scripts are inserted. Default mode is "on" -all_headers on|off enable|disable display of all message header lines. If set to "on" all header lines will be reported at the end of the message. Also all MIME headers and delimiters are shown within the message body. -pacifier off|on[:interval[:blinking]] enable | disable the display of i/o status on stderr (mostly with CarriageReturn but without NewLine). Mode "on" may be extended by a refresh interval in seconds and by blinking mode on|off . The refresh interval sets a minimum time between two status messages. If nothing happens at input the time span between two messages may be longer. A blinking command display signals continously incoming data. Stalled input does not blink. Example: -pacifier on:0.25:on -list_on_stdout on|off enable|disable printing of list result lines to stdout. Applies to -list_interval and -list_groups . -subscribe servername portnumber groupname create the directories necessary as targets for down- loading. If portnumber is 0 or negative, 119 is used. If the group directory did not already exist, then also create some controlfiles with initial values. If the group name is '-' , then no group directories are created. If the servername is '-' , then no server directory is created (just the basedir and the msg_idlist directories). Directories are created only if they did not already exist. No files are removed. -timeout seconds set the number of seconds to wait for reply After this limit is exceeded, the program closes the connection to the server and tries to re-establish it. Timeout numbers should be larger than the time needed to fetch a large article since an interrupted article has to be re-read from start. Default is 120 -timeout_bell on|off enable|disable the acustic signal on timout. Default is "on" -baudlimit baudrate restricts the average read bandwith to the given rate. This is done by waiting after too many bytes have been read. Since a few kB are buffered, the bandwidth at the communications port may vary. -port portnumber use the given port at the server. If a server aready has been specified, then the connection is closed and re-established. Usually this option is not necessary since port 119 is the default and should work fine. -server servername connect to the given server at the preset port. This should be a very early option since nothing can be fetched and no group can be selected without -server -list_groups write a list of groups into the file grouplist in the server+port's directory -group_quick on|off if set to "on" no data are requested from the server by a -group command. Since some servers expect an immediate request, there should be no delay until the fetching of the first message starts. Default is "off". -group groupname choose the group on the connected server. This should be an early option since nothing can be fetched without -group -filter -not | -or | -and | '(' | -sub | ')' | -subend | -if | -then | -elseif | -else | -endif | from: | date: | subject: | msgno: | msgid: | references: | bytes: | lines: | subjectfile: | larger: | smaller: | return: | print: | subjectfilefound: | subjectfilerec: set a filter for -fetch_interval or -list_interval . The default filter does not reject anything. Basically a filter consists of tests which use regular expressions (like grep and sed). An expression can test one of the message header properties "from:" "date:" "subject:" "msgno:" "msgid:" "references:" "bytes:" "lines:" so a test actually consists of two arguments. Special tests with pseudo properties are: "larger:" number matches messages which consist of more than the given number of bytes. "smaller:" number matches messages which consist of less than the given number of bytes. "subjectfile:" regular_expression try to read a filename from "subject:" and test wether it matches the expression "subjectfilefound:" option_list try to read a filename from "subject:" option_list is a list of directories separated by blanks. Nevertheless it has to be a single shell argument. Use quotation marks to protect the blanks. The filename is searched in these directories. The test is not case sensitive unless "-case" is given. "/*" at the end of a directory path works like with the shell parser and may result in a list of subdirectories. (eg. /home/fl/* ) This test can also handle hashed directories. "subjectfilerec:" option_list guess a filename from the subject line and create an empty file in each of the hashed directories listed in option_list. This is done at the end of downloading if a file with the guessed name was actually extracted and if -subjectfilerec is not set to "off". Case sensitivity is triggered with -case and -nocase. The result of the test is 'true' if a subject filename could be guessed else it is 'false'. It also can be forced by -return_true resp. -return_false within the list of directories. If a directory does not exist or is not hashed, then it is made a hashed directory. Usually the hashed directories are tested by "subjectfilefound:" to avoid duplicates. "return:" decision A final decision of the filter result. If this test is performed the filter ends immediately. There are two possible decisions: "true" the filter accepts the message "false" the filter rejects the message This test is useful in -if-expressions. "print:" text The text is printed to standard output. Text has to be one word so in most cases it needs to be enclosed in quotation marks. This test always results true. There may be embedded property names in the text which get replaced by property's content. They begin with $ and end with their colon. Like $subjectfile: . A literal $ has to be written as $$ . Tests can be combined by the logical operations "-or","-and","-not" and can be grouped by brackets or pairs of "-sub" and "-subend". Precedence: -not, -and, -or . Regular expressions, option texts and brackets should always be surrounded by quotes to prevent the shell parser from interpreting them. With -read_from quotes are only needed to include blank characters. The tests are performed in strict left-to-right order. An important fact concerning speed as well as the effects of "subjectfilerec:" and "return:" : The right side of -and is only performed if the left side is true. The right side of -or is only performed if the left side is false. There also is a case distinction expression which uses -if ... -then ... [-else ...] -endif It evaluates the value of the expression between -if and -then. If it is true, the expression between -then and the -else [or -endif] is evaluated and becomes the value of the whole -if-expression. All other expressions up to -endif are omitted then. In case the expression between -if and -then evaluates to false, the expression between -else and -endif is evaluated and used as total value of the -if. If there is no -else before -endif the result is false. To prevent masses of -endif, the operator -elseif is equivalent to : -endif -or -if It works like "else if" in C or "elif" in the Bourne shell. There may be an arbitrary number of -elseif ... -then ... cases between the first -then of an -if-expression and its -else (or -endif). -subjectfilerec on|off enable or disable the recording of filenames by filter test "subjectfilerec:". Default is "on". -overwrite on|off enable|disable overwriting of already downloaded messages. Image files don't get overwritten but the newly loaded images get new names. Default is "off" -background_tee on|off|omit enable|disable renaming of binary files which results from the first part of a split part message. This will reserve the original name for a subsequent run of -merge_tee. Renaming is done as if the original file name was already existing. If set to "omit" no binary file is created at all. Default is "on". -allow_yenc on|multi|off enable|disable decoding of the 8-bit format yEnc. If enabled by "on" there must be available a command syenc_decode which reads yEnc (including first line "=ybegin ..." and last line "=yend ...") at stdin and writes decoded bytes to a file given by option -o. E.g.: syenc_decode -o bin/joystick.jpg An implementation of syenc_decode resides in stic's ./bin with a symbolic link in ./scripts . If it is not reachable there, copy it to a suitable place. If enabled by "multi", not only the first message but all of a split part post will be decoded. This is convenient but raises the danger of overwriting files with identical names. Default is "on". Note: This program cannot use yEnc for posting. -group_stat_curbs max_articles sets curbs for the options -group_statistics and -grouplist_statistics . No more than the given number of article headers may be considered for a single group. With groups that are larger, the oldest headers are not read from the server. (default 10000) -group_stat_limits min_articles min_total_size min_max_article_size sets limits for the options -group_statistics and -grouplist_statistics . A group's overview result gets only reported if the group contains at least as many articles as given by parameter min_articles. The total sum of article sizes has to be at least min_total_size. The size of the largest article has to be at least min_max_article_size. -group_statistics target_file print a summarized overview of the current group to standard output. If target_file is not '-' or '.' then the output is also appended to this file. The result has the following form Group : ${groupname} Age : ${oldest} to ${youngest} days Size : ${x} bytes in ${y} articles with ${z} lines Avg. : ${average} bytes per article Max. : ${max_size} bytes in article #{msg_no} Server: ${server} Port: ${portnumber} GMT: ${time} If a group contains no articles, then the lines 'Age' 'Avg.' and 'Max.' are omitted. If a filter is set, then messages get counted only if they pass that filter. -grouplist_statistics start_expression filter_expression target_file print a -group_statistics report for each group in the grouplist (to be generated by -list_groups). Reading and reporting does start with the first group which matches the regular expression start_expression. To be read, a group has to match the regular expression filter_expression and (if set) the limits given by option -group_stat_limits. Both expressions may be inverted by the prefix '-not:' -list_interval start_number end_number create a new message index. This file will contain all headers of messages with numbers between start_number and end_number (both included). Test the headers with the current filter (which matches everything if not restricted by option -filter ). -wait_for_file filename maximum_age_in_seconds wait for a data file to appear or to be modified. maximum_age_in_seconds > 0 also accepts files that have already been modified recently. 0 or negative numbers only accept future file events. Example (wait for a daily -list_interval to happen): -wait_for_file list_updated 43200 -fetch message_id download the message with the given id from the preset server and group. The id may be the message number or the unique message id (looks like a mail address). -coop_idx filename microseconds granularity to coordinate several simultaneous -fetch_interval with the same filter into the same group directory, there can be a cooperation file where the processes publish their current message number. If a peer with a more advanced number is detected and if it is not downloading that message, then a sleep period starts. It lasts at most the given microseconds and at least the given microseconds/granularity . In any case, messages are skipped up to the published one. This avoids multiple filtering of the same message and thus reduces CPU load. Example: -coop_idx for_fantasy_filter 2000000 10 Filename "-" disables this cooperation mode. -fetch_interval start_number end_number fetch all messages with numbers between start_number and end_number (both included). Test the messages with the current filter (which matches everything if not restricted by option -filter ). -extract_mailfolder pseudoserver pseudoport pseudogroup min_date max_date Read messages from a UNIX style mailfolder file. pseudoserver, pseudoport, and pseudogroup depict a group directory where a file with name "mailfolder" contains plain mail messages separated in the usual way. I.e. an empty line followed by a line which begins with the text "From " followed by a word and a valid time stamp like "Sun Aug 3 19:09:49 2003". Lines have to be NL separated (ASCII 10), not CRNL (ASCII 13 10). min_date and max_date define the date range to be extracted. They get compared against the time stamps of separator lines, not against the time stamps from the "Date:" mail headers. Besides the usual time formats the words "new" or "last_run" as min_date depict the time of the last extraction. The words "new" or "future" as max_date depict a day in January 2038. Additionally all filter tests apply (but some headers like "References:" or "Bytes:" may be missing in a mail folder). If at least one message has been extracted then a message index file is created like with -list_interval. -merge_tee servername portnumber groupname check the group's _*.tee files and eventually merge the binaries with complete file sets. .tee files are created when a message subject indicates a split message and the file nntpclient_multi_bin_tee exists. Processed .tee files are renamed to X_*.tee . -adjust_links servername portnumber groupname adjust the links in the group's message files to avoid all skipped messages and to connect messages from different download runs. Also generate the group index list_index.html in the group's directory. If servername, portnumber or groupname are '-' then the settings of the most recent -server, -port and -group options are used. -remove_old_files servername portnumber groupname timedefinition check the files related to the group wether they are older than the defined time and eventually remove them. This option finally runs -adjust_links . If servername, portnumber or groupname are '-' then the settings of the most recent -server, -port and -group options are used. The group's directory has to contain a file with name sfile_remove_old_files_allowed . This file may contain a timedefinition which prevents remove runs with more recent timepoints. If empty, no limit is imposed. -remove_old_ids timedefinition check the files in the idlist directories wether they are older than the given time. The directory msg_idlist has to contain a file with name sfile_remove_old_files_allowed . This file may contain a timedefinition which prevents remove runs with more recent timepoints. No limit imposed if empty. -release_all_locks above operations are permitted only once at a time. If the program aborts while such an operation, a lock file 'maintenance_lock' may remain and prevent all further actions. This option removes all lock files. Better run only if no other snntpbatch is active. -post_id hostname username organization set the components of the sender information in posted messages. These will be used with the header lines From , Organization and eventually Message-ID . The default is "somewhere", "anonymous", "none" Personal data from system files or mail settings will not be published by this program. -post_own_msgid on|off if -post_own_msgid is set to "on" then posted messages will get a message id generated from local properties and using the hostname given with -post_id If set to "off" posted messages will rely on the id generator of your news server. Default is "off". -post_ref message_id set the message id for the "Reference:" header line. The following messages will be posted as follow-ups to the message with this message id. -post_style flat|indent|stair set the posting style for multiple binary files. In style "flat" all messages are refering to the one defined by -post_ref (or to none without -post_ref). Style "indent" lets only the first message refer like in flat style. All other messages refer to this first one. Style "stair" lets each message refer to the previous message. The first one refers like in flat style. -post subject message_text [ binary_file [ ... binary_file ]] -post_end post messages to the preset server and group. At least the two arguments subject and message_text have to be given. All further arguments up to "-post_end" are considered to be names of binary files. -post_end is not necessary if no other options follow the binary list. If there is more than one binary file given, copies of the message subject and message_text are posted together with these additional files. The format complies to RFC850 (pre-MIME). Binary files are attached as uuencoded lines after message_text. -cancel message_id remove the depicted message from the news servers. -post_id has to be the same as used when posting the article. -post_control control_text posts a message with control_text as content of the header lines Subject: and Control: and also as message body. Use this option only after reading RFC850 and its successors to learn about the meanings of various control texts. -hashfilerec option_list fileaddress [... fileaddress] [-hashfilerec_end] records the given files in the hashed directories given in the option_list like it is done with -filter test "subjectfilerec:". If there is more than one directory or options like -case in the list, care must be taken to keep the whole list as one argument. E.g.: -hashfilerec '-case /home/news/fl/scifi' \ /home/dragons/* -hashfilerec_end The end marker -hashfilerec_end is needed only if there are other commands following the filenames. -hashfileadr option_list fileaddress [... fileaddress] [-hashfileadr_end] calculates and prints the address of a the given files within the first directory in option_list. For the meaning of option_list see -filter "subjectfilerec:". -hashfilefound option_list fileaddr [... fileaddr] [-hashfilefound_end] looks for the fileadresses in the directories of the option_list and of each prints the first found. For the meaning of option_list see -filter "subjectfilerec:". -make_hashdir directory creates the files and directories necessary for a hashed directory with 251 slots. If the directory does not exist it gets created. The parent directory already must exist, nevertheless. -deepen_hashdir directory maximum_split minimum_fill converts the subdirectories of a hashed directory into hashed directories and hashes their files into the new structure. (dir/00 may become dir/00/00 ... dir/00/F0) There will be created at most maximum_split new subsubdirectories in each subdirectory. If this value is 0 then the largest possible number will be used. The split is done only if the new subsubdirectories will contain averagely at least minimum_fill files. Example: -deepen_hashdir /home/news/fl/scifi 20 10 If minimum_fill is negative then the split is done if the old subdirectory has at least minimum_fill files. Example: -deepen_hashdir /home/news/fl/scifi 0 -5000 -read_from filename read options from file. Empty lines and lines which start with '#' or '!' are ignored. All others have to contain at least one option and all its additional arguments. -help Print this text. --------------------------------------------------------------------------- Control Files --------------------------------------------------------------------------- The run can be controled by files in the base directory or in the group's directory : nntpclient_stop ..... if this file exists, the run ends before the next message is fetched or the next option is executed. The file gets removed by nntpclient. nntpclient_stop_all . similar to nntpclient_stop but this file does not get removed by nntpclient. Do not forget to remove this file when not needed any more. nntpclient_baudlimit contains a number which limits the bandwidth used by the client. See option -baudlimit above. If a baudlimit is set, this file is checked after each transmission at the communication port. If no limit is set then this file is checked only before fetching messages or executing options. Just like the other files. nntpclient_no_beep if this file exists, beeping is disabled. Beeps are issued when serious problems with the server are encountered. nntpclient_skip_server_failure omits the usual handling of faulty server replies (e.g. premature end of message). Usually the command -fetch_interval waits some random seconds, reconnects to the server and tries again to obtain the message. Only after the fifth failed retry it will skip the message. With file nntpclient_skip_server_failure existing there is no waiting, no reconnect and no retry. The message is skipped after the first faulty reply. The following files are only working in the group's directory : nntpclient_pause .... a decimal number in the file tells how many seconds to pause. The number is re-read once in a second. If the file is removed externally then the run is resumed. If the time elapsed is larger than the current number in the file, then the file gets removed and the run is resumed. nntpclient_index .... a decimal number in the file sets the current message number in the current -fetch_interval or -list_interval. Useful to skip nasty messages. But use this option with great care. nntpclient_multi_bin_tee if this file exists and the subject line matches the regular expression .*[[(]1/[2-9][])].* then the binary data are also stored undecoded in a file in the group's bin directory. The file's name is the subject line with some characters replaced by a '+' and their ASCII-Code in hex. This is done for those characters which are illegal or inconvenient for file names. This filename gets the additional ending .tee . If the subject matches .*[[(][2-9]/[2-9][])].* then the whole article body gets stored in such a .tee file. The intention is to gather all parts of a split message's binary and put them together with -merge_tee or some external decoder. nntpclient_multi_bin_tee_2dg extends the search range to split messages which have up to 99 parts. The additinal search expressions are .*[[(]0[01]/[1-9][0-9][])].* for start messages and .*[[(][0-9][0-9]/[1-9][0-9][])].* for follow-ups. Nevertheless, this setting is prone to be mislead by subject lines which show (image/total_of_images) rather than (part/total_of_parts) . This file works in the server+port directory : nntpclient_no_xover disables the use of NNTP extension XOVER which saves a lot of time with -list_interval and filtered -fetch_interval . XOVER might not be available on all servers and the xover_*_* cache files occupy some disk space. --------------------------------------------------------------------------- Directory Structure --------------------------------------------------------------------------- Root of the tree is the base directory. Usually $HOME/snntpbatch_download . main_index.html HTML index of all subscribed newsgroups msg_idlist Directory which memorizes unique message ids to prevent duplicate downloading. 00 .. FA subdirectories which are used as fields of a hashtable sfile_remove_old_files_allowed Gives permission to run -remove_old_ids and may contain a time limit which protects young files. ${servername}.port${portnumber} Directory with all server specific data grouplist Text file with list of groups (result of NNTP command LIST) ${groupname} Directory with all data specific to a single group bin Directory with extracted binary files list_index.html Group index. Links to message index files. list_00000000_00000000.html Copy of newest message index. list_${startnumber}_${endnumber}.html Message index to messages within number range shown by name list_newest File containing the number of the last indexed message list_start File containing the number of the first message in the most recent message index list_updated File containing a timestamp. Written when -list_interval finishes. maintenance_lock indicates that a maintenance process (-list_interval -adjust_links, -remove_old_files) is in progress and no other may be started. The file's content tells about hostname,process-id,client-id and the operation in progress. msg_${messagenumber}.html File containing the message text and HTML links to binary files which have been extracted. msg_first.html Copy of the first message in the directory msg_new.html Copy of the first message of newest message index msg_last.html Copy of the last message in the directory sfile_remove_old_files_allowed Gives permission to run -remove_old_files and may contain a time limit which protects young files. mailfolder An optional UNIX style mailfolder file (or a link to it) which shall be extracted by option -extract_mailfolder . The option -group will refuse to work with a directory where this file name exists. mail_last_fetchdate Date of last run of -extract_mailfolder . Used for the special timestamp min_date="new" . mail_last_msgno Number of the last message extracted from mailfolder. This number gets incremented with ever extracted message These PHP scripts are created by -subscribe if -html php is set. If there are files with the same name in the base directory, these file are copied to the group directory. Otherwise the program produces the standard scripts out of its own. See also below: "Collaboration with a web server" fetch.php3 Checks wether a given message is downloaded and eventually fetches it from the news server. list_interval.php3 Gets a list of new messages from the news server. remove_old.php3 Removes old files from the group and also old message ids. snntp_doorman.php3 Checks wether the requirements for a snntpbatch run are fulfilled. --------------------------------------------------------------------------- Appendix --------------------------------------------------------------------------- --- Hashed directories --- The list of downloaded message ids and eventually the lists of downloaded filenames (see -filter, test subjectfilerec:) are stored in filesystem directories which contain a set of specially named subdirectories (like 4E). Such a hashed directory is marked by the existence of a file named sfile_hashed_directory which contains the number of subdirectories. This number is a prime between 251 and 2 and it must not be altered later. A subdirectory may be a hashed directory itself with a prime number lower than its boss directory. Such deeper hash structures may become necessary to handle hundredthousands or even millions of entries. The file sfile_hashed_directory and the subdirectories are created by "subjectfilerec:" when the first entry is made in a normal directory. There also is a command -make_hashdir to create a hashed directory. Deeper hash structures can be created by applying -deepen_hashdir to an existing hash directory. Beware: the first two levels of such a hash tree contain 251*241 = 60491 directories if not restricted by parameter maximum_split of -deepen_hashdir. Example : -deepen_hashdir /home/news/fl/scifi 25 0 splits a 1-level hashdirectory with 251 subdirectories into a 2-level hash tree with 251*23 = 5773 subsubdirectories. Future deepenings will be restricted to a factor of 19*17*13*11*7*5*3*2 = 9699690 though ;) --- Collaboration with a web server --- With the default setting the HTML pages are static and only represent the results of previous downloads. A link remains dead until the message is downloaded by an external snntpbatch run. While this will always be the main intention of snntpbatch, it is inconvenient for exploring unknown newsgroups. If you got a web server you may set up a snntpbatch base directory and enhance the results of snntpbatch by links to several PHP scripts. These scripts are able to start snntpbatch for subscribing to a group fetching a message from the news server updating the group by performing -list_interval removing outdated files (older than 3 days) CAUTION: The following setup should only be done if you can block it against unauthorized access. Either by a qualified configuration of your webserver or by hiding the webserver safely behind a firewall. DO NOT BLAME ME if you get hacked. Find out what user is running the web server ("wwwrun" on my box) and choose a directory in reach of that server. (/usr/local/httpd/htdocs is the root of http://localhost on my box. I choose that.) Be yourself and prepare the connector script. The web server user will not get its own stic installation. Mine does not even have a decent $HOME. $ cd $(cat $HOME/.stic_main_dir) $ cp scripts/snntpbatch_connector_generic scripts/snntpbatch_connector $ echo "$(pwd)"/bin/snntpbatch '"$@"' >>scripts/snntpbatch_connector If you need a -login for some of your news servers, prepare a directory where to hide these secrets outside the URL space of the web server. $ mkdir httpd_snntpbatch_server_dir $ chmod a+w httpd_snntpbatch_server_dir Print the stic directory for copying. Then become superuser, copy it and create the base directory. Donate it to the web server user. $ echo "STIC_DIRECTORY: $(cat $HOME/.stic_main_dir)" $ su # stic_dir=INSERT_STIC_DIRECTORY_HERE # basedir=INSERT_THE_CHOOSEN_PATH_HERE/snntpbatch # user=INSERT_WWW_USER_HERE # mkdir $basedir # echo "$stic_dir" >$basedir/.stic_main_dir # chown $user $basedir $basedir/.stic_main_dir Now set up the download area # su $user w$ basedir=INSERT_THE_CHOOSEN_PATH_HERE/snntpbatch w$ cd $basedir w$ stic_dir=$(cat .stic_main_dir) w$ cp $stic_dir/scripts/auth_template.php3 authenticate.php3 w$ cp $stic_dir/scripts/snntpbatch_connector snntpbatch w$ echo "-basedir $basedir" > .snntpbatchrc w$ echo '-html php' >> .snntpbatchrc w$ ./snntpbatch -basedir $basedir -html php -subscribe - - - w$ exit # exit Start a web browser and load the main_index.html . With my choice of a base directory, it would be http://localhost/snntpbatch/main_index.html There should be a link "subscribe anywhere" which leads to a PHP script. The PHP scripts are password protected by script $basedir/authenticate.php3 The preset user id is "me" with password "mypwd". If you use Netscape and it crashes here, try not to pop any window during the input of user id and password. Edit $basedir/authenticate.php3 and change the comparison values of $PHP_AUTH_USER resp. $PHP_AUTH_PW to your own desired values. Also the content of realm="..." might be set to some other text. The quality of protection depends much on the web server configuration. If there is a script authenticate.php3 in the server directory, then it is used instead of the one in the base directory. authenticate.php3 in the group directory overrides both others. If all three authenticate.php3 scripts are missing, no user id is required. Click on "subscribe anywhere" to get a form page with entries : "server" , "group" , "user_id" , "password" , "force_rc" fill in server and newsgroup. user_id and password are needed if the newsserver demands authentication. force_rc should be set if you want to change existing user_id and password. Click the "subscribe" button. You will get a page with a link "to group index (list_index.html)". Click it. You are now at a page with a link "update group" . Click and be patient. Netscape says it waits for localhost but actually localhost is waiting for the newsserver to reply. Do not interrupt the -list_interval going on. As soon as the update is done, you will get a page with a link "to group index (list_index.html)". Click it. Click one of the list links to get to an index of messages. Beneath any message link, there is a link "(fetch)" now. Like : "86095 " "(fetch)" If you click the "(fetch)" link, a PHP script will check wether the message is already downloaded and will eventually start snntpbatch to fetch it from the news server. In both cases it returns the message content to the webserver and browser. Within the messages, there are the usual links "previous" , "up" and "next" as well as links to fetch.php3 which show the message numbers of the neighbor messages. These outer links get not changed by -adjust_links . --------------------------------------------------------------------------- ---------------------------------------------------------------------------- Startupfiles with name '.snntpbatchrc' : ---------------------------------------------------------------------------- On start the program looks in the users $HOME directory for the above file. If it exists then it is executed like with option -read_from . After this the same is done within the current working directory's parent and afterwards with the current working directory itself. The option -subscribe creates such a file in the group's directory if it does not already exist. So if the program starts in a group's directory it automatically connects to the appropriate server and group and has its base directory set in a suitable way. Login name and password for a news server may be defined in the server's directory. Deny read permission to anyone else than yourself. (see above: option -login). If a global option is needed (like a different -timeout) then $HOME is a good place to define it.