# stic-0.7 http://stic.webframe.org stic-source@gmx.net Thomas Schmitt http://stic.sourceforge.net Some Tools for Image Collectors stic Content Overview Compilation and Installation Testing and Practical Usage of similar Testing and Practical Usage of simv A PHP simv+similar frontend for webservers Getting started with snntpbatch Cooperation with MySQL Upgrading from earlier versions Portability Issues Additional programs included Where to get supporting software Overview stic bundles a few Linux tools which are intended to support the task of collecting an unreasonable amount of pictures (preferrably in JPEG format). similar a program for detecting duplicate or very similar images. It maintains a database of characteristic color samples which it compares with submitted pictures. similar depends on libjpeg and ImageMagick 's convert (on a modern Linux desktop system these components should already be present). Storage medium may be a usual filesystem or a MySQL database. There also is a MySQL UDF extension to compare image samples within SQL queries. similar contains the communications module described at sagent. simv a core program to perform file management tasks on an image collection. Its main purpose is to coordinate file movements with the content of similar's database. This applies to importing new files which get tested against the existent collection, as well as to inform similar about moving and deleting files within the collection. simv depends on an external image viewer like ImageMagick 's display (should already be present on a modern Linux desktop system) or John Bradley's xv (quite a fast one). simv contains the communications module described at sagent. sagent a standalone version of the communications module used in simv and similar. This software receives input from its start terminal and multiple clients, distributes several types of output back to them, and is also able to act itself as such a client. Since communications mainly use TCP/IP there is an encryption layer (Blowfish with 128 bit keys) which provides user authentication. Any single activity of such a user may be particularly permitted or denied. Secure connections should be possible that way as long as one can defend the keyfiles and programs on client and server host against foreign access. Front-end connection software is available in C, Tcl/Tk and PHP3 to build custom clients. In the most primitive case even telnet can act as a client. The standalone program sagent may be used as communications node in a tree of clients. Another purpose is to be a shell frontend which sends commands to a server and receives its replies. snntpbatch a command line based NNTP (newsgroups) client. It is mainly intended for automatic download of images by use of a filter language. Nevertheless it also downloads the message texts and converts them to HTML code which includes the downloaded images. Also, it is capable of automatically posting sets of images to the newsgroups. The tools are designed to be very independent of the system flavor. On an average Linux desktop there should be no need to update existing system components. Actually one could use stic without having display equipment for graphics. Any program activity which is possible in dialog may also be performed in batch runs. Therefore the tools are quite suitable for users who like to get boring tasks automated and manual tasks simplified. All tools' code is open source and distributed under BSD license. Example images Credit: U. S. Fish and Wildlife Service (see images/CREDITS) Getting Started Compilation and Installation This is only tested on Linux with Intel processors. I confess that i need to learn about portable distribution of software. About expected portability problems read the section "Portability Issues" below. The commands shown in this demonstration will create some files in your home directory (e.g. $HOME/.stic_main_dir , $HOME/imagelist). Those files' names will be obvious in the command lines or pointed out in their explanations. Read the commands carefully and be sure to understand their general effects regarding the shell. If in doubt do not hesitate to ask stic-source@gmx.net . If you already installed a previous version of stic, better read the paragraph "Upgrading from earlier versions" first. To unpack the tarball go to a directory which is suitable for creating a subdirectory stic-0.7 with finally about 10 MB (it will grow). For simplicity it is assumed that the tarball is stored there too. $ tar xvzf stic-0.7.tar.gz To ease future upgrades, you should establish an "official" link to the current stic directory and use this link in all settings. (If you got a system which can handle symbolic links, that is. If not, you may decide to move stic-0.7 to that "official" name.) If this is an upgrade, you may have to remove the old link stic_dir first. $ test -L stic_dir && rm stic_dir $ ln -s $(pwd)/stic-0.7 stic_dir Enter the newly created directory and compile the C sources $ cd stic_dir $ ( cd src/stic_build ; make ) You should now have some programs in subdirectory bin . Like that : $ ls -l bin -rw-r--r-- 1 thomas thomas 0 Feb 23 2001 just_a_placeholder -rwxr-xr-x 1 thomas thomas 621151 Feb 11 12:04 sagent -rwxr-xr-x 1 thomas thomas 324088 Feb 11 12:04 sfrontend -rwxr-xr-x 1 thomas thomas 940247 Feb 11 12:04 similar -rwxr-xr-x 1 thomas thomas 764017 Feb 11 12:04 simv -rwxr-xr-x 1 thomas thomas 629197 Feb 11 12:04 snntpbatch -rwxr-xr-x 1 thomas thomas 28676 Feb 11 12:04 syenc_decode For the moment, you may add the subdirectory scripts and maybe bin to your shell's PATH variable. Also you should publish the stic-directory in a small file in your $HOME directory. $ pwd >$HOME/.stic_main_dir $ PATH="$PATH:$(cat $HOME/.stic_main_dir)/scripts" If your system does not support symbolic links or if it appends .exe to program names, do also $ PATH="$PATH:$(cat $HOME/.stic_main_dir)/bin" Many of the scripts expect the binary programs in subdirectory bin and their helper scripts in subdirectory scripts. If you decide to move the programs to other places, it is best to create a link or a copy, rather than removing the program file from bin. If you are willing to leave the stic stuff where it is and to add this location to the PATH of every newly started shell, then edit the startup script of the shell and add at the end : $ vi $HOME/.bashrc ... PATH="$PATH:$(cat $HOME/.stic_main_dir)/scripts" For other ways to make stic accessible, see below: "File Locations". We will now test similar, adjust it to daily usage, test simv, adjust it to your collection, and have a look at snntpbatch. If you are mainly interested in the news client, skip to "Getting started with snntpbatch". Testing similar For all runs of similar, set a database name by writing it into the startup file. Choose a suitable one of your own. (Of course the directory given by the path has to exist already.) In this example as a suitable path to insert i will use : home/thomas/test $ vi $HOME/.similar_rc # Set standard database -mapname:/INSERT_SUITABLE_PATH_HERE/similar_map Note that no white space is allowed at the beginning of the file lines. Now you may register the pictures of the sample collection that came with stic. $ cd $(cat $HOME/.stic_main_dir) $ similar -append_nondash:on images/*/* append to imgmap : /home/thomas/test/stic-0.7/images/birds/00000055.jpg ... append to imgmap : /home/thomas/test/stic-0.7/images/images/sea/00000019.jpg $ wc -c /INSERT_SUITABLE_PATH_HERE/similar_map* 440 /home/thomas/test/similar_map_nam 1008 /home/thomas/test/similar_map_smp16 Let's try wether 00000010.jpg recognizes itself : $ display images/birds/00000010.jpg & $ similar -search_self_too -search:images/birds/00000010.jpg /home/thomas/test/stic-0.7/images/birds/00000010.jpg Now make a copy of that image and scale it by help of convert. $ convert -geometry 640x480 images/birds/00000010.jpg \ images/import/eagle_X.jpg $ display images/import/eagle_X.jpg & Look wether it is still recognizable : $ similar -search:images/import/eagle_X.jpg /home/thomas/test/stic-0.7/images/birds/00000010.jpg You may also get a numerical distance value. The smaller this value the more similar are the image samples. It is quite an abstract number, though : $ similar -search_distance:images/import/eagle_X.jpg 1.307371 /home/thomas/test/stic-0.7/images/birds/00000010.jpg Shoot at the bird (i'm nice to animals and bad in aiming): $ fcircle=fillCircle $ convert -geometry 640x480 \ -pen red \ -draw "$fcircle 250,350 250,360" \ -draw "$fcircle 550,230 550,240" \ images/birds/00000010.jpg images/import/eagle_X.jpg If you get the error message "Non-conforming drawing primitive definition (fillCircle)." then your convert expects : $ fcircle=circle Please retry above convert command. $ display images/import/eagle_X.jpg & $ similar -search_distance:images/import/eagle_X.jpg 1.593397 /home/thomas/test/stic-0.7/images/birds/00000010.jpg similar has its limits : $ convert -geometry 640x480 \ -pen red \ -draw "$fcircle 250,350 250,360" \ -draw "$fcircle 550,230 550,240" \ -draw "$fcircle 100,200 100,210" \ -draw "$fcircle 150,400 150,380" \ -draw "$fcircle 300,80 300,40" \ -draw "$fcircle 600,300 600,260" \ images/birds/00000010.jpg images/import/eagle_X.jpg $ display images/import/eagle_X.jpg & $ similar -search_distance:images/import/eagle_X.jpg But one may increase the tolerance : $ similar -match_par:8:1:6:2:color_diff \ -search_distance:images/import/eagle_X.jpg 2.604442 /home/thomas/test/stic-0.7/images/birds/00000010.jpg Now probably it's time for you to read doc/similar_helptext or execute : $ similar -help | less Try to get an overview of the commands specific to similar. You may stop reading at "Command reference of agent commands" and use the rest of the text for reference purposes on demand. Practical Usage of similar First prepare for a dedicated server process which guards the database files and avoids concurrency problems. Therefore create the directory for the keyfile which is used for encrypted TCP/IP communications (so you do not have to rely on your firewall). Also write at least 64 non-obvious characters into the file me.tnl : $ mkdir $HOME/tnl_keydir $ vi $HOME/tnl_keydir/me.tnl Ahjoe9h 3tugo dfjhieruhyoperui pherioauyjod zhuiperhl kjdfzkl hjidr hjdlf sgkhgier hldkhbjdfojlnm,gnklfdjbnkdfnd;fkl ndmn; d;lndjf;hjdfjbblkdkdx Convert that file into a usable keyfile with a content quite hard to guess : $ similar -no_rc -make_userkey:all:me userkey file created : /home/thomas/tnl_keydir/me.tnl Edit the startup file $HOME/.similar_rc so it finally contains the following lines (example for INSERT_SUITABLE_PATH_HERE : home/thomas/image_collection ): $ rm $HOME/.similar_rc $ vi $HOME/.similar_rc # Identify at servers as user "me". All communication will be encrypted. # There has to be a keyfile $HOME/tnl_keydir/me.tnl (see -make_userkey) -security:tcp:user:me # Set standard database (use a suitable directory path of your own) -mapname:/INSERT_SUITABLE_PATH_HERE/similar_map # Allow connection to guarded map servers with individual -match_par settings -auto_client:match_par # Only put out the 10 best matches -sortbuffer:truncate:10 Note again that no white space is allowed at the beginning of the file lines. If you intend to use MySQL as storage medium, please read below "Cooperation with MySQL" before you begin to register images. In order to register your whole image collection, you should write the image file addresses into a text file and then use similar -append_list . For example let find make a list of files within your collection directories: $ find INSERT_DIRECTORY_LIST_HERE \ -type f -and -not -empty -print >$HOME/imagelist $ similar -append_list:$HOME/imagelist Beware: depending on the size of the file list, this may take a long time. With large collections it may be better to append several smaller list than a single big one. Also you should mark the directories containing registered files by the file .imv_guarded_directory . Best is if either all files in a directory are registered images or none of them. $ for i in INSERT_DIRECTORY_LIST_HERE > do > touch $i/.imv_guarded_directory > done Start the dedicated server process, which will guard the database. This should be done quite early each time the computer is started. The process is not intended to run as a demon. It should have a terminal window for output and administration. (Don't forget to append PATH if this is not done automatically yet.) $ similar_server If you are short of virtual memory and your database files became quite large, then you may have to disable the memory cache. That will slow down the server substantially, especially if RAM is short. The cache's memory consumption is about 150% of the total file size. $ similar_server -similar_server_no_cache The server started by this script demands user authentication by encryption and allows all operations but shell commands or ending the service. We only defined one single user and its identity is ensured by the fact that server and client use the same keyfile. If they are not run by the same user on the same host, then one has to install a pair of matching keyfiles at both of their key directories. (see "Appendix agent C : Authentication and Encryption" in similar -help) Caution: the sagent module in similar is able to start shell commands if this is not denied (e.g -permission:all:shell:deny ). The shell commands are run with the system user id of the server's start user. similar_server permits shell commands only for the internal_user (the reserved user name of the start terminal). But be careful when you change the example server scripts and then permit server access to other system users. For real world security demands, one should test wether the denials actually work before one relies on them. (Hey, also read the "I'M NOT TO BLAME" part of stic's BSD license) To shut down the server process, go to its terminal, hit the @ key and enter (without any leading blanks) the command : -end If the server process crashes or gets killed harshly, then it may be necessary to remove the file {mapname}_pubno which publishes the server's TCP/IP address. With catchable signals it should clean up neatly, though. Anyway such an abortion of the program might lead to an inconsistent state of the database if it occurs during a write operation. (The time window for such an accident is quite small. Only the garbage collection spends a substantial percentage of its time on writing.) If you want to know the names of potentially duplicate files within your collection, you may start a cross check. Set the parameters for a more tolerant comparison. This may lead to some false duplicates but will yield better results with color manipulated images. $ > $HOME/list_of_possible_duplicates $ similar -match_par:8:1:4:2:color_double_diff -list_doubles \ >>$HOME/list_of_possible_duplicates 0 : 0 -----------#--------------##--------------#------- 50 : 4 -----#------------------------------#------------- 100 : 6 ... Each "-" represents a unique file, while a "#" shows that a suspected duplicate has been found. This might last quite a long time and blocks the server for other clients. Also, due to bash's i/o behavior, the resulting file names do not show up in the result file line by line but only in larger blocks. If you get curious or need to interrupt the operation for some other reason, then touch the file {mapname}_break : $ touch /INSERT_SUITABLE_PATH_HERE/similar_map_break The current search position gets written to a file {mapname}_list_doubles_pos You may resume that cross check by simply starting it again: $ similar -match_par:8:1:4:2:color_double_diff -list_doubles \ >>$HOME/list_of_possible_duplicates But be aware that starting it after it was completed, means to restart it from the beginning again. When done, set the match parameters back to their default values : $ similar -match_par:4:1:4:2:color_diff Each of the resulting filenames may be used as argument to the script similar_display which shows the possible duplicates together with the given image file (see below). Is is much more efficient though to use it as input of the program simv . In any case, set -match_par: as it was set during the -list_doubles run. Duplicates which shall not be reported again in future cross checks, may be marked by the command -distinguish of similar. After you cleaned up, you should test any new image file for duplicates before you move it into your collection and register it with similar. This is best done by use of the program simv. Default setting -match_par is recommended to decrease the number of false doubles in that dialog situation. From time to time, one should check the registered collection with a -list_doubles run that employs the laxer -match_par settings. One should also do a -list_doubles with -match_par:8:1:6:2:color_diff to find things like the heavily molested eagle_X.jpg . Generally, experimental parameters are encouraged :) There are a few scripts which ease several tasks around similar. They all assume, that directories with registered files are marked by the existence of the file .imv_guarded_directory and that all files therein are registered. similar_server to avoid any concurrency problems, start a dedicated server process which guards the database files. All other instances of similar which try to access that database will automatically connect to the server process, provided the above preparations have been made. The server process will load all database records into its virtual memory to increase the speed of searching processes. similar_append file [file ...] add one or more files to the database. similar_del file [file ...] delete one or more files from the database. The given files have still to exist and to match their registered sample. Also, the registered address of the sample has to point to a file with the same byte content as the given file. By these rules, the deletion is not hampered by alternative file addresses caused by symbolic links or changing mount points. (There is a similar command -del_file_adr which relies on the exact file address and does not need to have access to the image file itself.) similar_refresh file recompute the sample value for a file address which is already registered in the database. Here the exact file address needs to be given. (There is a similar command -lookup_adr which may be used to determine the registered address of an existing file, before its content is altered.) similar_rm file [file ..] remove one or more image files from disk and database. similar_mv move a single image file on the disk and make the necessary changes in the database. Wether the image file's sample is removed from or added to the database depends on the existence of the file .imv_guarded_directory within the source and target directories. similar_display search duplicates of a single image file within the database. If duplicates are found then they are diplayed by Image Magick display. The original file is appended as last one to display's file list. (Use space bar and backspace key to flip through the list, key "^Q" quits) similar_xv same as similar_display but using xv rather than Image Magick display. (Use space bar and backspace key to flip through the list, key "q" quits) Testing simv There are three tasks around similar which can become quite time consuming if one has a substantial collection with a high input of new pictures. These are : - cleaning up the collection according to the results of a -list_doubles run. - checking newly downloaded files for duplicates and fitting them into the collection. - moving registered files within the collection respectively deleting them from the collection. simv shows you what you are doing, single keystrokes are enough to initiate actions, and there is the possibility to attach GUI components al gusto (my taste is quite frugal when it comes to GUIs). The price for that convenient and quick user input is some effort for the configuration of simv's user interfaces. A sample configuration is prepared for the mini collection of images that come with the stic tarball. To test it, first edit the file $HOME/.simv_rc and set the address of the mini collection's database. (I replaced INSERT_SUITABLE_PATH_HERE by home/thomas/test in the above example) $ vi $HOME/.simv_rc # the database to use with simv (same address as in first similar examples) -mapper_cmd:similar -no_rc -mapname:/INSERT_SUITABLE_PATH_HERE/similar_map:: This may be necessary if you already have changed the database address in $HOME/.similar_rc . The lengthy command redefines the way, similar is called (especially the :: at the end is essential). Thus it avoids bothering the guarding server of your real image data base. Now for something very important: Avoid a serious problem with the command display. If display uses shared memory and simv does kill it to remove it from the screen, then display leaves two shared memory segments undestroyed. This feature may choke your system after a while (found with ImageMagick 5.1.0 and 5.3.0). So disable shared memory usage by $ echo display.sharedMemory: False | xrdb -merge - Shared memory on Linux may be monitored by command ipcs and released by ipcrm . You may want to add the resource definition to one of the various X resource configuration files mentioned in your X-user's startup file .xinitrc . An oldfashioned place would be $HOME/.Xdefaults of that user. $ vi $HOME/.Xdefaults ... display.sharedMemory: False Alternatively you may want to use a patched version of xv as described in file doc/xv_changes . In file doc/asxv.txt you find instructions to create a modification of xv which can act as a permanently connected display frontend. Whatever, let us go on with the exploration of simv : $ cd $(cat $HOME/.stic_main_dir)/images $ convert -geometry 640x480 birds/00000010.jpg \ import/eagle_X.jpg This produces a duplicate but non-identical image in the directory import. Now start the simv example : $ simv_start -end_on:list_end:clear */* -- 1 : 10 --------------------------------------------------------- -rw-r--r-- stacker 10374 A10316.230731 birds/00000010.jpg There should be an image window at the upper left corner of your screen now. Keep your cursor in the terminal window where you started simv. Not in the graphics window. Hit the space bar and you will get shown the next image. Hit Backspace or the "\" key and you will get the previous image. Usually this script ends when you try to hop behind the last item. The lenghty -end_on command disabled that feature for now. Hit space until import/eagle_X.jpg (which we created above) appears. -- 7 : 10 --------------------------------------------------------- -rw-r--r-- 1 thomas thomas 32575 Feb 24 22:33 import/eagle_X.jpg =?=?=?=?=?=?=?=?=?=?=?=?=?=?=?= 1.307371 | 10374 A10316.230731 birds/00000010.jpg There should also be another image window at the upper right corner of your screen now. It shows the registered image 00000010.jpg which is considered to be a duplicate of eagle_X.jpg . The information given below the =?=?= line consists of : Distance | Bytecount Date Fileaddress Date format is YYMMDD.hhmmss (YearMonthnumberDaynumber.HoursMinutesSeconds) YY : 99 = 1999 , A0 = 2000 , A1 = 2001 , A2 = 2002 , B0 = 2010 , C5 = 2025 ... In a real life import situation, you would now have to decide wether to keep eagle_X.jpg or 00000010.jpg or even both of them. Keys you may hit now: - Moves eagle_X.jpg to trashdir and makes the next image file current item ~ Moves 00000010.jpg to trashdir and deletes it from similar's database b Moves eagle_X.jpg to birds and registers it in similar's database. If you do this, you will get both files in the result list of the next cross check. ? Prints a sparse list of which key is bound to what move target or other command. "sUmo" means that "u" will move a file to directory sumo. Hit one of these keys and watch what happens. Hit the ":" key to undo your actions one by one. Hit ESC if you leaned on the Space bar and auto repat filled the queue. Quick. But also hard as your first ride with vi. Now, if you got the Tcl/Tk shell wish, you may add a GUI frontend. Stop the running simv by hitting @ and entering -end Start the script again with an additional option. $ simv_start -tcltk -end_on:list_end:clear */* Note, that -tcltk is interpreted by the script and is not a simv command. The script creates a named pipe, causes simv to use that pipe as input and to start a wish process, which creates another named pipe and causes simv to use that pipe as an output channel. There should be a Tk window with some buttons and a file list at the upper right corner of your screen. It is quite obstinate in popping up if it gets covered (and poorly coordinated with my window manager, i fear). Use the "auto popup" check button (near lower right) to toggle this feature. Use the button "dip" in the same row to lower the window for some seconds. The buttons in the upper part mainly trigger move commands to the several directories of the sample collection. "del" works like key "-" and "del_double" works like key "~". "convert" converts the current item into a JPEG of quality 80 . A backup copy of the original is made in the trash directory, so this conversion can be undone despite it isn't reversible by itself. "known" moves the current item to the doubles directory. This is a kind of second trashdir which may help you to distinguish trash from duplicates before you delete it. "replace" copies the data content of the current item into the first listed duplicate and then deletes the current item. This preserves the duplicate's name, its access permissions and applies to any existing hardlinks simultaneously. As long as the focus is on the input field labeled "current:", key strokes of printable keys will be forwarded to simv and work according to the key mapping described above. In the file list, Left-Double-Click a filename to make it the current and selected item. Left-Click selects an item. Right-Click adjusts the selection. Any move command is applied to all files of the selection. If there is no selection, then the command is applied to simv's current item. As soon as the Tcl script receives notifications about a change of the current item, it automatically selects its line in the list if there isn't already a selection of more than one line. This may eventually interfere with your asynchronous selection activities in the file list. Toggle this feature by use of the button "auto select". "xv beep" is useful if you employ a patched version of xv as image viewer program (see text doc/xv_changes). It creates or deletes the file $HOME/.xv.gefummelt.beep . "combine parts" starts a script that tries to reunite multipart files downloaded by snntpbatch. Button "auto convert" causes convert to be run before any move to a regular target directory. This does not apply to "del", "del_double" or "known". One may adjust the delays for "dip" and "auto popup". Beware of two frontends with "auto popup delay" set to 0 and fighting over visibility. An X server might get stuck over that highspeed popping. "undo" works like key ":" and revokes reversible commands. "delete" is like the "del" button. As for the last row of buttons, "stop" works like the ESC key. "help" prints the complete help text flatly to the terminal (one should pop up a text widget). "close" ends the Tcl script but not simv. "end simv" ends both. Now you may use both the Tk window and the start terminal as input device. If you use the terminal, then the Tk filelist is kept up to date anyway. Play a while, undo all actions and end the program. Use the "end simv" button or @ -end Now see a targeting method which is more easy to modify than the buttons. $ cd $(cat $HOME/.stic_main_dir) $ ls -d $(pwd)/images/[a-z]* > scripts/image_targets $ echo $(pwd)/images/trashdir > scripts/image_trash The reason for this is explained below at "Practical Usage of simv". Start a new run $ cd images $ simv_start -tcltk -end_on:list_end:clear */* There are no target buttons at the top of the window but three identical lists of directory addresses. Click on any of the addresses with any of the mouse buttons to issue a move command. Try it, undo and end the program @ -end simv does not finally remove files, it just copies them to other directories. Deleted files are copied to the trash directories trashdir and doubles. You have to remove the files in those directories by yourself if you want to get rid of them. At least remove eventual convert backups from the trashdir. $ rm trashdir/_imvjpgb_* Before you begin with your own target list, delete the lines from $HOME/.simv_rc which connected simv to the test database. Let it work with the default database as defined in $HOME/.similar_rc which you already used to register your collection. As long as you use Image Magick display as your image viewer program there is not much need to have any commands in $HOME/.simv_rc . $ echo '# write global start commands for simv here' > $HOME/.simv_rc Practical Usage of simv The most simple way to adapt simv to your own directory structure is to deposit move targets in prepared file locations and to copy the minimal keyset definition file to the expected location. DO NOT FORGET above echo '# write ...' > $HOME/.simv_rc . The expected file locations are all in the scripts directory : simv_keyset ...... the bindings for single keystrokes image_targets .... a list of directories which may be move targets image_trash ...... a single line with the address of the trash directory So activate the minimal keyset which contains no specific mover keys $ cd $(cat $HOME/.stic_main_dir)/scripts $ cp simv_keyset_minimal simv_keyset Decide where to have your trash directory. It should be cleaned out from time to time. $ echo INSERT_YOUR_TRASHDIR_HERE > image_trash Make a list of all desired move targets (absolute directory addresses) $ vi image_targets and test the result $ cd $(cat $HOME/.stic_main_dir)/images $ simv_start -tcltk */* You may provide up to four different target lists within the files image_targets_1 image_targets_2 image_targets_3 image_targets_4 of which _1 to _3 eventually override the lists defined by file image_targets See in script simv_start, variable target_list for a complete description. It is a good idea to split a long list in several parts or to provide rotated versions of the big list as _1 _2 _3. If you want specific mover keys and a visible button menu, see below "Adapting key bindings and button menu of simv" . Handling the result of similar -list_doubles After cross checking the whole collection by similar, one may want to review the reported files and decide what to do with them. Certain commands may be helpful in this scpecial situation. Since a listed file may reside in an .imv_guarded_directory , force the check for double files by -double_check:on . Since the list may be very long, better use -addfile_list: rather than giving the list as arguments by $(cat ) . Also be sure that similar's -match_par: are set to the same values as with the -list_doubles run. Since you may want to mark similar images as distinguished and also want to experience the effect of that distinction, set -use_exemptlist:on . So let a similar instance tell the server : $ similar -match_par:8:1:4:2:color_double_diff -use_exemptlist:on and start simv: $ simv_start -tcltk -double_check:on \ -addfile_list:$HOME/list_of_possible_duplicates Now simv should show you the first file to the left and its alleged doubles to the right. Eventually use Spacebar and Backspace inside the right image window to view all doubles if more than one is reported. Keys you may hit now: = Moves left image to directory doubles, deletes it from similar's database and makes the next image file current item. Button "known" + Moves first duplicate (at the right) to doubles, deletes it from similar's database and makes the next item current. When the deleted duplicate is supposed to be current item, it will not be found and you get the idle window of ImageMagick respectively xv. & Tells similar that *all* alleged duplicates are not identical to the left image. Be careful to see all alleged duplicates and wait for a chance to remove any real duplicate before you -distinguish the rest. NO UNDO POSSIBLE HERE. Eventually edit the file similar_map_exempt. One may also use key "-" (button "del") to delete the left image or key "~" (button "del double") to delete the first duplicate. When done, set the match parameters back to the default values. $ similar -match_par:4:1:4:2:color_diff Handling newly downloaded files With a small number of new files, one just starts simv and gives it all the files' addresses : $ simv_start -tcltk * With a larger bunch of new images it may be desirable to have them categorized before one uses simv to check them in. stic_importer is a script that will show you four categories of images in separate runs of simv -tcltk : unique ........ images with no duplicates found interesting ... images which are larger than the duplicates found equivalent .... images that are equally sized or slightely smaller problem ....... files which are supposed to be images but unreadable (one may try different graphics software like netscape) Three other categories GET REMOVED WITHOUT USER INTERACTION : inferior ...... duplicates significantly smaller than the registered one trash ......... very small images unreadable .... non-image stuff (text, MPEG, Word, PDF ...) The categorization is done by script similar_splitter which not only detects duplicates with the registered collection but also uses a temporary similar map to detect similarities among the newly imported files. See in similar_splitter "adjustable parameters" for the criteria used. The importer will not process files that end with .tee . Those may have been created by snntpbatch and may be needed to produce complete images. The usage of stic_importer is simple. Just give it some file addresses but be sure that these aren't your valuable private files : $ stic_importer * ++-=++--+-+-=------++++-+======+++++++++++++++++++ 50 +++++++++++++++++-++=++=++++++++++++++++++++++++-+ 100 ++++++++++++++++++++++-+++++++++++++++++++++??+-++ 150 ++++++++++++++++-++++++++++++++---++++++++=++++-++ 200 +++++++++++++++++++++++++++++=++----+++++++++++ 12 similar_split_equivalent 1 similar_split_inferior 2 similar_split_interesting 2 similar_split_problem 3 similar_split_trash 206 similar_split_unique 21 similar_split_unreadable 247 total start processing of result ? Enter a single "y" and press Return to start the first simv run. The other runs will eventually follow as soon as the previous one has ended. The simv run which presents the unique images will not check again for duplicates. This speeds up simv but be aware of other users who are registering files in the time between stic_importer's start and the actual run of simv. Other Types of Frontend Clients A PHP simv+similar frontend for webservers There is also a PHP3 script simv_frontend.php3 for use with a webserver (like Apache) and a webrowser (like Netscape). It is not intended for moving files but only for viewing and checking for duplicates. Just a programming demonstration and an example to build on. The webserver needs to be running already and it must support PHP3 or PHP4. In the following examples, i assume that its documents are located under /usr/local/httpd/htdocs . In order to activate the PHP3 script, copy it in reach of your webserver and also copy scripts/slate.gif and scripts/penguin.gif to the same directory as the PHP3 script. For the correct setting of permissions ask your system administrator resp. the webmaster. I.e in most cases: ask yourself and try out. Special precautions have to be taken if PHP is running in safe mode on your server. See scripts/simv_frontend.php3 and search for "safe_mode On". This is the procedure for "safe_mode Off" : $ su # mkdir /usr/local/httpd/htdocs/stic # chown INSERT_YOUR_SYSTEM_USER_ID_HERE /usr/local/httpd/htdocs/stic # chgrp INSERT_YOUR_SYSTEM_GROUP_ID_HERE /usr/local/httpd/htdocs/stic # exit $ chmod a+rx,o-w /usr/local/httpd/htdocs/stic $ mkdir /usr/local/httpd/htdocs/stic/tmp $ chmod a+rx,o-w /usr/local/httpd/htdocs/stic/tmp $ cd $(cat $HOME/.stic_main_dir) $ cp scripts/simv_frontend.php3 \ /usr/local/httpd/htdocs/stic/my_simv_frontend.php $ cp scripts/slate.gif scripts/penguin.gif /usr/local/httpd/htdocs/stic Copy scripts/simv_info_server to scripts/my_simv_info_server $ cp scripts/simv_info_server scripts/my_simv_info_server Edit the copied PHP3 script and set variable $stic_dir to your stic_dir directory. If port number 4000 is not suitable for the connection to the simv server, then set variable $simv_port to a new one . In that case you will have to change the variable simv_port in my_simv_info_server accordingly. $ echo "Your path to insert : " $(dirname $(cat $HOME/.stic_main_dir)) Your path to insert : /home/thomas/test/stic_dir $ vi /usr/local/httpd/htdocs/stic/my_simv_frontend.php ... # The installation directory of stic $stic_dir= "/INSERT_YOUR_PATH_HERE/stic_dir"; # The address of the simv server to connect with $simv_host= "localhost"; $simv_port= 4000; Only if you do not install the PHP3 script in /usr/local/httpd/htdocs/stic or do not use port 4000 you have to edit the shell script my_simv_info_server and set the variables workdir and simv_port accordingly: $ vi scripts/my_simv_info_server # The directory in which the PHP3 frontend script resides workdir=INSERT_PHP3_INSTALL_DIRECTORY_HERE # the portnumber where to provide service. This must be the same as in the # variable $simv_port in the PHP3 frontend script simv_port=INSERT_YOUR_PORTNUMBER_HERE Start the simv server in an own terminal window and give it some image files to display. For this example, force duplicate check even for files in guarded directories. $ cd SOME_DIRECTORY_WITH_IMAGES $ my_simv_info_server -double_check:on * Now enter the URL of the PHP3 script at your web browser. http://localhost/stic/my_simv_frontend.php A login request will appear. User ID = "me" , Password = "mypwd" . To navigate through simv_info_server's list of items, use the links "prev", "next" or any of the numbered links which form a small logarithmic scale underneath. The link "refresh" reloads the current item. The webserver is not accessing the images directly but gets copies which are requested by a sagent process started by the PHP3 script. This sagent process contacts the simv server which has permission to access the images and makes copies in the reach of the web server. A copy is guaranteed to exist for at least 60 seconds even if the simv server immediately hops to another item in its file list (in script my_simv_info_server: keeptime=60). There is a file input form "Upload file for comparison:" where you may send image files to the webserver which then get checked for duplicates in the collection guarded by similar. The file size limit is 1e6 bytes. The scripts my_simv_frontend.php and my_simv_info_server expect the names of the uploaded file copies to begin with "/tmp/php" and not containing "/" in the remainder. If your webserver puts the files somewhere else, the scripts will complain about that false address. Eventually adjust the variable userfilestart in both scripts. Despite simv_info_server is configured to permit only harmless commands and also refuses connections which seem not to come from localhost, one may be even more cautious and also demand encryption. Obviously the user id for the web server should not be the same as the user id for the trustworthy clients of similar. So make a keyfile for user id webserver : $ cd $(cat $HOME/.stic_main_dir) $ vi $HOME/tnl_keydir/webserver.tnl fnhndfont mkgfcno;dfn;odjaiug ewjijxzknsiuh9ldfm ji gdk;j xj joj bxjojojo 8q765v0p80-bi09y450u8ex907du7v09 998908nr8908un 8du08760 9sd;ig98987dld $ simv -no_rc -make_userkey:all:webserver -end userkey file created : /home/thomas/tnl_keydir/webserver.tnl Copy it to the stic_dir directory and make it readable for the Apache user (look who is running your httpd, mine is run by wwwrun) : $ cp $HOME/tnl_keydir/webserver.tnl . $ su # chown INSERT_WEBSERVER_USER_ID_HERE webserver.tnl # chmod ug+r,o-r,g+w,uo-w webserver.tnl # exit Now you have to edit my_simv_frontend.php and set the variable $simv_encryption to "on" . The other variables should then match the above preparations and the webbrowser should behave with that URL as before. $ vi /usr/local/httpd/htdocs/stic/my_simv_frontend.php # Encryption is used if this variable is not set to "off" $simv_encryption= "on"; Finally stop the simv server and restart it with an additional argument that demands encryption (and not only allows it). @ -end $ my_simv_info_server -security:tcp:tunnel:require -double_check:on * You may do the reverse test and set $simv_encryption= "off" in the PHP3 script. Afterwards the simv server should complain and the frontend should send pages to the browser which are quite empty. Other Programming Languages Since the protocols for input and output are fully documented in the help texts of similar, simv and sagent, a programmer should be able to build a frontend client in any desired language. See src/as/asfrontend.c , scripts/simv_frontend.tcltk and scripts/simv_frontend.php3 for implementation examples. Getting started with snntpbatch snntpbatch is intended to be controled by shell scripts. Nevertheless, one may perform all operations in shell dialog and there is also a way to use it interactively via a webserver. The basic idea is to have a directory for each group where the messages get stored and therein a directory bin, where attachments get stored. You will need some free disk space. A dozen busy groups may easily occupy 500 MB. A full GB of working storage is advised. You will also need some i-nodes (expect hundredthousands of files). But first have a look at the brief introduction to snntpbatch : see "Getting Started" in text doc/snntpbatch_helptext or by executing : $ snntpbatch -help | less One may view the messages with a web browser starting at a main index page. For an image collector, direct operation of simv and similar on the files in directory ./bin may be more interesting. One should avoid the possible multi-part files *.tee (as well as the usual virus.exe). Example : $ cd $HOME/snntpbatch_download/YOUR_NEWS_SERVER.port119 $ cd alt.binaries.pictures.fantasy-sci-fi/bin $ simv_start -tcltk *[!e] If you already have favorite newsgroups then you should think about how to recognize the wanted messages by means of regular expressions applied to the header lines subject: , from: , date: , by logical operators -and , -or , -not and by brackets. (see -filter in helptext) snntpbatch's filter may look into your collection directories and into own hash directories to avoid double downloading of the same binary files. This feature depends on sufficiently unique filenames and their unambigous announcement in the subject: . Good for use in serious collectors groups with not too many Mac file names (shaking my head silently). Generally, care is taken that no dangerous or confusing filenames emerge. Space characters are converted to underscores. Any unusual character is replaced by its hex code. The file names should be shell-safe then. The download directory also contains a global list of message ids. If not explicitely disabled by -overwrite on , a message is not downloaded if it is already known to that list. Like all downloaded data, those list entries may be cleaned off the disk after a certain time (usually one week). The routes are bumpy, especially if one uses remote commercial servers. The usual timeout is set to 4 minutes and the program tries to reconnect and resume its tasks after a connection breaks down. Nevertheless there are situations where it has to abort. When confronted with the special server behavior of delivering much less bytes than announced, it tries to circumvent the problem by aborting the connection and waiting a random time before reconnecting. After five such glitches, a message is discarded and the next one is processed. This happens quite rarely, though. Multi-part messages (i.e. split attachments, not the MIME multipart type) are handled in a rather coarse way, if ever. I would like to know wether there is an inofficial protocol of the automated Windows news clients how this is to be announced in the subject line. "(5/23)" may mean anything from 5th part of a 23 part message to "hoorray it's 23rd of may !". To enable multi-part handling, the group directory has to contain a file nntpclient_multi_bin_tee The first part of an attachment is stored as decoded binary file. If the group directory contains the marker file nntpclient_multi_bin_tee then possible first and further parts are downloaded undecoded as *.tee files with names derived from the subject line. It remains left to an external tool like scripts/combine_tee to find matching .tee files and decode them by uudecode. (Up to now, i never saw split MIME-encoded attachments. Maybe i didn't look sharply enough.) There is also the possibility to control snntpbatch interactively via a PHP-enabled web server and a browser. See snntpbatch -help , "Collaboration with a web server" . ---------------------------------------------------------------------------- Cooperation with MySQL similar normally uses several disk files to store data like image samples, file addresses, exempts. Alternatively all these data may be stored within MySQL database tables. There may also be an extension for the MySQL server which implements User Defined Functions to compare image samples and to perform conversions on image samples. To enable MySQL access, a special version of similar has to be built. It needs the headerfiles and the client library of MySQL. Try this (and have a look at "Portability Issues" if it fails) : $ cd $(cat $HOME/.stic_main_dir) $ ( cd src/stic_build ; make mysql ) The result should be a new, slightly larger file bin/similar which is executable: $ ls -l bin/similar -rwxr-xr-x 1 thomas thomas 909097 May 8 13:59 bin/similar You have to provide two MySQL tables with certain properties and to describe them in a configuration file. similar will then automatically use the MySQL server to access these tables for reading and writing. For details on database layout and configuration see : similar -help , "Appendix Similar M : Cooperation with MySQL". For performance considerations see below : "Why and how to use MySQL with similar". If you want to convert an existing file based map into a SQL based one, then prepare the SQL database and tables and write the configuration into the _conf file of a new temporary map. List all valid samples and exempts to files and record them with the temporary map. Finally rename the _conf file to the normal map name. $ tmp_map=tmp_${mapname}_tmp $ vi ${tmp_map}_conf ...see similar -help , "Appendix M" for an example configuration ... $ similar -mapname:$mapname -list_samples:. >samples.txt $ similar -mapname:$mapname -use_exemptlist:on \ -list_exempts:. >exempts.txt $ similar -mapname:$tmp_map -append_list:samples.txt $ similar -mapname:$tmp_map -record_exempt_list:exempts.txt $ mv ${tmp_map}_conf ${mapname}_conf The files samples.txt and exempts.txt should be compressed and kept as backups for a while. ${mapname}_smp , ${mapname}_nam, ${mapname}_exempt may be deleted after a few days of testing the new setup. Enhancing the MySQL server via UDF To enable the User Defined Functions, compile the shared object and copy it to a suitable directory (the MySQL Reference Manual advises /usr/lib which works fine with my system). $ cd $(cat $HOME/.stic_main_dir) $ ( cd src/stic_build ; make mysql_udf ) $ pwd /home/thomas/stic_dir $ su Password: # ls /usr/lib/similar_udf.so ls: /usr/lib/similar_udf.so: No such file or directory # cp /home/thomas/stic_dir/bin/similar_udf.so /usr/lib Caution : Before you load the shared object into a production database server where availability matters, please test it thoroughly with a server that you can kill and restart without much pain. With a mysql client and the necessary privileges at the server, register the functions and load the shared executable: CREATE FUNCTION similar_img RETURNS INTEGER SONAME "similar_udf.so"; CREATE FUNCTION similar_dist RETURNS REAL SONAME "similar_udf.so"; CREATE FUNCTION similar_cpar RETURNS REAL SONAME "similar_udf.so"; CREATE FUNCTION similar_fmt RETURNS STRING SONAME "similar_udf.so"; This registration is maintained permanently until some day you revoke it: DROP FUNCTION similar_img; DROP FUNCTION similar_dist; DROP FUNCTION similar_cpar; DROP FUNCTION similar_fmt; (or until similar_udf.so cannot be found at server start). The function SIMILAR_IMG compares two image samples, which may be either in ascii serialized format (see similar -sample) or in the binary formats used by similar to store samples into a MySQL database (in the column depicted by variable sample_dataname in the _conf file). After the two samples there may be given more arguments like with similar -match_par . The difference is that in SQL the numbers are separated by comma and the text needs to be in quotes. Also SIMILAR_IMG accepts number codes to choose the comparison method. ( 0='color_diff' , 1='color_double_diff' , 2='color_scaled_diff') SIMILAR_IMG(sample1,sample2,color_tolerance,bw_tolerance, color_ignore,bw_ignore,method,max_distance_factor,max_result) Parameter max_result is accepted but ignored, nevertheless. All optional arguments have to be constants (i.e. no column names allowed). The return value is 1 if the two samples match and 0 if they do not. Note that the first sample encountered defines the actual settings of _tolerance and _ignore. Therefore, when comparing two non-constant samples (like t1.smp17,t2.smp17 ) then the parameters for color_ and bw_ should be identical to avoid random settings. Examples (the constant '-smp16...' is longer than 240 characters): SIMILAR_IMG('-smp16-00000007-...-26F834F443FD+',smp17) SIMILAR_IMG(t1.smp17,t2.smp17,8,8,4,4,'color_double_diff') The function SIMILAR_DIST returns the distance between two samples. The smaller this number, the more similar the samples. See similar's command -distance for more details. SIMILAR_DIST takes the same input as SIMILAR_IMG . SIMILAR_DIST(sample1,sample2,color_tolerance,bw_tolerance, color_ignore,bw_ignore,method,max_distance_factor,max_result) max_distance_factor and max_result are ignored, nevertheless. The function SIMILAR_CPAR returns information about the actual match parameters which will be applied if a particular sample would be the first to be examined by SIMILAR_IMG . SIMILAR_CPAR(sample, mode, color_tolerance,bw_tolerance, color_ignore,bw_ignore,method,max_distance_factor,max_result) The mode text chooses one of three values : is_bw ..... a number between 1.0 and 0.0 which says wether a sample is monotone (1.0), colorful (0.0) or somewhere between. (see also similar's command -is_bw) tolerance .. the actual tolerance that is chosen somewhere between bw_tolerance and color_tolerance. ignore ..... the number of pixels to ignore (either bw_ignore or color_ignore) The match parameters are like with SIMILAR_IMG and SIMILAR_DIST. Examples (the constant '-smp16...' is longer than 240 characters): SIMILAR_CPAR('-smp16-00000007-...-26F834F443FD+','tolerance') SIMILAR_CPAR(smp17,'is_bw',8,2,4,1,'color_double_diff') The function SIMILAR_FMT converts a sample into one of several formats or extracts information from samples. The input sample may have one of the formats acceptable with SIMILAR_IMG. The output format is determined by the second argument. SIMILAR_FMT(input_sample,format_name); format_name may be one of : 'ascii' or 'smp16' ... ascii serialized format (see similar -sample) with empty fileaddress. 'native' ............. binary format optimized for maximum speed on the server system. This format is slower if accessed by other systems. It depends on the inner representation of 16bit arrays in C programs (beware of exotic ports). 'swapped' ............. 'native' with byte pairs swapped. The format that a system of opposite bytesex would use as 'native'. 'binary' or 'smp17' ... binary format with full content of 'smp16'. It is quite independent of the inner representation of color values. Regrettably it is slow to read. 'name' ................ the fileaddress if input is a serialized sample or an empty text else. 'red' ............... average red color value of the sample. 'green' .............. average green color value of the sample. 'blue' ............... average blue color value of the sample. This function may be used to import and export serialized samples without the help of a similar executable online. Originally these samples have to be computed by similar, nevertheless. Examples: Register a serialized sample in the default table layout expected by similar : SET @sampletext='-smp16-00000007-...-26F834F443FD+/home/pic/pic1.jpg'; INSERT INTO similar_samples (id,name,smp17,smp17_red,smp17_green,smp17_blue) VALUES( 0, SIMILAR_FMT(@sampletext,'name'), SIMILAR_FMT(@sampletext,'native'), SIMILAR_FMT(@sampletext,'red'), SIMILAR_FMT(@sampletext,'green'), SIMILAR_FMT(@sampletext,'blue'), ); The trick with SET @sampletext is described in the MySQL Reference for 3.23.33 and does not work with an old 3.22.32 server. Retrieve the serialized format (like with similar -lookup_sample): SELECT CONCAT(SIMILAR_FMT(smp17,'ascii'),name) FROM similar_samples WHERE name = '/home/pic/pic1.jpg'; Why and how to use MySQL with similar The usage of MySQL for storage purposes allows to enhance the image samples by arbitrary database information. The usage of the custom SQL functions like SIMILAR_IMG and SIMILAR_FMT provides means to manage that database without much interference of similar. On the other hand, a query comparable to -list_doubles on a normal desktop system is much slower than with a caching similar server. This may be different on a fat, well tuned server system. Also, the concept of exempts (-distinguish) is not known to the SIMILAR_IMG function. It has to be implemented by means of the SQL WHERE clause, if ever. One should not run a similar server with -caching:on while other MySQL clients possibly alter sample records. The similar server will not learn about those changes. Usage models for similar with MySQL : - A similar server runs with -caching:on and write operations on the similar tables are denied to all other MySQL clients. This can be ensured by creating a dedicated MySQL user id for similar and setting access restrictions to similar's tables. Same search performance as with a conventional similar server. Nearly all benefits of SQL available. - similar clients operate directly on a MySQL server. With larger databases this only shows acceptable performance if function SIMILAR_IMG is available on the MySQL server and if it is enabled by has_similar_udf true in the {dbaddress}_conf file. Lower search performance than with a conventional similar server. Full access to sample data for other SQL clients. - similar solely provides the serialized image samples which then get inserted into a MySQL database in some SQL way. All usual jobs of similar are done by the UDF enhancements like SIMILAR_IMG and SIMILAR_FMT. No online connection is needed between similar and the MySQL server. No image management with simv is possible. ---------------------------------------------------------------------------- Upgrading from earlier versions Install and compile stic-0.7 as described above in "Compilation and Installation". If you used MySQL (available since 0.4) also compile and install as described above in "Cooperation with MySQL" and "Enhancing the MySQL server via UDF". The external data formats did not change up to now. There is no need to re-record similar's database or to re-subscribe to a newsgroup. The startup files .*rc of the stic programs may be left unchanged. With 0.7 a new authentication protocol has been introduced and is used by default. Migration is intended to be smooth. Compatibility issues mainly affect newly created keyfiles. (See below) If your previous version is already accessed via a symbolic link stic_dir (as it is advised since 0.3) you only had to adjust that link during "Compilation and Installation". If not, you might want to establish a symbolic link from the old stic directory to the new address. This would eliminate the need to adjust the variable PATH in all running shells. (A reboot would be overdone) The following overview is valid for the installion described in previous versions of README. If you made your own changes, you will possibly have to adjust them too. --- Files to copy from an existing stic directory : scripts/image_targets your target list for simv scripts/image_trash your trash directory address for simv scripts/simv_keyset your keyset for simv_start scripts/my_simv_info_server your simv server for the PHP frontend (if it was installed at all) Only if you did "Adapting key bindings and button menu of simv" scripts/my_simv_start your custom start script for simv scripts/my_simv_keyset your custom key bindings for simv scripts/my_simv_frontend.tcltk your custom GUI frontend for simv --- Installation parts to be re-done as described in README text : --- Only necessary if you upgrade from stic-0.2 or older: "A PHP simv+similar frontend for webservers" Also have a look at the new appendix of snntpbatch -help "Collaboration with a web server" --- Authentication protocol upgrade from stic-0.6 or older : The new protocol provides increased security and should be used in its full extend where possible. Nevertheless, smoother migration is supported. For full 256 bit user keys, all user keyfiles ( *.tnl ) will have to be re-created on the server host and all clients. See paragraph "Practical Usage of similar" up to this command : $ similar -no_rc -make_userkey:all:me New keyfiles got 32 bytes rather than 16 bytes of content. Alternatively one may decide to stay with the old 128 bit user keys. Then it is not necessary to change any existing keyfile. But one has to take care to create any new keyfiles in this old format by appending "-128" to the type component of -make_userkey. Like : $ similar -no_rc -make_userkey:all-128:me Keyfiles of differing sizes are not considered to describe the same key in the new protocol 0.2. All servers should be upgraded to the new version. Old clients will still be served by the new servers unless they get explicitely instructed not to do so by : -security:all:serverprotocol:0.2 before the command that is starting its service. (Nevertheless, one should upgrade all clients if possible.) With the old protocol 16 byte keyfiles and 32 byte keyfiles describe the same key if they have been generated from the same 64 original bytes. So old clients can stay with their old key files or they can use newly generated ones without noticing any change. If a new client needs to connect to an old server then the client has to choose the old protocol (0.0) by setting : -security:all:clientprotocol:0.0 before the command that connects to the server. ---------------------------------------------------------------------------- Adapting key bindings and button menu of simv To adapt simv fully to your own directory structure, you will have to copy and modify some files. simv_start coordinates the components of a simv run simv_keyset_example defines the key bindings and target directories of the above example simv_frontend.tcltk the GUI component of the above example Copy the three files : $ cd $(cat $HOME/.stic_main_dir)/scripts $ cp simv_start my_simv_start $ cp simv_keyset_example my_simv_keyset $ cp simv_frontend.tcltk my_simv_frontend.tcltk Edit my_simv_start and set the variables keyset_file and frontend_file to the names of your copies : $ vi my_simv_start # the name of the file within the stic-scripts directory keyset_file=my_simv_keyset ... # the name of the Tck/Tk script within stic-scripts to serve as frontend frontend_file=my_simv_frontend.tcltk If you prepare for separated target lists, you may possibly also want to change the variables target_list and trash_target_file. See their remark texts. Edit my_simv_keyset . Better read simv -help | less before, especially the description of the command keyset. The file will be read by -keyset:readfrom: $ vi my_simv_keyset If your collection directories reside below a common main directory, set that address behind the statement main: e.g. /home/thomas/stic-0.7/images main:INSERT_COMMON_MAIN_DIRECTORY_HERE also you should create the directories "trashdir" and "doubles" in that main directory. So you can use the definitions of trashdir: and the keys below "Some auxiliary keys" without any changes. $ mkdir INSERT_COMMON_MAIN_DIRECTORY_HERE/trashdir $ mkdir INSERT_COMMON_MAIN_DIRECTORY_HERE/doubles To translate your directories from the dummy names of the example, better make a translation table since you will probably want to change the Tcl/Tk script accordingly. Choose the translation with the layout of the Tk window in mind. Translate from "flowers" the target which shall be on the upper left Tk button, translate from "crowds" what shall be bound to the upper right button. And so on. The letters after "map:" should be choosen as abbreviations of the target names. Not an easy task. I restrict myself to alphabetic letters and number keys (i.e [a-zA-Z0-9]) for the collection's move targets. Choose the letters in the order of importance (i.e traffic frequency) of the directories. Usually after a dozen you will have to make some strange choices ... that's life. Delete those example move targets which you did not translate into targets of your own. Add new "map:" lines if needed. Finally write the helplines describing your key bindings. You may add or delete "helpmore:" lines but there must be one "helpstart:" at the beginning. Edit my_simv_frontend.tcltk . simv_frontend.tcltk is intended as programming example of a frontend interface as well as a GUI component for practical use. It surely is helpful if you know some Tcl/Tk but not absolutely necessary. $ vi my_simv_frontend.tcltk Search for "set what_to_show {auto}" and replace that line by set what_to_show {targetbox payload} This makes the button menu visible even if there are target lists. The targetbox with these lists is shown eventually. The buttons and their containers are defined in proc init_payload . Their names are chosen not to interfere with other names within the script. There are four main button containers which each consist of two row containers. plants , animals , humans , themes They should be easy to identify in the visible layout of the example GUI. The buttons, their label texts and the target directories have the same names. For example : .flowers is the button's name "flowers" is the label text /flowers is the target directory's name below the main directory of the keyset definition If possible, keep this identity of names. If you need more freedom: The button's name has to start with a letter and may consist of letters, numbers and underscores. The label text is quite arbitrary. The target directory should be a path of short and shell friendly names. Take your prepared translation table from the keyset translation and use your editor's text change facilities. For unused buttons, leave the names as they are, make text " " and remove -command "..." . See button .empty for an example. If you want to add buttons or change the layout, you need to know some Tcl/Tk. Make of these files a backup copy outside the stic directory tree. Test wether it looks and works like intended : $ ( cd ../images ; my_simv_start -tcltk -end_on:list_end:clear */* ) ---------------------------------------------------------------------------- File Locations Most of the scripts depend on the binary executables or helper scripts. This part describes the configurations that will allow them to find each other. It is a good idea not to remove any file from scripts or bin. Only make copies or install links. If the system supports symbolic links, it is sufficient to append the subdirectory stic_dir/scripts to the environment variable PATH. The binaries are represented in scripts by symbolic links to the sibling directory bin. Modification of PATH is done best in the shell's startup file (e.g. $HOME/.bashrc). If there is no support for symbolic links, then one would additionaly have to append the subdirectory stic_dir/bin to the variable PATH. If you do not want to change your PATH then you may put copies or links of the desired commands into one of directories already listed in PATH. The scripts take quite an effort to find the script stic_std_variables which then tries to locate the bin and scripts directories of stic . One may easily make that search unambigous by writing the complete path of stic_dir into the file $HOME/.stic_main_dir . This file may be overridden by the environment variable STIC_MAIN_DIR. If $HOME/.stic_main_dir is missing and $STIC_MAIN_DIR is empty, then the scripts try to find out the filename that has been used to start them. If the parent of their directory contains a file named this_is_the_stic_main_directory then they assume to be at their original position within stic_dir . If no other clue is found, the scripts try to find their helpers in the same directory as they were started in. ---------------------------------------------------------------------------- Portability Issues Generally, the source should compile with any 32-bit UNIX C-Compiler that does not refuse to process oldfashioned K&R code. Set the compiler commands for your system at the start of src/stic_build/makefile . Also decide what bytesex to use with blowfish. Trouble spots may be similar.c with its signal handlers Cleanup_handle_xyz(), sblowfish.c which contains code that i found in the internet, and imgdbs_mysql.[ch] which contains the MySQL client code. The handler functions are void and don't touch any argument. That might cause compilers to complain but should be compatible with any type of function that is expected by your system's personal flavor of signal() . Nevertheless, there might be signals mentioned which do not exist on other operating systems. Vice versa, there might be the need to catch signals not mentioned yet in similar.c . sblowfish.c contains some gestures which might cause problems on older compilers. I'll try to change it to primitive K&R soon. Also i still have to validate that implementation of blowfish with an artless implementation of my own. I checked it with B.Schneiers description of December 1993 and so far it seems to be ok. A remote possibility of a well disguised fake still remains (security considerations make me temporarily paranoid). Also it depends on the byte sex of your system. See in sblowfish.h the macros ORDER_ABCD , ORDER_DCBA , ORDER_BADC . I will try to remove this dependency from the code (let blowfish work on a byte array rather than two 32-bit words ?) but verification will not be that easy without an ABCD workstation. The MySQL client code expects the headers and which include several other headers. Usually they are under /usr/include/mysql. The client code has to be linked with libmysqlclient . There obviously have been changes of the API in the past. It might be necessary to upgrade the client library and headers. Nevertheless even quite an old server should be sufficient. The sourcecode in imgdbs_mysql.c is divided into a part very specific to MySQL and a general SQL part. Nevertheless the semantics of the general part may still depend on MySQL specifics. So the task to attach a different SQL database system might not be trivial. SQL regrettably provides no standard to produce a unique id. Therefore an abstract three-step procedure is performed which offers opportunity to implement id generation quite freely. - Obtain a template value by Imgdbs_mysql_unique_seed(). It is 0 with MySQL but may be the final id value with other data base systems. - Insert that template value into the table. With MySQL the AUTO_INCREMENT attribute converts 0 to a unique number. - Obtain the final id number by Imgdbs_mysql_get_unique_id(). If the final id was already generated, one may return member unique_seed of struct ImgdbS. ---------------------------------------------------------------------------- Additional programs included There are some more programs which are not directly related to image collecting. They are not created during the normal make run. Perform these commands while being in directory src/stic_build : make ../../bin/sproxy will produce a primitive HTTP proxy which i use to watch the communications of a web browser and web servers. Run : ../../bin/sproxy -help | less make ../../bin/smtps will produce a primitive e-mail sender which uses SMTP . Run : ../../bin/smtps -help make ../../bin/notify_ip_adr will produce a program that watches the output of ifconfig for changes of the IP address and eventually notifies a list of recipients via e-mail. Run: ../../bin/notify_ip_adr -help ---------------------------------------------------------------------------- Where to get supporting software libjpeg ......... http://www.ijg.org/ ImageMagick ..... http://www.simplesystems.org/ImageMagick/ xv .............. http://www.trilon.com/xv/ Tcl/Tk .......... http://www.tcl.tk http://sourceforge.net/projects/tcl PHP ............. http://www.php.net/ MySQL ........... http://www.mysql.com A backup tool for CD recorders: scdbackup ....... http://scdbackup.webframe.org ---------------------------------------------------------------------------- Legal Stuff This software and related documents are copyright 2001 - 2003, Thomas Schmitt stic-source@gmx.net and provided to you without any warranty under an open source BSD license. (see file COPYING)