Features and Options of stic-0.7/bin/similar This software maintains a database of image file names and micro thumbnails. Main purpose is to find images which look very similar to a given one. This data base is also called "map" here. The program does not alter the image files but possibly creates temporary copies of them in /tmp directory. In most cases the temporary files are unlinked immediately after creation so it is quite unlikely that files like /tmp/imgrgb*.jpg remain. To handle other file formats than JPEG, the program "convert" has to be accessible. At start the program tries to open the file $HOME/.similar_rc and execute the text lines as options. This may be prevented by option -no_rc . If given, it must be the first argument. The database is depicted by a dbaddress which in fact is a common start text for the names of several files. Usual are : ${dbaddress}_nam the filenames of the registered images ${dbaddress}_smp16 the fingerprints (samples) of the images ${dbaddress}_exempt list of explicitely distinguished images (if any) ${dbaddress}_break used to abort long running operations ${dbaddress}_pubno address of the active guarding server (if any) ${dbaddress}_conf configuration file (for MySQL cooperation) ${dbaddress}_sql_pwd separate password for MySQL access. The default for dbaddress is "similar_db". A single instance of this program can act as guarding server of a map. It then serves for the others via a TCP/IP based client-server protocol. If guarded, a map may not be accessed directly by any other instance of this program. Access to the database may be gained by -auto_client:on . This avoids problems of concurrency while altering a database and allows effective use of memory caching. A guarding server publishes its TCP/IP address in file ${dbaddress}_pubno . If this file is found by any other instance of this program which tries to access dbaddress directly then it refuses to open the files. If auto client mode is set "on" then the guarding server is contacted automatically whenever a guarded map is selected by command -mapname . Several non-map commands are performed locally by the client program, though. Some of the long lasting operations can be stopped by the existence of the file ${HOME}/.imgcmp_break or ${dbaddress}_break : -append_list , -list_doubles The program tries to remove this file and begins to execute the next option. This software uses a communications module called agent. Its features and commands also are available in a standalone program called sagent which may serve as frontend client or communications node in a net of servers. Commands usually are not toggled in but triggered by shortcut keys or sent by external frontend programs. To get a command line prompt, press the '@' key. If commands come faster than they can be processed, they get buffered and executed as soon as possible. The ESC key discards all pending commands and all those keystrokes and commands which follow the ESC immediately. Please note that a lot of special keys produce sequences with ESC and therefore cannot be used for key binding up to now. Commands are bound to single keys by option -keyset which takes a single keyset statement as parameter. See below command -keyset and the sample keyset definition. External frontend programs can connect via TCP/IP or named pipes. They may send commands and apply for notification of any output event. See below -pipe_service , -tcp_service, -add_view , -show_copy . This software itself is able to act as such a frontend program via TCP/IP. The connection may be encrypted to be secure against malicious clients or third party interference. See below -security and -tcp_client . Unencrypted, the input protocol simply consists of command lines separated by CR+LF, LF or NUL bytes. Several classes of output messages may be subscribed by -add_view . Command reference "fileaddress" is path and name of an accessible file. "fileaddress_or_sample" is either a fileaddress or a serialized sample as it is generated by command -sample or -lookup_sample. "sourceaddress" is a fileaddress or "-" for standard input. -append:fileaddress_or_sample Append the image to the database. -append_nondash:on|off If set to "on" the beginning of an -append command may be ommitted and only the fileaddress needs to be given. This works only with fileaddresses which do not begin with a dash "-". Since agent's @ command input at the start terminal possibly automatically prepends dashes, one should only rely on this mode for arguments and commands sent from frontend clients. -append_list:sourceaddress Read a list of fileaddress_or_sample from source and append them to the database. Each line of input is used as one fileaddress_or_sample including all special characters besides the line's end marks. -auto_client:on|match_par|off Sets the automatic client mode which controls the behavior when encountering a guarded map (see below, command -guarded_map). If this mode is "on" the program tries to use the server of the guarded map. If this mode is "off" then access to the guarded map is disabled. Mode is "match_par" is like "on" but also forwards the local setting of -match_par to the server. If the program is already connected to a server, this connection is closed when -auto_client gets executed. A new connection gets established if appropriate. One should consider to set -auto_client:on in file $HOME/.similar_rc after all local settings (like -security:all:user) are done. -caching:on|off With -caching:on the programm uses a memory copy to speed up search operations in long program runs like -list_doubles or in guarding mode. Since reading the cache lasts some time, short runs become slower ! To get an uncompromised cache for a multi-client server, use -guarded_map: before -caching: . -cd:directoryaddress Set the working directory for use with fileaddresses. -check_ratio:mode:tolerance:maxnumber If mode is set to "on" during searches, an expensive 2nd check for geometrical format is attempted with each match. If both files of a match are readable without conversion, their X/Y ratios get determined and compared. "off" is the default and disables this test. tolerance may be in the range of 0.0 to 1.0. Ratios match if : (1-tolerance) <= (X1/Y1)/(X2/Y2) <= 1/(1-tolerance) (with 1/0 := 1.0e99) maxnumber limits the number of X/Y reads per search (default 100) If one of the files is not a readable JPEG, this check matches. This feature is experimental. X/Y ratios are not part of map samples. -compare:{delimiter}:fileaddress_or_sample{delimiter}fileaddress_or_sample Directly compare the two given images without using the image database. {delimiter} may be any text that does not contain ":" and does not occur within the first fileaddress_or_sample. If both images match, a line containing "1" is printed to stdout. If they do not match, then no text at all is printed. The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. -del_back:fileaddress Delete fileaddress from the database if it is the newest entry. The freed space is immediately reusable. -del_dir:fileaddress Delete all files within the directory given by fileaddress from the database. Do not delete subdirectories. -del_file:fileaddress_or_sample Delete the entry of a file with the same byte content as the one with the given address from the database. Doesn't act if more than one matching entry is found. -del_file_adr:fileaddress Delete all entries of the given fileaddress from the database. -del_files_expr:expression Delete all entries where the fileaddress matches the given regular expression. Empty expressions are not allowed. Be very careful with this command anyway. -del_tree:fileaddress Delete all files within the directory given by fileaddress and within all its subdirectories from the database. -distance:{delimiter}:fileaddress_or_sample{delimiter}fileaddress_or_sample Calculates the measure of non-similarity of the given files. The value depends on the actual setting of -match_par's ignore-values and method. The lower the distance the more similar are the file samples. This is not a stable measure yet. Do not compare distances calculated by different program versions. (see -theory for distance definition) -distinguish:fileaddress_or_sample|serialized_exempt Write an exempt record to the list of explicitely distinguished files. If a serialized sample (-sample) or a fileaddress is given then a -search is performed to obtain the list of lookalikes. If a serialized exempt (-record_exempt) is given, then it is split into fileaddress and lookalikes without any search. In -list_doubles these lookalikes will not show up again as possible duplicates of the fileaddress. Vice versa, fileaddress will not show up as duplicate of any of these lookalike addresses. Usually this list will only be checked with -list_doubles . If -use_exemptlist is set to "on" it will affect all search operations. Beware of the possible race condition that a real duplicate might be appended to the database in the time between checking the old false duplicates and sending the -distinguish command. Therefore the permission class of -distinguish is "dbadmin" rather than "modify". Better disable it in heavy multiuser situations and restrict yourself to -record_exempt. -garbage_collection Copy all valid entries to a new database, rename the old one to a backup name and rename the new one to the original name. -guarded_map:connection_type Write own server address into file ${dbaddress}_pubno , provided a service of the given connection_type is active. This file marks the database as a guarded map. Guarded maps may not be accessed directly by instances of this program. The guarding state ends when another map is selected by -mapname or when the service ends properly (i.e. no total program crash occurs). Currently, only connection_type "tcp" is recognized. This type writes hostname and portnumber of the current -tcp_service:2 to the file. -imagesize:mode:fileaddress Print the geometrical size of an image file. Valid modes are: x the width of the image measured in pixels y the height of the image measured in pixels z the depth i.e. number of color components per pixel xy width height xyz width height depth These informations might be useful to estimate the quality of an image. The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. -imagoid:fileaddress Make a guess wether the given fileaddress is supposed to be a picture file. This is mostly depending on the file name extension and a positive reply does not imply the file to be readable at all. If the fileaddress looks like one of an image file, then the reply on stdout is 1. If not, 0 is printed. The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. -is_bw:fileaddress_or_sample Determine wether a search for the given image would use the paramaters bw_tolerance,bw_ignore rather than color_tolerance,color_ignore (see -match_par). If so, print "1" on stdout. If not, determine wether the color_tolerance will be reduced and print the actual weight factor W between bw_tolerance and color_tolerance : actual_tolerance = W * bw_tolerance + (1-W) * color_tolerance. W near 1 indicates a monotone sample, W=0 indicates a colorful one. The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. -list_all_files Prints all stored fileaddresses to standard output. -list_doubles Prints those stored files to standard output which find other stored files by -search. -list_exempts:expression Prints the serialized exempts where the file addresses match the given regular expression. Exempts are records which distinguish similar but non-identical files. (see -distinguish) Currently this command will not print anything unless -use_exemptlist is set to "on". -list_samples:expression Prints the serialized samples of the stored fileaddresses which match the given regular expression. (e.g ^/cd12/ matches any address that begins with /cd12/ ). -lookup:fileaddress Show the stored sample values of the fileaddress. The fileaddress has to match the recorded address exactly. Eventually use -lookup_adr to find out this address. -lookup_adr:fileaddress Print the address of a registered file which has the same byte content as the file with the given fileaddress. This option retrieves the "official" name of a file. -lookup_adr_all:fileaddress Like -lookup_adr but printing all known addresses if there is more than one. -lookup_adr_count:fileaddress Print the number of addresses which can be found with -lookup_adr_all -lookup_id:fileaddress Print the database id registered for the fileaddress. If the map is stored in a MySQL database, this is the content of the column defined by {sample_idname} in the {dbaddress}_conf file. If the map is only stored in the standard files, this id may be invalid and therefore 0. The fileadress has to be exactly the registered one. -lookup_sample:fileaddress Print the sample registered for the fileaddress in serialized format (see -sample). The fileadress has to be exactly the registered one. Use -lookup_adr to obtain this registered address from an arbitrary one. -mapname:dbaddress Set name of database. Actual fileaddresses are generated from this name by appending suffixes like "_nam", "_smp16", or "_conf". -match_par:color_tolerance:bw_tolerance:color_ignore:bw_ignore[:method [:max_distance_factor[:max_result]]] Set the parameters for comparation of samples. "color_tolerance" sets the maximum color difference for two pixels to be considered matching. The color values of a pixel are restricted to a range of 0.0 to 255.0. "color_ignore" sets the number of tolerable non-matching pixels for matching samples. There are 16 pixels in a sample. "bw_tolerance" and "bw_ignore" do the same for images with higher redundancy such as black-and-white images. These values should usually be about half of the color values. The actual tolerance may be a mix of color and bw settings. (See -is_bw) In this case the actual ignore number is set to "color_ignore". The optional parameter "method" chooses one of the following comparison algorithms: color_diff subtracts the colors of corresponding pixels and tests wether the difference is tolerable. color_double_diff computes the color average of both images and shifts colors to make both averages equal. Then it performs color_diff comparison. color_scaled_diff computes the color average of both images and scales color relative to 64.0 to make both averages equal before color_diff comparison. Both average correcting algorithms are suitable to find heavily color corrected images. They are a lot slower though. The optional parameter "max_distance_factor" restricts matches to those which are within a distance of : tolerance*max_distance_factor Useful values are between 0.000001 and 1.74 (= sqrt(3)). A value of 0 disables distance checking. Use color_tolerance resp. bw_tolerance to enforce 0 distance. Optional "max_result" restricts the number of reported matches. A value of 0 releases this restriction but -sort_buffer:truncate might impose an additional restriction on the number of results. -no_rc (this is not really a command) This special option works only if given as first argument at program start. In all other cases it will not be recognized. If given, the program does not read and execute $HOME/.similar_rc which usually happens at the very start. -record_exempt:serialized_exempt Write an exempt record to the list of explicitely distinguished files. A serialized exempt has this format: -exempt-{limit}-{fileaddress}{limit}{lookalike}...{limit}{lookalike} where {limit} is a text which does not contain '-' and does not occur within fileaddress or any of the {lookalike}s. Each {lookalike} is the address of a file which shall be distinguished from {fileaddress}. At least one {lookalike} has to be given. In -list_doubles these lookalikes will not show up again as possible duplicates of {fileaddress}. Vice versa, {fileaddress} will not show up as duplicate of any of these {lookalike} addresses. Usually this list will only be checked with -list_doubles . If -use_exemptlist is set to "on" it will affect all search operations. -record_exempt_list:sourceaddress Read a list of lines from source and perform -record_exempt with each of them. -refresh:fileaddress_or_sample Recompute and update the sample values of a fileaddress already registered in the database. Recomputation is omitted if a serialized sample text and not a fileaddress is given. -rename:{delimiter}:old_fileaddress{delimiter}new_fileaddress Change the fileaddress in all records which refer to old_fileaddress into new_fileaddress. old_fileaddress therefore must be registered and eventually has to be obtained by -lookup_adr from an unregistered one. {delimiter} may be any text that does not contain ":" and does not occur in the old_fileaddress. Usually one or more "+" will be ok. Despite its restrictions and the effort to find a suitable delimiter, -rename is to prefer over -del_file -append because it preserves the database id of the record. This is valuable especially with SQL. -sample:fileaddress Generate a text that describes the database entry which the given fileaddress would produce if recorded with -append. This serialized sample text may be used as a substitute for a fileaddress with all commands that expect input of type fileaddress_or_sample . The sample serialization format is : -smp16-{flags=7}-{id}-{crc=0}-{pixel1}-...-{pixel16}+{fileaddress} flags, id, crc are 8 digit hex numbers, pixels are 12 digit hex numbers with 4 digits each for red, green, blue in fixed point representation: Two digits integer, two digits fraction. The first '+' character in the text indicates the start of fileaddress. The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. -search:fileaddress_or_sample Prints all addresses of registered image files similar to the given one. -search_distance:fileaddress_or_sample Like -search. Each address is preceded by a floatingpoint number which indicates the -distance (see above) to the given fileaddress_or_sample. -search_dsample:fileaddress_or_sample Like -search but output consists of distances and serialized samples. -search_id:fileaddress_or_sample Prints all database ids of registered image files similar to the given one. See -lookup_id for a description of database ids. -search_others_only Given before option -search : do not print the address of the given file. (default) -search_sample:fileaddress_or_sample Like -search but output consists of serialized samples (see -sample) -search_self_too Given before option -search : print the address of the given file too if registered in the database and image still similar. -sortbuffer:mode:buffersize Enables or disables the sort buffer for search results. Mode "on" causes the matches to be buffered and sorted according to their distance to the search sample. If there are more matches than buffersize, then the buffer is put out and all further matches are put out without sorting. If mode is "truncate" and more than buffersize matches are found, then the matches with the largest distance are omitted and only the best ones are put out. If mode is "off", then no buffering and sorting takes place. buffersize must be larger than 1 and smaller than 1e6 . -statistics Determines the overall color histograms of red, green and blue. Prints a shell script to stdout. If executed, this script prints a text table describing the histograms and the three maximum intensities. Also it creates and displays 3 files in the working directory : red.gif , green.gif and blue.gif which show the histograms in a self adjusting scale. A thin horizontal line marks the average value and a vertical line marks the color value with the highest density. Try: similar -statistics | sh But be aware of the three image files which are created. With ImageMagick 5.3.1 you need to insert a sed command before sh : ... | sed -e 's/-pen /-fill /' -e 's/fillRectangle/rectangle/' | sh -theory Prints a text which speculates about the reasons why similar possibly works. -use_exemptlist:mode If mode is "on" then all search operations will respect the list of explicitely distinguished files (see -record_exempt). This may slow down searching and especially the operation of command -distinguish and -record_exempt. If mode is "off" then only -list_doubles will use the list. A change of database (-mapname) sets -use_exemptlist to "off". -visualize:size:resultfileaddress:fileaddress_or_sample Prints a shell script to stdout which can generate an image file with 16 color patches describing the 16 pixels of the sample resulting from fileaddress_or_sample. If you look at the image, you see what the comparison process sees. size controls the size of a single color patch (1 to 256). resultfileaddress may not contain ':' characters and has to end with a type indicator known to program convert (like .gif or .jpg). The command is performed locally by the client. To avoid an unnecessary connection to a server, -no_rc might be helpful. Example: Create file /tmp/x.gif from file house.jpg similar -no_rc -visualize:64:/tmp/x.gif:house.jpg | sh Like with -statistics you might have to insert a sed pipe : ... | sed -e 's/-pen /-fill /' -e 's/fillRectangle/rectangle/' | sh View resulting image with : display /tmp/x.gif Command reference of agent commands #text or !text Remarks. They are ignored like empty commands. :character Same as -synthetic:-1:1:character (see below) -add_view:connection:subscription:representation[:encapsulation] Subscribe to a class of output events. This is the way how frontend programs get informations. There are several kinds of subscriptions: prompt get notified when the server is idle and when it becomes busy. stdout get texts which are directed to standard output stderr get texts which are directed to standard error current_file get the current file (usually its name) double_files get the list of possibly double files current_file_copy get the copy of the current file (see -show_copy) double_files_copy get the list of copies of possibly double files ask get questions mark get start and end of command -mark. reply get start and end of command -request. all subscribe to all output events Any time this program (the server) changes one of the subscribed states it sends a notification to the client using the chosen representation. The connection parameter may be a filename or a "-" character. A "-" tries to use an already established back channel to the frontend program. This will fail if the input connection is a pipe and no filename was given with a previous -add_view: from that client. It is ok to use a connection filename several times from within the same frontend program. The representation should be "default" up to now. If encapsulation is given as "encapsulated" then the view will only receive notifications caused by commands from the same input connection as this -add_view command. If encapsulation is set to "open" any command's output may be received. "open_and_show" does the same, but with subscription current_file or double_files, initial notifications will be sent to inform about the actual image state. The encapsulation mode applies to all subscriptions of the same view id (see below Appendix B, notification "init"). There are two subscriptions which are served automatically together with any other subscription: init notifies about the first subscription which is assigned to a backchannel. end notifies that the backchannel will be closed by the server now (e.g. because it ends service). Also the subscriptions mark and reply are served as soon as any subscription is made. These notifications emerge from commands -mark and -request. The format of notification messages is described in appendix B. -arguments_to_queue Only in effect for commands given as program arguments. Usually the arguments are interpreted before the input for dialog is started. This prevents unwanted outside interaction during setup. By -arguments_to_queue all following arguments are placed in the input buffer. So they get executed as user commands from internal input as soon as dialog service starts. (Beware of -permission restrictions) For automated start of -tcp_client it is desirable that all following arguments are sent to the server rather than waiting for the end of client mode. Use: -arguments_to_queue -tcp_client:... ... -ask_end Ask for Y or N . End program if answer is Y. -bundle:mode After the command -bundle:start all following commands from the same input channel are buffered and their execution is delayed until the command -bundle:end or again -bundle:start is received. The execution of the buffered commands will not be mixed with commands from other channels. So one can rely that a combination of -jump and -moveto will really move the desired file even if a lot of other clients are using the server without any coordination. Be aware that the total size of buffered commands per channel is limited (but at least 64 KB ). -bundle acts along the whole client-server chain whereas enclosed commands might get executed locally by a client. -bundle may not be prepended by any other command (e.g. -mark). -double_check:mode In applications which show a current image item, a picture recognition check may be performed to find very similar images. This command wether such a test shall be performed by appropriate applications. Mode "off" disables the check, "on" performs it with any current image item. Mode "auto" allows the application to check only those images, which are not already under the management of the recognition system. -encapsulate_views:switch If switch is set to "on" this command excludes other views from being notificated about the effects of commands. Only the internal output channels (see -internal) and the views of the command's sender will get these notifications. This mode is useful if the clients of this program are not aware about each other and would interfere with their replies. "off" is the normal mode where all subscribers get all subscribed notifications. -end End the run of the program. -end_client Ends client mode. This mode eventually was initiated by -tcp_client . -end_on:trigger[:option] Set or delete trigger for automatic end of program. The trigger is deleted if option is 'clear' . Triggers defined up to now: list_end is true if the current item cannot be set because the end of the file list has been reached. prompt (only effective in client mode) is true if the server indicates to be idle by a 'prompt' notification. timeout is true after the number of seconds given by option has elapsed without a new -end_on:timeout. idle is true after the number of seconds given by option has elapsed without receiving an input event. A general trigger is defined which combines all triggers. all is true if any trigger is true. Therefore -end_on:all:clear deletes all triggers. -end_server Sends an -end command to the server and ends client mode. This command is ignored if the program is not in client mode. -external:channelname:switch Enable or disable external i/o channels. Available channel names are: current_file the view subscriptions for current files double_files the view subscriptions for possibly double files all all of the above channels Switch may be either "on" or "off". Default is "all:on". -help[_short] Print this text. -help_short only prints the keyset help text. The program ends without any dialog if -help is its last argument. -hide Close internal viewers for current_file and double_files and notify all subscribers of these events that no files should be displayed. If the views are already hidden, then nothing is done. -input_ready_fd:code In client mode, this command notifies that data is available from the server. In normal (server) mode, this command is ignored. -input_interrupt:mode If an application attached interrupting input channels to the agent they may be enabled (mode "on") or disabled (mode "off"). If they are disabled then the agent only listens for commands and notifications and the application temporarily will not see data pending at the interrupting input channels. As soon as the interrupt is enabled again, the application will notice those data. One should not disable for a long time because a sender may become impatient and close the connection. -internal:channelname:switch Enable or disable internal i/o channels. Available channel names are: current_file the internal viewer for current files double_files the internal viewer for possibly double files stdout texts directed to standard output stderr texts directed to standard error termios listen for single character input at start terminal all all of the above channels Switch may be either "on" or "off". Default is "all:on". -keyset:statement Manipulate the keyset mapping from single keystrokes to commands. A statement consists of a code word and possibly additional text. See also the example keyset definition below. Keyset statements: clearall Clears the whole keyset. clear:keys@ Clears the mappings for one or more keys. clear:@#code@ Clears the mapping for a key by its decimal code. helpstart:text Clears the short helptext and adds first line. helpmore:text Adds another line to short help text. listall Lists the whole keyset. list:keys@ Lists the mappings for one or more keys. list:@#code@ Lists the mapping for a key by its decimal code. maindir:address Sets a common parent directory for file addresses with 'map' and 'trashdir' which do not begin with '/'. Performs parameter substition. map:keys@command Maps one or more single keys to a single command. Use the same commands as for the command line but no remarks. map:@#code@command Maps a key by its decimal ASCII code to command. readfrom:filename Reads keyset statements from a file one statement per line. See example below. Performs parameter substition. trashdir:address Sets the directory which keeps old file versions for the -undo command. Performs parameter substition. #text or !text Remarks. They are ignored like empty statements. '@' (ASCII 64) and ESC (ASCII 27) cannot be mapped at all. If a key gets more than one mapping, then all these commands get executed in the same sequence as they have been mapped to the key. So if a key is to be newly defined, "clear:" should be applied first. Mapping for multi character keys (like F1) is not possible yet. Leading blanks before the code word (e.g. 'map') are ignored. Blanks between @ and 'command' are ignored. All other characters up to the line end are significant. -local_cmd:command Execute command locally even if being client and the command usually is performed remotely. Note: the command -bundle does not need -local_cmd prepended. -loop_wait:hide_show_pause:minimum_execution_time:queue_limit Set the dampening parameters for the input loop. This is mainly to calm down the appearance of the image viewers. The hide_show_pause is the minimum number of microseconds between cancelling the previous current_file and showing the next one. minimum_execution_time (in microseconds) is to reduce the number of current_file notifications under heavy input load with fast commands. In this case the queue may be empty for a short time causing a prompt and display of the current file. Nevertheless, if new commands arrive frequently, only flickering windows appear and the CPU load goes up. If the input queue contains less than queue_limit commands, then the next command will be delayed until minimum_execution_time has elapsed. A queue_limit of 0 will apply the delay with any lenght of the queue. -make_userkey:connectiontype:username Convert a file into an encrypted keyfile suitable for secure connections. The connectiontype may be one of: "tcp","file","internal","all" . It is used to choose the keydirectory which may be set by -security. There may be appended "-128" or "-256" to explicitely set key size. Within the keydirectory there has to be a file {username}.tnl with at least 64 bytes of content. The first 64 bytes are condensed and encrypted into a key which replaces the original file content. Permission to read or write the file is granted only to the owner. -mark:id:command Notify the sender about start of command, execute command and notify about the end directly before the program checks for the next command. Other clients will not see the mark notifications. Note: the command -bundle will not work as argument of -mark. -permission:username:class[-action]:["permit"|"deny"] Set a rule to permit or deny execution of particular actions by particular users. As soon as the dialog mode starts, the rules are checked with the effective username whenever a command requests a program action. Be careful with this command in dialog mode since you might exclude yourself from entering the next rule. The effective username is obtained from the authentication of the command's input connection. If no authentication by encryption has been enabled at that connection, the username is "external_anonymous" for external programs and "internal_owner" for application *rc files, program arguments and input at the start terminal (termios,stdin). Username "all" in a rule applies to any effective user. Any action belongs to a class (like "shell" or "move") but may also be addressed individually. The youngest matching rule applies. Example (deny shell commands for any user other than "internal_owner" but allow the use of a particular script with current item): -permission:all:shell:deny -permission:internal_owner:shell:permit -permission:all:shell-/home/scripts/infoscript $*:permit For a complete list of classes, see appendix agent P below -pipe_service:type:filename Create filename as named pipe and listen for input. The parameter type determines how the input shall be used : 1= handle single bytes as keystrokes. 2= handle text lines as commands. 3= send text lines to stdout 4= send text lines to stderr For frontend programs only type 2 is useful. Type 1 is discouraged. If filename already exists and is a regular file, it gets replaced by the named pipe. The named pipe is removed when it is closed. Usually the first action of a frontend program is to send a -add_view command with the name of another pipe for output. Performs parameter substition if the application supports it. -preserve_orphan_events:mode When a client connection ends, usually the server removes from the input queue all pending events which originate from that client. This is the default mode "off". If the server shall execute commands from clients which do not wait for completion, the mode may be set to "on". Due to the compex handling of such orphan events, "on" is discouraged in high security environments. -queue_canceled In client mode this causes a ESC line to be sent to the server. In normal (server) mode, this command is ignored. -readfrom:fileaddress Read lines from file and execute each one as command. Performs parameter substition. -refresh[:option] Redisplay the current item and eventually its lookalikes. Also print again the file information lines. Option "if_idle" prevents refresh if there is at least one input event pending (i.e the program is busy). -remote_cmd:command Try to execute command remotely even if it would normally be performed locally in client mode. Fails if not in client mode. Note: the command -bundle does not need -remote_cmd prepended. -remote_root_cmd:command Try to execute command remotely even if it would normally be performed locally in client mode. If the server is a client itself, propagate the command to its server and so on. Works even if not in client mode. Note: the command -bundle does not need -remote_root_cmd prepended. -repair_icv:statement Set or delete trigger for automatic conversion of image files which cause the internal current view process to exit with nonzero value. This happens mostly if the format of the imagefile is not known to the viewer program. There are two possible statements: on .... if the viewer exits badly, try to repair the file via -jpgb off ... do not try to repair the file -request:id:command Notify the sender about start of command, execute command and notify about the end directly after the command itself ended. Text messages to standard output will only be delivered to the sender of -request. Other clients will not see the mark notifications. This is similar to -mark but the end notification will be sent more early. Note: the command -bundle will not work as argument of -request. -shell:shellcommand Execute a shell command. Performs parameter substition if the application supports it. -security:connectiontype:option:value Add an access rule to the security list for remote connections. The connectiontype may be one of: "tcp","file","internal","all" Options may be: tunnel controls encryption on server side. Possible values are "require", "allow" or "refuse" encryption. user controls encryption on client side. Its value sets the username for encryption. Empty value disables encryption keydirectory its value sets the directory with the user key files. controller controls access to the -security command. Initially only "internal" connections (i.e arguments and start terminal input) are allowed to use this command. By value "on" one may enable other connectiontypes which then should have set "tunnel:require". clientprotocol explicitely select an authentication protocol for option "user". Valid values (protocols) are : 0.0 128 bit key, CRC-32 authentication (deprecated) 0.2 256 bit key, SHA-1 authentication (recommended) serverprotocol select the authentication protocol which will be accepted by a server. Valid values are : 0.0 , 0.2 , all (default is "all") where "all" permits any protocol known to the server. If more than one rule matches, the youngest one is used. -show_copy:connection:subscription:options:target Subscribe to a copy-and-notification service for image files. Different from -add_view this subscription makes a copy of the files to be shown. Then it notifies its subscriber about the addresses of the copies. Any other subscribers of "${subscription}_copy" will get notified only if connection is "*". In this case -show_copy has no own back channel but relies on channels which have to be subscribed by -add_view. Other connection names are handled like in -add_view. Subscription may be one of : current_file copy the current file double_files copy the possibly double files all all of the above Options may be a list of the following keywords and their eventual parameters: max_number=number limits the number of copied files. This may be be necessary for subscription "double_files". encapsulate if this keyword is present, subscriptions are encapsulated (see -add_view). Also the display of copyfiles does end not before the next command from the same input connection is received. Without this keyword copyfiles end display as soon as the next file is to be displayed. keep=seconds copyfiles normally get removed when the subscribers are notified about end of their display. They may be kept available on the disk for the given number of seconds. This is useful if serving for a web server whose clients may need some time to load the files. Target is a file address template. For an particular file it will be extended by a unique counting number and the file name's dot extension. Example: target=/tmp/simv_current , filename=/home/me/my_cat.jpg Name of copy: /tmp/simv_current_1168105077.jpg This name is sent with a "current_file_copy" notification. Despite copyfiles get removed if all goes well, one should use a separate directory which is easy to clean from old remnant file copies. Just in case that something goes wrong. -sidestep:command Execute command and afterwards restore the cursor position in the application's item list (if there is one and it got changed by command). -stderr:[newlinemode:]text Send text to stderr and appropriate channels. If newlinemode is "nonl" then the output text will not be trailed by a newline character. If newlinemode is "nl" or if it is omitted, then a newline is appended. -stdout:[newlinemode:]text Send text to stdout and appropriate channels. newlinemode: see -stderr -synthetic:connection:type:text Create a synthetic input event. Useful to emulate a keystroke via a line oriented input connection. The value of "connection" has to be -1 up to now. "type" tells how to handle "text" : 1=keystroke , 2=command line , 3 = to stdout, 4 = to stderr -tcp_client:hostname:portnumber[:representation:[encapsulation]] Enter client mode. This means to connect to a peer program and to subscribe for all output events. Most of the input events will be forwarded to the peer and the incoming notifications will be printed and sent to the own frontend clients. The internal image viewers for current and double files are disabled though. The peer has to provide TCP/IP service by -tcp_service:hostname:... Commands that are handled locally: -add_view , -end , -end_client , -input_ready_fd , -internal -local_cmd , -security , -show_copy , -tcp_service If a username has been set by -security:tcp:user:... then this name is used for for authentication and encryption. (see appendix C) If representation is "data" then the subscriptions for image files will cause the server to transmit the whole file contents together with their addresses. All subscriptions made by -show_copy at this client will make their file copy from the transmitted content rather than trying to read the file itself. If internal file viewers are enabled (by -internal after -tcp_client), then they show these file copies rather than trying to read the file itself. This notification method is helpful if direct file access is not possible due to access restrictions or due to incompatible file addressing (try: -show_copy:... -tcp_client:... -internal:all:on ). If encapsulation is given as "encapsulated" then the client will only receive notifications which are caused by its own commands. If encapsulation is "open" or missing, any command's results may be received. -tcp_service:type:hostname:portnumber:access_mask[:option] Offer service for frontend programs via TCP/IP. A client like telnet or a peer of this program itself (see -tcp_client) may connect to the address given by hostname and portnumber. Depending on the given type, input will be handeled differently: 1= handle single bytes as keystrokes. 2= handle text lines as commands. 3= send text lines to stdout 4= send text lines to stderr For frontend programs only type 2 is useful. Type 1 is discouraged. If portnumber is 0, the operating system will assign one automatically. The access_mask describes (in hex) which hosts other than localhost are allowed to connect. Connection is refused if : ( access_mask & foreign_address ) != ( access_mask & host_address ) Where host_address is the address of hostname given with -tcp_service. Please note that this might be circumvented by IP-spoofing and that there is no check of user identity. Nevertheless the mask test provides a first defense against denial-of-service attacks. In hostile IT environments, use -security:tcp:tunnel:require to demand user authentication and encrypted communication. See also appendix C . Examples: Serve at port 50000, do only accept connections from localhost -tcp_service:2:localhost:50000:FFFFFFFF Accept connections from own C-class net (assuming own hostname = ts4) -tcp_service:2:ts4:50000:FFFFFF00 Accept from anywhere -tcp_service:2:localhost:50000:00000000 If 'option' is given as "obstinate" the program will ignore errors with call bind() and try to obtain a server socket until it finally succeeds (or the program gets stopped). -user:username:command Require permission to execute command not only for the authenticated user but also for the one given by username. -viewer_cmd:shellcommand Set the shell command which starts an image viewing program. The program will be started to display one or more image files. ${shellcomand} -geometry [+|-]0+0 ${filename_1} ... ${filename_N} A geometry of +0+0 should create the image window at the upper left of the screen. -0+0 should do the same adjusted to the upper right. Tested shellcommands on Linux : xv , display Appendix agent A : Keyset Definitions Usually a keyset definition is written to a file and read by -keyset:readfrom There are only few sagent commands which make sense with keys, so better look at Appendix application A for a more realistic example. If the content of file /home/me/sagentkeys looks like that: # Clear old keyset clearall # Some helptext help: -disable_text +enable_text ?help Quit # disable output of text map:-@ -internal:stdout:off map:-@ -internal:stderr:off # enable output of text map:+@ -internal:stdout:on map:+@ -internal:stderr:on # The help key map:?@ -help_short # Ask wether the program shall end map:Qq@ -ask_end then the command -keyset:readfrom:/home/me/sagentkeys activates this definition and overwrites the old one. Appendix agent B : Format of Notification Messages A notification is a message consisting of a header and some fields. It is terminated by a linefeed although it may contain 8-bit data. subscription number_of_fields [field[...field]]\n A field consists of its (printable) header and a 8-bit data part : :name:representation:number_of_bytes:bytes Example: current_file 2 :idx:data:1:5:file1:address:20:/usr/pictures/me.jpg\n |first field||---------- second field ------------| Message format of particular subscription notifications. {#} is the number_of_bytes in decimal. For better readability, each field is shown in a separate line. When the server encounters the first subscription of a client : init 1 :id:data:{#}:{id-number}\n This message is also generated by every single -show_copy subscription. When the server ends service, for every "init" it issues an "end" : end 1 :id:data:{#}:{id-number}\n Message to standard output resp. standard error : stdout 1 :text:data:{#}:{text}\n stderr 1 :text:data:{#}:{text}\n When a file is added to the list list_change 3 :idx:data:{#}:{index} :action:data:6:append :item:data:{#}:{filename}\n When the filename of a list item changes, "action" is set to "change" list_change 3 :idx:data:{#}:{index} :action:data:6:change :item:data:{#}:{filename}\n When the current file gets displayed by the internal viewer : current_file 2 :idx:data:{#}:{index} :file1:address:{#}:{filename}\n or if subscribed with representation "data" : current_file 3 :idx:data:{#}:{index} :file1:address:{#}:{filename} :content1:data:{#}:{filecontent}\n When the current file ends to be displayed : current_file 0 \n If two possibly similar images were found by the recognition system : double_files 2 :file1:address:{#}:{name1} :file2:address:{#}:{name2}\n or (if subscribed by representation "data") double_files 4 :file1:address:{#}:{name1} :content1:data:{#}:{content1} :file2:address:{#}:{name2} :content2:data:{#}:{content2}\n When the display of these similar images ends : double_files 0 \n The same formats are used for subscriptions "current_file_copy" and "double_files_copy" (see -show_copy). When the server becomes idle : prompt 2 :idx:data:{#}:{index} :count:data:{#}:{total_files}\n When it gets busy : prompt 0 \n When a question is asked : ask 4 :id:data:{#}:{question_id} :type:data:{#}:{type} :text:data:{#}:{question_text} :constraint:data:{#}:{rules}\n When the question has been answered : ask 4 :id:data:{#}:{question_id} :type:data:6:cancel :text:data:0: :constraint:data:0:\n Output is done by the particular commands as well as by the command-loop which informs about the command before and possibly about the current item after execution. The command -request surrounds exactly the command's output and does not include any loop output. The command -mark surrounds everything after "prompt 0" up to and including "prompt 2 ...". Before a command given with -mark is executed: mark 2 :state:data:5:start :id:data:{#}:{id given with -mark}\n Before the program checks for the next input event after -mark mark 2 :state:data:3:end :id:data:{#}:{id given with -mark}\n Before a command given with -request is executed: reply 2 :state:data:5:start :id:data:{#}:{id given with -request}\n After the command has been executed: reply 2 :state:data:3:end :id:data:{#}:{id given with -request}\n One can explore the notifications by telnet. Connect to the server (enabled by -tcp_service) and send: -add_view:-:all:default Then operate the server via its usual interface and watch the messages received by telnet. One also may send command lines from telnet. (telnet will end quickly if encryption is required by the server) Appendix agent C : Authentication and Encryption Connections between the program and its frontends may be made secure by a user-related encryption key. A user's key is stored in a file within a keydirectory which is known to the program. The file's name is the user's name extended by .tnl . Anybody who is able to read that file is authorized to connect under that user identity. Therefore, restrictive access permissions should be set for that file. By help of command -security one can set the keydirectory, the security requirements of a server and the username for use with -tcp_client. The particular connection types may have separate settings. The content of a user's keyfile is encrypted by the internal key of the program. Since this key may vary from installation to installation, it is better to use -make_userkey rather than creating the keyfile by other means. Create an original 64-byte file of random bytes, transport it and convert it locally. If you got to use old 16-byte keyfiles together with newly created ones, make sure to create 128 keys and not the new 256 size. If a user's keyfile has been installed at both hosts, it is sufficient to allow encryption at the server side and to set a user for type "tcp" on the client side. When client mode starts (by -tcp_client) the necessary handshake is performed automatically between server and client. Default security settings are : -security:all:tunnel:refuse -security:all:controller:off -security:tcp:tunnel:allow -security:internal:controller:on -security:all:keydirectory:${HOME}/tnl_keydir Remember: the youngest matching rule wins. So encryption is only allowed for TCP/IP connections and security controlling is only allowed for internal input (*rc files, program arguments and termios,stdin at the start terminal). So usually one only has to set a username before connecting to the server. -security:all:user:{username} -tcp_client:{hostname}:{portnumber} Appendix agent L : Legal stuff This software is copyright 2003, Thomas Schmitt stic-source@gmx.net and provided to you without any warranty under an open source BSD license. (see file COPYING) Appendix agent P : Permission Classes After the initial arguments of the program have been processed any further action caused by a command is checked against rules which may have been defined by the command -permission . A rule may address a whole class or a single action. Depending on the command, more than one action may take place. The application command -moveto:$(/home/scripts/find_target $*) first tests for class="move" , action="moveto" and then for class="shell" , action="/home/scripts/find_target $*" Nearly all commands test for their class and name, but -shell execution tests for class "shell" and the shell command before parameter substition takes place. The command -user tests for class "user" and the given username. Class Command (resp. Action) ------------------------------------------------------------------------ security -security , -make_userkey , -permission, -viewer_cmd shell -{shell command before parameter substitution} end -ask_end , -end , -end_client , -end_on , -end_server peer -local_cmd , -remote_cmd , -remote_root_cmd service -double_check , -encapsulate_views , -external , -internal -input_interrupt , -keyset , -pipe_service -preserve_orphan_events , -show_copy , -tcp_client -tcp_service cancel -queue_canceled user -{username given with command -user} client -add_view , -bundle , -mark , -request , -synthetic script -readfrom navigation -sidestep info -hide , -refresh message -stderr , -stdout help -help For application specific commands, see the appropriate appendix P. Special care should be taken for class "shell". If permitted, the commands are executed under the user id which started this program. Class "security" should be even more restricted to avoid unauthorized changes of permissions. Note that -permission:all:security:deny is quite final in dialog. You will get no chance to enter further -permission commands. So, only if given in the program's start arguments, a set of rules may begin like that : -permission:all:security:deny \ -permission:all:shell:deny \ The start terminal (via termios,stdin) gets administrator permissions : -permission:internal_owner:security:permit \ -permission:internal_owner:shell:permit \ Others are allowed to perform some selected shell commands : -permission:all:shell-/home/scripts/infoscript $*:permit -permission:all:shell-/home/scripts/automatic_move_target $*:permit For a quite mistrusted user "webserver" one may additionally define : -permission:webserver:all:deny -permission:webserver:client:permit -permission:webserver:navigation:permit -permission:webserver:help:permit -permission:webserver:info:permit from class "modify" we only allow action -repair_icv: -permission:webserver:modify-repair_icv:permit Appendix Similar A : Keyset Definitions similar is not much intented for dialog. Usually servers guard databases and clients get their commands from arguments. Appendix Similar C : Configuration File With each map {dbaddress} there may be associated a file {dbaddress}_conf which contains configuration settings. Currently it is only needed to describe the usage and layout of a MySQL database server. A line of this file may either be a remark or a variable setting. Both types may have white space prepended which is ignored. Any other lines are ignored. A remark begins with a "#" and causes the whole line to be ignored. A variable setting begins with a name which may be followed by white space or one of the delimiters "=,;:". The setting's content begins directly after the delimiter respectively at the next non white space and ends at the line's end. Examples: # remark with comments about some settings name content and more content ... even here is content name = content with two leading blanks Currently these general variable names are defined: dbtype content may be either "files" or "mysql". If it is "mysql" the names shown in Appendix Similar M are expected to describe the prepared data base layout. mirroring if set to "on" it causes the read operations to use dbtype "files" and the write operations to use both "files" and the configured {dbtype}. Appendix Similar P : Permission Classes This is a list of application permission classes and their commands. see "Appendix agent P" for more information about permission classes. Class Command (resp. Action) ------------------------------------------------------------------------ service -check_ratio , -guarded_map , -mapname , -match_par -sortbuffer dbadmin -caching , -distinguish , -garbage_collection modify -append , -append_list , -del_back , -del_dir , -del_file -del_file_adr , -del_files_expr , -del_tree -record_exempt , -record_exempt_list , -refresh , -rename env -append_nondash , -auto_client , -cd , -search_others_only -search_self_too info -compare , -distance , -imagesize , -imagoid , -is_bw -list_all_files , -list_doubles , -list_exempts -list_samples , -lookup , -lookup_adr , -lookup_adr_all -lookup_adr_count , -lookup_id , -lookup_sample -sample , -search , -search_distance , -search_dsample -search_id , -search_sample , -statistics , -visualize ------------------------------------------------------------------------ Appendix Similar M : Cooperation with MySQL The sample data, fileaddresses and exempts usually are stored within some disk files. With a suitable version of similar it is also possible to store these information within a MySQL database. See text file README, "Cooperation with MySQL" on how to create a version suitable for MySQL. There have to exist two tables in the same database. The names of database, tables and columns are adjustable but it does not hurt if you use those in the given examples. CREATE DATABASE similar_db; The sample table minimally needs to look like this (one may add columns): CREATE TABLE similar_samples ( id BIGINT AUTO_INCREMENT NOT NULL, name TEXT, smp17 TEXT, smp17_red MEDIUMINT, smp17_green MEDIUMINT, smp17_blue MEDIUMINT, INDEX (id) ); The exempt table should quite exactly look like : CREATE TABLE similar_exempts ( eid BIGINT AUTO_INCREMENT NOT NULL, eboss TINYINT, ename TEXT, INDEX (eid) ); This layout is published to similar by the {dbaddress}_conf file: dbtype mysql hostname localhost username {your_username_on_MySQL} password {your_password_on_MySQL} dbname similar_db sample_tablename similar_samples sample_idname id sample_adrname name sample_dataname smp17 exempt_tablename similar_exempts exempt_idname eid exempt_bossname eboss exempt_adrname ename {sample_dataname} will also be extended by _red, _green, _blue if this is not disabled (see below {has_color_avg}). So this setting actually may control up to four column names. If password is empty or missing, similar also looks for {dbaddress}_sql_pwd where it may be stored plainly as the only content. Do not make this file readable for anybody but yourself. Whenever similar gets directed to a database with such a _conf file, it tries to start MySQL storage mode. This may fail for various reasons. If similar is not linked with the mysql-client, then the error message reads like : dbtype mismatch. requested: mysql , available: files Another way to combine similar and MySQL is to load User Defined Functions into the MySQL server. These functions are provided in file similar_udf.so . See text file README on how to compile this file and how to load the functions. The functions may be used within SQL queries. SIMILAR_IMG(sample1,sample2[,options like with -match_par]); compares two sample values and returns 1 if they match (0 else). SIMILAR_DIST(sample1,sample2[,options like with -match_par]); returns the distance between two samples. SIMILAR_CPAR(sample,mode[,options like with -match_par]); returns information about the actual match parameters of a sample. SIMILAR_FMT(input_sample,format_name); converts a sample into one of several formats or extracts information from the sample (see README for details) More configuration parameters : has_similar_udf if set to "true", "yes" or 1 it causes similar to use the function SIMILAR_IMG in SQL search queries. So any search query will fail if SIMILAR_IMG is not loaded in the MySQL server. The use of SIMILAR_IMG reduces to a minimum the data traffic between similar and MySQL and should increase speed under most circumstances. has_color_avg if set to "false", "no" or 0 it causes similar not to read or write the average color columns. They are intented to help if SIMILAR_IMG is not available but are much less effective. So one may ommit them in the database layout. write_native if set to "false", "no" or 0 it causes similar to write image sample data in a generalized binary format which is provided in case of exotic program ports. If set to "swapped" the usual native format gets byteswapped to be more suitable for a server with a different bytesex than the client. Appendix Similar L : Legal stuff By using libjpeg this software is based in part on the work of the Independent JPEG Group (http://www.ijg.org). This software itself is copyright 2003, Thomas Schmitt stic-source@gmx.net and provided to you without any warranty under an open source BSD license. (see file COPYING)