MANUAL PAGE 2
Author: IanB - ian(at)i-asm.com
|
Previous versions of yDecode used a preprocessor feature of NewsPlex that allowed an executable file (yCheck.exe) to be run for every raw .OK article file it downloaded for processing automatically before they were saved in the NewsPlex \async directory, so they could be renamed. This is no longer required.
yDecode creates its own queue of files to process from the input folder when the current queue is empty. It checks files for yEnc or UUE signatures and if it finds one will rename it with a .YNC or .UUE extension to flag it for further processing. For safety it will NOT, however, look in the following types of files for data (some of these are NewsPlex/yDecode working file types):
.YNC .YBAD .UUE .UBAD .DONE .LOG .DEL .BAD .DYC .DUE .SPLT .TMP
.ZIP .RAR .MP3 .RAM .RM .RA .MPG .MPEG .AVI .WMV .JPG .JPEG .BMP .GIF .PNG
.HTM .HTML .DOC .CRC .SFV .CSV .PAR .PAR2 .BAT .EXE .COM .DLL .INI .REG
For this detection phase of operation it only renames the files, it does not move them or decode them at this stage. Only the first 4096 bytes of a file are tested, which is enough to reach beyond the Usenet headers of most raw saved Usenet articles into the data.
Although strictly speaking the Usenet headers are not necessary for raw decoding, many of yDecode's features rely on two lines at least being present - the "Subject: " and the "Newsgroup: " header lines. UUE decoding in particular will fail without certain required multipart information present in the subject line, and file filtering cannot be performed without the newsgroup source information, nor will automatic repair requesting be possible.
The yEnc check is for the text "=ybegin" immediately after a Ctrl-Linefeed combination, as per the yEnc specifications. This allows both yEnc version 1 and putative yEnc version 2 files to be flagged. The UUE check is for the text "begin " after a Ctrl-Linefeed combination, or "end" after a Ctrl-Linefeed combination, or for a valid UUE line 300 bytes (approximately 4 or 5 valid lines) from the end of the test segment. A valid UUE line is one which is the correct length as given by the decoded first character.
Because the extra yCheck program is no longer required to detect and queue files, you must ensure that if you have previously been using it, all references are removed from the NewsPlex startup environment. If you are using NewsPlex 3.9 or earlier, make sure this text is REMOVED from the NewsPlex startup command line:
-X yCheck30.exeFor NewsPlex 4.0 or later make sure the following line is REMOVED from the [async] section of the etc\newsplex.ini file:
executeCmd=yCheck30.exeand that if no other preprocessor is being used the following option is set:
executeYes=0
Filters are listed in the INI file in an easily editable format, or they can be edited from within the program. They are listed in a strict checking order, which should have the MOST specific first, ie. "alt.binaries.multimedia" before "alt.binaries" or just "multimedia".
For simplicity, all text matches are treated as wildcarded, ie. "alt" is equivalent to "*alt*", so it is advised to use dot delimiters to ensure the desired results (ie. if you really want to match "alt." not "waltons"!). To avoid specious matches, there is a minimum of 3 characters for each filter. There are, however, no actual wildcard characters supported.
The text match string is immediately followed (no space) by all the file extensions to check for in brackets, a space between each one listed. The extensions are NOT implicitly wildcarded and must be matched exactly for intended results. Extensions can have minimum 2 characters, maximum 4, eg.
alt.binaries(mpg mpeg avi rm)=<path>To route all files from that matched group rather than particular types, the extension string should be just (*) ie.
multimedia(*)=<path>To route different types of files from the same source groups, each set will need to be listed explicitly, or you can use the (*) all files option after routing other files first from the match group, eg.
alt.binaries(jpg jpeg gif bmp)=<image path> alt.binaries(ra mp3)=<music path> alt.binaries(*)=<video and everything else path>Any files not matching filter sources or file extensions will go to the default output directory, given in the main [yDecode Options] section of the INI file. All verification files (CRC, SFV, CSV, PAR, Pnn and PAR2) are routed deliberately to a separate \verification folder for processing before user rules are checked, so these file extensions are ignored/disallowed.
It is important to understand that when you have selected the use of filters, not only will incoming files be routed to the desired folders but also any verification sets will expect to find the files they match there as well!
If a file you expect should be recognised by a set is not being found and matched when the verification set is parsed, it is more than likely because it is not in the correct folder according to the given filter rules, perhaps because the rules settings are particularly complex. Importing the file manually through the User Actions dialog should relocate it if necessary.
TO AVOID UNEXPECTED RESULTS, IT IS ADVISABLE TO CHANGE THE FILTER RULES FROM WITHIN THE PROGRAM ONLY WHEN THERE ARE NO ACTIVE VERIFICATION SETS, AFTER INITIAL STARTUP HAS COMPLETED AND WHEN THE INPUT QUEUE IS EMPTY!
Multipart yEnc data is inserted correctly into a .TMP file the size of the expected final file, and the part progress is noted in a small .DYC file made for each multipart binary which is deleted when all parts are finished. (This is an extended version of Jürgen's original .DEC file format for yDec). THESE FILES SHOULD NOT BE ALTERED OR THE MULTIPARTS WILL NOT BE CORRECTLY JOINED.
Like a .DEC file, the first part of a .DYC contains a list of data still missing that needs to be inserted. In this extended version, there then follows a list of ranges that were not fully added, ie. contained errors. This means that yDecode can decide when all the files for a multipart have been decoded, even if not all the ranges could be marked off successfully. There is also a message subject confirmation line to assist with verification.
A sample .DYC file might appear like this:
Subject=<message subject title minus bracketed part numbers> 100001,200000 Bad= 100001,200000 CompletedIn this example, all the data except for the listed byte range above the "Bad=" line has been added correctly. That range may actually have been partially or even fully inserted into the file, depending on how much was decoded before the data error was detected in the segment. There is no "slack" in the output file, however, as its size, the size of the segment and its location in the final file are all known exactly from the yEnc wrapper information. Any missing data will be "junk", but whatever is good can possibly be parsed and repaired by PAR2.
The missing range is also listed below the "Bad=" line because it was detected as bad or incomplete when decoded, and the "Completed" line is appended if the byte ranges of all bad segments added together would complete the file. Bad ranges listed after the "Bad=" line, unlike those above it, are NOT joined together - each bad range line represents the data from a single incomplete decode.
The "Completed" marker doesn't prevent good data from replacement segments coming in subsequently and overwriting any bad data, but it does allow PAR2 verification sets to guess that no more data is likely to be added. The file can then be closed and parsed for good data blocks without worrying about the results changing later.
If a file is "Completed", new PAR2 sets on being built will find it and check the data if there is a name match, removing the .DYC and closing the multipart. Conversely, if the yEnc file is aware that there is a PAR2 set waiting to verify it, it closes itself when all parts have been decoded, bad or not, and offers what is available for verification. Obviously, there may be a period between a file being "Completed" and the relevant PAR2 set being fully decoded when the multipart will be marked "Completed" in the .DYC but is not claimed by any set.
If yDecode detects a bad yEnc file, as advised by Jürgen in the yEnc specs it will rename the source (with a .YBAD extension) and add the error type into the filename. These bad files are left in the input folder for user reference and there is a counter of all bad files found in the main window. If the file contained multiple encodes, the number within the file is also added to the filename.
Depending on the error the file may be partially or even fully decoded:
In particular, there is no information provided with a UUE message about the correct size of the datastream within the message, so it is impossible to verify if the stream in any message is complete or has been truncated. If it is a multipart, the byte position of a datastream within the final message is also completely unknown.
Because of this complete lack of confidence in the validity of any UUE datastream, only an external verification check (with filesize, CRC or preferably MD5 calculation via PAR/PAR2) can guarantee that the final joined datafile is the right length and perhaps correctly up/downloaded.
Any data up to the first decoding error will be written. Because of the lack of error-checking data in the format, errors can only be raised by:
As UUE files are rare in the groups I have been testing with, I would be grateful if any encodes that are not recognised properly by yDecode are sent to me so that I can improve the detection and parsing algorithm. If you know that you will be decoding UUEs regularly, you are strongly advised to turn OFF the option to delete UUE source files until you are sure that yDecode is handling the files you normally receive reliably.
The sourcefile deletion option for UUEs works slightly differently to yEnc. In order to help preserve UUE data and make it easier to store and find, if the files are NOT set to be deleted, they are renamed with a .DONE extension but moved into yDecode's \UUE subfolder from the NewsPlex \async folder. In the event of unsuccessful processing by yDecode, these source files will be intact for other UUE processors to operate on.
Multipart UUE messages are recognised by yDecode and decoded into separate split files (standard .001 .002 etc. format, joinable by any common Windows split-joining utility) in that separate \UUE subfolder. Until all parts have been downloaded and decoded successfully, these split sets are named with a root filename made by an MD5 hash of the subject string (without bracketed part numbers) they match to. This ensures that each set is uniquely named.
This is necessary as due to the lack of embedded information they cannot be successfully concatenated into the final file until all parts are available in sequence. The output binary name is only given in the first part of a multipart message, so filter rules and output file naming cannot be applied until this has been queued and decoded.
yDecode tracks completion of the UUE multiparts by means of a .DUE file for each set in the \UUE subfolder. This is in much the same format as the .SPLT file for split sets described later, but is fundamentally different (while superficially similar) to the .DYC file for yEnc multiparts described earlier, as this shows completed rather than missing ranges. In addition, there are the following lines to aid processing at the top:
Subject=<message subject title minus bracketed part numbers> BinaryName=<final binary name> Source=<message source groups>The "Subject=" line allows matching of the incoming multiparts to the set. The "BinaryName=" and "Source=" lines will be completed when the first part (which includes the "begin ###" header) has been processed. After these three lines the completed part number ranges are listed with the last number info. Any detected bad files for the set are at the end (may not be in order), eg.:
1,12 14,44 46,53 55, Last=55 Bad=45,13,54 CompletedNote that single line entries (rather than ranges) MUST be followed by a comma. Note also that the single bad entries are listed on the same line as the "Bad=" marker, separated by commas. They are NOT joined into ranges. Unlike the .SPLT file described later, the "Last=" line is NOT optional, and MUST show the total number of parts in the set. (This info is extracted from the bracketed number in the message subject line when decoding.)
If a good replacement file is downloaded and decoded by yDecode, it will be added to the set properly in the good range entries above the "Bad=" line and the bad entry for that part is removed.
Like yEnc multiparts, multipart UUEs will also show as "Completed" if the number of detected bad parts plus the number of good decoded parts equals the total. New PAR2 files will detect matches with "Completed" UUE splits in exactly the same way, and will automatically join whatever parts are available to extract any good data for repair, deleting the source splits and the .DUE file. UUEs also track whether they will be PAR2-verified, and will close themselves for verification if they are "Completed" and there is a PAR2 set to match with.
The only difference in this behaviour is that if the first part in a multipart UUE has not been decoded, there is no output binary name to name the set with! In this case, the file is concatenated from the available splits in the \UUE folder using a filename of the MD5 hash root filename plus a .UUED (UUE Data) suffix. If good data is found it will be renamed and deleted on repair in any case.
When the .DUE file shows all parts have been successfully decoded, yDecode will join the multipart set and save it in the correct output folder automatically for verification (if available), but in the event of a problem the decoded parts can still be easily joined manually by a standard joiner utility if they are all present.
All parts (including the .DUE file) will be deleted when the final file is fully joined. So that incomplete multipart UUEs can be tracked between yDecode sessions, any remaining .DUE files are parsed on startup.
IT IS CRITICAL THAT THE .DUE FILES NOT BE ALTERED OR THE UUE WILL NOT BE JOINED CORRECTLY. The only possible useful alteration might be to add a known good multipart segment that has been separately downloaded and manually decoded then renamed (with the MD5 hash root name plus a split number suffix) to match the correct position in the set.