yDecode for Windows v4.0

MANUAL PAGE 3


Homepage - www.i-asm.com/yDecode/.
The latest version of yDecode can be downloaded as a zip file
from this link: yDecode.zip. Please note the terms of use.

   

Author: IanB - ian(at)i-asm.com
Please consider making a donation (click button!)


Contents


Verification processing

It is in your own interest to always download (and demand) the appropriate verification files for your Usenet binaries, PAR2 by preference for maximum confidence in your downloads. Good posters should always provide them. If you are unfamiliar with PAR/PAR2 files or their correct use, you should find relevant web documentation easily with a little searching.

yDecode writes a number of small text files to track completion status of the files and sets, to help with verification and to store some state information between sessions - these are described in this manual page and the previous one. They can (and should) be ignored as they are always deleted on successful completion. User alteration of these in most cases is extremely unwise.

All verification files (CRC, SFV, CSV, PAR/Pnn and PAR2) are detected and routed to a \verification subfolder. They are parsed and the verification data is then used to verify all binary files as they are decoded by yDecode.

This means that if the only verification is a CRC/SFV/CSV file containing just filename and CRC value information, the CRC value (and filename) will be checked against every processed file to verify a match. This is however far less preferable than verification by PAR/PAR2, where filesize and far more reliable MD5 hash checks are carried out, which can also safely detect misnamed and (in the case of PAR2) partially good files.

Because of this, to make the search for valid data as wide as possible while not actually checking everything in every possible output folder, yDecode will also recognise limited variations of file renames, such as additional suffixes or prefixes (such as adding split numbers: .001 .002 etc.) or names with spaces replaced with underscores, which can sometimes be found when downloading fill files from web Usenet libraries (eg. Easynews) which can then be imported into yDecode for testing. Any files matching these variations containing the set root name (anything before the dot suffix) will be tested.

Each verification set has its own unique 32-character hex string set ID - for PAR/PAR2 this is the actual embedded recovery set ID, for CRC/SFV/CSV files it is an MD5 hash of the file itself. For every set yDecode writes a small text file in the \verification folder named as the set ID plus a .VER extension, containing only two lines giving the original filename and newsgroup source of the file.

On startup all open (ie. not yet completed) verification files still in the \verification folder, along with their .VER files, are reparsed completely from scratch. This allows state information to be carried between sessions.

While running yDecode keeps track of all so-far unverified files decoded, and compares the list against any new verification files that are parsed. On shutdown, this list is written to disk as a text file called "unverified.lst" in the \verification folder, and this file is also read in on startup.

FOR CORRECT VERIFICATION OPERATION THE unverified.lst FILE AND ANY .VER FILES MUST NOT BE ALTERED IN ANY WAY! There is an option available in the User Actions dialog to view and purge the unverified.lst file in the event that it becomes cluttered with listings for files downloaded without any form of verification. Otherwise it can be deleted when yDecode is not running.

yDecode lists all open verification sets in the main program window. They are grouped by type. For each PAR/CRC-type set there is a count of the number of files in the set, the number verified, the number detected bad and (for PAR) the number of repair files available if downloaded.

Both PAR and CRC sets are notified of bad parts in a multipart when detected on decoding, and will update their totals if it is a set match immediately, even if the multipart has not reached the stage of being "Completed". This allows PAR sets to keep an accurate count of the number of repairs needed.

For CRC/CSV/SFV sets, where there are no repair options, yDecode alerts the user to the names of the remaining bad or missing files that will need replacing. This may mean manually checking Usenet for follow-up posts containing good files from the original poster, although yDecode will attempt to find any automatically if there is an active NewsPlex link. Perhaps users will encourage such posters to learn to use PAR or preferably PAR2 instead...

For PAR2 the statistic numbers given represent PAR2 blocks, not files. The Bad blocks count is necessarily an estimate based on the amount of unmatched data contained in files where good blocks were found, as unlike a named file, arbitrary binary data 1 block in length cannot be absolutely matched "bad", only good (fully matched with a source data block) or not.

While parsing blocks in potential files, a PAR2 set may discover valid blocks split across two files if the source file was split AFTER being PAR2 verified and not on block boundaries. In this case, the found data is saved as a single blocklength file named as the MD5 of the data with a .P2BK suffix in the \verification folder. All such single data blocks are automatically removed after a successful repair.

Although yDecode keeps track of the completion state of verification sets while running, it does NOT write all this information to disk on shutdown. For safety, instead on startup it rechecks the verification of all listed files that it finds in the correct folders, rather than relying on saved information, in case files have been replaced, altered or removed between sessions.

This guarantees high confidence in the verification, but it will of course mean a much slower startup for yDecode if there are many open verification sets, especially if they are PAR or PAR2 which use MD5 checks rather than faster CRC checksums, so it is advisable to shut down yDecode sessions with as few open sets left as possible. A warning is given if there are any.

When a verification file is fully completed, ie. ALL its referenced files are downloaded, fully verified by CRC/size/MD5 and repaired if necessary, it is moved (along with any repair files associated with the set) to the \verification\done subfolder. The .VER file for the set is then deleted.

Any further repair files that come in for a completed set are ignored. The set ID is checked and they are saved directly into the \verification\done folder after decoding. They are not otherwise processed. Source files for a set are also ignored if they turn up after a set has been completed, but they are not currently deleted when detected as not needed.

yDecode can be set to delete all the finished files in the \verification\done folder on exit, or they can be kept for reference. This deletion option is not available if verification processing or repair options have been turned off, so that files may still be verified with an external utility.


PAR/PAR2 repair

Full PAR and PAR2 repair functionality is built into yDecode. No external utilities are needed to join or repair downloaded files, although these can be used to check results.

Repair files (.Pnn for PAR or .volnn+mm.PAR2 for PAR2) are recognised by all having the same embedded recovery set ID. As the sourcefile verification data is repeated redundantly in every file, only the first PAR/PAR2 file decoded for every set is actually parsed. The rest are just verified, then their repair potential is noted and added to the data structure for the set.

If a PAR file is found to be corrupt, it must be discarded. PAR2 files, however, are constructed in discrete packets which means that a damaged file may contain some recoverable information from any undamaged packets. For an initial PAR2 set build, certain key data packets MUST be available, though. If a damaged repair file happens to be the first one parsed, and any key packets are not readable, it is held "pending" until an undamaged PAR2 from the same recovery set is decoded, and it is then reparsed so that all possible repair packets are recovered where they can be verified. Nothing is wasted!

yDecode can calculate how much repair data is needed from the amount of bad or unverified files (or PAR2 blocks) in the set, once it knows all available files have been queued and decoded. It checks the NewsPlex database for the availability of enough repair files in the source newsgroup and uses the results to request these files through NewsPlex. When there are enough repair files or blocks for repair to proceed successfully it makes the repair, closing the verification set and leaving only complete, verified files behind.

To allow good data in nearly complete UUE or yEnc multiparts to be verified, yDecode tries to discover whether these files will be verified by PAR2. If the number of good and bad parts decoded so far add up to the total, and the set will be PAR2 verified (or is a PAR2 file itself that can be partially parsed because of the subpacket structure) then the file is concatenated (UUE) and renamed for parsing and verification, even if some is missing. This allows maximum usage of downloaded data without waste.

Incomplete multiparts will NOT be PAR2 parsed unless the user explicitly requests this by selecting verification sets to close in the User Actions dialog (commands are only available if both yDecode's own input queue and the NewsPlex async queue is empty), or if there is an active NewsPlex link that can confirm an empty async queue automatically, and this check has been requested in the NewsPlex Settings dialog.

This is effectively the same clean-up operation, but as previously noted the automatic setting may not be appropriate and will NOT operate correctly if all messages in a verification set have NOT been queued by the user. The likeliest scenario for this is where a poster has split a large file upload over several days - obviously, after the first day, only what was downloaded that day would be available as source files. In this case, automatic closure should be turned off and manual set closing should be used until the download is complete.

When set closure happens, all relevant multiparts (both yEnc and UUE) outstanding are closed and verified, whatever state of completion they are in, so that the correct final amount of repair data for the selected verification sets can be calculated and requested. If there is no active NewsPlex link, the user is prompted to download sufficient repair files or replacements when the correct number can be reliably calculated.

A useful feature of yDecode's repair implementation is that all the files need not be in the same directory to be repaired successfully. They are found and can be accessed in whatever folders they may have been routed to via the existing filter rules, and will be repaired in and to the correct filter location. Any incomplete source files are then removed automatically when repair is finished successfully.

When a PAR or PAR2 repair is completed, the reconstructed files are checked against the full MD5 checksum listed for them. While this makes for absolute confidence in the repair, it also means that the files have to be MD5 checked from start to finish a second time, which can be a slow process with large source files. The check should be a formality, as the PAR2 parse and repair process verifies every block within the file at its specified location.

However, there is always the possibility of a bad memory error or I/O glitch. If the file is flagged as bad on a secondary test, assuming the original PAR/PAR2 data was constructed correctly the entire dataset should be internally reset automatically and all output files should be retested for valid data. Whatever repair files have already been downloaded will be reused and any new ones needed to complete a second repair will be requested normally.

If processor speed or lack of machine memory is an issue then the speed of repairs and this extra check may be problematic. It may be advisable to turn off the repair functionality and use an external utility in this case. If verification is allowed, however, then yDecode will still be able to assess the state of repair of sets and request whatever repair files are required.


Splitfile processing

If a verification set is completed, all the files that are referenced are checked to see if the sequence constitutes a set (or sets) of split files. These are produced by utilities such as MasterSplitter, JAS or HJSplit and can be recognised by a numbered suffix sequence: .001 .002 .003 etc.

In the standard format produced by all these utilities, all the files are produced by slicing a base file (any file of any type) into equal data segments of a specified size, so that they are all the same size except for the very last, which is smaller and contains any remainder data. There is also often a small .000 file which contains information about the split set but no actual split data. The original file is reconstructed by simply concatenating (joining) all numbered files from .001 to the last in order.

yDecode works out when a split set is complete and will automatically join the set when it has all elements from the first to the last. It decides if it has the last by comparing the size of the split files and checking if the last-numbered in the set is smaller than the rest. If not, it will NOT join the set. It will also attempt to read the last file info from a .000 file. Remember, it will ONLY join a set when all files are verified!

This allows different verification files to reference different sections of a complete split set, for instance one file might verify parts 1 to 10 and others might provide verification for parts 11 to 55. As one file will be completed before the others, the incomplete split information is saved until the others are finished and the complete split set is fully verified.

yDecode does this by means of a .SPLT file almost identical to the .DUE used for UUE sets. It is also similar in format to the .DYC file used to track yEnc multipart completion, but rather than showing data ranges still missing it shows completed ranges, as number pairs separated by a comma on separate lines. Single number entries rather than a range MUST be followed by a comma (see UUE example above).

It is saved in the same folder as the corresponding split files so that as long as the filter rules are not changed it is updated automatically when any new split files are saved and verified there. A sample .SPLT file might appear like this:

1,10
21,30
51,55
Last=55
In this example, splits 11 to 20 and 31 to 50 are still missing (they may be already saved in the folder, but the sets that verify them have not been completed yet so the files are not considered fully verified). yDecode has also worked out that split 55 is the last in the set. When it has a complete run saved and verified from .001 to .055, it will automatically join this set and then delete the .SPLT file.

As splits are only processed for fully verified files, there is no bad file info here. Only completely verified files are listed in these .SPLT files.

ALTERING THESE .SPLT FILES WILL CAUSE AUTO-CONCATENATION OF SPLITS TO FAIL! However, it might be useful to add the "Last=nn" information to the end in case yDecode is unable to confirm it, perhaps because the source filesize was an exact multiple of the split size, to allow completion of the process.

yDecode can be made to optionally delete the split files too when a split set is fully joined. This means that after downloading and verifying, all that would remain on your hard drive would be the original file that the newsgroup poster uploaded (we hope!) without any further intervention by you. As yDecode will ONLY join splits when it is totally satisfied with the verification of every single constituent file, using the best verification data provided (PAR/PAR2 by preference), this is a fairly safe option.

As noted in the section on known issues, this version of yDecode is limited by the Visual Basic file I/O subsystem, which only enables file access up to sizes of 2Gb. yDecode therefore calculates the final joined size of a split and refuses to concatenate when it would exceed this limit. It should still be possible to join in an external utility.

Finally, remember that yDecode will only examine and process splits that it finds in the filelists of verification files that it decodes, and that it can therefore verify. It will take no action at all on binary split files that it decodes that are not referenced in any verification file, and will merely route them according to the filter rules applying to the base file.

You are therefore encouraged to always download (and demand) appropriate verification for your Usenet binaries, PAR2 by preference for maximum confidence in your downloads. Good posters should always provide this.


Overwrite and deletion policy

yDecode always has to check for the existence of a same-named file in the output folder when it is about to save a decoded file. If it finds one, it decides whether to rename the existing file by comparing it against all open verification sets, to see if it has been previously verified by any.

If the file is already verified as good, then it will not be renamed but the new file will be. If it was detected as bad (correct filename but incompletely decoded, a bad CRC or bad MD5) then yDecode will delete this bad file to allow the new file to overwrite it, as it is most likely a requested replacement.

Files containing some good verified PAR2 block data will not be overwritten as they are renamed for safety by the PAR2 set. They have the (hopefully) unique prefix "-P2-" added to them and are automatically deleted when they have been used to complete repair. Completely verified PAR2 source files needing no repair will be left intact.

Otherwise, the existing file will be renamed in favour of the new one. The format of the new filename is <old filename>(copy##)<old extension> where # is a digit. Previous copies will not be overwritten, the copy number is simply the next available digit. It will be up to the user to eventually decide which version might be correct!

When verifying files with PAR2, yDecode considers any file matching a limited number of patterns of the correct filename to be a potential match. Any file containing the "root" part of a set name (before any suffix) will be matched and checked for valid block data. This means that .BAT and .000 or other small files, if inadvertently downloaded, will therefore be checked, as their "root" will match the set "root".

It is yDecode's policy that any such name match that does not yield a single good block of data must be completely corrupt, as a name match that close should be valid, so it deletes the file. This means that these small redundant files, frequently found with binaries split by certain splitter utilities, will be automatically removed if they have not been made part of the recovery set, as they will contain no valid set blocks.

They are not required anyway, as yDecode manages the split rejoining process perfectly without them. Downloading executable .BAT files from Usenet, however reasonable their provenance, is in any case extremely unsafe, therefore this is definitely in the best interest of the user! Because of reported issues possibly due to this feature, however, I am considering adding control over this deletion in future releases.

Finally, on exit yDecode deletes all decoded files that have been detected as corrupt during that session. (They are kept for reference while running along with all so-far unmatched files.) There is no value to keeping files definitely verified as bad, so this is a valid clean-up procedure. If there was any good PAR2 data in them, or they were recovered by PARs, they will have been replaced or renamed in any case.


16/08/06 - ian(at)i-asm.com