nettime's_dusty_archivist on Mon, 20 Mar 2000 07:38:59 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> The Breaking of Cyber Patrol® 4 [part 1 of 2] |
[orig from <http://hem.passagen.se/eddy1/reveng/cp4/cp4break.html>] [part 1 of 2] [2000-03-11] The Breaking of Cyber Patrol® 4 Written by Eddy L O Jansson and Matthew Skala "Cyber Patrol has sophisticated anti-hacker security." -- Cyber Patrol readme.txt Abstract Several attacks are presented on the "sophisticated anti-hacker security" features of Cyber Patrol® 4, a "censorware" product intended to prevent users from accessing Internet content considered harmful. Motivations, tools, and methods are discussed for reverse engineering in general and reverse engineering of censorware in particular. The encryption of the configuration and data files is reversed, as are the password hash functions. File formats are documented, with commentary. Excerpts from the list of blocked sites are presented and commented upon. A package of source code and binaries implementing the attacks is included. Table of contents 1. Introduction 2. Metodus operandi 3. Overview 4. Analysis of cp4crypt() 5. Cryptanalysis of cp4hash() 6. Files, formats and the URL database 1. Structure tables Observations 1. Rogue deinstallation Source and binaries 1. CPHack documentation Conclusion 1. Thanks References 1 Introduction The market is full of "parental control" applications, and as has been shown in earlier essays [DFR98, DFR99], they are mostly of low technical quality. When we now look at Cyber Patrol (v4.03.005), we find that this trend continues. Some things are a little better than what we're used to from products like CyberSitter and NetNanny, but mostly, things are the same. We will begin by presenting our goals for the evening, with a small digression on the "how" and "why" in a section we call Metodus operandi, followed by a quick overview of our target for the day. We start getting technical as we do a quick presentation of a cipher found in the target. After that little snack we move on to the main course, something a little bit more complicated, the cryptanalysis of a hash function. At this point we have completed the first goal, and so we move on to take a closer look at how CP manages its URL database. Putting the technicalities behind us we make some observations about the product as a whole. As we take our last bite out of the target, we treat ourselves to a dessert; some source & binaries to drive our point home. All good things must end, so also this pleasant evening. We draw our conclusions, say our goodbyes and part with only a handful of references to follow up on. Let the festivities begin! 2 Metodus operandi People often ask "How can I learn to do these things?". The answer is the same as for all other things you want to learn: you must study, and you must practice. This, of course, was not the answer the questioner was hoping for. There is a certain amount of "magic" in learning this crude form of reverse engineering. There are no books to learn from, and most of the material on the web is quite bad. There are some excellent essays from some excellent practitioners out there, but while the knowledge possessed by the authors is often great, their presentation is often sub-par, which can make the material inaccessible to the beginner. There's really no substitute for practice, but what you must have is a firm ground to stand on. This means good knowledge of binary arithmetic and of the platform you are to work with, both the processor and the operating system. While these things can actually be learned by means of books and classes, most people are probably self-taught in these matters. It's been said that to be good at creating ciphers, you must be good at breaking them. Similarly, we'd like to say that to be good at reverse engineering, you must be a good programmer. So, if you haven't already, learn a language or two. Now, let's say you have all the background knowledge you need. Then you need some tools. You will need a good debugger. There are a couple out there, but for the Win32 platform most people are using the one included in their development environment, such as the one in Microsoft's Visual Studio, or one of NuMega's offerings, of which the crown jewel is the systems debugger SoftIce. SoftIce has been the tool of choice of crackers and device-driver developers for over half a decade now, and it is a very competent product. In addition to a couple of programming languages and a good debugger, you will want a good hex-editor. Here the choices are plenty, at least on the "WinDos" platform. One good hex-editor which is often mentioned is HIEW. It's a favourite primarily because it supports some disassembly functions and because many of us feel comfortable in the Norton Commander-esque interface. Speaking of disassembly, you will probably want a good disassembler, and here there is really only one product worth mentioning by name, and that is IDA. There're quite a few other disassemblers out there, but IDA - which disassembles ELF executables too by the way - is the most advanced and most competent of them. In addition to the big three, there are some other utilities that you might want to add to your toolbox. We're thinking mainly of such tools as regmon and filemon by SysInternals. Add in a little paper, a good pencil and possibly a calculator, and you are all set. Now, this is not meant as an essay for teaching real beginners about reverse engineering, cryptanalysis or any such matters, but feedback from the essay on NetNanny indicates that the pieces on how the reversal was done were really appreciated by you readers, and so we will try to incorporate some of that into this essay too. Hopefully, those of you not interested in these matters will find it easy skipping the sections in question. Let's start from the beginning. Before we even install a product we must have some set of goals we want to achieve. For Cyber Patrol the goal was to break the authentication scheme and to extract the URL database, documenting the structures in the progress, thus facilitating interoperability. These constitute practical goals. You will also find less pragmatic goals for the launching of an attack, such as the inquisitive desire to learn the internals of someone else's product, the thrill of doing something you are not supposed to be able to do, and the recognition you might gain for being the first one to explore unchartered territory. We can call these goals of personal gratification. More interesting for the majority of people are probably the political goals, to expose any hidden agenda that might be lurking behind the product and to fuel the discussion around it, in this case the discussion around censorware. For us, the primary motivation has been the possible political implications. With the goals firmly set in mind, we begin our work to achieve them. 3 Overview Installation is straightforward. You will note, however, that you are not asked to supply an installation path. This is a typical example of producers taking the easy way out. Rather than going through with the little extra bit of effort, they chose to take the easy route - by forcing all their customers to install the software into C:\PATROL no matter what. Now, before we speak some more on how we can achieve our goals, let's go on a short tour of the program. For reference, here's a screenshot of the main interface. As can be seen, a large part of the main interface is devoted to time management. For each day in the week you can - with a 30 minute granularity - control the hours in which a user is allowed to use the Internet. You can set the maximum amount of time "online" allowed per day and calendar week. To the upper right, you'll find a panel for controlling the filters in Cyber Patrol. It's fairly straightforward, but let's run through the alternatives anyway. IRC Chat Filters on keywords that are not allowed to be part of the channel name. ChatGard Lets you specify things that are never to be allowed to be transmitted over the Internet, such as your address, phone number and the like. The clipboard will be monitored too. The "Carlin-7" mentioned are shit, piss, fuck, cunt, cocksucker, mother-fucker, and tits. See also [ACLU96] WWW, FTP & Other This is where you add any additional URLs you want to filter, or allow, as the case may be. News This screen is virtually identical to the "WWW, FTP & Other" one, but here you can define any newsgroups you want to filter. You can also choose to apply the IRC keyword filters to newsgroup names. Games & Applications Here you can specify up to sixteen 16-bit windows applications that should not be allowed to be run. Not very useful if you're running a 32-bit operating system though. Also in the main interface, under "categories" the following categories are available for filtering: 1. Violence / Profanity 2. Partial Nudity 3. Full Nudity 4. Sexual Acts / Text 5. Gross Depictions / Text 6. Intolerance 7. Satanic or Cult 8. Drugs / Drug Culture 9. Militant / Extremist 10. Sex Education 11. Questionable / Illegal & Gambling 12. Alcohol & Tobacco 13. Reserved 4 14. Reserved 3 15. Reserved 2 16. Reserved 1 It is, as we will show in the section on file formats and the URL database, not a coincidence that there are sixteen categories in this list. The last four entries, named "reserved", are greyed out and cannot be toggled on or off by the administrator. In addition, "Reserved 4" and "Reserved 3" are selected, and thus cannot be unselected. We'll comment more on this later, but rest assured that an opportunity for foul play lurks right there. Now, let's review our goals. First, we want to break the authentication, so let's talk about that. There are three levels of access in Cyber Patrol. You could be the administrator, which gives you full access to both the Internet, naturally, and to administering CP. Below the administrator is a "deputy". A deputy is someone who's given full access to the Internet, bypassing CP, but is not allowed to do administration. Everyone else is simply a "user" and must abide by whatever filtering and rules CP enforces. After installing CP it will install its hooks and remain loaded in the background. To disable it you must authenticate against it. If you want more than one person to have administrative privileges, then you must give them the administrator password; similarly for deputies. As the administrator you can add users, which is simply a matter of assigning a password to go with the username and then setting up the restrictions you want to apply to him or her. A user can then authenticate against CP by right-clicking on the CP icon in the taskbar and select his or her name from a list of all users configured. After you've chosen your username you'll be prompted for the password. It's generally acknowledged that giving out account names like this is bad security since it makes it easier to attack them, for example by guessing likely passwords. Now, to break the authentication we must simply follow the flow from where we enter our password, to where the valid passwords are retrieved and compared against ours. Or, so goes the theory... 4 Analysis of cp4crypt() Real life isn't nearly as neat and perfect as our theory would have it. In order to break the authentication scheme, i.e. extract the passwords, we must first locate them. The obvious way, and so our theory says, is to trace the path from entry down to comparison. In the case where retrieval was done earlier and thus is not in the path, we can actually backtrack once we've found out what our password is compared to. We would do that by setting a breakpoint on the memory occupied by the data our password was compared against, then - by restarting the application - gaining access to the point where that data is loaded. This, however, was not the path we took in this reversal. We were a little bit less methodical, but maybe a little bit more intuitive. We quite simply disassembled the target and browsed it for suspicious code. Using the debugger to locate something like this can be very effective in terms of time spent, but in the grander scheme of things you will want to spend some time browsing the disassembly just to get a feel for the program. The debugger is great for following a specific path and to learn exactly what gets loaded into every register, but it's not a very effective way of learning to know the program more generally, because it's hard to track different code paths with it. So there we were, browsing our disassembly of CP.EXE for suspicious code. So what constitutes suspicious code, anyway? This of course depends on the context and the type of application, but as a general rule you will want to slow down when you encounter xors used in any other way than with the same register as both source and destination, as in xor eax, eax, or against an immediate value not a power of two. We're looking for clusters of bit-manipulation instructions; shifts, ands, ors, xors, rotations and compact loops. These are all signs that you might be dealing with some sort of encryption routine. So, lazily browsing the code was how we ended up in a routine we call cp4crypt(), which we will now examine in detail. In reality we're speaking of two different routines, one doing the encryption and one doing decryption. We'll use the term cp4crypt to mean these two collectively, or the cipher algorithm they express. The first thing one wonders when one finds a suspicious routine is "For what is this routine used?". An important question, and to answer it we use our next most powerful tool, the debugger (the most powerful tool of course being your brain, and don't ever let anyone tell you otherwise). By inserting the special opcode 0xCC, or INT3 as it is also known, into the suspicious code, we can make the debugger active when the routine is run. Typically you will want to enter this into the entry-code of the routine. To give you an idea of how "suspicious" code can look, here is the piece of code we found: 4D6D:0025 loc_4D6D_25: 4D6D:0025 66 8B CA mov ecx, edx 4D6D:0028 4D6D:0028 loc_4D6D_28: 4D6D:0028 D0 C8 ror al, 1 4D6D:002A 67 32 03 xor al, [ebx] 4D6D:002D 67 88 03 mov [ebx], al 4D6D:0030 66 43 inc ebx 4D6D:0032 67 E2 F3 loopd loc_4D6D_28 4D6D:0035 66 2B DA sub ebx, edx 4D6D:0038 FE CC dec ah 4D6D:003A 75 E9 jnz loc_4D6D_25 The sequence of instructions that start with the ror (rotate right), followed by xor reg,mem (which is really a memory fetch) and then a memory store in mov [ebx],al triggers our interest. We also notice the two tight loops. This is a school book example of Intel x86 encryption/decryption code, similar in complexity to the things you might find in old DOS viruses. For those of you unfamiliar with the architecture it should be noted that this particular Intel instruction set is read with the destination to the left and the source to the right, so the xor operation fetches the byte addressed by ("pointed to by") the register ebx and xors that byte with the value in the register al, and then stores the result back into al. The section of code above is a decryption routine. When the whole thing is translated into a higher level pseudo-code it reads something like this: key = ctsize & 0x000000FF do twice { for each ctbyte in file { rotate key right ptbyte = ctbyte ^ key key = ptbyte } } Algorithm in English: * First the key is initialized to be the low eight bits of the ciphertexts size. * Then we make two passes in which we: * Rotate (not shift) the key one bit to the right. * The key is exclusively-or'ed to the current ciphertext byte to give the corresponding plaintext byte * The key is set to be the now decrypted byte There are numerous issues with this algorithm which make it very weak. First of all we have the key size, which - weighing in at only eight bits - is, shall we say, very short of being impressive. Of course, since we have the code for deriving the key (that ctsize & 0x000000FF operation) a longer key wouldn't be much use anyway. Obviously this was never intended to be strong, but even so, someone, some where, spent time developing this scheme. There are certainly simpler ways of doing weak encryption than this, but this is actually a perfect example of the kind of homebrew encryption you are likely to find in commercial applications. To continue, the key update schedule is weak in that the key only depends on the previous byte. This means that plaintext repetitions longer than the number of rounds used will show through in the ciphertext. The number of rounds used by CP, as described above, is two. A competent cryptographer should be able to break this in mere hours (and by breaking, we mean recovering the plaintext and with it the algorithm), having access only to ciphertext. In real life we also have some assumed plaintext in that the file 'cyberp.ini' is encrypted using this scheme. Such a file probably starts with "[" or ";" and ends with 0x0d,0x0a, to mention but a few hooks. The use of two rounds strengthens the cipher somewhat from a ciphertext only attack, but... this is academia, not really relevant to our real life situation, in which we don't have to recover the algorithm by means of cryptanalysis of ciphertext. We can simply locate and translate the algorithm as it is implemented in the target. Much easier, much faster, though maybe a little bit less fun and rewarding. We introduced you to the decryption code because it's somewhat simpler to explain compared to the encryption code. The reason for this is that we must, in effect, encrypt the buffer backwards, and we must also treat it as a circular buffer, and even then we need to do a fixup at the end to key the encrypted data. Because of these complications we'll present some code that is a little closer to an actual implementation than pure pseudo-code: for i = BufferSize-1 downto 1 do { key = Buf[i-1] rotate key right Buf[i] = Buf[i] ^ key } Now, the buffer must be seen as circular due to the way the key (which is set to the previously decrypted byte when decrypting) carries over from one round to the next. It's possible to write this into the loop above, but in our case that would only make the explanation more difficult to follow. Anyway, after the loop we connect the two ends of the buffer: key = Buf[BufferSize-1] rotate key right Buf[0] = Buf[0] ^ key After that little tie-in we do another round of the backward encrypting loop above, after which we must do one final fixup for the first byte by keying the whole buffer with the lower eight bits of the buffer's size. key = BufferSize & 0x000000FF; rotate key right Buf[0] = Buf[0] ^ key There, now Buf[] will be encrypted. See the section on sources and binaries if you want to look at an actual implementation of these algorithms. Now, for the question we posed, for what is this encryption used? cp4crypt() is used to encrypt some of Cyber Patrol's data files, such as the file cyberp.ini which contains configuration and user information, and also the administrator and deputy passwords. This cipher also protects the raw URL database, i.e. the files cyber.not and cyber.yes. It is also used on the file user.lst which contains some configuration information, most notably any additional URLs the local administrator(s) may have added. Let's talk about the passwords. The cyberp.ini contains a main section, "Cyber Patrol", under which the two passwords are stored in the keys "HQ PWD" and "DEPUTY PWD". The data of both these keys is encoded as a hexadecimal string representing eight bytes, or 64 bits if you prefer. It can look something like this: HQ PWD=3AD6AF0CB33D8A87 DEPUTY PWD=9A3740C7019A5AA1 The deputy password is in fact the password encrypted using cp4crypt(), so it is a simple matter of decoding the hex-string into binary and then decrypting the string using the key 0x08, the maximum length of a password, and voilá; instant unrestricted access to the Internet. Are you impressed? We're impressed. This file also contains any additional users that have been configured. First the main section contain keys of the form "UserNameNN=name" where NN is a number. Associated with these keys are sections of the form "[UserNN]", which contains not only that users configuration but also his password (in the key "password"), again, encrypted using cp4crypt(). But here things become interesting, because the administrator password is not encrypted in this way, the data for the administrator password is in fact a 64-bit hash value. Analysis of the hash function follows. 5 Cryptanalysis of cp4hash() Throughout this section we'll use vaguely C-like notation, with some of the extensions common in cryptographic circles. In particular: * = is assignment * == is test for equality * ^ is bitwise XOR * | is bitwise OR * & is bitwise AND * ^=, |=, and &= are assignment variations of the above, just like in C * << and >> are unsigned shift left and right * <<< and >>> are bit rotate * ** means exponentiation (2**4 is two to the fourth power, which is sixteen) All numbers are unsigned, and bits are numbered from 0 (least significant). When we get to the mathematical part, we'd ideally like to use some notation that doesn't map into ASCII. There we'll use reasonable approximations: * a[1], a[2], etc. for subscripts on variables... not the same as array indexing, but you can think of it as being like array indexing. * a = b (mod n) for "a is congruent to b modulo n"; it'd be nicer to use the proper math "congruent" symbol, which is like an equals sign with three bars. Cyber Patrol uses a technique called "hashing" for its HQ password. Instead of simply storing the password in its configuration files, it processes it through a "hash function" to produce a code called a "hash", and it stores the result. The hash function is supposed to have these properties: * The same password will always produce the same hash. * Two different passwords will not produce the same hash. * Given the hash, it's impossible to determine the password that produced it. These properties guarantee that the program can recognize the correct password when it's typed, but people like ourselves can't determine the password when we reverse engineer the program, even if we obtain the hash. In practice, even a very strong hash function cannot guarantee these properties perfectly - for instance, we could always try all possible passwords until we found the one that gives the correct hash. If the hash is shorter than the input password, then it's absolutely guaranteed that some two inputs will have the same hash. But strong hash functions do exist, which come close enough to the theoretical ideal for most purposes. Cyber Patrol's hash functions are not like that. The deputy password isn't even truly hashed; it is encrypted with the cp4crypt() routine already discussed. The hash function for the Cyber Patrol HQ password looks like this in pseudocode: (1) hash1 = 0 (2) hash2 = 0 (3) for every character in the input: (4) input character |= 0x20 (5) hash1 = (hash1>>8) ^ table[(hash1 & 0xFF) ^ (input character)] (6) hash2 = (hash2<<<5) + (input character) - hash1 (7) end for (8) result is hash2 concatenated with hash1 (64-bits) The first thing we notice here is that every input character gets bit 5 set in line (4). That seems to be case conversion - it forces all alphabetic input characters to lowercase. It has some other effects as well (making certain punctuation characters equivalent to each other) but if the passwords are expected to always be alphanumeric, that shouldn't make much difference. One significant issue (which we'll revisit later) is that it gives attackers a bit of known plaintext: we know that bit 5 of every input character is 1. We can also guess that bit 7 of every input character is 0, since it's tricky to type characters with that bit set to 1 on an ordinary keyboard. Next let's look at line (5). This one may seem a little confusing: we shift 8 bits out of hash1, look them up in a table (using the input to screw with the index of the table), and XOR the result back into hash1. The table, if you look at it, seems at first glance to be random garbage. Here we just have to depend on our experience to enable us to recognize the code: it happens that this code is a standard idiom for hash calculation. To be precise, it's a 32-bit linear feedback shift register, also known as a CRC (cyclic redundancy check). This kind of algorithm is a traditional way of checking data integrity in things like file transfers and compressed archive files. Most serious programmers have written it, or something very much like it, before. The definitive explanation of CRCs is in [RNW93]. Most people who have to write CRC code consult that file for instructions; it's a reasonable bet that Microsystems based their code on it. Line (5) of our pseudocode is pretty much verbatim from one of the examples in that document. The code in Cyber Patrol isn't even just any CRC - examination of the table shows that except for a nonstandard initialization, it is the de facto standard CRC32 algorithm, used in ZModem, PKZip, Ethernet, and FDDI (among other places). CRC32 has proven to be a very successful error detection measure in these systems. Because it provides pretty good bit diffusion (i.e. changing one bit in the input changes lots of bits in the output) CRC32 is also a reasonable hash function just as a hash function. If you were building a hash table data structure, CRC32 wouldn't be a bad way to generate your indices. It is not, however, a cryptographically strong hash function suitable for password hashing. CRC32 has fatal cryptographic holes - it not only provides no security, but also gives us a lot of hints for breaking the rest of the hash. We'll show how to exploit the holes below. Before we do, let's look at hash2, the other half of the 64-bit hash. This seems to be something homemade, created by throwing together several different techniques in an effort to provide cryptographic strength. That isn't a good way to build cryptographic systems, but in this particular case, it seems to have resulted in something that is at least better than the CRC half of the hash. There are three basic parts to the hash2 calculation: rotate the previous value, mix in the input character, and mix in the current value of hash1. The rotation gives us a little bit of what cryptographers call avalanche - bit changes get shifted around to eventually affect most of the word. Using addition and subtraction instead of XOR is important because the carry bits allow some information to flow from one bit position to the next. The designers are treating hash1 as a random number, and hoping to confuse things by mixing it in at every stage. It's not really very random at all, but combined with the carry bits and the rotation, there is a little bit of security here. In fact, in our attacks we haven't bothered to do much to hash2, cryptographically speaking; the results on hash1 are strong enough that we can get away with ignoring most of the holes that may exist in hash2. We might well speculate about the reasons for designing the hash function this way. It looks like Microsystems chose the CRC32 algorithm because they knew it was a standard algorithm, and then they knew that 32 bits wasn't enough for a cryptographically strong algorithm, so they added some stuff. They knew that bit rotations are popular in cryptographic algorithms, and they knew that addition and XOR are popular, so they put in some of each. If they knew more than a little about how CRC32 worked, they'd have used a 64-bit CRC instead of extending the 32-bit version with homebrew stuff. If they understood the security implications of CRC32, they wouldn't have used it at all. So the general pattern here is that the pieces are there, but they aren't understood nor put together systematically. If you have a copy of the Jargon File[JRG00], the entry for "cargo cult programming" might be educational. Now, let's talk about cryptographic strength. Strength of cryptographic algorithms is measured by the order of magnitude of the fastest or most space-consuming attack. An algorithm is strong if the best attack takes too much time or space to be practical; it's also preferable that the best attack be a straightforward brute-force attack, because any other attack indicates a security hole which might easily become worse as the theory of cryptanalysis improves. An attack's difficulty is the amount of time or space it requires, whichever is worse, typically measured as a power of two (like "2**64"). This table is designed to give a very rough idea of the scale of the numbers involved. The exact time to attack a system will depend very much on the system, the optimizations, the hardware, etc. (what computer scientists call "constant factors"); as a result, the stuff in this table could well be off by a factor of a thousand or more in either direction. Where we're going, that isn't going to matter. Difficulty Resources required 2**0 Pencil and paper 2**8 Pencil and paper and a lot of patience 2**16 Warez d00d with a Commodore 64 2**32 Amateur with an average PC 2**40 Smart amateur with a good PC 2**56 Network of workstations, custom-built hardware 2**64 Thousands of PCs working together for several years 2**80 NSA, Microsoft, the Illuminati 2**128 Cubic kilometers of sci-fi nanotech 2**160 Dyson spheres Some real-life examples: the EFF's "Deep Crack" machine uses custom VLSI to solve a problem of difficulty 2**56 in a few days. The distributed.net RC5 effort is attacking a problem of difficulty 2**64 and expects to finish within a few years. The maximum keyspace of symmetric cryptography exportable without a license from the United States, before they made the rules more complicated recently, was 2**40. Standard practice in most crypto circles is to use symmetric encryption keyspaces of 2**112 or 2**128. Those aren't directly comparable with the cryptographic hash sizes we're talking about, but the point here is the general size of the numbers involved. There are two ways in which you could break a cryptographic hash function: you could find an input that will produce a specified output, or you could find two inputs that produce the same output. For a hash with n bits of output, assuming it's cryptographically perfect, the difficulty of finding an input for a specified output is 2**n time, negligible space, and to find two different inputs with the same output, 2**(n/2) time and space. So, the two basic attacks we could apply to the HQ password (assuming it was perfect) are: 1. Brute-force search: try inputs until one gives the right output, finds an input for a chosen output in 2**64 time and 2**0 space. 2. Birthday Paradox attack: try inputs, remembering their hashes, until we find two with the same output. Finds such a pair in 2**32 time and space. The prompting code in Cyber Patrol limits password inputs to 8 characters, and we bash them to lower case and can assume the high bits are zero, so there are in fact only 48 bits of unknown input to the hash. That reduces the time for a brute-force search: 3. Brute-force search with assumptions about input: finds password for a given hash, in 2**48 time, 2**0 space. That's starting to get into the range where it's feasible to attack. In fact, this attack lends itself to a time/space tradeoff because the hash is unsalted. Let's talk about Unix and salt for a minute. The Unix operating system uses a similar concept of password hashing. It doesn't store the user's password in the password file - only a hash (based on the DES algorithm) of the password. If it were a pure hash of the password, then a determined cracker could compile a codebook of all possible passwords and their hashes, and just look up each hash in the list. Also, the cracker could go down a list of users and recognize who had the same password (because they'd have the same hash) even without knowing what the password was. To prevent such attacks, Unix has a concept of "salt". Before hashing the password, the system chooses a random number called the salt. It then hashes the salt along with the password, and stores the salt and the resulting hash in the password file. Now a given password can hash to a lot of different values, depending on the salt. The attacker can't compile a codebook, or at least the codebook would have to be a lot bigger, because its size is multiplied by the number of possible salt values. The attacker can't identify pairs of users with the same password, because they'll probably have different salts and so different hashes. Cyber Patrol doesn't use any salt. As a result, it's vulnerable to a codebook attack, which as you can see is really just a time/space tradeoff of the brute force search above: 4. Codebook attack: 2**48 time and space to compile the codebook once and for all, 2**0 time to look up each password. All those attacks apply to any hash function with this amount of input and output, no matter how cryptographically strong it may be; to avoid them, Microsystems would have to use a longer, stronger hash function. We like SHA1; it's got a 160-bit output and no known attacks better than brute force, so it takes 2**160 time to attack by brute force search, and 2**80 time and space for the much weaker birthday paradox attack. Of course, the prompting routines in Cyber Patrol would have to be modified to accept longer keys; with 48-bit passwords, about all they could do would be add salt, and pray. The first cryptographic hole that's apparent in the Cyber Patrol hash is that the per-character processing is bijective, for a given input character. What that means is that if we know the input character and the output, we can be absolutely sure of what the state was before that character. All we have to do is reverse the steps; we won't dwell on how to do that here because we have better things to do, but it's not hard to figure out. The bijective nature of the hash function makes it possible to do a meet-in-the-middle attack. To find an input for a given output, we hash about 2**32 guesses for the first half and store them. Then we hash backwards from the known output, about 2**32 guesses for the second half of the input. After doing that amount of work we expect to find about one pair of a first half that goes to an intermediate value with a second half that comes from the same intermediate value, and then we've broken the hash. This is about the same amount of work as the classic birthday paradox attack, but it allows us to find an input for a chosen output instead of just two inputs for the same output. It's enough to indicate that the Cyber Patrol hash is a lot weaker than a theoretical perfect cryptographic hash of the same size. But as we'll soon see, we have much stronger attacks on it. 5. Meet-in-the-middle attack: input for a chosen output in 2**32 time and space. Before we can talk about the difference between CRC32 and cryptographic hash functions, we have to learn some math. Let's talk about modulo arithmetic. Suppose you have an equation involving integers, addition, subtraction, and multiplication, like this: 12 + (34 * 56) = 1916 In modulo arithmetic, we have a number called the modulus, and we say that any two numbers are considered equal (or "congruent") if they have the same remainder when you divide them by the modulus, or in other words, if they differ by exactly a multiple of the modulus. It's an amazing fact that if you take any equation like the one above and replace all the numbers with their remainders when you divide by some modulus, you get a congruence that works. For instance, let's take all the numbers modulo 9; we can do that by adding up all the digits, and then if the result is two digits or more, adding up those, etc. This is the old accountant's trick of checking computations by "casting out nines": 12 + (34 * 56) = 1916 <-- equation 3 + (7 * 2) = 17 = 8 (mod 9) <-- congruence Note that 17 (or 8, which is the same thing) is what you get if you take the digits in 1916 and add them up. All the numbers in the mod 9 congruence come from adding up the digits in the equation above. In modulo n arithmetic, there are really only n numbers: 0 up to n-1. Anything else is congruent to one of those. To understand CRC32 we're going to use the second most boring kind of modulo arithmetic: modulo 2. In our mod 2 universe there are only two numbers, 0 and 1. Are you starting to see how this could be relevant? Incidentally, the very most boring kind of modulo arithmetic is mod 1; everything equals zero. Anyway, let's look at the addition and multiplication tables for modulo 2: + 0 1 * 0 1 0 0 1 0 0 0 1 1 0 1 0 1 Notice anything? The addition table is the same thing as what we like to call ^ (XOR) and the multiplication table is the same thing as what we like to call & (AND). This connection is important because it means we can take programs which do bit twiddling, and apply the mathematics of addition and multiplication to figure out what's going on. The CRC32 algorithm simulates a linear feedback shift register. We won't try to draw the diagram here, but the concept it simple: you shift bits out of the register one at a time, and you have a mask that's as long as the register, and you XOR the stream of bits you shift out with the stream of input bits, and every time the result is 1, you XOR the mask into the register. Why this is useful is best explained in [RNW93]; we'll just take it for granted that in communications error checking, this is a sensible thing to do. The 8-bit shift, and the magic table, are just a clever way of doing a whole byte at once. The result is the same as if you did the bits one at a time. What's important cryptographically is that after you've performed one cycle, every bit is either equal to the bit next to it (if its mask position was 0), or it's equal to the XOR of the bit next to it, the bit that just got shifted out, and the input bit (if its mask position was 1). Let's say we have a short register, four bits long, with the mask value 1011. Let a[0] to a[3] be the previous state, b[0] through b[3] be the next state, and x be the input bit. Then we have: b[3] = x + a[0] (mod 2) b[2] = a[3] (mod 2) b[1] = x + a[0] + a[2] (mod 2) b[0] = x + a[0] + a[1] (mod 2) Notice how we're using the + symbol to denote bit XOR; in modulo 2, it's the same thing. The most important thing to notice here is that every output bit is a XOR of some combination of the input bits. If you take two output bits and XOR them together, you find that the result is still simply a XOR of some of the input bits: b[1] = x + a[0] + a[2] (mod 2) b[0] = x + a[0] + a[1] (mod 2) b[1] + b[0] = x + x + a[0] + a[0] + a[1] + a[2] (mod 2) = a[1] + a[2] (mod 2) Note that we can get rid of all the terms that are repeated twice or an even number of times, because those are like multiplied by an even number, and all even numbers are equal to zero in modulo 2 arithmetic. OK, it takes some getting used to, but it really is easy once you see it. With the CRC32 algorithm, there are lots more variables than in that example, but this same general property holds: the CRC32 bits are each the modulo 2 sum of some combination of the input bits. Actually, in strictly correct CRC32 it's a little more complicated because the register doesn't start at 0, so you have to XOR in some extra bits that depend on the length of the input. But fortunately for our discussion, Cyber Patrol uses a nonstandard variant of CRC32 which initializes the register to 0. We can figure out a congruence for each output bit by feeding inputs into CRC32 which consist of only one bit set to 1 and all other bits set to 0. If an output bit is set for that input, then the input bit that was 1 is included in the congruence for that output bit. Once we've tested all the input bits this way, we have the entire congruences. To save space on the page, we will write out our congruences with all the variables included, multiplied by 0 or 1, like this: 0*b[0]+0*b[1]+0*b[2]+1*b[3] = 1*x+1*a[0]+0*a[1]+0*a[2]+0*a[3] (mod 2) 0*b[0]+0*b[1]+1*b[2]+0*b[3] = 0*x+0*a[0]+0*a[1]+0*a[2]+1*a[3] (mod 2) 0*b[0]+1*b[1]+0*b[2]+0*b[3] = 1*x+1*a[0]+0*a[1]+1*a[2]+0*a[3] (mod 2) 1*b[0]+0*b[1]+0*b[2]+0*b[3] = 1*x+1*a[0]+1*a[1]+0*a[2]+0*a[3] (mod 2) and then leave out everything except the coefficients (the 0 or 1 we multiply by): 000111000 001000001 010011010 100011100 That block of bits has all the information from the complete set of congruences. It's what mathematicians call a "matrix", but there's no need to think too hard about what that means; just call it an abbreviated set of congruences. Each row is equivalent to a congruence, and we can add congruences to each other (the same way we did earlier) by XORing one row into another. If we shuffle the rows around and XOR them enough, we can solve for whatever variables we want - it's just like the simultaneous equations we learned in high school. For example, if we desperately want to know the value of a[1] in the example, we can say: b[3] = x + a[0] (mod 2) b[0] = x + a[0] + a[1] (mod 2) b[0] + b[3] = a[1] (mod 2) or in our abbreviated notation: 000111000 + 100011100 = 100100100 We wrote code to figure out the set of congruences for the CRC32 algorithm as implemented in Cyber Patrol, using all 32 output bits and 96 input bits (12 characters, which is overkill, but bits are cheap). Here's the resulting matrix. Input bits come first, first bits 0 to 7 of the last byte, then bits 0 to 7 of the second-to-last byte, and so on; after that are the 32 outputs, again in order from 0 to 31. (This ordering of bits was easiest to program; it doesn't really matter.) 0010000010001011100000001111101100010010110110000111001001011010100011110100000 1010111111110000010000000000000000000000000000000 1001000001000101110000000111110100001001011011000011100110101101110001111010000 0001011111111000001000000000000000000000000000000 1100100000100010111000001011111000000100101101101001110011010110011000111101000 0000101111111100000100000000000000000000000000000 0110010000010001011100000101111100000010010110110100111011101011001100011110100 0000010111111110000010000000000000000000000000000 1011001000001000101110000010111110000001001011011010011111110101000110001111010 0000001010111111000001000000000000000000000000000 0101100100000100110111001001011111000000100101101101001101111010000011001111101 0000000101011111100000100000000000000000000000000 0000110010001001011011101011000001110010000100110001101101100111100010010011110 0110111101011111100000010000000000000000000000000 1000011001000100001101110101100010111001100010011000110110110011010001000001111 0111011110101111100000001000000000000000000000000 0100001110100010000110111010110011011100110001001100011001011001001000101000111 1111101110010111100000000100000000000000000000000 0000000101011010100011011010110101111100101110101001000101110110000111101000011 0101001000111011100000000010000000000000000000000 0010000000100110010001101010110100101100000001010011101001100001100000000000001 0100011011101101100000000001000000000000000000000 0001000000010011101000110101011010010110000000101001110100110000010000001000000 1110001101110110100000000000100000000000000000000 1000100010001001010100010010101101001011100000010100111000011000101000000100000 0111000110111011000000000000010000000000000000000 1100010011000100101010001001010110100101010000000010011100001100010100001010000 0011100010011101100000000000001000000000000000000 0110001001100010110101001100101001010010101000000001001100000110001010001101000 0101110001001110100000000000000100000000000000000 0011000100110001011010100110010100101001110100000000100100000011000101000110100 0110111000100111000000000000000010000000000000000 1011100010010011001101010100100100000110001100001111011001011011100001010111010 1001100011100011100000000000000001000000000000000 1101110011001001100110100010010000000011000110001111101110101101110000101011101 0100110000110001100000000000000000100000000000000 1110111001100100010011011001001000000001100011001111110101010110011000010101110 1110011001011000100000000000000000010000000000000 0111011110110010001001101100100100000000110001100111111010101011101100000010111 0111001100101100000000000000000000001000000000000 0001101111010010000100111001111100010010101110111100110100001111110101110101011 0001011000100110000000000000000000000100000000000 0010110101100010000010011011010010011011000001011001010011011101111001000110101 0010010011100011000000000000000000000010000000000 0011011000111010100001000010000111011111110110101011100000110100111111011111010 0011110110000001100000000000000000000001000000000 0001101100011101110000101001000001101111011011010101110010011010011111101111101 0101111011000000100000000000000000000000100000000 1010110110000101111000010011001110100101111011100101110000010111101100001011110 0100000011010000000000000000000000000000010000000 1111011001001001011100000110001001000000101011111101110001010001110101111001111 1000111111011000000000000000000000000000001000000 1111101100100100001110000011000110100000010101111110111010101000111010111100111 1000011110101100000000000000000000000000000100000 0101110110011001000111001110001111000010111100110000010110001110011110101010011 0010110001100110000000000000000000000000000010000 1000111011000111000011101000101011110011001000010111000000011101101100100001001 0011100111000011000000000000000000000000000001000 1100011101100011000001111100010111111001000100001011100000001110010110011000100 1001110010100001100000000000000000000000000000100 1100001100111010000000110001100101101110110100000010111011011101001000111000010 1110000110100000100000000000000000000000000000010 0100000100010110000000011111011100100101101100001110010110110100000111101000001 1101111101100000000000000000000000000000000000001 Notice the diagonal stripe at the right-hand side of the matrix. That stripe reflects the fact that we have one congruence for each output bit, in order. Now, by flipping and XORing rows (the exact technique we used is called Gauss-Jordan elimination, which is just a formalization of the "system of equations" technique you do in high school) we shuffled that stripe over to the left, to get a set of congruences that tells us the last bits of the input based on the output and the other input bits: 1000010000000100000001000000000100000101000001000111000010001111110110110011001 0110110100111101101100110001011010100100100001101 0100010000000000000000010000000000000000000110110001110111000010100100000110111 1011011010101110011111110010010010000110001000100 0010000000000100000000000000010100001001000001101101101001001010010000100100110 1001001001011110010000011100000011111001100111000 0001000000000001000001000000000100000100000001000100000110111110101100010101011 1111100111010101111000111010000000101010010010110 0000100000000001000001000000010000001000000110101011100000011000111110011001101 0110110001111000111111110001010111001010001010001 0000011100000100000000010000010000000000000110001011101001101101110000000011001 1001111011100001110111010000000100100100011001111 0000010110000100000001000000010100001000000000000111100110010101100111011100101 0010001101111100111101010100000111011101000010011 0000000101000000000001000000010000000000000111111110111111010001010000110111111 0111010100110111111010001100000000111101001001000 0000000000100000000000010000010100000000000101011100110000111000010100010100001 1101010101001110001011101110111000011100111000111 0000000000010000000001010000000100000100000000111101001101100001001101011101111 1110111010001011101100010001100101001101100100110 0000000100001100000001010000000100000100000101111110111001110100100110010001011 1000111111101101010110011000110000110111100101010 0000000100000011000000010000010000000100000001001111000110110100011111000100000 1110110110110001101100100100011011111110101010111 0000010100000000100001000000000000000001000100111011010100001011101001010010010 0101111001110111111011100011101100111100110000111 0000000100000100010000010000010000000101000110110011001100011100001010110100100 1001110100110011011011111010111100101100100110101 0000000100000000001000010000010000001001000001011111000110101000111000001111001 1000001011010110000011011000110110000101111100000 0000000100000000000100000000010000001000000000010111111000110110011100110010010 0101010111111111011110101010100011001000111111111 0000000000000100000011010000010000001001000110100100011011111000100111001000001 0001001111011000010000000100000011000110010101110 0000000000000101000000100000010000001101000110011100001101111001011001100101111 0000010000011101111110000001101011010100011001001 0000010000000001000001011000000000001001000000001000011101011110000010000011010 1101110001100000111010111011101100000001000110110 0000000000000001000000000100000100001101000111000011010110110101110111111000100 0111010001000010001000101010101001111100001001110 0000010000000101000000010010000000001000000011110100011000111010001000000001111 0111001011001110010011000011000100110100011000010 0000010100000000000000010001010100001100000101011111011111000111100011100011010 0010001001001101110001101001101000001110101011010 0000010000000100000000000000110000000001000011010110101011001001001000111010111 0010110000101110011100111101000011110011111111110 0000000000000001000001000000011100001100000100001110111101111010110000010010000 1001111100011100011101001000011010010101111101100 0000000000000101000000010000000110000001000010101001011000011001011001000011000 1010011001111011110010011001001000110111110111101 0000000100000000000000000000000001000001000101100000000111110111001001011011000 0111001011011010010101010101100010100001000100110 0000010000000000000000010000010100101001000100101000011001101111101010011101111 1111011111100010000111101001011100100101101011110 0000000100000000000000000000010000011000000100101101110101100000111001010010011 0001101101100111010101100000110111000011100101110 0000000100000001000000000000010000000011000011110001111111110000100010100100101 1011010100101010011101000111001011100111000100010 0000000000000001000001000000010100000000100110011000000111001010101100110011001 0001001010101111100111111000000001110100001111011 0000000100000100000000010000010000000100010000011011111010110001010101111111001 1101010110001000000010000110001110100000001010100 0000010100000000000000000000000100000000001100010011000101101010011001010010100 1110100000000100110010100101000010110110100011101 The diagonal isn't perfect here for a couple of reasons. One, we didn't solve for bits 5 and 7 of any input byte, because we don't have a choice for the values of those bits. We can't allow the congruences to force impossible values on them. Two, we found that some pairs of input bits are actually equivalent to each other. There's no way to solve for both of such a pair - you just have to choose a value for one or the other of them. But the point is that from this "solved" or "inverted" matrix, we have a set of congruences where we can choose all but 32 bits (including all the output bits of the CRC) and then just by evaluating the congruences, we get the values for the other input bits to make the whole thing work. This is why CRC32 is not a cryptographic algorithm: anyone who knows a little matrix algebra can generate inputs for a chosen output in minimal time. We extracted the columns from our solved matrix and put them into tables in cph1_rev.c. The reverse_crc() function takes parameters specifying the input length, some choices for the "free" input bits, and the desired output value. Then it fills in the "forced" bits for the chosen input length, and goes through its tables XORing together all the columns that correspond to 1 bits in the partial input. Then the result contains all the values for the rest of the input bits. It unpacks them and returns the completed input. The bit packing order is also in tables. Some checking of the result may be necessary because if you ask for a very short CRC input, there may well be no such input for the output you requested. Now we can attack the Cyber Patrol hash just as if it were a strong hash with only 32 bits of output: 6. CRC-breaking Birthday Paradox attack: choose an output for the CRC32, generate inputs with that CRC32 value, save their hash values, until you have two the same. Finds such a pair in 2**16 time and space. 7. CRC-breaking brute force: choose an output for the CRC32, generate inputs with that CRC32 value until you find one with the hash you want. Finds an input for a chosen output in 2**32 time and 2**0 space. But because we know that the input password is only 8 ASCII characters, bashed into lowercase by the OR with 0x20, we can improve the attack still further. With 8 input characters there are 48 bits unknown, and of those, we can derive 32 from the set of congruences. That leaves us only 16 bits unknown. We can simply test all possible values for those 16 bits, by brute force. This is the attack implemented by cph1_rev.c. 8. CRC-breaking brute force, with assumptions about the input: finds input for a chosen output (assuming it really came from an 8-character ASCII password) in 2**16 time, 2**0 space. Shorter passwords take much less time, so we can test all shorter passwords first, without increasing the cost noticeably. Our code for this attack isn't particularly optimized, and we haven't attacked the rotate-and-add part of the hash at all, but even so we can now reverse the function in a fraction of a second. At this point it's appropriate to categorize the Cyber Patrol HQ password hash function as blown wide open, thus fulfilling our first goal. 6 File formats and the URL database The primary function of Cyber Patrol is to do filtering based on a database of URLs that are not allowed. CP can also be configured to work with the inverse policy, denying everything but a small set of allowed URLs, a very strict policy. The source of the URLs that are denied is a file called cyber.not. Conversely, the file cyber.yes is used for the "allowed" URLs. Currently the not file is about 600Kb, with the yes file only about 50Kb. Roughly, this means there are less than one tenth as many allowed URLs as there are denied ones. These files are actually processed into another format and file called cyber.bin for use by the different Cyber Patrol modules. We will not discuss the format of this file since the raw files are much more suited to our needs. Before we broke the encryption on the cyber.not file, we did some short-range correlation measurements on it, to check for polyalphabetic encryption (like the classic "simple XOR" algorithm). The idea here was to measure, for various values of k, how often the byte at offset x was identical to the byte at offset x+k. We used a simple Perl script to compile these statistics. We found that very often, after some byte value would occur, the same byte was repeated again six bytes (or, somewhat less often, seven bytes) later. That would point towards some kind of modified polyalphabetic cipher with a key length that switched between six and seven bytes, or some very weak cipher that didn't conceal repeats in the plaintext, and record sizes of six and seven bytes in the plaintext. It turned out to be the latter, but we didn't find that out by cryptanalysis. Reverse engineering yielded the answers more easily, as described earlier. However, those six- and seven-byte repeats seemed interesting, especially once we had decrypted cyber.not, reversed its header format, and split it out into its component tables. The first table we figured out was the last one in the file, the table of blocked newsgroups. When reversing a file format it makes sense to start at the easy end, and with the cyber.not file that means the table used for newsgroup masks. What made this the obvious starting location is that it's very obvious - from a fast ocular inspection - what it is for. A section might look something like this (some unprintable characters replaced by question marks): 00065A90: 00 41 00 02-00 00 00 07-02 00 68 38-0B F7 07 11 A ? ?? h8?½?? 00065AA0: 00 FA BD 29-B8 00 5F 0A-28 07 1E 00-45 44 53 44 ¢)© _ (?? EDSD 00065AB0: 22 00 04 43-6F 70 79 72-69 67 68 74-2E 4D 69 63 " ?Copyright.Mic 00065AC0: 72 6F 73 79-73 74 65 6D-73 2E 53 6F-66 74 77 61 rosystems.Softwa 00065AD0: 72 65 0D 60-00 61 2E 62-73 75 2E 72-65 6C 2A 0B re ` a.bsu.rel*? 00065AE0: 06 00 61 62-74 2E 6D 61-6D 2A 0B 0E-00 61 62 67 ? abt.mam*?? abg 00065AF0: 2E 74 65 73-2A 0E 0E 00-61 63 61 64-69 61 2E 62 .tes*?? acadia.b 00065B00: 75 6C 2A 07-00 08 61 6C-63 2A 0C 08-04 61 6C 74 ul*? ?alc*???alt 00065B10: 2E 32 33 69-73 2A 0A 09-04 61 6C 74-2E 32 36 2A .23is* ?alt.26* Some quick looking around would show that "ED" marks "End-of-Data" and consequently "SD" ought to mark "Start-of-Data". The first thing to do is try and see what constrains a record. Either there's a length count, a special marker, or the records are fixed length. They are not fixed length, that much is apparent, but they all end in an asterisk, so maybe that is a record terminator? Could be, but the first string, the "Copyright..." ends with 0x0d. To make sure, we must see if we can find a length marker, a common way to handle variable sized records, and also the preferred method in many ways. Now, because there are three "non-character" bytes between SD and the first string, we try this pattern on the other strings, and it checks out. Thus, we can guess that a record starts with three bytes, followed by a variable number of characters. A way to verify this result is to look at the end of the table, to see what kind of data ends the last record. Now, for the length, that would have to be one of the three bytes, so it shouldn't be too hard to locate, if indeed it exists. The first record starts '22 00 04' and then the string. The length of the string is more than zero or four, and definitely less than 1024 (0x400). By counting we see that the string is 31 characters, or 0x1F in hex. Very close to 0x22, and where could the difference come from? But of course, the three prefix bytes. This leaves one or two bytes per record unaccounted for. How many depend on how large the length specifier is. Looking around we find examples of records in which the second byte of the records is something besides zero, which would make the length way too long, and thus we can conclude that we are working with a length byte and not a word. Now them, what about the remaining two unaccounted bytes? As you may remember from the overview we mentioned that there are sixteen different categories available for filtering. The two remaining bytes are actually a blocking mask - a word whose bits maps to those sixteen categories. We call these records TNotNewsEntries. Having found the start and end, and thereby the size, of one table mean we have some usable information for reversing the next-easiest part of the file, the header. Our header looks like this in binary: 00000000: FC 00 2A 00-43 48 00 00-00 00 03 01-03 00 49 00 * CH ??? I 00000010: F8 34 00 00-B6 25 06 00-41 00 2C 00-00 00 CC 34 4 ¬%? A , ¶4 00000020: 00 00 4E 00-AE 5A 06 00-B8 25 00 00-53 44 80 53 N ´Z? ©% SD«S 00000030: 28 0F 02 80-53 28 10 80-53 28 11 D1-01 1C DE 01 (§?«S(?«S(???Ã? Seeing how "SD" marks the start of data, we can calculate the size of the header as 0x2C bytes, meaning the first table (of which there are at least two, one being the table for newsgroup masks) ought to start at 0x2C, the location of the "SD" marker. Experience tells us that headers often contains pointers (offsets), lengths, and counts (of records). Let's see if we can match any of those types into the header. We begin to look for pointers into the different tables. We know that the first one start at 0x2C (the "SD" marker, remember?) and it just so happens that we have two viable candidates, first one at offset 0x02 and one at 0x1A. The first one is off by two, so it's a little less likely to be the one we want. Too little to go on, but we also know where the newsgroup table starts: 0x65AAE. Wouldn't you know, there it is at offset 0x24 of the header. Now, let's see if we can find any lengths in there. We begin by measuring the size of the newsgroup table. It runs from 0x65AAE to the end of the file at 0x68065. Subtract and you get 0x25B8, and there it is in the header, following the offset. We can then verify our findings by checking them against the record of the table starting at 0x2C. Now we have a little more to work with, so we try and determine the size and the location of the first of the headers table-records. We know that one of the table entries end at 0x23 at the latest, because that is where we have our offset to the newsgroups table. We can also be sure that offsets 0x1E through 0x21 belong to the other table, because those bytes contain the length of the first table (whose contents are so far unknown). That leaves only two bytes between them. Implicit in this is a limit to the size of a record, because we know that the newsgroup entry ends at 0x2B at the latest, so it can run from 0x22 through 0x2B ( ten bytes) or 0x24 though 0x2B (eight bytes). And therein lies the answer to the question of the record's length, because we can see that the newsgroup record ends with the length field, and for that to hold the other record should too, which would orphan the two bytes between them. In this way we can come to know that a record is in fact ten bytes, consisting of two bytes (meaning unknown), a longword (table offset) and a longword (table length). Working backwards we can fill out the entries as this: 0x004E, 0x0065AAE, 0x000025B8 0x0041, 0x000002C, 0x000034CC 0x0049, 0x00034F8, 0x000625B6 [cont.] # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: [email protected] and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: [email protected]