nettime's_dusty_archivist on Mon, 20 Mar 2000 07:32:44 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> The Breaking of Cyber Patrol® 4 [part 2 of 2] |
[orig from <http://hem.passagen.se/eddy1/reveng/cp4/cp4break.html>] [part 2 of 2] If we were to go another step back we would get a record like this: 0x4348, 0x0000000, 0x00030103 This clashes with the structure as we know it, and so we assume that there are only three records, the data before them having some other structure. Looking, again backwards, we notice that the word following the first table entry is 0x0003, which could mean that it's a count of the number of tables, right? By checking against another file with the same structure, the hotlist.not, we could see that this assumption was correct. The little bit left of the header is not as important as locating the table entries and their count, but it seems like the 0x2A at offset 0x02 is the header size, assuming the header starts at 0x02 and the two bytes in front of it being not related to it. The "CH" seems to be a marker, the hotlist.not contains "HH" instead. Without more files to compare to, or time-consuming debugging of the executable, the few bytes left unaccounted for will remain a "mystery". We learned several important things from the newsgroups list. First, Microsystems likes putting length bytes on things. Second, the blocking mask 0x000E (corresponding to "Partial Nudity", "Full Nudity", and "Sexual Acts / Text") is the most popular one. It appears that that's the generic "porn" label which they slap on everything that looks like it might be porn, whether it technically applies or not. Both these facts were useful in attacking the other two tables in cyber.not. The first table mentioned in the header is the biggest one. At over half a megabyte, it makes up most of the bulk of the cyber.not file. As our previous measurements indicated, this table includes a lot of repeats at a distance of six or seven bytes. Character frequency counts revealed that the top three characters in table 1 are: 1. 0x00 (106280 times) 2. 0x0E (65483 times) 3. 0x07 (25212 times) We know that they like using blocking mask 0x000E, and the bytes making up that number are the top two most frequent bytes in the table. We know they like length bytes, we know there's some kind of structure in here with a size of seven bytes, and 0x07 is the third most frequent byte value. This looks promising. Let's look at a hex dump. This dump was generated with the Linux od -Ax -txC command; offsets are from the start of table 1 as specified in the cyber.not header. 000000 53 44 0a 00 03 c7 00 00 07 0e 00 99 37 55 67 00 000010 0a 0a 0a 0a 0e 0c 0b 67 73 76 00 00 07 0e 00 51 000020 b1 f1 6d 00 0c 0a 79 c8 0e 00 0c 0a 9e 09 00 00 000030 0b 01 00 89 84 e0 4e 55 9e 53 d8 00 0c 0a bd 05 000040 00 00 07 0e 00 71 aa 8a 2a 00 0c 0b b8 18 00 00 000050 0b 08 00 ea 1e da d8 d4 fc d4 20 00 0c 0b b8 1a 000060 00 00 07 00 04 e0 3d c1 be 07 08 00 7b 75 fd b7 000070 07 00 04 87 0b 1e ef 00 0c 0b b8 1f 0e 00 0c 0b 000080 b8 2b 08 00 0c 0b b8 2c 0e 00 0c 0b b8 36 08 00 000090 0c 0d 78 02 00 00 07 0e 00 13 53 03 e2 00 0c 0d 0000a0 79 06 00 04 0c 0d ab 97 00 00 07 06 00 31 75 fc 0000b0 80 00 0c 0d 13 5a 0e 00 0c 0e c7 33 0e 00 0c 0e 0000c0 c8 02 00 00 07 0e 00 22 39 82 eb 00 0c 0e e1 0d 0000d0 00 00 07 01 00 0d b0 59 21 00 0c 0e e8 32 00 00 0000e0 07 20 00 7c d3 df f8 00 0c 0f 87 cd 00 00 07 0e 0000f0 00 88 35 ae 33 00 0c 0f c1 72 0e 00 0c 10 a0 d8 This may appear quite formidable to someone unaccustomed to reading hex dumps, but careful examination reveals some interesting things. First of all, the sequence "0e 00" occurs quite frequently. It's reasonable to suppose that that might be the blocking mask for a page or site. Another common one is "07 0e 00". When that one occurs, there are often four more bytes and then those three again. These patterns are easier to see when one examines more of the dump than the short sample here. It's reasonable to guess that the 07 is a length byte, just like in the newsgroup list. But that doesn't explain why we get so many repeats at distance six. The byte value 0x06 is only the 39th most common value in table 1, even though there are far more repeats at distance six than seven. So not everything can be tagged with a length byte, or there's something else we don't understand. Further skimming of the hex dump revealed inspirational passages like this one: 037b50 5c b7 08 6f 00 cf cc ae 13 0e 00 cf cc ae c8 0e 037b60 00 cf cc ae c9 0e 00 cf cc ae ca 0e 00 cf cc ae 037b70 cc 0e 00 cf cc ae cd 0e 00 cf cc ae d0 0e 00 cf 037b80 cc ae 15 0e 00 cf cc ae d8 0e 00 cf cc ae 16 0e 037b90 00 cf cc ae 18 0e 00 cf cc ae 1b 0e 00 cf cc ae 037ba0 1d 0e 00 cf cc ae 1e 0e 00 cf cc ae 1f 0e 00 cf 037bb0 cc ae 21 1e 00 cf cc ae 23 0e 00 cf cc ae 24 0e 037bc0 00 cf cc ae 27 0e 00 cf cc ae 28 0e 00 cf cc ae 037bd0 30 0e 00 cf cc 13 ea 0e 00 cf cc c4 c4 0e 00 cf 037be0 cc d0 a0 0e 00 cf cc d0 f8 0e 00 cf cc d2 64 1f 037bf0 00 cf cc d2 a0 00 00 07 0e 00 e5 b0 e3 10 00 cf 037c00 cc d2 18 0e 00 cf cc d2 19 0e 00 cf cc d2 1e 0e The pattern may be clearer if we look at the bytes six at a time: 037b54 00 cf cc ae 13 0e 037b5a 00 cf cc ae c8 0e 037b60 00 cf cc ae c9 0e 037b66 00 cf cc ae ca 0e 037b6c 00 cf cc ae cc 0e 037b72 00 cf cc ae cd 0e 037b78 00 cf cc ae d0 0e 037b7e 00 cf cc ae 15 0e 037b84 00 cf cc ae d8 0e 037b8a 00 cf cc ae 16 0e 037b90 00 cf cc ae 18 0e 037b96 00 cf cc ae 1b 0e 037b9c 00 cf cc ae 1d 0e 037ba2 00 cf cc ae 1e 0e 037ba8 00 cf cc ae 1f 0e 037bae 00 cf cc ae 21 1e 037bb4 00 cf cc ae 23 0e 037bba 00 cf cc ae 24 0e 037bc0 00 cf cc ae 27 0e 037bc6 00 cf cc ae 28 0e 037bcc 00 cf cc ae 30 0e Here we've obviously got our generic porn mask of 0x000E, alternating with four unknown bytes, the last of which often seems to be incrementing - but not always. Scanning across the table, we saw that when this kind of six-byte structure occurred, the four mystery bytes seemed to more or less increment smoothly from the start of the table to the end. But it was always the last byte that incremented first, and then the second-to-last, and so on. In other words, the field is being stored in "big endian" byte order, the exact opposite of the "little endian" byte order conventional on PCs. Why would a PC software package bother doing something in big endian when it's running on a CPU designed for little endian? At this point we had to depend on intuition. There is one thing that's 32 bits long and big endian everywhere, even on a PC: that is an IP address. Some computers like big endian and some like little endian, but it is standard for all Internet protocols to use big endian regardless of what kind of system they're running on - so that they'll all be able to talk to each other. An added bit of evidence is that the actual values of this four-byte field seem to be distributed the way one would expect IP addresses to be distributed. Lots of them start with bytes like 0xCF, which puts them right in the popular part of the Class C IP address space. So, let's write the decimal equivalents of the supposed IP addresses next to the hex dump: 037b54 00 cf cc ae 13 0e 207.204.174.19 037b5a 00 cf cc ae c8 0e 207.204.174.200 037b60 00 cf cc ae c9 0e 207.204.174.201 037b66 00 cf cc ae ca 0e 207.204.174.202 037b6c 00 cf cc ae cc 0e 207.204.174.204 037b72 00 cf cc ae cd 0e 207.204.174.205 037b78 00 cf cc ae d0 0e 207.204.174.208 037b7e 00 cf cc ae 15 0e 207.204.174.21 037b84 00 cf cc ae d8 0e 207.204.174.216 037b8a 00 cf cc ae 16 0e 207.204.174.22 037b90 00 cf cc ae 18 0e 207.204.174.24 037b96 00 cf cc ae 1b 0e 207.204.174.27 037b9c 00 cf cc ae 1d 0e 207.204.174.29 037ba2 00 cf cc ae 1e 0e 207.204.174.30 037ba8 00 cf cc ae 1f 0e 207.204.174.31 037bae 00 cf cc ae 21 1e 207.204.174.33 037bb4 00 cf cc ae 23 0e 207.204.174.35 037bba 00 cf cc ae 24 0e 207.204.174.36 037bc0 00 cf cc ae 27 0e 207.204.174.39 037bc6 00 cf cc ae 28 0e 207.204.174.40 037bcc 00 cf cc ae 30 0e 207.204.174.42 Notice that these are not in numerical order; 216 is not normally considered to come between 21 and 22. However, considered as decimal representations, these addresses are in strict alphabetical order. This list is the kind of thing you might get if you took a text list of URLs and passed it through a sort utility designed for text. A little examination reveals that these six-byte structures in table 1 are strictly in this "text IP" order across the entire table. As a final confirmation that these numbers are intended to represent IP addresses, just point a Web browser to a few. Almost all are porn sites. At this point we had figured out that there were a lot of blocking masks interspersed with IP addresses in the table, and also a lot of seven-byte structures starting with a length byte and a blocking mask. But the remaining four bytes of those seven-byte structures were apparently not sorted, nor IP addresses, and there were still some bytes that didn't fit into either kind of structure. So we wrote a Perl program to dump out the known structures and label the unknown parts. The next step was simply to stare at the output and look for patterns. We saw that the six-byte and seven-byte records often occurred in blocks of lots of the same kind all together. The unknown part often seemed to consist of the byte 0x0B followed by a blocking mask and eight bytes of garbage. We guessed that that might be a third record type, so we added it to the dumper program, and noticed that the remaining unknown sequences often seemed to consist of 0x0F, a blocking mask, and then twelve bytes of garbage. From this we inferred a general pattern: a length byte (always 3 plus a multiple of 4), a blocking mask, and then some amount of garbage, always a multiple of four bytes. Between this and the six-byte IP/mask pattern, almost all the contents of table 1 fit some kind of structure. But there were still a bunch of zero bytes hanging around. A reasonable guess was that these signalled some kind of "end of structure" condition. It only took a little more intuition to realise that of the "length byte" records and the "IP address" records, one logically went inside the other. Unfortunately, we guessed that the "IP address" records went inside the "length byte" records, and that confused us for quite a while. Here's part of the output from our dumping program at this stage: 07 0E 00 0F 25 6B BF 07 0E 00 C8 87 B1 C1 (0501)(0800)(0800)(0000) 0B 02 00 B9 53 9A 71 6A BE 88 54 0B 00 08 B9 53 9A 71 3D 5F E2 F4 0B 00 08 B9 53 9A 71 38 16 1A 41 0B 08 00 B9 53 9A 71 07 B3 CA 02 (000E)(0000) 07 08 00 2F 31 2A 45 (000E)(000E)(0000) 07 0E 00 37 71 0F 71 (000E)(000E)(0008)(0000) 0B 01 00 88 B4 92 0E A6 53 2E 7F (000E)(0000) 07 98 04 08 B0 DD FB 07 08 00 0F E8 F5 82 (0000) 07 09 00 4F DE 86 ED (0000) 07 0E 00 79 1F 36 41 07 0E 00 63 C8 51 C4 (0000) 07 02 00 0A E2 34 93 (000E)(0000) 07 08 00 31 2D E5 BA (000A)(000E)(0800)(0020)(0000) In this dump, the four-digit numbers in parentheses are abbreviations for "IP address" records, showing only the blocking mask part. We had already figured out, although it's a break with the tradition set elsewhere in the file, that in the six-byte IP address records, the blocking mask comes at the end instead of the start. Not shown in this dump is the enormous variability in the number of IP addresses apparently associated to each "length byte record"; some had dozens, many had none at all. Also, although it looks okay in this fragment, there's a critical problem of how to recognize which records are which. The dumping program would guess what looked like a plausible IP address, but it sometimes guessed wrong and produced junk until it happened to randomly re-synchronize. It appeared that IP records with a blocking mask of 0x0000 helped signal "OK, length byte records coming now", and a length byte of 0x00 (not shown here) signalled the start of a list of IP address records, but these things raised problems because it appeared that in a list of IP addresses, there would always be one more address than there were blocking masks. Where would the blocking mask for the last IP address come from? Late one night, under the influence of a couple bowls of MSG-saturated Korean instant noodles ("kimchee" flavour), we realised what we should have seen all along. The "IP address" records are actually the major records, and the other records go inside them, as children of a parent IP address. This makes more logical sense, given the purpose of the file; the package blocks either an entire IP address, or one or more subsections of an IP address. Then the rest of the structure fell out easily. The basic record contains an IP address and a blocking mask. If the blocking mask is nonzero, it applies to that entire IP address. If the blocking mask is zero, then there are a number of subrecords, each consisting of a length byte, a blocking mask, and one or more four-byte unknown fields. A length byte of 0x00 terminates the list of subrecords and signals a new IP address. Now, what about those subrecords? Well, they obviously represent some kind of subdivision of an IP address - like, for instance, a directory full of Web pages. Here's an entry from table 1, decoded by a more sophisticated Perl program that also incorporated reverse lookups of the IP addresses: 207.34.139.253 (pii300.bc1.com): 000E D2A152F4 23AC865E 0002 D2A152F4 9ECA24AB 000E D2A152F4 4337DDA1 001E D2A152F4 F1909EA3 000E D2A152F4 8532C8E2 This particular entry stood out partly because bc1.com is an ISP local to one of us. We have friends with pages on that system (although not, as far as we could tell, at the particular URLs blocked by Cyber Patrol). It also stood out because all the subrecords start with the same four-byte sequence. That's a pattern that appears in lots of other entries, too; there will often be a site where several subrecords start with the same four-byte sequence. Here's a good example (it's long, so we've left out part): 158.43.192.14 (twister.dial.pipex.net): [...] 000E 86AC9240 000E 4603712B 0002 D7E769CA 001E 0B01848F 000E 8A1266F1 000E 6DA218B8 957FF449 607AB5ED 000E 6DA218B8 957FF449 E90B0308 000E 6DA218B8 957FF449 D5D0798C 0002 6DA218B8 6A96D698 5F78E699 000E 6DA218B8 6A96D698 CCA4ED77 000E 6DA218B8 118AA2D3 5B69B41C 000E 6DA218B8 3CEC7FA9 48E41B10 000E 6DA218B8 3CEC7FA9 09ED716A 001E 6DA218B8 9B826D61 9BEC198D 000E 6DA218B8 9B826D61 8EF51A8C 000E 6DA218B8 1A7E65EE 8E16AE15 Notice how the four-byte values seem to be grouped together in an hierarchical structure. Just like directories... It seemed a reasonable guess that in fact, that's what they were. If they wanted to block a URL like http://www.foo.com/bar/baz/, maybe they'd do it by creating a record with the IP address of www.foo.com, and a subrecord with some representation of the strings "bar" and "baz". We said "some representation of the strings". What, exactly, does that mean? Well, it would be quite reasonable to suppose that these four-byte fields are hashes, similar in nature to the password hashes. They could feed each URL component into a hash function, store only the hashes, and then have enhanced security as well as various efficiency advantages. We figured out the exact nature of the hash function with the aid of the bc1.com entry. As you can see above, every subrecord for that server starts with the hash value 0xD2A152F4. If you look on the corresponding Web site, you find that it's an ISP's server for user home pages, all of which are stored in a "users" subdirectory. And it just so happens that in the nonstandard CRC32 variant that was used as half of the HQ password hash, the hash of the string "users" is 0xD2A152F4. Problem solved. We've designated this structure TNotURLEntry. Above we explain the cryptanalysis of CRC32 in considerable detail, and we show how to construct, in negligible time, an input that will generate any output of our choice. As with the passwords, Cyber Patrol doesn't use any salt for its URL hashes, so we can recognize where there are duplicate directory names even without reversing the hashes, and get extra value for each hash we reverse because the same reversal will be valid for all other occurrences of that hash. Unfortunately, there is what might be called an "information theoretic" problem with reversing these hashes. There are many possible directory names that could generate the same CRC. We can never be absolutely sure which of several equivalent (same CRC) URLs was actually meant to be blocked. In the case of the HQ password, we could use the other half of the hash output to recognize which one was correct, but here, that doesn't work. In a perverse way, shortening the hash has actually increased its security. But one good thing for us as attackers is that of the many possible strings, only a few will be meaningful. Given the choice between "sex" and "dkbgl~3.a7df", few would argue with our choice of "sex". For the small number of hashes which are hashes of very short strings, we can guess that the short strings are really correct - there are so few possible strings of five or fewer characters, that they're almost certainly right. But for most hash values, the CRC32 reversal isn't really very helpful. For any given hash it generates a long list of possibilities, most of which are garbage. Instead of sorting through them, we fell back on the old reliable dictionary attack. We took a list of words and hashed them all, and then started modifying them by tacking tildes onto the start (to make it look like user home directories), adding letters to the start and end, adding ".htm" and ".html" to the end, and so on. The source file "cndecode.c" implements this attack on the cyber.not file, as well as incorporating decryption code, some prettier output formatting, and (for systems where this works) reverse DNS lookups. It uses a hash table, and remembers the reversal of each hash for use on future occurrences of that hash, in an effort to be as efficient as is reasonable, although the prime emphasis was on expediency in programming over squeezing out the last CPU cycles. As a last resort, if it can't find a hash in the dictionary, the cndecode program goes through all the possible reverse-CRC values up to a configurable limit, assigning scores to them based on how plausible they seem, and then chooses the best. That takes a relatively long time (significant fraction of a second) per hash, and it doesn't really work very well, but it does catch a few that aren't caught by the dictionary attack. Here's a sample of the output: ************************************************************************ www1.iastate.edu = 129.186.1.22 0006 http://129.186.1.21/.wmdnl/ 000E http://129.186.1.21/~blak/ 0008 http://129.186.1.21/~cwhipple/ 0820 http://129.186.1.21/~ejackson/ 0010 http://129.186.1.21/~ipdpfid/ 0001 http://129.186.1.21/2kihan/ 000E http://129.186.1.21/~omega/ 0008 http://129.186.1.21/~roymeo/ 0800 http://129.186.1.21/s(ettk/ 0001 http://129.186.1.21/~thinker/ 0001: Violence / Profanity 0006: Partial Nudity, Full Nudity 0008: Sexual Acts / Text 000E: Partial Nudity, Full Nudity, Sexual Acts / Text 0010: Gross Depictions / Text 0800: Alcohol & Tobacco 0820: Intolerance, Alcohol & Tobacco ************************************************************************ As this shows, URLs tend to be sorted within a given IP address. The ones that aren't in sorted order are probably ones for which the reverse-CRC didn't guess the right reversal. A more sophisticated version might attempt to detect the sorted order, and force the reverse-CRC to choose a reversal which would fit into the sorted order, but the amount of work involved would probably be more than it's worth. This entry also shows something else we haven't talked about yet - "alias" IP addresses, which are the apparent purpose of the one remaining table in cyber.not. The structure can be seen in the TNotIPEntry. These aliases are just that. Each entry consists of a root IP and one or more aliases to that one. The root IP corresponds to entries in the URL table, and any resource banned under the root IP will also be banned under its aliases. These aliases may or may not resolve to the same machine; the assumption here is that these IPs are serving the same pages. Let's talk briefly about hash collisions. The chance that any two randomly chosen URL components will happen to have the same hash is one in 2**32, which is not very likely. This is true even with the uneven distribution of URLs, because CRC32 is a reasonably good hash just as a hash, for all its cryptographic weakness. So at first glance, it doesn't seem like there'll be a big problem of different URLs having the same hash. But the birthday paradox comes into play, too. With 2**32 possible hash values, there starts to be a serious chance of collisions as soon as the number of hashes gets past 2**16, which is 65536. It's certainly easy to imagine that a large ISP could have more than that many user home pages at the same location in their URL tree. Then two or more different sites would have the same URL as far as Cyber Patrol is concerned, and any block on one such page would hit the others. Given the current size of the Net and the size of cyber.not, there probably aren't any real examples of this kind of problem in the cyber.not file. But there is very little safety margin. A 64-bit hash would remove any suggestion of collision risks, at the cost of a considerable increase in filesize. Of course, using a 64-bit hash would improve our ability to attack the cyber.not file too, by reducing the number of possible URLs for each hash value. Remember how having the second half of the HQ password hash made it so much easier to unambiguously reverse the hash? Information theory makes this tradeoff unavoidable: the fewer possible collisions, the easier and more unambiguous dictionary attacks will necessarily become. Given that bytes in cyber.not are somewhat expensive (because the file has to be transferred to all the users in updates all the time), the choice of a 32-bit hash is probably reasonable, even though it has some small risk of creating false blocks. A more practical security measure would be to salt the URL hash. In the section on the HQ password we described how salting that hash would make dictionary attacks on the password much harder. With the URL hashes that becomes all the more significant, because with the URL hashes we aren't attacking just one hash value. We're attacking a few tens of thousands of hash values all at once. So anywhere we can recognize that two hashes are the same, that's a win, and any time we hash a dictionary word, we can easily check it against all the hash values in cyber.not all at once. If every URL in cyber.not had been hashed with a different salt value, then we would have to hash an entire dictionary for every URL instead of just hashing one dictionary for the entire file. That would raise our time for a dictionary attack from a few CPU minutes to a few CPU months - we could still do it, possibly by recruiting a network of volunteers to compute cooperatively, but not as easily as the present attacks. They wouldn't even need to make cyber.not any bigger to get the benefit of salted hashing - they could just use the offset of each URL in the cyber.not file as its salt value. Salt doesn't have to be random or secret, it just has to be different for each hash. They would also have to upgrade the hash function to one that isn't linear like CRC32; with CRC32, we could simply figure out the hash of the salt, XOR it out, and then have an unsalted hash to attack normally. A much more secure approach, which wouldn't make cyber.not any bigger, would be to take the offset and the URL, hash them together with SHA1, and then take the bottom 32 bits of the result. But even that wouldn't raise the difficulty of attack above the level of competent amateurs, and indeed, there is no way to make this kind of hashing scheme any more secure. There just aren't enough possible URLs on the Web; it's too easy for attackers to guess all possible URLs and test them to see which ones would be blocked. Unix sysadmins accept the fact that attackers can test passwords offline, and attempt to educate their users to choose hard-to-guess passwords, but censorware companies cannot ask all objectionable Web sites to choose hard-to-guess URLs. So they ultimately cannot defend themselves against this form of attack. With salt in the hashes, though, they could make it a lot harder for us. Next, the cyber.yes file contains "positive option" URLs; when the software is configured to its strictest setting, only these URLs will be permitted. There is also a list of newsgroups at the end that seems to be in identical format to the one in cyber.not. A quick scan of the decrypted file with a text lister showed that it's full of fragments of ASCII text, like this (dump generated, amusingly enough, by Richard E. Morris's good old DOS-based HEXEDIT program): 000880: 0B 01 00 7E 63 68 69 6E 6F 6F 6B 00 81 80 3D 11 |...~chinook...=.| 000890: 00 00 06 08 00 77 73 69 00 81 80 44 0A 00 00 15 |.....wsi...D....| 0008A0: 10 00 7E 77 61 6E 69 67 61 72 2F 73 70 61 63 65 |..~wanigar/space| 0008B0: 6C 69 6E 6B 00 81 0D 0A 64 00 00 10 09 00 7E 74 |link....d.....~t| 0008C0: 68 67 72 69 65 73 2F 64 69 73 63 00 81 89 C2 89 |hgries/disc.....| 0008D0: 00 02 81 89 21 25 02 40 81 0F 02 5A 00 00 07 40 |....!%[email protected]...@| 0008E0: 00 6F 75 70 64 19 48 00 7E 6E 77 73 2F 73 70 6F |.oupd.H.~nws/spo| 0008F0: 74 74 65 72 67 75 69 64 65 2E 68 74 6D 6C 00 81 |tterguide.html..| 000900: A4 28 6C 10 02 81 A4 28 DF 80 00 81 A4 28 E1 10 |.(l....(.....(..| 000910: 82 81 B1 0C 0C 00 00 0F 40 02 70 65 6F 70 6C 65 |[email protected]| These look like URL fragments, but they also look sort of haphazard. In fact we theorized at one point that they might be stray garbage from memory allocation calls. However, they do have a purpose, and once we had the format of the cyber.not file, the cyber.yes file became easy to figure out. The same correlation-counting program that we ran on cyber.not showed similar results on cyber.yes, with strong correlation at a distance of six characters, but unlike cyber.not, no sharp peak at seven characters. This suggested that the format for the main table in cyber.yes would be very similar to that of cyber.not. Examination of the hex dump showed similar stretches of six-byte repeats with a field incrementing in big endian. A little trial and error revealed that the format is essentially identical: records with IP addresses and two-byte "mask-like" fields. We say mask-like because it's not clear that they serve the same function as the mask fields in cyber.not. When the mask-like field is zero, there follows some number of variable-length URL records, terminated by a zero byte. There are two significant differences in the subrecord format. First, the URL is in plain text instead of being hashed. As a result, the variable length can assume a less restricted set of values. Second, the "mask" field appears to have a different significance. Here is a sample record from cyber.yes: 202.231.128.32: 0802 "home/dbec1" 5A8A "home/kazoo" 5A8A "home/kiboc" 5A8A "home/kimin" 5A8A "home/sanyohs" 7ACA "home/terada" 7AEA "home/tomoy" 7AEA "home/tomoyuki" 7BFA "home/ueno" 7BFA "home/warp" The hexadecimal column is the field that in cyber.not would be the blocking mask. Here, it's not clear what it is. It could be some kind of anti-blocking mask, of categories NOT to block, but then it's surprising that it would be in sorted order (a pattern that persists in other records too), especially when the URLs are also in alphabetical sorted order. Other possibilities for this field include some kind of time stamp, a serial number, an index pointer, an authentication token or hash, or random memory garbage. The "mask-like" fields on IP addresses similarly show little apparent design, except that (just as in cyber.not) a zero value indicates the presence of URL subrecords. The newsgroup list has mask-like fields too, and there's no immediately obvious meaning to the data in them. At this point we should note the overall file structure of cyber.yes. Unlike cyber.not which had an elaborate header, the header on cyber.yes consists of just three bytes: one version number (or possibly encryption key fixup), and two bytes giving the length of the URL table. We discovered this by working backwards from the URL table until we found that all the bytes in the file except the first three made sense as part of the URL table. The newsgroup list follows immediately after the URL table and continues until the end of the file, in the same format as the cyber.not newsgroup list except with unknown data where the blocking mask would go. Unlike the tables in cyber.not, both tables in cyber.yes are just bare data, with no "SD" and "ED" delimiters. This file structure is interesting because it seems stripped down or simplified from the structure of cyber.not. It would be reasonable to guess that the cyber.yes format was a quick hack retrofitted onto the product subsequent to the more carefully-designed cyber.not table. It's also possible that the cyber.not format proved too complicated and cyber.yes is an example of a "leaner and meaner" file format, still keeping to the same design principles as cyber.not and likely re-using a lot of code originally written for cyber.not. Following are the relevant structure tables. This concludes the section on reversing the file formats. 6.1 Structure tables TNotHeader Offset Size Description 0x0000 2 Filetype? (0x00FC) 0x0002 2 Header size (0x002A) 0x0004 2 Header id ('CH' or 'HH') 0x0006 2 unknown ( 00 00 ) 0x0008 2 unknown ( 00 00 ) 0x000A 2 unknown ( 03 01 ) 0x000C 2 Count of TNotHeaderEntries (0x0003) Immediately followed by one or more of these: TNotHeaderEntry Offset Size Description 0x0000 2 Table type ( 4x 00) 0x0002 4 Absolute offset 0x0006 4 Size (in bytes) The problem here is the Table Type field which we have too little data to fill in with any certainty. We can build the following table from the files we have analysed so far, built around the types that have occurred and the type of data they pointed to. TNotTableType Value Binary Description 0x0041 0100 0001 Points to TNotIPEntries in cyber.not 0x0047 0100 0111 Points to TNotNewsEntries in hotlist.not 0x0049 0100 1001 Points to TNotURLEntries in cyber.not and hotlist.not 0x004E 0100 1110 Points to TNotNewsEntries in cyber.not and hotlist.not 0x004F 0100 1111 Points to TNotURLEntries in hotlist.not We can make no detailed conclusions from so little data. TNotIPEntry Offset Size Description 0x0000 4 IP 0x0004 1 Count of additional IP addresses (typically 1-23) 0x0005 * IP x count TNotURLEntry Offset Size Description 0x0000 4 IP Address 0x0004 2 Category blocking mask or 0x0000 to indicate a subrecord follows Subrecord 0x0000 1 Subrecord size 0x0001 2 Category blocking mask 0x0003 * URL hash In the case where there are one or more subrecords, the list is terminated by a zero byte. TNotNewsEntry Offset Size Description 0x0000 1 Record size 0x0001 2 Category blocking mask 0x0003 * Newsgroup string Now, for the cyber.yes: TYesHeader Offset Size Description 0x0000 1 Filetype? (0xFB) 0x0001 2 Count of TYesURLEntries This is the only record-type of the cyber.yes: TYesURLEntry Offset Size Description 0x0000 4 IP Address 0x0004 2 Unknown, or 0x0000 to indicate a subrecord follows Subrecord 0x0000 1 Subrecord size 0x0001 2 Unknown 0x0003 * URL as plaintext Same as for the TNotURL-entries, in the case where there are one or more subrecords, the list is terminated by a zero byte. 7 Observations With all these technical things resolved, let's look at the data itself. First a table of statistics pulled from two different CyberNOT files: Cyber Patrol URL Database Statistics Bit Category 1999-04-29 2000-02-20 Change 0 Violence / Profanity 1201 1407 +206 (17%) 1 Partial Nudity 46538 72236 +25698 (55%) 2 Full Nudity 45013 70248 +25235 (56%) 3 Sexual Acts / Text 47769 74009 +26240 (54%) 4 Gross Depictions / Text 1414 2273 +859 (61%) 5 Intolerance 259 337 +78 (30%) 6 Satanic or Cult 129 197 +68 (53%) 7 Drugs / Drug Culture 197 306 +109 (55%) 8 Militant / Extremist 187 204 +17 (9%) 9 Sex Education 201 270 +69 (34%) A Questionable / Illegal & Gambling 1347 1928 +581 (43%) B Alcohol & Tobacco 783 1155 +372 (48%) C Reserved 4 48 3 -45 (1500%) D Reserved 3 0 0 0 (0%) E Reserved 2 0 0 0 (0%) F Reserved 1 0 0 0 (0%) Total URL masks 52315 79899 27584 (52%) We can see that of the roughly 80000, entries about 90% fall into one or more of the pornography categories. The Learning Company have a page on their site describing their criteria for categorizing entries. At the end it states: "Note: Web sites which post "Adult Only" warning banners advising that minors are not allowed to access material on the site are automatically added to the CyberNOT list in their appropriate category.". This may give the impression that sites are automagically added as soon as they appear on the web, which certainly isn't the case. They are most probably using a web spider to pick these up. These spidered sites probably make up the bulk of the URLs flagged in all of categories 1, 2 and 3, which is the dominant set of flags by far. By monitoring these statistics for a longer period of time one could deduce how effective the spider is in finding new sites. The oldest cyber.not we have available is dated 1999-04-29. By comparison it contains only 52315 entries, but the ratio of "porn" rated sites is the same, about 89%, with 46538, 45013 and 47769 entries flagged for categories one, two and three respectively. Most of the other categories are up by between a hundred and three hundred entries, but the porn categories, suspected mostly to consist of spidered sites, are up by about 25000 entries each for the period (about 38 weeks). There is a function in CP where a user can use a form to report new URLs for consideration of inclusion into the CyberNOT. It would be interesting to know how many of the URLs added come in this way. It would be possible for users to team up and exchange URLs on their own, bypassing The Learning Company, which is charging for these CyberNOT updates. By patching the CP executable it could be made so that this report form is posted to another server, which could also host updated CyberNOT lists. It would take a little work to set up, but not too much. The most difficult aspect would probably be to reach out to active Cyber Patrol users and convince them that this would be worthwhile, especially since it would require a certain amount of momentum to be worthwhile at all. With this threat, it's logical to assume that The Learning Company and other censorware vendors will use even more security-through-obscurity in future products, to deter the threat of having one of their sources of income bypassed. Near the start of this essay we mentioned the "reserved" blocking categories. Cyber Patrol, in addition to the twelve documented blocking categories, has an additional four (labelled "Reserved 1" through "Reserved 4") which are greyed out. Reserved 3 and Reserved 4 are selected by default, and so cannot be disabled - even by the administrator. Any sites placed in one of those two categories will be blocked no matter what. We found three examples on the now current CyberNOT list. All three are in Japanese. They were each blocked in Reserved 4 and no other categories; we could not find any examples of blocks on other reserved categories. * http://133.205.62.133/~coga/, which appears to say something like "This domain has moved". * http://202.26.1.170/~mcqueen/, which is mostly in Japanese but includes the English text "The page you requested was not found". * Tsutomu Notani's home page, which based on the pictures appears to include some content about horse racing, and thus (presumably) gambling. No other blockable content is immediately apparent. There are a few entries in the CyberNOT list that are blocked under all non-reserved categories. For instance, the anti-censorware site of Peacefire is listed as containing "Violence / Profanity, Partial Nudity, Full Nudity, Sexual Acts / Text, Gross Depictions / Text, Intolerance, Satanic or Cult, Drugs / Drug Culture, Militant / Extremist, Sex Education, Questionable / Illegal & Gambling, Alcohol & Tobacco". That's not such a surprise; blocking Peacefire has become traditional among censorware manufacturers. The other sites blocked under all categories seem to be translation and anonymizer services; any site where you can type in a URL and it will present you a copy of that page. That's probably no big surprise either, because such sites can be used to circumvent censorware. So it may be reasonable that sites like anonymizer.com should be blocked under all categories; potentially, they do make available the entire range of human thought. Not all these blocks are carefully applied, however; the "STOP KITTY PORN" page (which features a picture of a very bored-looking house cat) is blocked under all categories apparently just for containing a link to anonymizer.com. Here, as elsewhere, the blocking list doesn't seem to be updated very frequently. The server at 207.55.200.2 (whose reverse-DNS resolves to "www.live4u.com", although that doesn't resolve in the forward direction) seems to be an ordinary portal site, with no obvious translation service, but it's blocked for everything except sex education. Of course, the most interesting things we could find on the blocking list would be sites about political or social issues. Other censorware packages have gotten in a lot of trouble, for instance, by blocking sites like the National Organization of Women, and a great many gay and lesbian sites. The CyberNOT list seems relatively free of that kind of political agenda, which could be a good or a bad thing depending on your point of view. If the software is to be installed in public libraries, it's good that it won't block these politically-important sites. Of course, it would be better if it didn't block any sites at all. On the other hand, if you were a parent who considered feminism or homosexuality to be unimaginably horrid subjects, then you might feel ripped off by Cyber Patrol's not blocking the high-profile sites. Let's take a closer look at the category intolerance. While they do block smaller sites, such as this one on atheism, which we feel is relatively benign, they also block such high profile a site as www.godhatesfags.com and part of American Family Organization, whose views on homosexuality cannot be described as anything if not intolerant. AFA is one organization pushing for the installation of censorwares in US libraries. One can only assume they'd prefer one of Cyber Patrol's competitors. Some other sites in this category: * Matthew R. Galloway's homepage. Contains the word "Voodoo" in a reference to voodoo-cycles.com, and a pretty famous joke file entitled Top 10 Reasons Why Beer Is Better Than Jesus. No #1 being "If you've devoted your life to Beer, there are groups to help you stop.", BTW. * Misha Verbitsky's old homepage. Seems perfectly ordinary. Some papers, a couple of usenet archives. Note that this page was frozen several years back, so whatever it was censored for, is still there. * Church of the SubGenius. Banned in every category except sex-ed. The Church is a spoof of fundamentalist Christianity, consumer culture, and other things. * joc.mit.edu/cornell/. This link is for the archive containing files relevant to: The Justice on Campus Project's mission is to preserve free expression and due process rights at universities. Our online archive includes reports on disciplinary charges, speech codes, and censorship on college campuses around the country. The Project was one of 20 plaintiffs in the ACLU's successful challenge of the Communications Decency Act. How very intolerant of them to be working for free speech, huh? How about some examples from the category "Satanic / Cults"? * Mega's Metal Asylum. Miika "Mega" Kuusinen's page of Metal music. Articles, links. Perfectly ordinary. Tagged as militant, too. Well, we all know how metal music is the devil's work. * This site contains nothing but the text "Welcome!". If that's enough to be branded a "Satanist", we can expect a rapid growth in bans. If nothing else, this is another example of how the bans grow outdated as time goes by, but The Learning Company doesn't seem to care much. * webdevils.com - "Experiments with sound", a site which has nothing to do with religion, or lack of it. Guess the hostname was enough in this case. There is one political issue the CyberNOT list doesn't shy away from: that of nuclear disarmament. All sites relating in any way to war, bombs, explosives, or fireworks, both for and against, seem to be eligible for blocking as "Militant / Extremist". Most are also classed as "Violence / Profanity" and "Questionable / Illegal & Gambling", whether those categories seem to apply or not. For instance: * The Nuclear Control Institute. From the blocked page: Founded in 1981, the Nuclear Control Institute (NCI) is an independent research and advocacy center specializing in problems of nuclear proliferation. Non-partisan and non-profit, we monitor nuclear activities worldwide and pursue strategies to halt the spread and reverse the growth of nuclear arms. No Bomb! In particular, we focus on the urgency of eliminating atom-bomb materials ---plutonium and highly enriched uranium---from civilian nuclear power and research programs. Is that an extremist position? * A personal site including a lot of different material, apparently blocked for something called "The Nazism Exposed Project". From the blocked page: Nazism, fascism and extreme nationalism are today at its highest peak since the destruction of Hitler's dictatorship in 1945. Today, all over the world, fascists and extreme nationalists win millions of votes on their simple racist solutions to very complex problems of the society. In the streets, Nazi boneheads are spreading fear by using murderous violence and terror. These fascist groups blame the cultural and ethnic minorities for the problems in our society. These individuals, and their political leaders, are a threat to our democracy, and to everything that is decent. Blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". * Anti-nuclear-bomb articles from the Tri-City Herald newspaper, blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". * One page in this directory (URL hash not fully reversed) on the City of Hiroshima Web site, blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". * Jim Lippard's home page, which contains some anti-Scientology material and a link (not text) to this Salon article about the Littleton shootings, which everone ought to read. * Cheesehead Central, a personal home page, which contains a few links relating to fireworks displays and therefore, apparently, qualifies as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". * The former location of the American Airpower Heritage Museum - an apparently-legitimate museum of US combat aircraft. Blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". Some sites that may be blockable under a few categories are also blocked under a great many other categories. For instance: * Teen Babe of the Month; it's a porn site, but it appears to be a perfectly ordinary porn site. Blocked under all categories except sex education. * http://www.xs4all.net/~stones/, a link (not the actual site itself) pointing at a warez search engine. That would presumably qualify as "Questionable / Illegal", but it's flagged for everything except sex education. * http://www.danland.engelholm.se/, a personal home page. Some content relating to warez, but nothing else blockworthy is immediately apparent. Blocked for everything except sex education. * The Marston Family Home Page, with the usual round of pictures of Mom, Dad, the kids, the dog, etc. Entire directory blocked for "Militant / Extremist, Questionable / Illegal & Gambling", apparently just because of this paragraph in young Prescott's section: In school they teach me about this thing called the Constitution but I guess the teachers must have been lying because this new law the Communications Decency Act totally defys [sic] all that the Constitution was. Fight the system, take the power back, WAKE UP!!!!! You go, boy. It is obvious on examining the list that many entries haven't been updated or checked in a long time. Many sites that are blocked now give 404 not found errors, or redirects to new locations that are not blocked. Changes to Web sites may also account for some of the inappropriate category labelling. Here are some samples of sites that seem inadequately reviewed: * an empty page blocked in all categories except sex education, and a 404 not found page blocked in all categories including sex education. There are many others like these. * A student home page at utexas.edu, blocked for "Violence / Profanity, Partial Nudity, Full Nudity, Sexual Acts / Text, Militant / Extremist, Questionable / Illegal & Gambling" content. It consists mostly of (clothed) photos of the author's baby son, with no blockable content immediately apparent. * Another student home page at imsa.edu, blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling". Consists solely of a link to the author's resume, which is perfectly ordinary. * A personal home page at world.std.com. The part about his wife is nauseatingly sweet, but doesn't really fit most people's definitions of "Gross Depictions / Text, Militant / Extremist, Questionable / Illegal & Gambling", which is what it's blocked for. * A sheet-music publisher, blocked as "Violence / Profanity, Militant / Extremist, Questionable / Illegal & Gambling" for no apparent reason. These are just a few examples of sites that Cyber Patrol is banning, or was. It is not unthinkable that they might lift a few after this is published. We've only scratched the surface as far as checking on the sites that are banned. Going through even a few hundred takes a lot of time, and with almost 80,000 bans in effect, the work required to check them all would be enormous. We don't have time to do it, but since The Learning Company is making money from the supposed correctness of the list, they ought to be able to find resources to check the list from time to time. We know they are banning 80,000 or so URLs, but most censorware packages also have a database of words that are not allowed to exist in incoming pages, because it's the only way to really approach being effective in banning new pages on the ever evolving and growing Internet. Cyber Patrol doesn't do that, and so its IP and URL bans are its only real line of defence. If you can find a site that The Learning Company have not, then there's very little stopping you from browsing it. There is the function that can filter a site based on substrings in the URL itself, but that is it. Cyber Patrol is actually fairly efficient in blocking sites if you don't know how to search effectively. If you simple search one of the major search-engines then you will probably draw a blank, because it's very likely that that is the exact kind of search used by The Learning Company to bait their web-spiders. However, finding a few pages with obscene banners and thumbnail pictures is no big problem. We could locate this one and this one in short order. One somewhat effective method is to search for non-English language pages. The spider might not be effective in locating and parsing these for automatic inclusion in the CyberNOT. You could for instance look for a Swedish site, and locate www.smygis.com, which is not - as this is written - blocked in any way. If you really want porn, Cyber Patrol might slow you down a little, but it won't cut you off entirely. 7.1 Rogue deinstallation Apart from checking for "unauthorized" modifications to cyberp.ini, CP's "advanced anti-hacker security" consists of a new %windir%\system\system.drv that checks for the existence of the modules PROGIC, PROGICS and TS. These are represented by the files IC.EXE, ICFIRE.EXE and TS.DLL, all in the %windir%. The original system.drv is cleverly hidden away as %windir%\system.386. The modules are loaded in two ways: first there is a load entry in the win.ini file, and second, there's a entry in the registry at HKCU\Software\Microsoft\Windows\CurrentVersion\Run called "FltProcess", which will load %windir%\system\msinet.exe, which in turn will load the Cyber Patrol modules. After replacing the system.drv, which in the CP-version will halt loading of Windows if it doesn't find it's modules, and ask you to call their support number, you can safely do away with the registry entry, the load-key in the win.ini and any of the numerous binaries. Because of the many files CP installs to your system, we suggest you use the normal uninstaller instead. Not that it does a very good job of removing its system files, but there you go. Optionally, if you come across an installation running unregistered, you can use the backdoor password omed to uninstall, or simply to gain administrator access. 8 Source and binaries We have developed a set of software for getting around Cyber Patrol. People oppressed by Cyber Patrol will want to take a look at CPHack, a Win32 binary which will decode the userlist for you, and also let you browse the different banlists. Also available is C source for two command-line programs illustrating the cryptographic attacks on cyber.not (cndecode.c) and the HQ password hash (cph1_rev.c). These programs were written under Linux and are not guaranteed to work anywhere else. A complete package with this essay, the binaries, and various sources and related files are available as cp4break.zip (~360Kb). 8.1 CPHack documentation This tool is not particularly hard to use, but some comment on its use could be in order. First of all the author would like to state that this is a hack(1), which is reflected both in the state of the source and the user interface. The basic functionality is to let you load and browse the information of a cyber patrol .not file and/or the user information contained in a cyberp.ini file. Simple select which you want to load using the file menu. Also in the file menu are functions for importing and exporting hosts. By importing hosts you are reading a text file containing lines of IPs and their corresponding hostnames into the treeviews. Export, of course, does the opposite. Continuing we have the functions "Export dictionary" which will traverse the treeviews and write out all words that have been assigned to URL-hashes. "Export unresolved IPs" does just that; it could be used to distribute the work of doing reverse-lookups. The final export function is "Export URL hashes", which will export any hash that has not been assigned a word, the logical inverse of the "Export dictionary" function. Maybe the most useful functions are the last ones, "Generate report", which will output a HTML document reflecting the data you have loaded. Be sure to check out the "Configuration" tab before doing that though, and the somewhat mysterious "Cull dictionary by hash". The last function will take the main dictionary (as defined in the configuration tab), and create a new dictionary containing only the words with hashes contained in a .not file you have loaded. A bit of explanation on this: It was thought by the author that a lazy dictionary attack would be enough. This lazy approach is what you get if you select one of the attacks available by right-clicking a node. However, this proved quite slow when used with large dictionaries (15Mb or so), as it only looks at one URL at a time. The problem here is that CPHack will try - for each node - lots of words from the dictionary with hashes that doesn't exist in the database at all. As a quick hack on the hack, this function was implemented, which will take all the hashes in the database and attack them all at once. The downside is that no references are kept as to which exact nodes the found hashes belong to, so you will only get a new optimized dictionary to use in the lazy attack, you won't get a instant update to the treeview. While desirable, it would take too much time and effort - at this point - to implement correctly. A good implementation would traverse the nodes you have selected, creating a ordered list of unique hashes, attached to which would be lists of all associated nodes. When the hash of a word is found in this ordered list of hashes, the correct chain of tree nodes could be quickly traversed and nodes updated to reflect the hit. Until this is fixed, you should cull the dictionary first, and use the output with the lazy attack, to "assign" all words into the database. The main interface contains the five sections "Users", "Newsgroups", "URL database", "IP Aliasing" and "Configuration". A quick rundown follows. If you load a cyberp.ini the "Users" tab will display the names and passwords of the users therein, including the passwords of the innate administrator and deputy accounts. After loading a CyberNOT file, the "Newsgroups" tab will display all filters defined therein. To the rights is a panel of checkboxes which you cannot operate, but will reflect the masks applied to the newsgroup entry you select in the listview. Next we have the "URL database" tab, which contains a treeview where you can browse the database. It should be noted that the relative long loading time of a CyberNOT file is due to the way the treeview works, with insertion into a branch - apparently - being O(n) and not about O(1) in regard to the number of siblings of a new node. Anyway, you can browse the view in the normal manner of things. There are three different types of nodes, the first being called internally a "net node". This is simply a root node containing all entries for IPs of a "A net". Below these are "IP nodes" which are the IPs that are banned by the database. Some of these have children of their own, being "URL nodes" which contains the hashes of specific paths and resources being banned. You can right-click on any one of these three types of nodes for additional context sensitive functionality, such as "Open", "Lookup" and "Dictionary attack". As with the newsgroups tab, there is a panel of checkboxes which will reflect the masking status of the IP or URL you select. At the bottom is a quick search bar where you can do case sensitive string searches. There's not much to say about the "IP Aliasing" tab, but here too you can right-click for additional functionality. Finally we have the configuration tab where you define the different dictionaries you want to use, and a number of other things which are self-explanatory, except maybe for the "Lock found URLs". This function, if enabled, makes sure that once a word has been found to match a hash and been attached to it in the treeview, then it will never get replaced even if another possible candidate is found. This program is entirely self contained. It will not write to the registry, and it will not create files anywhere but in the its own path, unless you say it can. The source is included, and you can do whatever you want with it. 9 Conclusion On the good side, we note that Cyber Patrol is - technically - somewhat better than NetNanny and CyberSitter, the two other censorware packages we have intimate knowledge of, but there is still far too much 16-bit code for it to be really stable and earn a good grade. We see no evidence of a clear political or religious agenda behind Cyber Patrol, though as citizens of highly secularized countries we might feel that many of the bans in the "Satanist / Cult" category are unreasonable. Their criteria document says "Satanic material is defined as: Pictures or text advocating devil worship, an affinity for evil, or wickedness." and "A cult is defined as: A closed society [...] Common elements may include: [...] influences that tend to compromise the personal exercise of free will and critical thinking." LaVey Satanism - for instance - isn't about any of the things in the full definition, and atheism certainly isn't, but such sites are included in the CyberNOT. The evidence points to the CyberNOT list not being properly updated to remove old and outdated entries. As many as 50% of the IPs in the list doesn't even resolve! When evaluating a product with a ban list, you should not look at the number of entries, but the number of current entries. Simply collecting new entries, and using the ever growing (but outdated) list of bans as an argument in the sales game, is much easier than actually putting in work to ensure the list is up to date and accurate. The old classic tactic of entering critics into the banlist continues, with the banning of Peacefire in almost every category available. When the producers are knowingly banning a site in clearly the wrong categories, then what kind of trust can you put in them and their products? None. We must continue to reverse-engineer these products so that consumer rights can be protected. Will we ever find a censorware company who are not lying to us with these false bans? The absence of filtering based on content keywords is surprising, but welcome. The technology does not exist to make content-based filtering really functional. The problem of recognizing content and making choices based on context is a hard one, suitable for research by the AI-labs. But it is a two-edged sword. The price of leaving this error-prone functionality out is that it makes Cyber Patrol less effective in blocking pages not previously processed by The Learning Company. After all this, the feeling is that CP is just another censorware package. It tries hard to come across as effective - the magical technical solution to a non-technical problem - but when push comes to shove, it yields to the power of the human mind. If you thought putting this between your children and the Internet would protect them from "dangerous" ideas, then you'd better think again. 9.1 Thanks We would like to thank all the fine men and women working for civil liberties all over the world. Matthew would like to thank: the goddess Pele for favours received, and the Canadian government for supporting my cryptographic interests in several ways. Greetings to all the people I hang out with in sci.crypt, alt.kids-talk, talk.bizarre, and the VLUG and Voynich mailing lists. Eddy would like to thank: Robert Risberg, Kristoffer Andergrim, Mattias Aspman, Gunnar Rettne, and all of my friends around the world. Special regards to all the intelligent, knowledgeable and humorous folks of R20 of the Fidonet - you know who you are. All cryptanalysis done by Matthew Skala. Reverse Engineering done by Eddy L O Jansson and Matthew Skala. Feel free to contact the authors with your comments and/or questions. This essay first published at Eddy's homepage in 2000-03-11. You'll find Matthew's homepage here. You are allowed to mirror this document and the related files anywhere you see fit. 10 References [DFR98] Saruman and Bobban, "The Penetration of CyberSitter'97", Apr 1998. [DFR99] Saruman, "The Reversal of NetNanny", Aug 1999. [ACLU96] American Civil Liberties Union "FCC V. Pacifica Foundation", 1996. [RNW93] Ross N. Williams "A painless guide to CRC error detection algorithms", Aug 1993. [JRG00] Raphael Finkel, Eric S. Raymond, et al. "The on-line hacker Jargon File, version 4.2.0", Jan 2000. (c)2000 Eddy L O Jansson and Matthew Skala. All rights reserved. All trademarks acknowledged. [END] # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: [email protected] and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: [email protected]