[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd characters






On Thu, 27 Aug 1998, Erich Schroeder wrote:

> 
> Hi,
>   No, I'm not asking about people;) I've been editing some html that I
> produced from text files exported from Win95 MS word translated from mac
> MS word (arrg) and I have been finding odd characters. In vi they look
> like ~U, and in netscape they come through as *, both on screen and in
> "view source". My preferred text editor (nedit) won't let me grab them for
> search&replace. What I really need is a script to strip them out of my
> html files, but I don't know how to identify them. I believe that you can
> see some of them at:
> 
> http://www.museum.state.il.us/RiverWeb/landings/Ambot/prehistory/mississippian/technology/stone-analysis.html
> 
> assuming that I can remember a url that long...
> 
> Thanks in case someone can figure out how to search&destroy these thingys

Here's a simple hex dumper, eight chars per line, with the text after, no
fancy substitutions on unprintables. Season to taste. Bugs are not my
fault, as I'm writing this off the top of my head, which has been known to
fog on occasion:

---
#!/usr/bin/perl

while (read(STDIN, $ch, 1)) 
{
  print sprintf("%02X ", ord $ch); 
  $count++; 
  push @chars, $ch; 
  if (!($count % 8)) 
  {
    print " ", join("", @chars), "\n"; 
    undef @chars; 
  }
}
---

After you find the hex value of the character in question, do this:

  perl -pe 's/\xFF/ /g;' file > newfile

This will convert hex FF to a space everywhere.  Again, season to taste.

I'm sure there are easier ways to do this, but this gets the job done.
Please don't pick on the wannabe Perl god!



--
To unsubscribe, send email to majordomo@luci.org with
"unsubscribe luci-discuss" in the body.