Re: Guy Macon's adventures with ASCII character frequency

From: Douglas A. Gwyn (DAGwyn_at_null.net)
Date: 01/22/05


Date: Sat, 22 Jan 2005 17:48:57 -0500

Guy Macon wrote:
> I want a list that includes the space, punctuation,
> numerals, upper case and lower case, not just letters.
> I strongly suspect that the space character is more
> common that E or e is, for example.

The letter E occurs about 17% of the time on the
average in "telegraphic" English text (which has
no nonalphabetic characters, and spells out some
punctuation). Since the average English word
size is about five characters, if space is
included as a word separator that means that space
occurs about 17% of the time, and E occurs about
14% of the time.

> Does anyone know where I can find such a list?

If you want to determine relative character frequencies
in a corpus of representative text stored in files, it
is easy to do so with a simple computer program.

> How about one with all the possible Digraphs?

All digraphs are *possible*, QXIHML. If you want to
determine their relative frequencies in a corpus of
representative text stored in files, it is easy to do
so with a simple computer program.



Relevant Pages

  • Re: DNA as a book
    ... Space is a character. ... Like the stem cell. ... In digital representations, spaces and letters have ... AB...The irreducability of sentence made me think of genes. ...
    (sci.bio.evolution)
  • Re: Random letter colors?
    ... the colors of individual letters in some text. ... you will need to edit the vaColors variable. ... ' Set each character in the selection to a different color ... Dim ilColorNext As Word.WdColorIndex 'Color Index property ...
    (microsoft.public.word.newusers)
  • Re: Heuristc to distinguish text and code
    ... Phil wrote: ... I have measured character fequencies in small corpuses of text and code; then for each paragraph I determine the correlation between its character frequency and those two references. ... The idea here is that it's the pattern of letters and punctuation that matters, not what the actual letters are. ...
    (comp.programming)
  • Re: Track Changes VBA Granular information needed
    ... MsgBox "Deleted Period" ... Add more ElseIf clauses for any other punctuation you want to catch. ... one character) will show up in the table column that contains .Range.Text. ... Since Track Changes creates a Delete Bubble on the document, ...
    (microsoft.public.word.vba.beginners)
  • Variable names Was: Re: Is this math test too easy?
    ... I believe your Spanish spelling has ... one or two letters. ... character, or one character with subscripts, which also ... Without knowing Russian, I was able to follow a Russian linear ...
    (sci.math)