## Critical Digital Humanities Bibliography

About Critical DH Bibliography

"The recent development of various methods of modulation such as reM and PPM which exchange bandwidth for signal-to-noise ratio has intensified the interest in a general theory of communication. A basis for such a theory is contained in the important papers of Nyquist! and Hartley" on this subject. In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the sta tistiral structure of the original message and due to the nature of the final destination of the information. The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design." p379

"<p>By a communication system we will mean a system of the type indicated schematically in Fig. 1. It consists of essentially five parts: </p><p> 1. An information source which produces a message or sequence of messages to be communicated to the receiving terminal... </p><p> 2. A transmitter which operates on the message in some way to produce a signal suitable for transmission over the channel. </p><p> 3. The channel is merely the medium used to transmit the signal from transmitter to receiver. </p><p> 4. The receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal. </p><p> 5. The destination is the person (or thing) for whom the message is intended. </p>" p380-382

"We can think of a discrete source as generating the message, symbol by symbol. It will choose successive symbols accoring to certain probabilities depending, in general, on preceding choices as well as the particular symbols in question." p385

"A more complicated structure is obtained if successive symbols are not chosen independently but their probabilities depend on preceding letters." p386

"<p>Second-Order Word Approximation. The word transition probabilities are correct but no further structure is included. </p><p>THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH </p><p>WRITER THAT THE CHARACTER OF THIS POINT IS </p><p>THEREFORE ANOTHER METHOD FOR THE LETTERS </p><p>THAT THE TIME OF WHO EVER TOLD THE PROBLEM </p><p>FOR AN UNEXPECTED </p><p>The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note that these samples have reasonably good structure out to about twice the range that is taken into account in their construction. Thus in (3) the statistical process insures reasonable text for two-letter sequence, but four-letter sequences from the sample can usually be fitted into good sentences. </p>" p388

"<p>We have represented a discrete information source as a Markoff process. </p><p> Can we define a quantity which will measure, in some sense, how much information is "produced" by such a process, or better, at what rate information is produced? </p><p> Suppose we have a set of possible events whose probabilities of occurrence are PI , P2 , ... , P3 These probabilities are known but that is all we know concerning which event will occur. Can we find a measure of how much "choice" is involved in the selection of the event or of how uncertain we are of the outcome? </p>" p392

"Quantities of the form H = - Σp<sub>i</sub> log p<sub>i</sub> (the constant K merely amounts to a choice of a unit of measure) play a central role in information theory as measures of information, choice and uncertainty. The form of H will be recognized as that of entropy as defined in certain formulations of statistical mechanics where p<sub>i</sub> is the probability of a system being in cell <i>i</i> of its phase space. <i>H</i> is then, for example, the H in Boltzmann's famous H theorem. We shall call H = - ΣP<sub>i</sub> log P<sub>i</sub> the entropy of the set of probabilities" p393

"The ratio of the entropy of a source to the maximum value it could have while still restricted to the same symbols will be called its relative ell/ropy. This is the maximum compression possible when we encode into the same alphabet. One minus the relative entropy is the redundancy. The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters is roughly 50%. This means that when we write English half of what we write is determined by the structure of the language and half is chosen freely. The figure 50% was found by several independent methods which all gave results in this neighborhood. One is by calculation of the entropy of the approximations to English. A second method is to delete a certain fraction of the letters from a sample of English text and then let someone attempt to restore them. If they can be restored when 50% are deleted the redundancy must be greater than 50%. A third method depends on certain known results in cryptography. Two extremes of redundancy in English prose are represented by Basic English and by James Joyces' book "Finigans Wake." The Basic English vocabulary is limited to 850 words and the redundancy is very high. This is reflected in the expansion that occurs when a passage is translated into Basic English. Joyce on the' other hand enlarges the vocabulary and is alleged to achieve a compression of semantic content. The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is zero any sequence of letters is a reasonable text in the language and any two dimensional array of letters forms a crossword puzzle, If the redundancy is too high the language imposes too many constraints for large crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three dimensional crossword puzzles should be possible, etc." p398-399

"The input to the transducer is a sequence of input symbols and its output a sequence of output symbols. The transducer may have an internal memory so that its output depends not only on the present input symbol but also on the past history. We assume that the internal memory is finite, i.e. there exists a finite number 111 of possible states of the transducer and that its output is a function of the present state and the present input symbol. The next state will be a second function of these two quantities." p399

"If the channel is noisy it is not in general possible to reconstruct the original message or the transmitted signal with certainty by any operation on the received signal E. There are, however, ways of transmitting the information which are optimal in combating noise. This is the problem which we now consider." p407

"An approximation to the ideal would have the property that if the signal is altered in a reasonable way by the noise, the original can still be recovered. Tn other words the alteration will not in general bring it closer to another reasonable signal than the original. This is accomplished at the cost of a certain amount of redundancy in the coding. The redundancy must be introduced in the proper way to combat the particular noise structure involved. However, any redundancy in the source will usually help if it is utilized at the receiving point. In particular, if the source already has a certain redundancy and no attempt is made to eliminate it in matching to the channel, this redundancy will help combat noise. For example, in a noiseless telegraph channel one could save about 50% in time by proper encoding of the messages. This is not done and most of the redundancy of English remains in the channel symbols. This has the advantage, however, of allowing considerable noise in the channel. A sizable fraction of the letters can be received incorrectly and still reconstructed by the context." p414