What Is 'Information Entropy' & Why Is It So Important To Data Analysis?
Entropy - The Mysterious Ingredient Of All Information
In a previous article, I discussed Information Theory and Claude Shannon’s contribution to our understanding of ‘information.’ In it, we discovered that ‘meaning’ has nothing to do with the information, as paradoxical as that may seem. This was the genius of Shannon as he postulated that information is ‘noise’ and ‘surprise.’ Indeed, meaning will confuse the possibility of analyzing information. In this piece, I will explain what Shannon meant by ‘Information Entropy,’ a factor that has come to all Big Data and AI systems.
I now hope to demystify Entropy and ‘Information Entropy.’ Though, as anyone who has studied Entropy, the second law of thermodynamics will tell you, it is impossible to pin down one definition. Yet entropy plays a significant role in Chaos and Complexity theory, which lay at the heart of AI.
According to one (almost undoubtedly untrue) story, when grappling with the terminology to use in his paper, Shannon asked the legendary mathematician and physicist John von Neumann¹ ‘What should I call this thing?’. Von Neumann reputedly responded: ‘Say that information reduces “entropy.” It is a good, solid physics word. And more importantly, no one knows what entropy really is, so in a debate, you will always have the advantage’.
Any research on entropy will leave the researcher in stupefaction over the possible definitions of a law that lies at the heart of humanity’s understanding of the universe. Entropy² is the Second Law of Thermodynamics,³ and few technological advances would have taken place without understanding the fundamental consequences of entropy. Entropy should benefit from an uncontested, universally accepted definition by all logical formulations. However, as Von Neumann supposedly stated, this has proven impossible, and this crucial law is defined differently depending on the system with which it is being used. It has even entered the realm of buzzwords, where one can hear the term ‘entropy’ being spoken with abandon and without any essential meaning.
The Second Law of Thermodynamics states:
‘Entropy always increases until it reaches a maximum value. The total entropy of a system will always increase until it reaches its maximum possible value; it will never decrease on its own unless an outside agent works to decrease it.’⁴
This law holds that there will always be an expenditure of energy in any system in the universe, and this expended energy cannot be reused. The entropy produced by all-natural and biological systems will continue to expand until some other outside source interferes with this trajectory. For example, many systems release heat as part of their working process. ‘This so-called heat loss is measured by a quantity called entropy. Entropy is a measure of the energy that cannot be converted into additional work’.⁵
The image of the fire above is an example of entropy. The products of fire are composed mostly of gases such as carbon dioxide and water vapor, so the entropy of the system increases during most combustion reactions. The energy decreases while the entropy increases and entropy cannot be reused or converted. In short, it is lost to the entire system.
To address the misuse and misunderstanding of the term, Professor Arieh Ben-Naim has suggested abandoning the word ‘entropy’ altogether and replacing it with ‘missing information.’ He also suggests entropy can be classified into two major categories: ‘one is based on the interpretation of entropy in terms of the extent of disorder in a system; the second involves the interpretation of entropy in terms of the missing information on the system.’⁶
In information theory, Shannon recognized the order-disorder quandary. He realized that all information contains ‘noise’, something which is essentially useless to the data. The ubiquitous term ‘signal-to-noise ratio’ comes directly from this formulation.
‘In his classic 1948 paper, Shannon defined the information content in terms of the entropy of the message source. People have sometimes characterized Shannon’s definition of information content as the “average amount of surprise” a receiver experiences on receiving a message, in which “surprise” means something like the “degree of uncertainty” the receiver had about what the source would send next…
… Once again, the entropy (and thus information content) of a source is defined in terms of message probabilities and is not concerned with the ‘meaning’ of a message.’⁷
These bits of information are at the heart of what Shannon named ‘information entropy’ (also known as ‘Shannon entropy’). Information entropy is necessary to grasp the meaning of any information (data).
‘Information is entropy. This was the strangest and most powerful notion of all. Entropy — already a difficult and poorly understood concept — is a measure of disorder in thermodynamics, the science of heat and energy.⁸
‘In essence, entropy is a measure of uncertainty. When our uncertainty is reduced, we gain information, so information and entropy are two sides of the same coin.’⁹
The ‘noise’ is always abstracted in any data analysis, be it machine learning (ML), pattern recognition (PR), deep learning (DL), natural language processing (NLP), or simple data comparison.
We only analyze the ‘real’ information available, also known as ‘the surprise within the message’. Without this understanding and ability, without a firm grasp of information entropy, one can never hope to achieve any sort of AI or analytics. Predictive analytics would never work without reducing the noise and disorder,¹⁰ and it lies at the heart of all modern data analytics. It makes it possible to reduce the uncertainty of meaning.
Summary
In summary, there is ‘surprise’ and ‘noise.’ To analyze data we remove the noise (information entropy) and we are left with the surprise in the message. The actual meaning of the message has no context or importance to us.
In the following article in this series, I hope to discuss “Bias,” the Achilles heel of all data systems.
Please remember, these are only short introductions to complex subjects. At the end of this series, I will produce a comprehensive reading list for anyone interested.
References:
1. Wikipedia (n.d.) ‘John von Neumann’, available at: https://en.wikipedia.org/wiki/John_von_Neumann (accessed 20th August, 2021).
2. Wikipedia (n.d.) ‘Entropy’, available at: https://en.wikipedia.org/wiki/Entropy (accessed 29th July, 2021).
3. Wikipedia (n.d.) ‘Entropy in thermodynamics and information theory’, available at: https://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_information_theory (accessed 29th July, 2021).
4. Mitchell, M. (2009) ‘Complexity: A Guided Tour’, Oxford University Press, New York, NY, Kindle Edition, Location 744.
5. Ibid., Location 738.
6. Arieh Ben-Naim (2008) ‘Entropy Demystified: The Second Law Reduced to Plain Common Sense’, World Scientific Publishing Company, London, Kindle Edition, Location 489.
7. Mitchell, ref. 4 above, Location 902–918.
8. Gleick, J. (2011) ‘The Information’, Pantheon Books, New York, NY, Kindle Edition, Location 3592.
9. Stone, J.V. (2018) ‘Information Theory: A Tutorial Introduction,’ Sebtel Press, Kindle Edition, Location 603.
10. Gross, T.W. (2021) ‘Thesis and antithesis — Innovation and predictive analytics: Σ (Past + Present) Data ≠ Future Success’, Applied Marketing Analytics, Vol. 6, №3, pp. 22–36