Data continuity, data sampling and discernible change
As I have mentioned in many previous blogs, data continuity is one of the many features that is lost in data that is pre-collected and used for AI. But, what is data continuity? Why do I say losing continuity in data results in information loss? Is it possible to pre-collect continuous data at all? Is data continuity related in any way to time? Does a data-realised algorithm solve this problem?
While the most obvious definition of continuous data collection seems to be to collect continuous streaming data: for example collecting the analog version of varying temperature rather than a digitised, sampled version (sampled say every 1 minute) of the same temperature; I think this definition is wrong. This is not what data continuity really means. This would definitely burgeon the amount of data collected without any extra information added. This is the reason I believe, we have terabytes and terabytes of data with no useful information in it. We seem to think that just increasing the frequency of data collection, increases the accuracy of information that is present. I find this is again a very wrong notion. But, if this is not data continuity, then what is? In fact, I should be calling it “information continuity” rather than data continuity.
Let’s take examples to understand the difference. When we drive a vehicle, we take data inputs in the form of visual, sound and thermal inputs. We do not see the scene around the vehicle at regular sampled intervals, such as every 1 minute or 2 minutes or 30s. We typically tend to see the scene around us continuously, so that a continuous moving image is formed. Based on this continuous moving scene created, we react. But, to say that we need to continuously keep looking at the road and reconstructing the scene without any interruption, to drive correctly is also not true. In fact, if we keep looking at the road for too long without diverting attention away for few seconds, it leads to other problems related to the eyes. The same goes for the sound input. While we hear a continuous stream of sound coming from around the car, we do not intently listen to these sounds continuously. Yet, we are able to form a “coherent continuous stream of information” from these visual and sound inputs that helps us react to the situation around us.
Let’s take another example. When a doctor is diagnosing the heart condition of a person, they take many measured parameters at the time when the patient comes in for examination. They may observe the heart’s working condition continuously for a period of time using an echocardiogram or perform an angiogram to look at the state of the arteries. Based on the information collected they diagnose the condition and take action. Another example in the medical world is the blood tests that collect various blood parameters. Based on the comparison of the test results to normal levels various actions can be taken. Here again, blood tests are only done once, at the time of diagnosis. But, when a person has undergone surgery, regular periodic blood tests and other tests are done till discharge or even after discharge based on the severity. What is important to note here is that the data collection frequency and continuity is varied based on the requirement to create “a coherent continuous stream of information” that is required by the doctor to diagnose or monitor the patient’s condition.
What is important to note here is that practically speaking, in none of these situations “continuous data” is taken. More important than looking for “continuous absolute data”, we are looking for “continuous discernible change”, i.e., “a change that is indicative of a situation” to take a decision and act on it. So, continuity of data is dependant on the “rate of discernible change” that is present in data to be collected. In the vehicle driving scenario, another vehicle coming into our travel trajectory is more important than a vehicle travel at a steady similar speed as ourselves. Here, the rate of discernible change in visual and sound is high and increases or decreases with the speed at which we are driving and the traffic conditions in which we are driving. So, based on the speed and traffic conditions, we vary our alertness to ensure that we track the change to get continuous information.
When we look at the medical scenario, an echocardiogram; which needs to observe the changes in the working of the heart, such as functioning of valves, blood pumping rate etc; requires a continuous stream of data that can be mapped, because the rate of change in heart is high. The same, when we look at the blood tests say for example, a test that monitors cholesterol level or blood sugar levels, we do not get discernible change by the second. Possibly over a period of a week or month, it would change. So, it does not make sense collecting these every second of every day.
Thus, we find that each data has its own unique rate of change, based on which information continuity is maintained. Data continuity, thus, can be defined as “that rate of change of data which does not result in information disruption“. Since an AI application works with information, we need data collected so that information continuity is not lost. Hence, collecting data as a simple function of regular samples introduce errors or burgeons the data with no value-added. But, if the continuity of data depends on the data, then how can we collect the data without knowing its “rate of change of data”? And how can we know the rate of change without first analysing it?
The reason we run into such a question is because we try to pre-collect data as a scalar at a given point and time and then process it. Thus we collect the temperature at a specific location at a specific time, say as 26oC or humidity as 39% as absolute values at that given location and time. So, to extract any usable information, we need the mathematical precision that we have defined. What is important to note in all this is that for information, “change of data” is important as opposed to the “actual value”. Almost always we find that the “change indicates information”. It can be that there is no change, which is also information, i.e., there is 0 change, but change is what contains knowledge. So, pre-collecting point values makes no sense, unless we can compare it against some other value, either pre-defined normals or previous values collected to extract information.
But, rather than doing this, what if we left the data as is and are able to find techniques to create observers that output knowledge as different representations that can be directly used for intelligence, i.e., a data-realised knowledge as I have indicated in my previous blog, then what happens. We need to understand that nature also reacts only to changing conditions. When left alone everything stays as is. So, a chemical reaction occurs only if there is a change in the parameters surrounding it, a molecular bond changes only when surrounding environment has changed, or electron flows only if there is a change in potential or liquid flows only if there is a gradient in pressure and so on. Another important point about to note about natural processes is that these different natural process are accumulative in nature. None of them reset themselves back to an origin and keep starting over again. So, a chemical reaction starts at some given current state and changes from that current state to the next, electron flows from the current position to the next, it does not keep going back to its original position before flowing.
So, when we use these natural reactive process to collect information, as I have indicated in the data-realised algorithms, they automatically reacts only to changes and when changes occur. And given that they are accumulative in nature, they accumulate process based on continuous data change rather than working with absolute discrete values. This makes information collected as these changes in molecular bonds or chemical reaction, automatically encode continuity in it in the form of the accumulation that is present. Thus if we can control the start state, the end state becomes indicative of the continuity in the duration over which the accumulation occurred. So, say we used amino-acids coming together to form protein molecules as a way to encode data, then the structure of the proteins and the strength of the bonds formed between the amino-acids becomes indicative of the data collected, both the continuity and duration of accumulation.
Another example would be for light based scenes as we want for driving a vehicle. If we used photo-sensitive materials that accumulate ambient light as they reflect on the material and are able to collect the change in the photo-sensitive material by overlapping these materials over each other and react to a change between the two overlapped photo-sensitive materials, rather than re-create an accurate image every time and process them to react, we would have got information rather than absolute data and reacted only to changes as we actually do while driving.
Published on Java Code Geeks with permission by Raji Sankar, partner at our JCG program. See the original article here: Data continuity, data sampling and discernible change Opinions expressed by Java Code Geeks contributors are their own. |