Software Development

The missing depth dimension in data

As I have been writing in my blogs on AI, the data representation that we have, is not suited for writing a knowledge and intelligent system. Data needs to be the primary focus and remain the primary focus to implement true AI. So, we need to find a different way of representing data by taking into consideration all the requirements of data in AI. When we look at the data we collect with AI, we find that it lacks of necessary characteristics. As I have written in the last blog, first and foremost, the data we collect misses continuity. Blindly, collecting data as regular samples does not imply we have collected the required data that can be used to give us intelligent conclusions. In fact we find that the data we collect is sometimes insufficient even for very basic analysis, let alone AI. So, collection of data has to focus on information-oriented collection rather than periodic collection of scalar data. As I had mentioned in the previous blog, “information continuity” is the primary requirement and “pure continuous data” does not guarantee this. Blindly collecting continuous data also just bloats data without adding any value to the collected continuous data.

Similar is the case when we look at depth of the data collected. We lack depth in the data collected. In fact we do not have a representation that allows us to introduce depth in data. So, what do I mean by depth of data? To understand this, we need to look at the data we are collecting. As I have said before, when we use “electronic sensors” in the manner we have used them, we can collect only sampled, single dimensional scalar values. This is because, the sensors typically aim to translate the monitored data such as pressure, temperature or light into “a single flow of electrons in a conductor or grid of conductions” which then is translated to digital values. Subsequently, we need to related these disparate single scalar values collected. This is where depth of data is lost. To understand depth of data we have to look at nature and our understanding of nature around us to see the concept.

In nature we find that very easily “infinite can become finite and finite can become infinite“. What do I mean by this? For example take space, limit space by containing it within some defined boundaries such as a box, and deal with macros elements within that space in the box, space is observed as finite. We can take that limited space in the box and keep dividing it into smaller and smaller parts and keep focusing on the smaller parts to get more details of those composing parts. We can keep dividing it infinitely and focusing down to smaller parts to form more observations and from them more knowledge and intelligence. Similar is the case with time. Limit it and view just the limited duration, it became finite higher level duration for a top-level event, keep splitting it down to smaller and smaller time slices, and keep focusing on these smaller time slices, the higher level event breaks down to be composed of smaller and smaller triggers that come together to form that bigger event. It is as if I can take a small piece and zoom in and zoom out infinitely, pan to a different piece and focus on other small piece and at that region again zoom in and zoom out infinitely. This happens because of infinite recursive nature of everything around us.

To extend this concept to the data in driving. As we are driving, we find that first comes the macro level high level scenery around us, which we keep panning as we pass them. Based on the speed and intent, the macro level information remains just that, macro level information with just enough information absorbed to take decisions. We do not focus on any of the macro level scenes as it passes by and just process enough to know whether to react or not to the change in the scenery. But, say, we have to take a specific turn on a street, such as 4th cross or 2nd main, we start focusing on more details related to the macro level information. Or say that there is a stop sign or traffic light, we start processing more information about the change in the scenery. This is similar to adding bounds to the infinite data present and drilling down further and further as we seek more information in the specific bounds till we are able to get as much information as needed to finish a task. That is, we have panned to the correct spot and zoomed in to get more details. So, in case of a traffic light, we start understanding, is the light red? will it turn green by the time I get to a junction? are there cars blocking my way? what is the state of traffic on the cross-road? and so on. We can keep asking more and more questions of the macro data and observe deeper and deeper related data to the macro data and get more information.

It should also be noted that these relations between data is not exclusive. Meaning they are not related to just one action or type of data. So, if there is are wind gusts that are affecting the steering of my car, the same wind gusts data is also predicting for me the current weather conditions. So, when I zoom into “the wind gust information” from the logic that is driving a car or the logic that is assessing the weather conditions, I am accessing the same zoomed in information. All data is connected in a multi-dimensional network of interconnected continuous information that can be reached from anywhere based on the need and zoomed in or zoomed out as and when required. Panned across some top-level information and drilled down at a given point. I call it multi-dimensional even though I have focussed on depth in this article. This is because when we look at data, there is no difference between the depth dimension and any other dimension. So, while the “wind gust speed” becomes a depth dimension for “driving”, it becomes a primary predicting parameter for weather conditions. Hence, I can pan on any set of information and drill down at a point and get to another information that becomes the depth.

The data we collect is not in this manner. All the data, I have mentioned can be collected, but they are collected as non-related scalar individual pieces of data. With our current representation they remain as just this, non-related scalar individual pieces of data. We need to explicitly relate these pieces of data based on the knowledge we want to build into the system. This automatically implies that there needs to be some pre-existing expert opinion for even relating data and is not driven by what already exists when we collect the data. In the current systems we have a typical pattern where we have “list of items such as orders” and have a “detail for each item in the list”. But, it stops at the first level. There is no such thing called an infinite drill-down of details which is what is present in nature which we have lost when collecting data. Hence, we have to write a whole lot of logic to learn and use this relation and end up creating permutation and combination kind of logic, rather than logical, sensible relations that anyways was present in the data, had we only collected them.

As I have said in my blog on “Data-realised algorithms vs logic-based algorithms“, the foremost need to build an AI system is to find an adaptable representation for knowledge. I have propounded that we need to leave data as is without trying to collect it periodically and just create the knowledge as and when required in some form similar to molecular structure and utilise them appropriately. But, what we need to realise is that while data-realisation allows us to encode continuity in information (all data that play a role in the information) into varying structures because of accumulation, it is not sufficient to encode the depth of information. This is because depth of information depends on the context of the goal rather than just related data.

To implement the depth functionality in a system we could do what we are doing of pre-collecting data and try to relate them using logic OR try to find a way to extract them just in time as and when required. As I said, the data is already related when they are present as just data. It is just the question of using them correctly. In this case, what is needed is to trigger the correct collection of data based on requirement. As I have said, I find that we can learn a lot from “protein creation” in the body. We find that mRNA is formed from the DNA which encodes what protein needs to be created. Subsequently this mRNA forms the protein that gets encoded based on various parameters. Is this not similar to a trigger based collection of data? Isn’t this actually encoding what data needs to be collected? All we need is to create a similar trigger to create the correct “molecular structure creation at the correctly location” based on the goal that needs to be achieved. So, the “data-realised algorithm” needs to have a controllable-programmable trigger and we will get the infinite depth we are looking for without actually collecting and storing sampled data.

Published on Java Code Geeks with permission by Raji Sankar, partner at our JCG program. See the original article here: The missing depth dimension in data

Opinions expressed by Java Code Geeks contributors are their own.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button