Archive for Products & Services

Data & Computation challenges in Machine learning/Predictive Analytics – An Architect’s POV

While building complex machine learning and/or predictive analytic algorithmic models, huge amounts of historical/sampled data is consumed to train & tune the models. Typically, data comprises of some past observations with qualitative and/or quantitative characteristics associated with each observation. Each such observation is commonly referred to as an ‘example instance’ or ‘example case’ and observation characteristics are commonly referred to as ‘features’ or ‘attributes’. For example, an observation about certain type of customers may contain ‘height’, ‘weight’, ‘age’ and ‘gender’ and these will form the attributes of such observations.

A set of past observations is assumed to be sampled and picked at random from a ‘data universe’. Some ‘attributes’ are of numeric type and some others are ‘discrete’. Numeric type examples are ‘height’, ‘weight’ and ‘age’. Attribute ‘gender’ is discrete with ‘male’ and ‘female’ as values. Another example of discrete attribute is customer ‘status’ with values such as ‘general’, ‘silver’, ‘gold’ and ‘platinum’. Discrete attributes can only take particular values. There may potentially be an infinite number of those values, but each is distinct and there's no grey area in between. Some numeric attributes can also be discrete such as currency coin denominations – 1 cent, 5 cents (nickel), 10 cents (dime), 25 cents (quarter) as in US currency. Non-discrete numeric attributes are considered as ‘continuous’ valued attributes. Continuous attributes are not restricted to well-defined distinct values, but can occupy any value over a continuous range. Between any two continuous data values there may be an infinite number of others. Attributes such as ‘height’, ‘weight’ and ‘age’ fall under ‘continuous’ numeric attributes.

Before any algorithmic models are developed, the entire set of observations is transformed into an appropriate mathematical representation. This includes translating qualitative discrete attributes into quantitative forms. For example, a qualitative/discrete attribute such as customer status can be translated into quantitative form by representing value ‘general’ as 0, ‘silver’ as 1, ‘gold’ as 2 and ‘platinum’ as 3. Numeric attribute values are typically normalized. Normalization involves adjusting values measured on different scales to a notionally common scale. Most commonly used Normalization is Unity-based Normalization where attribute values are scaled (down) to fall into a range between 0 and 1. Numeric continuous attributes are discretized. Discretization refers to the process of converting or partitioning continuous attributes to discrete or nominal values. For example, values for a continuous attribute such as ‘age’ can be partitioned into equal interval ranges or bins; ages falling between 1 year and 10 years, 11-20, 21-30, 31-40 and so forth which is typically referred to as ‘binning’. Finite number of such well-defined ‘bins’ are considered and based on the ‘bin’, attribute value falls into, a distinct ‘bin’ value is assigned to that attribute. For example, if ‘age’ falls in 1-10 ‘bin’, a value of 1 may be assigned; if it falls in 11-20 ‘bin’, a value of 2 may be assigned and so forth.

Other critical issues typically encountered in dealing with attribute values are ‘missing values’, ‘inliers‘ and ‘outliers’. Incomplete data is an unavoidable problem in dealing with most of the real world data sources - (i) a value is missing because it was forgotten or lost; (ii) a certain feature is not applicable for a given observation instance (iii) for a given observation, the designer of a training set does not care about the value of a certain attribute (so-called don’t-care value). Such observations with missing attribute values are less commonly discarded but most often techniques are applied to fill-in ‘closest possible missing value’ for the attribute. Such techniques are commonly referred to as missing data imputation methods. Similarly, both ‘inlier’ and ‘outlier’ values would need to be resolved. An 'inlier' is a data value that lies in the interior of a statistical distribution and is in error. Likewise, an 'outlier' is a data value that lies in the exterior of a statistical distribution and is in error. There are statistical techniques to detect and remove such deviators, simplest one being removal using quartiles.

Data preprocessing also includes attribute value transformation, feature extraction and selection, dimensionality reduction etc., based on applicability of such techniques. Overall, data preprocessing can often have a significant impact on generalization performance of a machine learning algorithm. Once observations are converted into appropriate quantitative forms, each observation is considered to be ‘vectorized’ – essentially each observation with quantified attribute values is supposed to be describing a specific point in a multi-dimensional space and each such point is assumed to represent a ‘position-vector’ with its attributes as coordinate components in each dimension. In almost all cases of building machine learning models, matrix/vector algebra is extensively used to carry out mathematical/algebraic operations to deduce the parametric values of the algorithm/model. As such the entire set of all observations, now converted into ‘position vectors’, is represented by a matrix of vectors – either as a column matrix where each observation-vector forms the column or as a row matrix where each observation-vector forms a row, based on how machine algorithm consumes such matrices.

The following are key infrastructure/technology challenges that are encountered right away while dealing with such matrix forms:

- We are talking about, not hundreds or thousands of observations but typically hundred thousands or millions of observations. As such a matrix or even if observations are represented in some other formats, may not fit into single process memory. Besides, there could be hundreds/thousands of attributes associated with each observation magnifying the size further – imagine 1 million by 100K matrix or even higher! As such underlying technology/infrastructure should address this issue meticulously and in a way NOT affecting the runtime/performance of machine learning algorithm

- Persisting chunks of such data/matrices and reading them from disk on-demand as and when needed, may not be an option at all and even if it looks feasible, will not eliminate the problem adequately

- While training/tuning the model, there will be intermediate/temporary matrices that may be created such as Hessian matrices, which would typically be in similar shape & size or even larger, thereby demanding equal proportions or more of memory

- Even if some clustering/partitioning techniques are applied to disperse memory across a network of server nodes, there would be a need to clone such memory partitions for redundancy and fail-over – similar to how Hadoop distributed file system (HDFS) copies chunks of large file data across multiple data-nodes

- Complex matrix algebra computations are typically carried out as part of model training & convergence. This includes not just so-called-simple operations such as matrix-matrix multiplications, scalar operations, additions, subtractions but also spatial transformations such as Givens rotations, Householder transformations and so forth typically executed as part of either eigenvalues extraction, determinant evaluation, matrix decompositions such as QR, singular value decomposition, matrix bi-diagonalization or tri-diagonalization. All such operations would need to be carried out with high-efficiency optimizing memory foot-print and without high latencies

- Most algorithms would need several iterations of training/tuning and each such iteration may take several minutes/hours/days to complete. As such the underlying infrastructure/technology, apart from being scalable, should be highly reliable and fault tolerant. For a model that inherently takes long durations to get trained/tuned, it is undesirable & unimaginable to have some system component crash resulting in entire model building getting restarted again from ground up. As such, long running models should have appropriate ‘save’ points such that an operation can be recovered/restored safely without having to restart and without losing mathematical meaning/accuracy

- Underlying technology/infrastructure should help take advantage of high-speed multi-core processors or even better, graphical processing units (GPUs) resulting in faster training/convergence of complex models

- And most importantly, the underlying technology should address all the above concerns/issues in a very seamless fashion with easy-to-use & easy-to-configure system from both algorithm modelers and from system administrators’ point of view

In the next blog, I will describe how Entrigna addresses many, if not all, of these issues/concerns leveraging Entrigna’s RTES technology solution.

From Real-Time Insights To Actionable Decisions - The Road Entrigna Paves

What problem does Entrigna solve?

With tremendous increase in computing power and decrease in memory & storage costs, today’s businesses are weighed down with a deluge of mostly disconnected data, much of it highly relevant to effective decision-making; data associated with business applications, processes, transactions, operations, customers, customer insights, products, product insights, policies, systems, business-partnerships, competition etc. produces unmanageable amounts of information in many different formats. Such data is highly volatile and inherently contains time-sensitive business intelligence reflecting upon momentary business performance. If detected real-time, such ‘in-the-moment’ business intelligence can provide real-time insights that can be used to dynamically determine an optimal action to be taken in real time; Essentially such data would need to be ploughed through to discover valuable knowledge & instantaneous insights in a way to make actionable decisions and feed such decisions in real time back into business applications, processes and operations in a way to drive business value & profitability. A few examples of such insights & decisions are: Product recommendations based on customer actions & purchase behavior, Offer optimization based on customer insights, customer churn prediction, customer disservice detection and recommendation of recovery actions, dynamic pricing of products & pricing optimization based on 'in-the-moment' shopping-versus-buying patterns, real-time predictions of perishable inventory shrink, anomaly detection such as fraud and recommendation of course-correction actions.

To derive intelligence and insights in real-time from such volatile & varying data, there is a real need for technology with capabilities such as Machine Learning, Data Mining, Rules Processing, Complex Event Processing, Predictive Analytics, Operations Research type of Optimizations, Artificial Intelligence and other intelligence-generating Mathematical Algorithmic capabilities coupled with flexibility to mix & match such capabilities for more complex decisions orchestration. The breadth of decision frameworks is necessary because different business objectives require different analytical approaches. For example, a rules engine works great when recognizing a customer for a milestone. Likewise, event processing is well suited for identifying potential customer dis-service scenarios. Finally, optimization techniques work well when making decisions about which promotions to place in front of the customer.

Another challenge such technology would need to address is processing & correlating high-velocity data in live-streams coming from disparate data sources and in wide-variety of formats. As such technology should be highly scalable and fault-tolerant with extensive provisioning for large memory distribution & massive parallelization for executing complex CPU bound operations. From licensing & maintenance point of view, such technology should be cost-effective and economically viable. However, there is no single technology that is readily available offering all these capabilities out-of-the-box and with seamless implementation.

Why is this problem a big deal?

There are commercially available technologies that specialize individually in one required capability or the other. For example, there are sophisticated Business Rules Engines, technologies that excel in Complex Event Processing, those that excel in Data-Mining, in Operations Research, in traditional Business Intelligence etc. Each technology perhaps works well within its realm of decision-making capability. In almost all cases such specialized technologies come from different product vendors. There are a few vendors that offer some of these technologies as part of a suite, but as independent products within the suite. As such these technologies are not necessarily designed to speak to each other & inter-operate. However for real-life complex business decisions orchestration, such capabilities would need to be combined in a flexible and seamless fashion.

dw

Many businesses leverage traditional Business Intelligence technologies alongside Data-Warehouses & Data-Marts with sophisticated data mining capabilities. Such traditional approaches are either time-driven or request-driven. Operational & Transactional data is staged and analytically processed in batch mode. Data mining & model building are static mostly purposed to create reports & feed results to dashboards. Human reasoning is still needed to understand the business insights and trends so that any course-correction actions can be suggested/implemented for adapting business processes to new insights. This entire process of extracting business insights & trends spans from days to weeks and by the time such business insights are applied to business processes, the business environment might evolve further making the insights stale and potentially counter-productive.

Other businesses procure individually specialized technologies as mentioned before and make them inter-operate by developing custom middleware solutions. Even then deriving insights & actionable decisions is not comprehensive because required decision orchestration is still not seamless and not fully realized since it is like a split-brain across disparate technologies. This entire saga is prone to large capital investment spent in procuring disparate technologies and in developing custom middleware solutions to make such heterogeneous technologies inter-operate. As such many business initiatives with an urgent intent of exploiting maximum advantage from real-time insights are prone to long delays with long project timelines. Moreover, because decision orchestration cannot be fully realized, such business initiatives get implemented with limited-scope with many requirements de-scoped resulting in businesses losing original business value proposition, a loss typically measured in millions of dollars, besides losing competitive edge.

How does Entrigna solve this problem?

Entrigna developed a real-time decisions platform called RTES – Real Time Expert System. RTES enables real time decision capabilities by offering full range of decision frameworks packaged in one technology that work together seamlessly; Rules Engine, Complex Event Processing, Machine Learning, Artificial Intelligence, Optimization, Clustering/Classification. Essentially, RTES platform exposes such decision frameworks as built-in modularized services that can be combined and applied on an organization’s business data on a real-time basis to identify intelligent patterns that can lead to real time business insights. By packaging such decision capabilities in one technology, RTES enables seamless orchestration of higher-level decision services - meaning an hybrid decision service can be configured as network of individual decision services, as in electrical circuit, in-series and in-parallel. Individual decision services can be rules based, machine learning based, classification based, segmentation/clustering based, predictive or regressive, real-time optimization based.

Since RTES works on live-streams of data, it promotes event-driven data integration approaches. An event can be any number of things but is usually initiated by some action or activity; examples include, a customer shopping for tennis rackets online, sale of 1000 winter gear items in last 1 hour, a snowstorm being predicted for North-East, a flight getting rescheduled; RTES attempts to extract events from live-streams of data in order to initiate the real time decision process.

RTES processes live-data a.k.a events as they occur, combining the event with other valuable data or other events, gaining intelligence from the data and deciding on an action to be taken. Sometimes, knowledge of the event is sufficient information to derive an insight and take action.  More often, additional data must be leveraged to improve intelligence. For example: customer profile, transaction/sales history, channel interaction history, social activity history, external data like weather & traffic. RTES employs data architecture strategy commonly referred to as Data Virtualization that integrates disparate data sources in real time into a usable format while decoupling data processing aspect from intelligence derivation aspect.

To enable derivation of intelligence, RTES makes it easy to combine different decision frameworks. For example, to implement a specific offer optimization requirement, RTES enables use of decision trees to determine customer value score, clustering to segment customers based on customer attributes, neural networks to assess purchase trends by customer segment, optimization to rank most-value-generating products, additional rules to further personalize offers to a specific customer and CEP to augment offers based on external events such as weather & national events - all of these orchestrated seamlessly within one single technology .i.e. RTES.

ex

Once actionable decisions are determined, RTES enables such decisions to be integrated with business applications, processes and operations to enable action in real-time that impact business outcomes. For example, presenting optimized & personalized offers to a customer in order to help that customer buy his/her choices of products more easily such that business objective of increased product sales is met. RTES makes actionable decisions accessible by means of web services, messaging, direct web-sockets, database adapters and other custom adapters.

tr

RTES enabled machine-Learning and AI based predictive decision models can also learn and adapt based on feedback from actions taken. Online predictive models learn from action feedback in real-time while Offline predictive models learn in batch mode from action feedback that is stored first & consumed later. Typically such feedback is enabled for other business purposes such as Enterprise Data Management & Warehousing and RTES can tap into existing feedback channels, without having to necessarily devise new ways of consuming feedback.

How does Entrigna engage clients?

Below are high level steps that Entrigna would typically follow while initiating a client engagement.

Working collaboratively, Entrigna would engage with client by listening to client’s requirements for leveraging real-time intelligence paying close attention to business goals. This is a mandatory step more so because the concept of real-time intelligence is relatively new and as a product & services provider, it becomes critical for Entrigna to streamline client’s ideas, dotting the I’s and crossing the T’s and in the process steering the client realize much more business value off of real-time insights than initially anticipated. This includes capturing client's thoughts around what they presumed as possible solutions versus what they assumed as infeasible, impracticable, anecdotal or imaginative ones, something they thought not implementable at all because technologies that they are aware of, lacked required capability. Of course, this step is very much preliminary with the understanding that Entrigna would help realize more cases & opportunities for real-time intelligence during the course of actual ensuing implementation.

Once client’s initial requirements are discovered, next natural step is to thoroughly understand two important business aspects; 1] business processes, applications & operations where real-time insights-driven-actions would be implemented and 2] data, types of data, potential sources of existing data and sources of new data that would come into play for plowing through intelligence.

stps

Next step is a bit more involved step where hypotheses for different intelligence scenarios are actually given shape in the form of algorithmic models. Entrigna would employ elements of data-science applying them to client’s specific needs. This is very much a collaborative step wherein client’s subject matter experts would work hand-in-hand to vet the intelligence models. Entrigna would quickly prototype more than one model - trained, tuned & tested. Typically, the differences in the models are more in terms of mixing & matching of underlying decision frameworks. Entrigna would then share & review the model results to verify if what models predicted is close to or exceeding what client anticipated. In case there is a need to increase results accuracy, Entrigna would either further fine tune underlying decision algorithm or replace the existing one with a more advanced decision algorithm, ensuring applicability of such decision algorithms within mathematical feasibility boundaries set by data-science.

Finally, once the models are finalized, Entrigna would determine how decisions would get integrated into and consumed by downline applications, processes or operations; thereby Entrigna would expose models as intelligent decision services with appropriate service access mechanisms such as web services, messaging etc.

Tell us if this blog helped and please do share your comments!!