Just how “Data Intelligent” is your company?

Whilst terms like “Big Data”, “Data Analytics”, “Business Intelligence” and “Data Science” have seemingly being around for many years, not a lot of companies have really understood the boundaries between these, and the interrelationships between them to lead their efforts in data to genuine business impact.

Business impact is the key end goal from any investment in data initiatives in your company. Whilst data exploration is always a useful exercise, if it does not lead to benefit for either your internal organisation or your customers, then it can be a waste of company resources.

538b8bffd2d2baac92398fa8be4536b6

Although the specific approach to the application of analytics – either through BI, Data Science, or application building – may vary according to an enterprise’s needs, it is important to note the broad applicability of BI. Its capacities are constantly expanding to include greater access to more forms of data in intuitive, interactive ways that favor non-technical users. Consequently, the business can do more with the data accessed through these tools in less time than it used to, which makes applying discovery-based BI an excellent starting point for the deployment of analytics. A nice approach outlined by Michael Li of LinkedIn here, shows an EOI model  for driving business value.

02bb6d8

According to Gartner: “By 2015, ‘smart data discovery,’ which includes natural-language query and search, automated, prescriptive advanced analytics and interactive data discovery capabilities, will be the most in-demand BI platform user experience paradigm, enabling mainstream business consumers to get insights (such as clusters, segments, predictions, outliers and anomalies) from data.”

Data Transformation is key

Companies around the globe normally have these questions to answer: Just where is all my data? What format is it in? Can I use it? A large amount of the challenge is maximizing the business impact from your data is to understand what I like to call your “Data Atlas”. And it is normally a journey.  The larger the company, the greater the size of this challenge. Multinationals for example, being in existence for a long period of time have offended for a longer period of time, with it common to have multiple data centers, hosting strategies, database types, data types, data format, and how the data is actually used. It can be difficult for these companies to get their data into the formats required for the latest data software platforms. This can be a time consuming exercise, which can

Looking at the industry, one company that is doing wonders in solving this type of challenge for companies is Analytics Engines, based out of Belfast. Their “Fetch Do Act” methodology offers a click to deploy, end to end big data analytics platform that enables rapid transformation of your data into business insights within a single environment. Check it out here. This major advantage of this approach is that it accelerates your data transformation, so you can focus more of your time on the “Act” element. Remember Big Data is just a tool. 

Defining Data Science?

Explore. Hypothesize. Test. Repeat.

That’s what scientists do. We explore the world around us, come up with hypotheses that generalize our observations, and then test those hypotheses through controlled experiments. The positive and negative outcomes of those experiments advance our understanding of reality. Now one of the best definitions for Data Science I have come across is described by DATAVERSITY™ as:

“Data Science combines the allure of Big Data, the fascination of Unstructured Data, the precision of advanced mathematics and statistics, the innovation of social media, the creativity of storytelling, the investigation and inquiry of forensics, and the ability to use all of those skills together while still being able to demonstrate the results to non-technical audiences.”

Just like in any other Science industry, everything you do with a sample, whether it be biological, chemical or physical, is considered science. Up front analysis, sampling, applying statistics, interpreting and securing the end results.

Beware the Hype

Industry indicates that the hype curve of analytics has peaked, but as it settles, terms like machine learning and predictive analytics are coming up the hype curve, and will have a huge role to play in the coming years. But ensure you only adopt them when the use cases require them. See past the buzz and ensure your strategy takes on board industry trends, but is somewhat unique to the personality of your company. Stay focused, and ensure simplicity is at the forefront of your mind. It is also becoming easier to outsource and partner on some of these advanced methods, typing “machine learning platform” into Google will give numerous results (here).

Customer Centric Analytics

Exploration and experimentation is an important part of your data journey. The key is not to let it become all you do, and to understand the difference insight and impact. Insight does not result in improvement unless you can translate it to business impact. The “data to action” loop below does a nice job of visualizing the difference between data to insight and insight to action.

DataToInsight2

Know your customer. Every data custodian has one. The IT Manager’s customer is the Data Architect, who’s customer is the Data Scientist. They in turn must ensure they meet the requirements of the business sponsor, and having a use case to solve or KPI to meet will help you to build comprehensive return on investment (ROI) statements, and ensure a quicker acceptance in the importance of analytics in your companies business future.

 

Numenta and MemComputing: Perfect AI Synergy

the-brain

Let’s look at two forces of attraction that are happening in the technology space, specifically looking at creating true artificial intelligent systems, utilizing advances in both software and hardware technologies.

For years, even decades we have chased it. AI has been at the top of any list of research interest groups, and while there have been some advances, the pertinent challenge has been that advances in hardware electronics in the 70’s and 80’s occurred, software design was lagging behind. Then, software advanced incredibly in the past decade. So now, in July 2015, we reach a key point of intersection of two “brain based technologies”, which could be built together in a way that may lead to “true AI”.

At no other point in history have we had both hardware and software technologies that can “learn” like we can, whose design is based on how our mind functions.

Numenta

First, let’s look at Numenta. Apart from having the pleasure of reading Jeff Hawkins excellent book “On Intelligence”, I have started to look at all the open source AI algorithms ( github here) that they provide. In a journey that start nine years ago, when Jeff Hawkins and Donna Dubinsky started Numenta, the plan was to create software that was modeled on the way our human brain processes information. Whilst its been a long journey, the California based startup have made accelerated progress lately.

numenta-icon512

Hawkins, the creator of the original Palm Pilot, is the brain expert and co-author of the 2004 book “On Intelligence.” Dubinsky and Hawkins met during their time building Handspring, they pulled together again in 2005 with researcher Dileep George to start Numenta. The company is dedicated to reproducing the processing power of the human brain, and it shipped its first product, Grok, earlier this year to detect odd patterns in information technology systems. Those anomalies may signal a problem in a computer server, and detecting the problems early could save time, money or both. (Think power efficiency in servers)

You might think, hmm, that’s not anything great for a first application of algorithms based on the mind, but its what we actually started doing as neanderthals. Pattern recognition. First it was objects, then it was patterns of events. And so on. Numenta is built on Hawkins theory of Hierarchical Temporal Memory (HTM), about how the brain has layers of memory that store data in time sequences, which explains why we easily remember the words and music of a song. (Try this in your head. Try start a song in the middle.. Or the alphabet.. It takes a second longer to start it). HTM became the formulation for Numenta’s code base, called Cortical Learning Algorithm (CLA), which in turn forms the basis of applications such as Grok.

Still with me? Great. So that’s the software designed and built on the layers of the cortex of our brains. Now lets look at the hardware side.

 

Memcomputing

After reading this article on Scientific American recently, and at the same time as reading Hawkins book, I really began to see how these two technologies could meet somewhere, silicon up, algorithms down.

Memelements

A new computer prototype called a “memcomputer” works by mimicking the human brain, and could one day perform notoriously complex tasks like breaking codes, scientists say. These new, brain-inspired computing devices also could help neuroscientists better understand the workings of the human brain, researchers say.

In a conventional microchip, the processor, which executes computations, and the memory, which stores data, are separate entities. This constant transfer of data between the processor and the memory consumes energy and time, thus limiting the performance of standard computers.

In contrast, Massimiliano Di Ventra, a theoretical physicist at the University of California, San Diego, and his colleagues are building “memcomputers,” made up of “memprocessors,” that can actually store and process data. This setup mimics the neurons that make up the human brain, with each neuron serving as both the processor and the memory.

I wont go into specifics of the building blocks of how they are designed, but its based on three basic components of electronics – capacitors, resistors and inductors, or more aptly called memcapacitors, memresistors and meminductors. The paper describing this is here.

Di Ventra and his associates have built a prototype that are built from standard microelectronics. The scientists investigated a class of problems known as NP-complete. With this type of problem, a person may be able to quickly confirm whether any given solution may or may not work but can’t quickly find the best solution. One example of such a conundrum is the “traveling salesman problem,” in which someone is given a list of cities and asked to find the shortest route from a city that visits every other city exactly once and returns to the starting city. Finding the best solution is a brute force exercise.

The memprocessors in a memcomputer can work together to find every possible solution to such problems. If we work with this paradigm shift in computation, those problems that are notoriously difficult to solve with current computers can be solved more efficiently with memcomputers,” Di Ventra said. In addition, memcomputers could tackle problems that scientists are exploring with quantum computers, such as code breaking.

Imagine running software that is designed based on our minds, on hardware that is designed on our minds. Yikes!

In a future blog, I will discuss what this means in the context of the internet of things.

brain-computer

 

 

Distributed Analytics in IoT – Why Positioning is Key

analytics-word-cloud

The current global focus on the “Internet of Things (IoT)” have highlighted extreme importance of sensor-based intelligent and ubiquitous systems contributing to improving and introducing increased efficiency into our lives. There is a natural challenge in this, as the load on our networks and cloud infrastructures from a data perspective continues to increase. Velocity, variety and volume are attributes to consider when designed your IoT solution, and then it is necessary to design where and where the execution of analytical algorithms on the data sets should be placed.

Apart from classical data centers, there is a huge potential in looking at the various compute sources across the IoT landscape. We live in a world where compute is at every juncture, from us to our mobile phones, our sensor devices and gateways to our cars. Leveraging this normally idle compute is important in meeting the data analytics requirements in IoT. Future research will attempt to consider these challenges. There are three main classical architecture principles that can be applied to analytics. 1: Centralized 2: Decentralized and 3: Distributed.

The first, centralized is the most known and understood today. Pretty simple concept. Centralized compute across clusters of physical nodes is the landing zone (ingestion) for data coming from multiple locations. Data is thus in one place for analytics. By contrast, a decentralized architecture utilizes multiple big distributed clusters are hierarchically located in a tree like architecture. Consider the analogy where the leaves are close to the sources, can compute the data earlier or distribute the data more efficiently to perform the analysis. This can have some form of grouping applied to it, for example – per geographical location or some form of hierarchy setup to distribute the jobs.

Lastly, in a distributed architecture, which is the most suitable for devices in IoT, the compute is everywhere. Generally speaking, the further from centralized, the size of the compute decreases, right down to the silicon on the devices themselves. Therefore, it should be possible to push analytics tasks closer to the device. In that way, these analytics jobs can act as a sort of data filter and decision maker, to determine whether quick insight can be got from smaller data-sets at the edge or beyond, and whether or not to push the data to the cloud or discard. Naturally with this type of architecture, there are more constraints and requirements for effective network management, security and monitoring of not only the devices, but the traffic itself. It makes more sense to bring the computation power to the data, rather than the data to a centralized processing location. 

There is a direct relationship between the smartness of the devices and the selection and effectiveness of these three outlined architectures. As our silicon gets smarter and more powerful and efficient, this will mean that more and more compute will become available, which should result in the less strain on the cloud. As we distribute the compute, it should mean more resilience in our solutions, as there is no single point of failure.

In summary, the “Intelligent Infrastructures” now form the crux of the IoT paradigm. This means that there will be more choice for IoT practitioners to determine where they place their analytics jobs to ensure they are best utilizing the compute that is available, and ensuring they control the latency for faster response, to meet the real time requirements for the business metamorphosis that is ongoing.

Why IoT practitioners need to “Wide Lens” the concept of a Data Lake

As we transition towards the vast quantity of devices that will be internet enabled by 2020, (anything from 50-200 billion experts estimate), it seems that the current cloud architectures that are being proposed are somewhat short on the features required to enable the customers data requirements on 2020.

I wont dive hugely into describing the technology stack of a Data Lake in this post (Ben Greene from Analytics Engines in Belfast, who I visit on Wednesday en route to Enter Conf, does a nice job here of that in his blog here). A quick side step, if you look at the Analytics Engines website, I saw that customer choice and ease of use were some of their architecture pillars, when providing their AE Big Data Analytics Software Stack. Quick to deploy, modular, configurable  with lots of optional high performance appliances. Its neat to say the least, and I am looking forward to seeing more.

The concept of a Data Lake has a large reputation in current tech chatter, and rightly so. Its got huge advantages in enterprise architecture scenarios. Consider the use case of a multinational company, with 30,000+ employees, countless geographically spread locations, multiple business functions. So where is all the data? Its normally a challenging question, with multiple databases, repositories and more recently, hadoop enabled technologies storing the companies data. This is the very reason why a business data lake (BDL) is a huge advantage to the corporation. If a company has a Data Architect at its disposal, then it can develop a BDL architecture (such as shown below, ref – Pivotal) that can be used to act as a landing zone for all their enterprise data. This makes a huge amount of sense. Imagine being the CEO of that company, and as we see changes in the Data Protection Act(s) over the next decade, a company can take the right step towards managing, scaling and most importantly protecting their data sets. All of this leads to a more effective data governance strategy.

Pivotal-Data-Lake

Now shift focus to 2020 (or even before?). And lets take a look at the customer landscape. The customers that will require what the concept of a BDL now provides will need far more choice. And wont necessarily be willing to pay huge sums for that service. Now whilst there is some customer choice of today, such as Pivotal Cloud Foundry, Amazon Web Services, Google Cloud and Windows Azure, it is predicted that even these services are targeted at a consumer base of a startup and upwards in the business maturity life cycle. The vast majority of cloud services customers in the future will be everyone around us, the homes we live in and beyond. And the requirement to store data in a far distance data center might not be as critical for them. It is expect they will need far more choice.

I expect in the case of building monitoring data, which could be useful to the wider audience in a secure linked open data sets (LOD’s) topology. For example, smart grid provider might be interested in energy data from all the buildings and trying to suggest optimal profiles for them to reduce impact on the grid. Perhaps the provider might even be willing to pay for that data? This is where data valuation discussions come into play, and is outside the scope of the blog. But the building itself, or its tenants might not need to store all their humidity and temperature data for example. They might some quick insight up front, and then might choose bin that data (based on some simple protocol describing the data usage) in their home for example).

Whilst a BDL is built on the premise of “Store Everything”, it is expected that whilst that will bring value for these organisations monitoring consumers of their resources, individual consumers might not be willing to pay for this.

To close, the key enablers to these concepts are the ensure that real time edge analytics and increased data architecture choice. And this is beginning to happen. Cisco have introduced edge analytics services into their routers, and this is a valid approach to ensuring that the consumer has choice. And they are taking the right approach, as there is even different services for different verticals (Retail, IT, Mobility).

In my next blog, Edge Analytics will be the focus area, where we will dive deeper into the question. “where do we put our compute?”

Case Study: IoT Technology Platform – ThingWorx [10]

In my previous blog, I mentioned some platform design considerations at the outset. In this blog, I discuss one such Platform that has gained significant traction in the industry in recent times.

About ThingWorx10

ThingWorx is one of the first software platforms designed to build and run the applications of the connected IoT world. ThingWorx reduces the cost, time, and risk required to build innovative Machine-to-Machine (M2M) and Internet of Things (IoT) applications.

The ThingWorx platform provides a complete application design, runtime, and intelligence environment with the below features:

  • Modern and Complete Platform
  • Mashup People, Systems & Machines
  • Deploy 10X Faster with Model-based Development
  • Deploy How You Like
  • Evolve & Grow Your Application Over Time

What ThingWorx does that was really clever was that they created a modelling environment based on a database of graphs that keeps track of thousands of devices that communicate with other devices and applications.

“There’s nothing new about gathering and using data to make something better. What is new, and complex, is getting these things that are now web-enabled to take better advantage of the IoT. This requires application developers to rethink how they collect, analyze, manipulate and interact with information,” said Russ Fadel, CEO, ThingWorx9. “ThingWorx is the first software platform on the market designed to build and run applications in the connected IoT world and offers a fully integrated and pre-architected solution that covers connectivity, event processing, analytics, storage and presentation of any kind of M2M and IoT data. Our goal is to provide customers with instant insight into collected data from these smart, connected things so they can be proactive and address issues before they happen in a smarter way than previously able.”10

Figure 7: ThingWorx Architecture [10]
Figure 7: ThingWorx Architecture [10]

Features10

ThingWorx Composer™

ThingWorx Composer is an end-to-end application modeling environment designed to help you easily build the unique applications of today’s connected world. Composer makes it easy to model the Things, Business Logic, Visualization, Data Storage, Collaboration, and Security required for a connected application.

Codeless Mashup Builder

ThingWorx “drag and drop” Mashup Builder empowers developers and business users to rapidly create rich, interactive applications, real-time dashboards, collaborative workspaces, and mobile interfaces without the need for coding. This next-generation application builder reduces development time and produces high quality, scalable connected applications which allows companies to accelerate the pace at which they can deliver value-add solutions, resulting in greater market share against new and existing competitors.

Event-Driven Execution and “3D” Storage

ThingWorx’s event-driven execution engine and 3-Dimensional storage allows companies to make business sense of the massive amounts of data from their people, systems, and connected “Things” – making the data useful and actionable. The platform supports scale requirements for millions of devices, and provides connectivity, storage, analysis, execution, and collaboration capabilities required for applications in today’s connected world. It also features a data collection engine that provides unified, semantic storage for time-series, structured, and social data at rates 10X faster than traditional RDBs.

Search-based Intelligence

ThingWorx SQUEAL™ (Search, Query, and Analysis) brings Search to the world of connected devices and distributed data. With SQUEAL’s interactive search capabilities, users can now correlate data that delivers answers to key business questions. Pertinent and related collaboration data, line-of-business system records, and equipment data get returned in a single search, speeding problem resolution and enabling innovation.

Collaboration

ThingWorx dynamically and virtually brings together people, systems, and connected equipment, and utilizes live collaboration sessions that help individuals or teams solve problems faster. The ThingWorx data store becomes the basis of context aware collaboration and interaction among the systems users, further enhancing its value. Additionally, the tribal knowledge exposed during the process is automatically captured and indexed for use in future troubleshooting activities.

End of Case Study

References 

10: ThingWorx: About ThingWorx

http://www.thingworx.com/

IoT meets Data Intelligence: Instant Chemistry

Even in the ideal world of a perfect network topology, a web of sensors, a security profile, a suitable data center design, and lots of applications for processing and analyzing, one thing is constant across all of these, the data itself. Data science is well talked about, and careers have been built from the concept. It is normally aimed at the low hanging fruit of a set of data, things that are easily measured. Science will take you so far, but it is data intelligence that will show the true value, with capability to predict impact from actions, and track this over time, to build modelling engines to solve future problems.

Even the data set is different for data intelligence as opposed to data science, which relies on lots and lots of data sets (Facebook, working out effectiveness of their changes/features etc). It is more complex, smaller even, and can be a data set contained in a single process or building.  Imagine a hospital’s set of machines producing live data to an analytics engine, and using historical models to compare live data to gauge risk to the patients? It can have real tangible benefit to life quality. Commonly called “Operational Intelligence”, the idea is to apply real time analytics to live data with very low latency. It’s all about creating that complete picture: historical data and models working with live data to provide a solution that can potentially transform all kinds of industry.

At the core of any system of this kind is decision making. Again, one must strive to make this as intelligent as possible. There are two types of decision making. The first is stagnant decision making and the second is dynamic decision making. With the assistance of mathematical models and algorithms, it will be possible for any IoT data set to analyze the further implications of alternative actions. As such, one would predict that efficiency of decision making would be increased.

At the IoT device level, there is scope to apply such a solution. Given the limited storage capacity on the devices themselves, a form of rolling deterministic algorithm that looks to analyse a set of sensor readings, and produce an output of whether or not to send a particular measurement to the intelligent gateway or cloud service.

Another proposed implementation on-device might be to use a deviation from correctness model, such as the Mahalanobis-Taguchi Method, which is an information pattern technology, which has been used in different diagnostic applications to help in making quantitative decisions by constructing a multivariate measurement scale using data analytic methods. In the MTS approach, Mahalanobis distance (MD, a multivariate measure) is used to measure the degree of abnormality of patterns and principles of Taguchi methods are used to evaluate accuracy of predictions based on the scale constructed. The advantage of MD is that it considers correlations between the variables, which are essential in pattern analysis. Given that it can be used on a relatively small data set, with the greater the number of historical samples the greater the model to compare it to, it could be utilized in the example of hospital diagnosis. Perhaps the clinician might need a quick on-device prediction around a patient’s measurement closeness to a sample set of recent hospital measurements?

Taking this one stage further, if we expanded this to multiple hospitals, could we start to think about creating linked data sets, that would be pooled together to extract intelligence. What if a weather storm is coming? Will it affect my town or house? Imagine if we could have sensors on each house, tracking the storm in real time and try to predict the trajectory and track direction changes and the service could then communicate directly with the home owners in the path.

With the premise of open source software, consider now the concept of open data sets, linked or not. Imagine if I was the CEO of a major company in oil and gas, and I was eager to learn from other companies in my sector, and in reverse allow them to learn from us through data sets. By tagging data by type (financial, statistical, online statistical, manufacturing, sales, for example) it allows a metadata search engine to be created, which can be then be used to gain industry wide insight at the click of a mouse. The tagging is critical, as the data is not then simply a format, but descriptive also.

Case Study: Waylay IoT and Artificial Intelligence11

Waylay, an online cloud native rules engine for any OEM maker, integrator or vendor of smart connected devices, proposes a strong link11 between IoT and Artificial Intelligence.

Waylay proposes a central concept for AI, called the rational agent. By definition, an agent is something that perceives its environment through sensors and acts accordingly via actuators. An example of this is a robot utilizes camera and sensor technology and performs an action i.e. “Move” depending on its immediate environment. (See figure 8 on next page).

To extend the role of an agent, a rational agent then does the right thing. The right thing might depend on what has happened and what is currently happening in the environment.

Figure 8: Agent and Environment Diagram for AI [11]
Figure 8: Agent and Environment Diagram for AI [11]
Typically, Waylay outlines that an agent consists of an architecture and logic. The architecture allows it to ingest sensor data, run the logic on the data and act upon the outcome.

Waylay has developed a cloud-based agent architecture that observes the environment via software-defined sensors and acts on its environment through software-defined actuators rather than physical devices. A software-defined-sensor can correspond not only to a physical sensor but can also represent social media data, location data, generic API information, etc.

Figure 9: Waylay Cloud Platform and Environment Design [11]
Figure 9: Waylay Cloud Platform and Environment Design [11]
For the logic, Waylay has chosen graph modeling technology, namely Bayesian networks, as the core logical component. Graph modeling is a powerful technology that provides flexibility to match the environmental conditions observed in IoT. Waylay exposes the complete agent as a Representational State Transfer (REST) service, which means the agent, sensors and actuators can be controlled from the outside, and the intelligent agent can be integrated as part of a bigger solution.

In summary, Waylay has developed a real-time decision making service for IoT applications. It is based on powerful artificial intelligence technology and its API-driven architecture makes it compatible with modern SaaS development practices.

End of Case Study 

Reference:

11: Waylay: Case study AI and IoT

http://www.waylay.io/when-iot-meets-artificial-intelligence/