Apart from the sheer volume of data generated by IoT devices, there are also a huge number of different data customers requirements, both known and unknown that will need to be considered. In this regard, the platform technology will need to be agile enough to meet this variation. How will this scale both horizontally and vertically to ensure sustainability? I started to think of profiling requirements, and looking to give personality to the IoT customer type, so that the platform can morph and adjust itself based on not only the inputs (data type, frequency, format, lifetime), but also what outputs it needs to provide.
Data latency will also be a requirement that any platform will need to firstly understand, and then address, depending on the application and customer requirements. In an interesting discussion today in Silicon Valley with Jeff Davis (my original hiring manager in EMC, and now senior director of the xGMO group looking at operations cloud, analytics and infrastructure services ), he mentioned having worked in a previous company in the sensor business, latency represented a huge challenge, especially when the amount of data grew exponentially. We chatted more and more about how the consumer of now wants their devices/ technology interactions to be instant. How long will people be willing to wait for smart light bulbs/ switches? What if my devices are distributed? More importantly, Jeff outlined a key question. “How much are the consumer willing to pay for the added services provided by adding “smarts” to standard everyday sensors”? This is a “understand the market” question, and should be a consideration for anyone looking at building an IoT platform.
When one starts to consider that most applications in the IoT space might require more than one industry working together, cross collaboration is key to making it work. Consider some of the taxi apps in use currently, whereby the taxi company provides the car locations, the application needs to offer information on locations, then the banking is used to pay for it from your account, and perhaps there is advertisement shown on your receipt, if a suitable arrangement is not formed between the various It companies, it becomes too easy for the “blame game” to ruin the user’s experience of the application when something goes wrong.
Central to the satisfying both the varying requirements of the customers and latency management will be the concept of a customer or business data lake, powered by Hadoop or Spark technology, will form the primary storage and processing in the data center. There is also an option to look at tiering to help address the variation in requirements for the platform, with the possibility to send the “big hitting data”, which brings the most value in close to real time, to an in memory database, to provide fast cache insightful analytics. In a later blog post, I will elaborate greatly on this paragraph, so stay tuned. If the same dataset can be used by multiple applications, in a multi-tenant schema, then there will be clear orchestration challenges in ensuring that this data can be processed in real time. Other features of any data architecture for IoT could also include:
- Multiple Data Format Support
- Real Time Processing
- High Volume Data Transfer
- Geographically Agnostic
- Data Lake Archival and Snipping
As with all technology, IoT will evolve, which means that we will build on top of previous technologies, and new technologies will add to the ecosystem. The enterprise data warehouse will continue to play an important role, but a series of technology platforms will be necessary. While numerous platforms have and will be created, one such platform, ThingWorx is the subject of case study in my next blog.