A Maker’s take on “build versus buy” for data stream networks

Recently, I set out to build a smart thermostat to monitor the ambient room temperature in my home and chart these values on my Android phone. I wanted to track the temperature of my house from anywhere in the world and get notified when the values exceeded a threshold. Seems unassumingly simple, right?

When I got down to the nitty gritty details of the project, just the sheer number of design choices I had to make made me reconsider whether or not I wanted to build the system at all. From which board and sensor to use to the available real-time network infrastructure to the charting library, there are several options out there. That’s a good thing, but which one(s) do I pick?

First, I needed to start by defining the specific technologies for an IoT system like my smart thermostat to fully understand the project’s scope.

Essential components of a real-time intelligent system

With the widespread use of mobile devices there is a growing need to move data around in real time, be it social chat applications, multi-player games, or financial streams. The dawn of IoT added to this phenomenon, but in its infancy the IoT consisted mostly of devices out in the field sending data to huge databases. This data was unidirectional, and stored only to be analyzed at a later time.

In this day and age, however, IoT consists of the ability to bi-directionally send and receive data between devices (lights, phones, sprinklers, servers, or databases) in real time. Whether it is social streams, sensor streams, or game player movements, real-time data needs to move between devices instantly, the moment it is generated. The infrastructure that routes and delivers this data in real time is called a data stream network (DSN).

A smart DSN is exactly what‘s needed for the bi-directional data transfer between my connected home thermostat and my phone, with the data in this instance being the temperature sensor value sent every second. Alternatively, I could use any microprocessor board (such as Arduino, Raspberry Pi, or tons of others) to send any information (for example soil humidity or motion sensor values), but despite these design choices the infrastructure needs to send data streams from a device that generates data to devices that need to receive it in real time, which is what a DSN provides. When selecting a DSN, the options include using one of several open-source technologies like Socket.io to build your own network, or taking advantage of existing infrastructure to hook up devices.

DIY/building a real-time data stream network:

The IoT makes building a DSN just a little harder than it already is because it involves several non-traditional devices talking to each other. Whoever thought that a dog collar would be able to periodically send messages to your phone and give stats about the dog’s location, heart rate, and the speed it’s running? The same applied to my smart thermostat in which a phone communicates with a connected device in my house.

I figured that since I only had a couple of devices, getting started building a real-time DSN would be pretty easy. However, whether it’s a few devices or millions of them, I quickly realized there is a lot of setup involved to ensure real-time, reliable communication between devices. What started off as an “easy” task of building a real-time IoT application quickly turned into intensive work building and, more importantly, maintaining the real-time network.

The problems I encountered while spending all of my time and energy on the DSN itself were principally around the scale, security, and management of the DSN build process:

  • Platforms supported: Today my application may use an Arduino attached to the temperature sensor and an Android phone to display the dashboard. Tomorrow, I might want to add a Raspberry Pi to control the lights in house with an iPhone as the remote control. A real-time network must be capable enough to support multiple platforms (be they mobile, web, or IoT devices) because you blink once and there are several more devices on the market; there are 16 types of Arduinos themselves, without counting Arduino-compatible boards.. It isn’t sufficient to support just a couple of them since you never know which board someone (yourself included) might use in their next IoT project.
  • Latency: A real-time network must be able to route data between devices in real time, hence the name. These days we expect everything to be in real time, be it chat, taxi dispatch, multi-player game, or home automation. The lights have to go on and off at the tap of a button, not 5 seconds later. This latency needs to be the same irrespective of where the devices are. I could be in Australia and still set the temperature at my house in California, instantly. The choice of Internet connectivity for the devices is plentiful (my Arduino or any other board can be connected to Wi-Fi, Ethernet, or cellular network), but regardless, systems as a whole need to send and receive messages in real time.
  • High availability: The above-mentioned low latency can only be achieved if the network is highly available. Having several points of presence for the network is very much necessary so devices can send and receive messages in real time by connecting to the closest datacenters to themselves. Even in the unfortunate case that a few or more datacenters go down, the rest of them can continue operating the real-time network.
  • Scaling: Building a real-time application in the lab is easy – you have a couple of devices connected to a network, there are no security/firewall issues to deal with, no port forwarding required, no latency issues, etc. But scaling this setup to multiple devices that can be scattered around the globe is much more difficult. To start off with, you have to implement a method for detecting which data center and port to connect the client to, create mechanisms that allow different platforms to transmit data to/from each other, and figure out how to retrieve lost messages when a client loses connection. No doubt, it’s really hard to build a real-time network to support a modest number of users, but maintaining that infrastructure is the test. All your time and money is spent building the infrastructure to support your real-time application, instead of the app logic itself.
  • Security: Everybody talks about device security, and rightfully so. It is not something that can be overlooked or compromised on. The same applies to a real-time network. Security has to be baked into it, and if you are building this network for your applications, you have to do this yourself. Data that flows through the network shouldn’t be visible to anyone, and the payloads have to be securely encrypted. Building authentication and authorization systems for users is another key consideration for the network you build, and another hidden obstacle in the DIY approach, especially if you eventually need to deal with legislative issues like SafeHarbor or HIPAA.

Using an existing real-time data stream network

On the other hand, there are several companies that provide a real-time network solution out of the box. Some of them provide the network to route data; some of them provide the ability to send text, voice, and VoIP messages in real time; while others help build comprehensive real-time applications. It really depends on what you are building, but partnering with a DSN provider will undoubtedly save you time during the development of your application.

I recently came across a post that gives a comprehensive list of all the real time network solutions out there. In part two of this article, I’ll share the perspective of a commercial vendor, Emberlight, who went through a similar DSN build versus buy evaluation.

Bhavana Srinivas is a developer evangelist at PubNub.