A startup working on an IoT home water usage and leak detection monitor hired us build a crucial feature of their product. The purpose of the device is to help homeowners monitor and reduce their home water consumption by detecting inefficiencies and minor leaks as well as to shut off the water in the case of a catastrophic leak. Equipped with sensors that measure flow rate, temperature, and pressure, the device is installed on the main water line of a house. Data the device collects is transmitted wirelessly and homeowners can then check their usage and receive reports and alerts via a mobile app.
We essentially designed the ‘smart’ support for this smart device in the form of a Machine Learning system that continually evolves with a household’s water consumption, detecting inefficiencies such as running toilets and dripping pipes while tracking big picture trends and changes in water usage rates.
When they came to us, the company had already designed the device and most of the non-machine learning related backend. What they needed was the essential data infrastructure that could handle the volume of data being produced as well as the Machine Learning algorithms that could analyze the data, consistently output results, and enable the device to deliver on its promise of helping users save money and prevent leaks.
Scalability to handle large volumes of data was one of our chief concerns in designing the data pipeline for this project. With each device providing 10 data points per second, a single device produces 864,000 points per day. That means that just 12 devices will produce over 10 million data points daily. What was needed was a pipeline meticulously engineered for scalability and performance to handle increasing load as the company grows and more devices are installed.
Additionally, the system required a combination of robustness and sensitivity. The initial ML, or Machine Learning system, needed to be able to quickly estimate and recognize typical water usage patterns for a given house and, at the same time, be robust enough to detect outliers. Occasional spikes in water usage should not be confused with catastrophic leaks and lead to unnecessary shutoffs. The system also needed to be designed so that individual components of it could be independently scaled to account for a wide range of loads, as different households have different baseline water usage, depending on habits, efficiency of household appliances, and other factors.
Finally, the entire system needed to be auditable. Each device may have up to 5 parameters at a time, each of which could be independently reviewed for proposed updates. New parameters based on usage changes will also be suggested by the ML system. These can be either automatically or manually approved, depending upon the system’s configuration. All suggested changes to parameters, whether accepted or rejected, would be tracked.
We designed and implemented the algorithms, data pipeline and Restful API needed for estimating and managing device parameters. The result was a system that will continually test for changes in the underlying data distribution, ie water usage, and make or suggest appropriate updates to the device’s parameters. The system is cloud-native and runs on AWS, using the managed services provided, which means that the client never has to deal with backups and can dynamically scale computing capacity.