[4/23] Data Sources: A Cost-Effective Unified Data Platform for Small Organizations

By Eric Burt on November 17, 2020 in Analytics, Data Engineering

Data Sources

The unified data platform consists of four different parts, starting with all the sources producing data. This component is illustrated in the left column in the figure. Here, the data sources are split into three different categories. The first category, ‘Applications & APIs’, consists of third party web applications that collect data. For example, Google Analytics collects statistics about website traffic. The data in these applications can be programmatically accessed using application programming interfaces (APIs). APIs are sets of functions and interfaces that allow computer programs to interact with each other.

The next category of data sources is the different databases used by the small business. This includes relational OLTP databases that store operational and transitional data such as sales or logistics. While the Bible recipient database is still under consideration, there are WordPress databases that contain valuable insights and analytical uses. Most of the small organization’s websites are based on the WordPress content management system. WordPress uses a MySQL database for its database management system. MySQL is an open-source relational database where all the necessary website data is stored. It contains simple information such as usernames and passwords, in addition to valuable analytical data such as posts, pages, and comments. This data can be combined with Google Analytics web traffic as well as custom clickstream tracking data for significant analytical effect.

The last category is unstructured data. Unstructured data is a smaller part of the data sources and refers to non-columnar data files, such as those in Google Drive. Application logs belong to this category as well. Tracking application log data will allow us to understand our application usage, performance, and security. Log data can also be used by software engineers to identify and troubleshoot bugs quickly.

______________________________________________________________________________

This post is part of a 23 part mini-series about implementing a cost-effective modern data infrastructure for a small organization. This is a small part of a whitepaper that will be released at the end of this series.

Leave a Reply

Your email address will not be published. Required fields are marked *