Effective data analytics provides enormous business value for many organizations.
Effective data analytics provides enormous business value for many organizations. As ever-greater amounts of diverse data become available, analytics can provide even more value. But to benefit from this change, your organization must embrace the new approaches to data analytics that cloud computing makes possible.
Microsoft Azure provides a broad set of cloud technologies for data analysis, designed to help you derive more value from your data. These services include the following:
Azure SQL Data Warehouse, providing scalable relational data warehousing in the cloud.
Azure Blob Storage, commonly called just Blobs, provides low-cost cloud storage of binary data.
Azure Data Lake Store, implementing the Hadoop Distributed File System (HDFS) as a cloud service.
Azure Data Lake Analytics offers U-SQL, a tool for distributed data analysis in Azure Data Lake Store.
Azure Analysis Services, a cloud offering based on SQL Server Analysis Services.
Azure HDInsight, with support for Hadoop technologies, such as Hive and Pig, along with Spark.
Azure Databricks, a Spark-based analytics platform.
Azure Machine Learning is a set of data science tools for finding patterns in existing data, then generating models that can recognize those patterns in new data.
You can combine these services as needed to analyze both relational and unstructured data.
This might require extracting data from where it originates (such as in one or more operational databases), then loading it into where it needs to be for analysis (such as in a data warehouse).
You might also need to transform the data in some ways during this process. And while all of these tasks can be done manually, it usually makes more sense to automate them.
Azure Data Factory (ADF) is designed to help you address challenges like these. This cloud-based data integration service is aimed at two distinct worlds: big data and traditional data warehousing.
The big data community, which relies on technologies for handling large amounts of diverse data.
For this audience, ADF offers a way to create and run ADF pipelines in the cloud. A pipeline can access both on-premises and cloud data services. It typically works with technologies such as Azure SQL Data Warehouse, Azure Blobs, Azure Data Lake, Azure HD Insight, Azure Databricks, and Azure Machine Learning.
The traditional relational data warehousing community, which relies on technologies such as SQL Server. These practitioners use SQL Server Integration Services (SSIS) to create SSIS packages. A package is analogous to an ADF pipeline; each defines a process to extract, load, transform, or otherwise work with data.
ADF allows this audience to run SSIS packages on Azure and access both on-premises and cloud data services.
The critical point is this: ADF is a single cloud service for data integration across all of your data sources, whether they’re on Azure, on-premises, or on another public cloud such as Amazon Web Services (AWS).
It provides a single set of tools and a common management experience for all of your data integration. What follows takes a closer look at ADF, starting with ADF pipelines.