Sharing Data with Azure

Sharing data is a challenge vexed by antagonistic forces across several dimensions including: security, integrity, auditing, monitoring, trust, compliance, accessibility and inference vulnerability.

This document takes a high-level view of what’s involved when considering sharing data with people inside your company and with business partners outside your company. It includes references to the Azure solutions available to address these considerations.

Data sharing is challenging

DataSharingIsChallenging

GDPR – Essential considerations from Safeguard individual privacy with the Microsoft Cloud

GDPR and Data Governance: Your guide to Microsoft tools and technologies

DataSharingStakeholder

What data is to be shared?

Data classification

Azure SQL Database Data Discovery and Classification

Azure SQL Database Data Discovery and Classification: Data Discovery & Classification (currently in preview) provides advanced capabilities built into Azure SQL Database for discovering, classifying, labelling & protecting the sensitive data in your databases. Discovering and classifying your most sensitive data (business, financial, healthcare, personally identifiable data (PII), and so on.) can play a pivotal role in your organizational information protection stature. It can serve as infrastructure for:

  1. Helping meet data privacy standards and regulatory compliance requirements.
  2. Various security scenarios, such as monitoring (auditing) and alerting on anomalous access to sensitive data.
  3. Controlling access to and hardening the security of databases containing highly sensitive data.

Azure Information Protection

Azure Information Protection: Configure policies to classify, label and protect data based on its sensitivity. Classification with Azure Information Protection is fully automatic, driven by users or based on recommendation.

Data Catalog

Data Catalog: Azure Data Catalog is a fully managed cloud service whose users can discover the data sources they need and understand the data sources they find. At the same time, Data Catalog helps organizations get more value from their existing investments.

With Data Catalog, any user (analyst, data scientist, or developer) can discover, understand, and consume data sources. Data Catalog includes a crowdsourcing model of metadata and annotations. It is a single, central place for all of an organization’s users to contribute their knowledge and build a community and culture of data.

Data provider and consumer security

Collaborate securely with partners and customers

Engage with users outside your organisation while maintaining control over your sensitive apps and data. It’s easier than ever for customers and partners to connect with your business using a simple sign-up experience that works with multiple identity providers, including Microsoft, LinkedIn, Google and Facebook. You can also use customisation options to modify the web and mobile experience to fit your brand.

Role-based access control (RBAC)

RBAC: Access management for cloud resources is a critical function for any organization that is using the cloud. Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.

RBAC is an authorization system built on Azure Resource Manager that provides fine-grained access management of resources in Azure.

Azure Key Vault

Azure Key Vault: Azure Key Vault helps solve the following problems:

  • Secrets Management – Azure Key Vault can be used to Securely store and tightly control access to tokens, passwords, certificates, API keys, and other secrets.
  • Key Management – Azure Key Vault can also be used as a Key Management solution. Azure Key Vault makes it easy to create and control the encryption keys used to encrypt your data.
  • Certificate Management – Azure Key Vault is also a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with Azure and your internal connected resources.
  • Store secrets backed by Hardware Security Modules – The secrets and keys can be protected either by software or FIPS 140-2 Level 2 validates HSMs.

How is the data shared?

Azure Information Protection

Azure Information Protection: Share data safely with colleagues as well as your customers and partners. Define who can access data and what they can do with it – such as allowing to view and edit files, but not print or forward.

Database Access

This section references generic database objects and methods that are recognised and implemented by a wide variety of Relational Database Management Systems (RDBMS). However the links and sub text are specific to Microsoft SQL Server as an exemplar.

Row-Level Security

Row-Level Security enables customers to control access to rows in a database table based on the characteristics of the user executing a query (for example, group membership or execution context).

Row-Level Security (RLS) simplifies the design and coding of security in your application. RLS helps you implement restrictions on data row access. For example, you can ensure that workers access only those data rows that are pertinent to their department or restrict customers’ data access to only the data relevant to their company.

The access restriction logic is located in the database tier rather than away from the data in another application tier. The database system applies the access restrictions every time that data access is attempted from any tier. This makes your security system more reliable and robust by reducing the surface area of your security system.

SQL Database dynamic data masking

SQL Database dynamic data masking: SQL Database dynamic data masking limits sensitive data exposure by masking it to non-privileged users.

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.

Database replicas

Readable Secondary Replicas (AlwaysOn Availability Groups):

Directing read-only connections to readable secondary replicas provides the following benefits:

  • Offloads your secondary read-only workloads from your primary replica, which conserves its resources for your mission critical workloads. If you have mission critical read-workload or the workload that cannot tolerate latency, you should run it on the primary.
  • Improves your return on investment for the systems that host readable secondary replicas.

In addition, readable secondaries provide robust support for read-only operations, as follows:

  • Temporary statistics on readable secondary database optimize read-only queries. For more information, see Statistics for Read-Only Access Databases, later in this topic.
  • Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.

PolyBase

PolyBase: PolyBase enables your SQL Server 2016 instance to process Transact-SQL queries that read data from Hadoop. The same query can also access relational tables in your SQL Server. PolyBase enables the same query to also join the data from Hadoop and SQL Server. In SQL Server, an external table or external data source provides the connection to Hadoop.

With the underlying help of PolyBase, T-SQL queries can also import and export data from Azure Blob Storage. Further, PolyBase enables Azure SQL Data Warehouse to import and export data from Azure Data Lake Store, and from Azure Blob Storage.

PolyBase thus allows data in the external repositories above to be shared with database and data warehouse users without granting explicit permissions to these repositories for these users.

Database Views

A database view is a virtual table whose contents are defined by a query. Like a table, a view consists of a set of named columns and rows of data. Unless indexed, a view does not exist as a stored set of data values in a database. The rows and columns of data come from tables referenced in the query defining the view and are produced dynamically when the view is referenced.

A view acts as a filter on the underlying tables referenced in the view. The query that defines the view can be from one or more tables or from other views in the current or other databases. Distributed queries can also be used to define views that use data from multiple heterogeneous sources. This is useful, for example, if you want to combine similarly structured data from different servers, each of which stores data for a different region of your organization.

Views are generally used to focus, simplify, and customize the perception each user has of the database. Views can be used as security mechanisms by letting users access data through the view, without granting the users permissions to directly access the underlying base tables of the view. Views can be used to provide a backward compatible interface to emulate a table that used to exist but whose schema has changed. Views can also be used when you copy data to and from SQL Server to improve performance and to partition data.

Database extracts

Database extracts are data sets exported by the RDBMS and held in external files. The file types can be many and varied but are typically flat files e.g. Comma Separated Variable files (CSV). CSV files have long been a standard file format for data transfer and universally recognised by common data tools.

API Management

API Management (APIM) helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services. Businesses everywhere are looking to extend their operations as a digital platform, creating new channels, finding new customers and driving deeper engagement with existing ones. API Management provides the core competencies to ensure a successful API program through developer engagement, business insights, analytics, security, and protection. You can use Azure API Management to take any backend and launch a full-fledged API program based on it.

Storage Access

Storage Accounts

Azure Storage Accounts Azure Storage is Microsoft’s cloud storage solution for modern data storage scenarios. Azure Storage offers a massively scalable object store for data objects, a file system service for the cloud, a messaging store for reliable messaging, and a NoSQL store.

Access to storage accounts can be controlled via previously describes RBAC.

Delegating Access to Storage Accounts with a Shared Access Signature

Delegating Access with a Shared Access Signature. A shared access signature (SAS) is a URI that grants restricted access rights to Azure Storage resources. You can provide a shared access signature to clients who should not be trusted with your storage account key but to whom you wish to delegate access to certain storage account resources. By distributing a shared access signature URI to these clients, you can grant them access to a resource for a specified period of time, with a specified set of permissions.

Azure Data Lake

Azure Data Lake Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.

Access to Azure Data Lake accounts can be controlled via previously describes RBAC.

Azure Databricks

Azure Databricks: Accelerate big data analytics and artificial intelligence (AI) solutions with Azure Databricks, a fast, easy and collaborative Apache Spark–based analytics service.

Azure Databricks comes built in with the ability to connect to Azure Data Lake Storage, Cosmos DB, SQL DW, Event Hubs, IoT Hubs, and several other services. We now have the ability to allow customers to store connection strings or secrets in the Azure Key Vault.

Azure Key Vault can help you securely store and manage application secrets reducing the chances of accidental loss of security information by centralizing the storage of secrets.

Azure Databricks Secrets: Sometimes accessing data requires that you authenticate to external data sources through JDBC. Instead of directly entering your credentials into a notebook, use Azure Databricks secrets to store your credentials and reference them in notebooks and jobs.

When using Key Vault with Azure Databricks to create secret scopes, data scientists and developers no longer need to store security information such as SAS tokens or connections strings in their notebooks. Access to a key vault requires proper authentication and authorization before a user can get access. Authentication establishes the identity of the user, while authorization determines the operations that they are allowed to perform.

As a team lead, you might want to create different Secret Scopes for different data source credentials and then provide different subgroups in your team access to those scopes.

Sharing Data for Business Intelligence

Azure Analysis Services

Azure Analysis Services: Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data models in the cloud. Use advanced mashup and modeling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model. The data model provides an easier and faster way for users to browse massive amounts of data for ad-hoc data analysis.

Azure Data Warehouse

Azure Data Warehouse: SQL Data Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution. Import big data into SQL Data Warehouse with simple PolyBase T-SQL queries, and then use the power of MPP to run high-performance analytics. As you integrate and analyze, the data warehouse will become the single version of truth your business can count on for insights.

Machine Learning Web Services

Once you deploy an Azure Machine Learning predictive model as a Web service, you can use a REST API to send it data and get predictions from custom or 3rd party applications. You can send the data in real-time or in batch mode. Excel and Power BI, as business productivity tools, are easily configured to consume from Azure machine learning web services

What data has been shared and to whom?

Log Analytics

Log Analytics: Log data collected by Azure Monitor is stored in a Log Analytics workspace, which is based on Azure Data Explorer. It collects telemetry from a variety of sources and uses the query language from Data Explorer to retrieve and analyze data

Audit Logs

Audit Logs: Azure provides a wide array of configurable security auditing and logging options to help you identify gaps in your security policies and mechanisms. This article discusses generating, collecting, and analyzing security logs from services hosted on Azure.

Data Delivery: Trusted Devices

Intune

Intune: As an IT admin, you must ensure that managed devices are providing the resources that your users need to do their work, while protecting that data from risk.

The Devices workload gives you insights into the devices you manage, and lets you perform remote tasks on those devices.

Data Integrity

Tamper-proof logs, backed by Block Chain technologies, provide data provenance through a chain of custody

Block Chain

Block Chain: Blockchain is a transparent and verifiable system that will change the way people think about exchanging value and assets, enforcing contracts, and sharing data. The technology is a shared, secure ledger of transactions distributed among a network of computers, rather than resting with a single provider. Businesses are using blockchain as a common data layer to enable a new class of applications. Now, business processes and data can be shared across multiple organizations, which eliminates waste, reduces the risk of fraud, and creates new revenue streams.

Azure Blockchain Workbench

Azure Blockchain Workbench: Quickly start your blockchain projects with Azure Blockchain Workbench. Simplify development and ease experimentation with prebuilt networks and infrastructure. Accelerate time to value through integrations and extensions to the cloud services and consuming apps you already use, and innovate with confidence on an open, trusted, and globally available platform.

Posted in Uncategorized | Leave a comment

PowerShell Scipt for generating events for Azure Event Hub

A useful PowerShell script I use for sending events to Azure Event Hub demos

Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionID ‘<your subscription number>’;
Set-ExecutionPolicy Unrestricted
Install-Module –Name Azure.EventHub
$Token = Get-AzureEHSASToken -URI “<your event hub account>.servicebus.windows.net/<your event hub name>” -AccessPolicyName “<your policy name>” -AccessPolicyKey “<your policy key>”

$EventHubUri = “<your event hub account>.servicebus.windows.net/<your event hub name>
$EventHubTimer = new-timespan -Minutes 30
$StopWatch = [diagnostics.stopwatch]::StartNew()
$APIUri = “https://”+ $EventHubUri +”/messages”
While ($StopWatch.elapsed -lt $EventHubTimer){
$RandomDetroit = Get-Random -minimum 65 -maximum 85
$RandomChicago = Get-Random -minimum 65 -maximum 85
$RandomKalamazoo = Get-Random -minimum 65 -maximum 85
$LabData = ‘[{ “SensorId”:”101″, “Location”:”Detroit, MI”, “Speed”: ‘ + $RandomDetroit + ‘ },
{ “SensorId”:”102″, “Location”:”Chicago, IL”, “Speed”: ‘ + $RandomChicago + ‘ },
{ “SensorId”:”103″, “Location”:”Kalamazoo, MI”, “Speed”: ‘ + $RandomKalamazoo + ‘ }]’
Invoke-WebRequest -Method POST -Uri ($APIUri) -Header @{ Authorization = $Token} -ContentType “application/json;type=entry;charset=utf-8” -Body $LabData
Start-Sleep -seconds 5
}
Write-Host “Event Hub data simulation ended”

Posted in Uncategorized | Leave a comment

Getting Started With R

I’ve been helping a customer to get started with R – here’s my digest which no doubt I’ll be adding to over time.

Posted in Uncategorized | Leave a comment

Getting Started with Power BI

As often happens – I meet a series of customers who are keen to learn about something, Power BI in this case, and I collate a growing list of useful links and things to get started with. So here’s my current list for Power BI.

Posted in Uncategorized | Leave a comment

Getting Started with Microsoft Azure AI

  • For a list of Microsoft Azure Cognitive Services and demonstrations you can use with your own inputs (voice, images, text etc), please see https://azure.microsoft.com/en-gb/services/cognitive-services/. (Exploring these services has to be one of the most fun activities you can have whilst maintaining a legitimate claim to be working diligently.)

 

  • Microsoft’s Machine Learning capability is very well established and ready to be incorporated in end to end solutions hosted in Azure. By way of evidence, demonstration and learning, at a click of a button you can deploy a range of ready built examples, ML experiments and solutions from the Cortana Intelligence Gallery https://gallery.cortanaintelligence.com/. To deploy requires an Azure subscription. Free trial subscriptions are available, but anyone in your organisation with a Visual Studio Subscription will have a monthly Azure budget they can use in an Azure subscription associated to their Visual Studio subscription. If you peruse the gallery you will find many examples that play straight into the hot topics of today.

 

  • The real time audio translator for PowerPoint from Microsoft Research (giving a good example of a technology coming down the line) is available here . Very impressive and well worth a look. It is the result of an experimental project that has immediate business use.

 

  • Intelligent Kiosk Here you will find several demos showcasing the Perceptual Intelligence power of Azure Machine Learning, the Cortana Intelligence Suite and the Microsoft Cognitive Services. (Again very amusing).

 

  • Create a bot with the Azure Bot Service The Azure Bot Service accelerates the process of developing a bot by providing an integrated environment that is purpose-built for bot development. This tutorial walks you through the process of creating and testing a bot by using the Azure Bot Service.
    • One of the template bots is the QnAMaker. This a very easy to create question and answer bot that can be made without any coding (or Azure subscription) and with just an faq file – here https://qnamaker.ai/. A short video tells you all you need to know.

 

Other Useful links:

 

And a short video describing LUIS Language Understanding Intelligent Service : March LUIS Hackfest https://channel9.msdn.com/events/UKDX/March-Uk-Hack-Fest/Luis-AI?term=LUIS

Posted in Uncategorized | 1 Comment

I’m back, but I’m not who I used to be

After an 11 year break – I’m coming back to blogging – but I’m not who I used to be.

I started this blog way back in 2005, when I was a SQL Server Evangelist. In 2006 I returned to Enterprise sales and I couldn’t the find time to properly share my learnings on this blog.

In 2010 I left Microsoft and followed the foot steps of Donald Farmer to join Qlik. If truth be known, I had begun to lose faith in Microsoft’s BI strategy and at the time Qlik was blazing a trail with it’s ‘Associative’ technology in the newly identified data discovery market . This technology made developing very competent analytical dashboards both a breeze and a pleasure.

But, in April last year I was tempted back to work in the Azure team as a Data Solution Architect, to help customers realise the benefits of Azure’s Data Services. Of course, it hadn’t escaped me that while I had been away Microsoft had released Power BI, which along with its close integration with the amazing advances in Azures data services, was clear evidence Microsoft was back on track with a coherent BI strategy and had become a serious player in the data discovery space.

So, right now I’m planning my first (proper) blog post, in which I will share my first experience with U-SQL in solving a customer’s challenge.

Matthew

Posted in Uncategorized | Leave a comment

This gallery contains 2 photos.

More Galleries | Leave a comment

m1

Posted in Uncategorized | Leave a comment

‘Business Reflex’ A new term – a new concept?

My last blog post was in haste, in fact creating a blog on WordPress was in haste.  I found I couldn’t access my old blog on Technet  http://blogs.technet.com/b/mat_stephen – couldn’t remember my password and naturally now I no longer work for Microsoft my registered email address is now defunct.

But I wanted to try to coin the term ‘Business Reflex’ by publishing it on the web as soon as I thought of it and I wanted to use it in a presentation at an Ovum Symposium  in London shortly before I left Microsoft.  As far as I’m aware this was its first outing.  Well I couldn’t find the term as I had defined it on the web – so I thought it was worth a go.

Business Reflex is not Business Agility.  Business Agility refers to the ability a business has to change its modus operandi according to changing market conditions.

Business Reflex does not require any change to a business’s modus operandi.  Business Reflex refers to the ability a business has to change to a sudden change in trading ambiance.  E.g: an ash cloud, a terrorist threat, a disruptive dump of snow, an industrial strike, or internal systems failure.

The corner stone of the concept is the idea that the advantages of ‘Business Intelligence’ are enhanced when the intelligence is distributed and disconnected from the executive.  It is predicated on Business Intelligence being available to the masses but gives Business Intelligence for the masses a purpose and reason to exist.

Posted in Uncategorized | Leave a comment

The Business Reflex

Consider for a moment the knee jerk reflex. The knee jerk reaction has evolved to stop you falling over if you miss a step. A sudden bending of the knee results in an involuntary kick. Why? Because when the knee bends a message is sent to a component known as a visceral ganglion at the base of the spine. The ganglion sends a message back to the thy muscle to produce a kick, in the hope of giving your leg the necessary muscle tension to prevent a fall. It takes this ‘executive’ decision because it knows the executive in the brain, ‘you’, will not have time to respond owing to the innate latency in the nervous system. So with good business intelligence we expect to see the same type of reflex – an ability to react to sudden change – like an ash cloud perhaps, a bomb alert, or something more mundane like a road accident.

This is Distributed Business Intelligence – Everyone has a part to play – Business Intelligence for Everyone

Posted in Uncategorized | Leave a comment