Ever wondered why organizations need a robust Data Strategy or even a Data team in the first place ?
Well, wonder no longer : at the InvestOps Asia conference in Singapore, I had the opportunity of addressing this crucial question among several others in a fireside chat with Jason Inzer on Harnessing the Latest Data Innovations to build a Holistic Data Strategy.
Here’s what I had to say on the topic :
๐ Data team & Data Strategy : Why ?
Letโs rewind 15 years. Organizations operated with various departmentsโFront Office, Compliance, IT, etc.โeach with a primary function, yet burdened by data as a byproduct.
Take IT as an example : they manage infrastructure and the applications that run on top; these happen to use and/or produce data. This is the byproduct of the “main” function. If you push the reasoning a bit further (operators may key in erroneous data, crashing systems), it can be argued to be a liability.
This is why Data Team started to appear, and that is the goal of a Data Strategy : take all these siloed liabilities, and turn them in an asset for your organization through transversal management.
๐ฏ The Core Challenges:
1. Understanding the Transformation: This is not a technical upgrade but an organizational transformation. It requires altering how everyone in the organization works, and driving this change through the “processes, people, tools” triangle at the same time.
2. Clarity of Purpose: Know why youโre embarking on this journey. Avoid being driven by external pressures like regulations or trends (“CEO wants to do GenAI !”). Focus on defining extremely well the problems within your organization you want to address.
3. Empowering the CDO: Your Chief Data Officer will ask a lot of changes to a lot of people. In most (any ?) organization, this is a political minefield. Ensure they are positioned within decision-making centers and can sell their strategy and roadmap effectively.
Tackling siloed data in a firm-wide data strategy can feel like trying to piece together a puzzle with mismatched pieces: how do you bring together disparate processes, data stores, and analytics into a cohesive strategy?
๐ก Key Considerations:
– Start Small, Think Big: You cannot change everything everywhere all at once. Begin with a strategic area where success is likely, bring that within your framework (remember “people, processes & tools” !) and then learn/improve/iterate to expand to other chosen areas.
– Empower, Don’t Dictate: Instead of imposing changes, use proven principles (think hashtag#Agile) to create “freedom within boundaries.” Help the teams, add value to their day to day, not constraints.
Let me share an example from a previous life:
A trading chain was made up of numerous integrated systems. This created a persistent issue with tracking trades across this complex web (particularly, making sure no trade was slipping through the cracks). My team stepped in, not just providing a technical solution using Azure Synapse, but more importantly working with the business aligned IT team to build together a strategy-compliant, data-driven approach to solve this problem (teach a man how to fish vs give the man a fish). I truly believe this collaborative effort is what will scale a data strategy within an organization.
I have this sentence saying that “If only data people do data work, you’re doing it wrong.” To me, it encapsulates the essence of a successful data strategy: it’s a balance of top-down guidance and bottom-up empowerment. Cast a wide net with minimal constraints initially, and gradually normalize and standardize processes collaboratively.
๐ง Quick Tools & Frameworks Mention:
Data Modeling: Unify data models across silos for aggregatable data.
Data Lake: Utilize technologies like data lakes for flexible, scalable data management.
Next, weโre diving into the heroes of data managementโData Quality and Data Governanceโand why theyโre the building blocks of a successful Data Strategy.
๐ก Why Data Quality and Data Governance Matter:
๐ฟ๐๐ฉ๐ ๐๐จ ๐ฉ๐๐ ๐ฃ๐๐ฌ ๐ค๐๐ก: this phrase has been overused to the point it became meaningless. Its forgotten core truth remains: Your organization needs to ๐ต๐ณ๐ถ๐ญ๐บ treat its data as an asset.
Think about how a financial institution meticulously manages its trading books :
โข Operations department controls and reconciles them daily against their source of truth (Prime Brokers, etc.)
โข Risk department checks them against predetermined risk limits
โข Finance department controls and certifies them monthly/quarterly/yearly
Data needs to be treated in the same way, and that is the purpose of ๐๐ฎ๐๐ฎ ๐๐ผ๐๐ฒ๐ฟ๐ป๐ฎ๐ป๐ฐ๐ฒ.
To say it simply, you want your data assets to be 1๏ธโฃ ๐๐ถ๐๐ฐ๐ผ๐๐ฒ๐ฟ๐ฎ๐ฏ๐น๐ฒ, 2๏ธโฃ ๐ง๐ฟ๐๐๐๐ฎ๐ฏ๐น๐ฒ, and 3๏ธโฃ ๐ฆ๐ฎ๐ณ๐ฒ๐น๐ ๐จ๐๐ฎ๐ฏ๐น๐ฒ :
โข Policies and Processes: Define the framework that all departments should respect
โข Empowerment and Responsibility: Distribute roles and responsibilities across your organization to ensure everyone is accountable for these new tasks.
โข People Over Tools: Successful data governance is about empowering your team, not just deploying tools. Tools help automate and streamline your new processes, but people make the difference.
Data Quality is a critical component of Data Governance, focusing on ๐๐๐ถ๐น๐ฑ๐ถ๐ป๐ด, ๐๐ป๐๐๐ฟ๐ถ๐ป๐ด ๐๐ผ๐ป๐๐ถ๐ป๐๐ฒ๐ฑ, and ๐๐ฒ๐บ๐ผ๐ป๐๐๐ฟ๐ฎ๐๐ถ๐ป๐ด Trust in your data.
Assigning roles like Data Owners and Data Stewards (and empowering people for this responsibility) will maintain data hygiene and transparency throughout your organization.
โ๏ธ Frameworks and Resources:
For those looking to dive deeper, consider exploring the DAMA (Data Management Association) DMBOK (Data Management Body of Knowledge) framework.
Finally, let’s explore some important considerations to strike a balance between Data Access, Privacy, and Security, and why a Data Catalog is essential.
๐๏ธ ๐ง๐ต๐ฒ ๐๐ฑ๐ฒ๐ฎ๐น ๐ฆ๐ฐ๐ฒ๐ป๐ฎ๐ฟ๐ถ๐ผ:
Imagine a central hub where everyone in your organization can discover data assets, evaluate their suitability for their purpose, and request access safely, compliantly, and efficiently.
Think about all the problems you could solve, and the all data products you could build!
Letโs break down how to get started.
1๏ธโฃ ๐๐ป๐ผ๐ ๐ฌ๐ผ๐๐ฟ ๐๐๐๐ฒ๐๐
– ๐๐๐๐๐จ๐ฉ๐ง๐ฎ ๐๐ฃ๐ ๐๐๐ฉ๐๐๐๐ฉ๐: Start with a “registry” (= DataCatalogue) of your main data assets, including data that describes your data (= Metadata).
Donโt stop at datasetsโcatalog everything from applications to hardware (or build links to your existing registries).
While you want to eventually target for exhaustivity, I recommend starting by a smaller domain within your organization, do a thorough job in metadata definition and asset identification, instead of trying to address all domains averagely.
– ๐๐ก๐๐ญ๐๐๐๐ก๐๐ฉ๐ฎ ๐๐จ ๐๐๐ฎ: Especially in Asia, and in regulated industries like finance, you must comply with widely different regulatory views.
Your metadata must therefore be able to adapt, and answer many questions, e.g. does a dataset contains Personal Identifiable Information (PII), does it contain client data, from which geography, etc.
Without this, managing data will stay a manual task, error-prone and (unsustainably) time consuming.
2๏ธโฃ ๐๐ผ๐ป๐๐ฟ๐ผ๐น๐น๐ฒ๐ฑ ๐๐ฐ๐ฐ๐ฒ๐๐ ๐ฎ๐ป๐ฑ ๐จ๐๐ฎ๐ด๐ฒ ๐ช๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐๐
– ๐๐ก๐๐ญ๐๐๐ก๐ ๐พ๐ค๐ฃ๐ฉ๐ง๐ค๐ก๐จ: Implement diverse access controls to meet the various regulatory requirements.
For the same reasons mentioned above, flexibility here is crucial to ensure compliance.
– ๐ผ๐ช๐๐๐ฉ๐๐๐๐ก๐๐ฉ๐ฎ ๐๐ฃ๐ ๐๐๐ฅ๐ค๐ง๐ฉ๐๐ฃ๐: Regulations often require detailed logs and reports on data access and usage (think GDPR for example).
Capture every request, workflow, and usage to maintain transparency and compliance.
๐ก ๐ง๐ต๐ฒ ๐๐ถ๐ด๐ด๐ฒ๐ฟ ๐ฃ๐ถ๐ฐ๐๐๐ฟ๐ฒ:
Embedding these processes into your daily operations across departments is your only option for scalability.
Start smallโchoose a department where you can make a significant impact and use it as a pilot to learn from, then refine and iterate.
As said in previous posts, these operations need to be part of your daily processes, and not an extra responsibility handled by a “data team” on the side.
By doing so, you’ll not only improve data management but also help break down organizational silos, creating a more cohesive and efficient data-driven culture.