Required for real-time artificial intelligence, and data analysis for different subsets of business operations.
Scaled down version of a data warehouse with a more limited scope for groups of end users in different business units or departments
Gives groups more control to customize but also silos the data apart from the rest of the company
Enhances a user’s response time due to reduction in volume of data
Data is partitioned and allows very granular access control privileges.
Data can be segmented and stored on different hardware/software platforms.
Schema are the same as Data Warehouses
Data Marts should be for unique purposes. But, if you anticipate use cases where multiple marts are also used simulaneously for a project, then each mart should have a common dimensional table.
Types
Dependent - Gets populated from a single centralized Data Warehouse
Usually built using the same architecture as the data warehouse and may use the same data modelling techniques, such as star schema or snowflake schema.
Can be built quickly and at a lower cost than other types of data marts.
Ensures consistency in data definitions and reduces the risk of data discrepancies.
Less flexible and not optimized for the individual groups needs
Independent - Built from various data sources, including transactional databases, external APIs, and flat files. Operate autonomously away from a data warehouse.
Can be faster than dependent data marts because they are optimized for a specific department or business unit.
May duplicate data that is already in the data warehouse, which can lead to inconsistencies and data quality issues
Require more resources to build and maintain compared to dependent data marts.
Suited for short-term projects or small groups.
Hybrid - This type of data mart can take data from data warehouses or operational systems.
Suitable for businesses with multiple databases and need for quick turnaround.
Complex to manage due to the integration of data from various sources. Requires careful planning to ensure data consistency and integrity
Process
Create a initial list of questions the data mart will be used to answer
Create Schema documents
Include as much information as possible in the schema document because it can be used as a reference if anyone has questions about the data in the future instead of asking you
Add any business logic that needs to be applied when reading in the data such as filters and transformation logic as well as noting the time frame of data needed and frequency of update
Example
Notice the business logic for Field Name: wau (weekly active users) is the distinct count of users where the login date is current date-1 and current date-6. The reason we use current date-1 is because the most recent data is typically from yesterday and taking yesterday minus 6 days gives us 7 days to calculate wau.
Field Name: update_date should be set to the last time the ETL was run for this table to let the user know when the data was last updated. Occasionally ETL jobs may fail and this can help troubleshoot if the table was refreshed for the day
Create sample tables according to the schema document
Source production data for you to validate the tables
After the sample tables pass your QA (quality assurance) checks, you can work with the data engineer to back run any history if needed and then have them put the ETL code into production
Transactional Store
Optimized for row-based operations such as reading and writing individual records while maintaining data integrity
Not specifically built for analytics but can be used for analytic queries as well as low latency information monitoring
ACID (atomicity, consistency, isolation, durability) compliant, meaning they guarantee data validity despite errors and ensure that data does not become corrupt because of a failure of some sort.
crucial to business use cases that require a high level of data integrity such as transactions happening in banking.
Row-based makes it better at writing data.
In contrast to data warehouses which are column bases and better for reading data
Operational Data Stores
Like a staging area for data required for projects
Provide fine-grained non-aggregated data
Usually complimentary to a data warehouse
General purpose is to integrate data from different sources into a single structure via data cleaning, resolving redundancies, and establishing business rules.