Salesforce-Platform-Data-Architect Practice Test Questions

A company wants to document the data architecture of a Salesforce organization. What are two valid metadata types that should be included? (Choose two.)

A.

RecordType

B.

Document

C.

CustomField

D.

SecuritySettings

A.

RecordType

C.

CustomField

Explanation:

Data architecture focuses on how data is structured, stored, related, and governed within an organization. Therefore, the documentation must include metadata types that define the core structure and behavior of data.

Why A is Correct (RecordType):
Record Types control the business processes, page layouts, and picklist values available to a user for a specific record. They are a crucial part of data architecture as they define how different data segments are presented and managed within the same object. Documenting which Record Types exist and their criteria is essential for understanding data flow and user interaction.

Why C is Correct (CustomField):
Custom Fields are the fundamental building blocks of custom data structures in Salesforce. They define the attributes and data points (e.g., text, number, date, relationship) stored for each record. Documenting all Custom Fields, their data types, and their relationships is the very core of data architecture documentation.

Why B is Incorrect (Document):
The "Document" metadata type refers to files stored in the Documents tab, which are used for branding (like images for email templates) or other static file storage. While important for an organization, these are content files, not structural elements of the data model, and are not a primary concern for data architecture documentation.

Why D is Incorrect (SecuritySettings):
While security is intrinsically linked to data (governing who can see what), Security Settings (e.g., password policies, network access) are part of the application's security and access architecture, not its data architecture. The data architecture document would reference security in the context of field-level security or sharing rules, not these org-wide settings.

Reference:
The core of the Data Architect exam revolves around data modeling, which is defined by objects (standard and custom), fields (standard and custom), and relationships. Record Types and Custom Fields are primary components of this model.

Due to security requirements, Universal Containers needs to capture specific user actions, such as login, logout, file attachment download, package install, etc. What is the recommended approach for defining a solution for this requirement?

A. Use a field audit trail to capture field changes.

B. Use a custom object and trigger to capture changes.

C. Use Event Monitoring to capture these changes.

D. Use a third-party AppExchange app to capture changes.

C. Use Event Monitoring to capture these changes.

Explanation:

This question tests the knowledge of native Salesforce tools designed for auditing and monitoring user activity, particularly at the event level.

Why C is Correct (Event Monitoring):
Event Monitoring is a native Salesforce capability (part of Salesforce Shield) that is specifically designed to capture and log detailed information about user interactions. It generates event log files for exactly the types of actions listed:

✔️ Login and Logout are captured in the LoginEvent and LogoutEvent log files.
✔️ File Attachment Download is captured in the URI log file.
✔️ Package Install is captured in the PackageInstallRequest log file.
This is the standard, out-of-the-box solution for this security and compliance requirement.

Why A is Incorrect (Field Audit Trail):
Field Audit Trail is designed to track changes to field values on a record. It is excellent for knowing what data was changed, from what value to what value, and by whom. It does not track broader user events like logins, logouts, or file downloads.

Why B is Incorrect (Custom object and trigger):
While technically possible to build a custom auditing solution, it would be incredibly complex, fragile, and unsustainable. It would require writing triggers for every possible object and action, could not track system-level events like login/logout, and would not be a recommended "best practice" when a robust, scalable, native tool like Event Monitoring exists.

Why D is Incorrect (Third-party AppExchange app):
While an AppExchange app might leverage Event Monitoring APIs or provide a user interface for it, the question asks for the "recommended approach for defining a solution." The core, underlying technology that any robust solution would be built upon is Salesforce's native Event Monitoring. Recommending a third-party app before evaluating the native tool is not the best architectural practice.

Reference:
The Event Monitoring unit on Trailhead and the Salesforce Help documentation on "Event Monitoring Overview" detail the specific event types that are logged, which align perfectly with the actions listed in the question.

DreamHouse Realty has a Salesforce deployment that manages Sales, Support, and Marketing efforts in a multi-system ERP environment. The company recently reached the limits of native reports and dashboards and needs options for providing more analytical insights. What are two approaches an Architect should recommend? (Choose two.)

A.

Weekly Snapshots

B.

Einstein Analytics

C.

Setup Audit Trails

D.

AppExchange Apps

B.

Einstein Analytics

D.

AppExchange Apps

Explanation:

DreamHouse Realty has reached the limits of Salesforce’s native reports and dashboards, which are constrained by factors like the number of records, complexity of calculations, and visualization capabilities. They need advanced analytical insights in a multi-system ERP environment, suggesting a need for robust reporting and integration capabilities. Let’s evaluate each option:

Option A: Weekly Snapshots
Weekly snapshots allow capturing point-in-time data for reporting trends over time, but they are still part of Salesforce’s native reporting framework. They are limited in handling complex analytics, cross-system data integration, or advanced visualizations, and they consume storage space. This does not fully address the need for more analytical insights beyond native limits.

Option B: Einstein Analytics (now Tableau CRM)
Einstein Analytics is a powerful Salesforce-native analytics platform designed for advanced data analysis, visualization, and predictive insights. It can handle large datasets, integrate with external ERP systems via connectors (e.g., Salesforce Connect or middleware), and provide interactive dashboards and AI-driven insights. This makes it an ideal solution for overcoming the limitations of native reports and dashboards, aligning with the company’s need for advanced analytics.

Option C: Setup Audit Trails
Setup Audit Trails track changes to Salesforce configuration and metadata, such as user permissions or field modifications. They are unrelated to analytical reporting or dashboards and do not help with providing insights from sales, support, or marketing data.

Option D: AppExchange Apps
AppExchange offers a variety of third-party analytics and reporting tools (e.g., Domo, Power BI integrations) that can extend Salesforce’s native capabilities. These apps often provide advanced visualization, cross-system data integration, and customizable dashboards, making them a viable option for addressing the limitations of native reports in a multi-system ERP environment.

Why B and D are Optimal:
Einstein Analytics (B) is a Salesforce-native solution that provides advanced analytics and seamless integration with Salesforce data and external systems, directly addressing the need for enhanced insights. AppExchange Apps (D) offer flexible, third-party solutions that can be tailored to specific analytical needs, especially for integrating with ERP systems. Together, these approaches provide robust options for overcoming the limitations of native reports and dashboards.

References:
Salesforce Documentation: Tableau CRM (Einstein Analytics) Overview
Salesforce AppExchange: Analytics Apps
Salesforce Help: Reporting Snapshots

A Salesforce customer has plenty of data storage. Sales Reps are complaining that searches are bringing back old records that aren't relevant any longer. Sales Managers need the data for their historical reporting. What strategy should a data architect use to ensure a better user experience for the Sales Reps?

A.

Create a Permission Set to hide old data from Sales Reps.

B.

Use Batch Apex to archive old data on a rolling nightly basis.

C.

Archive and purge old data from Salesforce on a monthly basis.

D.

Set data access to Private to hide old data from Sales Reps.

B.

Use Batch Apex to archive old data on a rolling nightly basis.

Explanation:

The challenge is to improve the user experience for Sales Reps by ensuring searches return only relevant (recent) records, while preserving old data for Sales Managers’ historical reporting needs. The organization has sufficient storage, so the solution must focus on data visibility and performance rather than reducing storage usage. Let’s analyze each option:

Option A: Create a Permission Set to hide old data from Sales Reps.
Permission Sets control access to objects, fields, or features, but they cannot filter records based on criteria like age or relevance (e.g., created date). While you could use sharing rules or criteria-based sharing to limit access, this would require complex configurations and might not fully address search performance, as old records would still appear in searches unless explicitly filtered.

Option B: Use Batch Apex to archive old data on a rolling nightly basis.
This is the best approach. Batch Apex can be used to move old, irrelevant records (e.g., based on a date field) to a custom object, Big Object, or external system for archival purposes. This keeps the primary objects (e.g., Opportunities, Accounts) lean, improving search performance for Sales Reps. The archived data remains accessible for Sales Managers’ historical reporting via reports or integrations. A nightly batch process ensures the archive is updated regularly without manual intervention, balancing user experience and data retention needs.

Option C: Archive and purge old data from Salesforce on a monthly basis.
Purging (deleting) old data would make it unavailable for Sales Managers’ historical reporting, which conflicts with the requirement to retain the data. While archiving to an external system is viable, purging from Salesforce is not, as it would remove the data entirely. Additionally, a monthly cadence is less efficient than a nightly process for maintaining performance.

Option D: Set data access to Private to hide old data from Sales Reps.
Setting the organization-wide default (OWD) to Private restricts access to records based on ownership or sharing rules, but it doesn’t address the issue of old records cluttering searches. Old records would still appear in searches for users with access, and configuring sharing rules to hide old records based on age is impractical and doesn’t optimize search performance.

Why Option B is Optimal:
Using Batch Apex to archive old data on a nightly basis directly addresses the Sales Reps’ complaint by reducing the volume of records in active objects, improving search performance. It also ensures old data is preserved in an accessible format (e.g., Big Objects or external storage) for Sales Managers’ reporting needs. This aligns with Salesforce’s best practices for managing large data volumes (LDV) and optimizing user experience.

References:
Salesforce Documentation: Batch Apex
Salesforce Architect Guide: Large Data Volumes Best Practices
Salesforce Help: Archiving Data in Salesforce

UC has multiple SF orgs that are distributed across regional branches. Each branch stores local customer data inside its org’s Account and Contact objects. This creates a scenario where UC is unable to view customers across all orgs. UC has an initiative to create a 360-degree view of the customer, as UC would like to see Account and Contact data from all orgs in one place. What should a data architect suggest to achieve this 360-degree view of the customer?

A. Consolidate the data from each org into a centralized datastore

B. Use Salesforce Connect’s cross-org adapter.

C. Build a bidirectional integration between all orgs.

D. Use an ETL tool to migrate gap Accounts and Contacts into each org.

A. Consolidate the data from each org into a centralized datastore

Explanation:

A centralized datastore allows UC to bring customer data from all regional Salesforce orgs into one system, enabling a single source of truth and consistent reporting. This is a common pattern called multi-org consolidation. By creating a hub (such as a data warehouse, MDM system, or Customer 360), UC can aggregate Account and Contact data, resolve duplicates, and maintain consistency across branches. This design provides scalability and avoids messy point-to-point integrations.

Why not the others?

B. Use Salesforce Connect’s cross-org adapter:
While Connect can surface data from other orgs virtually, it doesn’t consolidate or normalize the data. Performance suffers when querying millions of records across orgs, and features like cross-org deduplication or unified reporting aren’t possible. It’s useful for lightweight access, but not for a true 360-degree customer view.

C. Build a bidirectional integration between all orgs:
This creates a complex web of integrations where each org pushes and pulls data to every other org. Maintenance quickly becomes unmanageable as the number of orgs grows (n² problem). Data consistency issues are also likely, since real-time sync across multiple orgs often introduces race conditions and conflicts.

D. Use an ETL tool to migrate gap Accounts and Contacts into each org:
Copying missing data into every org creates duplication instead of unification. Each org will have slightly different versions of the same record, which increases data quality problems. ETL is better suited to feed a central system, not to spread data redundantly across all orgs.

Reference:
Salesforce Architect Guide: Multi-Org Strategy

Universal Containers is setting up an external Business Intelligence (BI) system and wants to extract 1,000,000 Contact records. What should be recommended to avoid timeouts during the export process?

A. Use the SOAP API to export data.

B. Utilize the Bulk API to export the data.

C. Use GZIP compression to export the data.

D. Schedule a Batch Apex job to export the data.

B. Utilize the Bulk API to export the data.

Explanation:

✅ Correct Answer: B. Utilize the Bulk API to export the data
The Bulk API is designed for extracting or loading very large volumes of data (millions of records) asynchronously. It processes data in batches (up to 10,000 records per batch) and avoids the synchronous limits that cause timeouts with APIs like SOAP or REST. Using Bulk API ensures scalability, fault tolerance, and efficiency in exporting 1,000,000 Contact records to the BI system without hitting governor limits or timeouts.

❌ Why not the others?

A. Use the SOAP API to export data:
SOAP is synchronous and designed for smaller data sets. Extracting 1 million records over SOAP would lead to request timeouts, memory issues, or throttling. It’s not optimized for large-volume operations and is better suited for transactional integrations.

C. Use GZIP compression to export the data:
Compression reduces file size but does not address the core problem: synchronous processing limits. Even with smaller payloads, SOAP or REST APIs would still hit timeout and request size limits when exporting 1,000,000 records. Compression is helpful, but not a replacement for the correct API choice.

D. Schedule a Batch Apex job to export the data:
Batch Apex can process large volumes inside Salesforce, but it’s not designed to directly export massive datasets to external BI systems. You’d still need an API or integration layer to handle the transfer. This adds unnecessary complexity compared to using the Bulk API directly.

Reference:
Salesforce Developer Guide: Bulk API 2.0

Northern Trail Outfitters needs to implement an archive solution for Salesforce data. This archive solution needs to help NTO do the following:
1. Remove outdated Information not required on a day-to-day basis.
2. Improve Salesforce performance.
Which solution should be used to meet these requirements?

A. Identify a location to store archived data and use scheduled batch jobs to migrate and purge the aged data on a nightly basis.

B. Identify a location to store archived data, and move data to the location using a time based workflow.

C. Use a formula field that shows true when a record reaches a defined age and use that field to run a report and export a report into SharePoint.

D. Create a full copy sandbox, and use it as a source for retaining archived data.

A. Identify a location to store archived data and use scheduled batch jobs to migrate and purge the aged data on a nightly basis.

Explanation:

The correct approach to archiving is to establish a secure, external storage location (such as a data lake, EDW, or archive database) and use scheduled batch jobs to migrate aged Salesforce records there. Once archived, the records can be purged from Salesforce, freeing up storage and improving query performance. Batch jobs are scalable, automatable, and handle high-volume data, making this the best fit for performance and long-term data retention needs.

Why not the others?

B. Use a time-based workflow:
Workflows can trigger actions at specific times, but they aren’t designed for bulk data movement or archiving. They can only update fields, send emails, or create tasks. Moving thousands or millions of records out of Salesforce requires Batch Apex or ETL tools, not workflows. Attempting archiving with workflows is unscalable and unsupported.

C. Use a formula field and export reports to SharePoint:
Formula fields can flag records by age, and reports can export subsets of data. However, this is a manual, inefficient process. It doesn’t provide automated archiving, data integrity checks, or integration with enterprise-grade storage. SharePoint is also not designed for structured Salesforce data archives with relational dependencies.

D. Create a full copy sandbox for archives:
Sandboxes are meant for testing and development, not data archiving. A full copy sandbox replicates production data periodically, but it doesn’t reduce storage in production or improve performance. It also becomes stale quickly, requiring refreshes. Using sandboxes for archiving is expensive, inefficient, and does not meet compliance or retention goals.

Reference:
Salesforce Data Management Best Practices

Universal Containers (UC) has a data model as shown in the image. The Project object has a private sharing model, and it has Roll -Up summary fields to calculate the number of resources assigned to the project, total hours for the project, and the number of work items associated to the project. What should the architect consider, knowing there will be a large amount of time entry records to be loaded regularly from an external system into Salesforce.com?

A. Load all data using external IDs to link to parent records.

B. Use workflow to calculate summary values instead of Roll -Up.

C. Use triggers to calculate summary values instead of Roll -Up.

D. Load all data after deferring sharing calculations.

Explanation:

This question tests the understanding of performance optimization techniques for large data volume (LDV) operations, particularly in complex environments with private sharing and roll-up summary fields.

🟢 Why D is Correct:
When performing a large data load into an object that is a child in a master-detail relationship, several computationally expensive processes occur: the roll-up summary fields on the master record are recalculated, and sharing recalculations are triggered if the sharing model is private. For regular, bulk data loads, this can cause severe performance issues and timeouts. The defer sharing calculations setting allows you to postpone the sharing recalculation until after the data load is complete, significantly improving load performance. The roll-up summaries will still calculate, but the system isn't simultaneously trying to recalculate sharing for all affected master records.

🔴 Why A is Incorrect:
Using external IDs is a best practice for matching and upserting records to ensure data integrity. However, it does not address the core performance problem posed by the combination of a private sharing model and roll-up summary fields, which are the expensive operations mentioned in the scenario.

🔴 Why B is Incorrect:
Workflow rules are a legacy automation tool and are not a performance optimization technique. A workflow rule would likely perform worse than a roll-up summary field because it is not optimized for bulk aggregation and would fire for each individual record, potentially requiring a future update that could hit governor limits.

🔴 Why C is Incorrect:
Using triggers to calculate summaries is essentially building a custom roll-up summary. This is complex to code, prone to errors, difficult to maintain, and very likely to hit governor limits (such as the SOQL query limit) during a bulk data load of "a large amount" of records. Native roll-up summary fields are far more efficient for this purpose.

🔧 Reference:
Salesforce documentation on data loading best practices specifically recommends using the "Allow deferred sharing calculations" option in the Data Loader when inserting or updating a large number of records that might affect sharing rules. This is a key tactic for improving performance in LDV scenarios.

Universal Containers has a legacy system that captures Conferences and Venues. These Conferences can occur at any Venue. They create hundreds of thousands of Conferences per year. Historically, they have only used 20 Venues. Which two things should the data architect consider when denormalizing this data model into a single Conference object with a Venue picklist? (Choose 2 answers)

A. Limitations on master -detail relationships.

B. Org data storage limitations.

C. Bulk API limitations on picklist fields.

D. Standard list view in -line editing.

B. Org data storage limitations.

D. Standard list view in -line editing.

Explanation:

This question evaluates the trade-offs of denormalization, specifically replacing a lookup relationship with a picklist field.

✅ Why B is Correct (Org data storage limitations):
Denormalization (storing the Venue name directly on each Conference record as text in a picklist) trades storage efficiency for read performance. A normalized model using a lookup relationship to a Venue object would store the Venue name only once in the Venue record; all Conference records would just store the Venue's ID (18 characters). The denormalized model stores the full Venue name (which could be 50+ characters) on each of the hundreds of thousands of Conference records. This consumes significantly more data storage.

✅ Why D is Correct (Standard list view in-line editing):
A significant benefit of using a picklist field is that users can easily edit the value of the Venue field directly from a list view without having to open each individual Conference record. This improves user productivity and is a common reason for choosing picklists over lookups for low-volatility, small-value sets.

🔴 Why A is Incorrect (Limitations on master-detail relationships):
The scenario describes denormalizing away from a relationship model into a single object. Therefore, limitations of master-detail relationships (like the inability to reparent records if certain criteria are met) are no longer a concern. The question is about the consequences of not having a relationship.

🔴 Why C is Incorrect (Bulk API limitations on picklist fields):
The Bulk API works perfectly well with picklist fields. There are no specific limitations that would make loading data into a picklist field more difficult than loading data into a lookup relationship field. In fact, it might be simpler as it doesn't require resolving an external ID to link to a parent record.

✔️ Reference:
Data modeling decisions often involve the trade-off between normalization (reducing data redundancy) and denormalization (improving read performance and usability at the cost of storage). The pros (usability) and cons (storage) are core Data Architect concepts.

A large telecommunication provider that provides internet services to both residence and business has the following attributes:
➡️ A customer who purchases its services for their home will be created as an Account in Salesforce.
➡️ Individuals within the same house address will be created as Contact in Salesforce.
➡️ Businesses are created as Accounts in Salesforce.
➡️ Some of the customers have both services at their home and business.
What should a data architect recommend for a single view of these customers without creating multiple customer records?

A. Customers are created as Contacts and related to Business and Residential Accounts using the Account Contact Relationships.

B. Customers are created as Person Accounts and related to Business and Residential Accounts using the Account Contact relationship.

C. Customer are created as individual objects and relate with Accounts for Business and Residence accounts.

D. Costumers are created as Accounts for Residence Account and use Parent Account to relate Business Account.

A. Customers are created as Contacts and related to Business and Residential Accounts using the Account Contact Relationships.

Explanation:

This question tests the understanding of the Salesforce Data Model, specifically how to model complex customer relationships without using Person Accounts.

Why A is Correct: This solution uses the standard Salesforce model correctly and flexibly.
✔️ Residence: A "Residential Account" represents the household (e.g., "The Smith Family"). Each individual person in the household is a Contact under this Account.
✔️ Business: A company is represented as a "Business Account". Individuals who work at that company are Contacts under this Account.
✔️ The Single View: A single person (a Contact) who has services at both their home (residential account) and their workplace (business account) can be related to both Accounts using Account Contact Relationships (ACR). ACR allows a single contact to have a relationship (e.g., "Employee" at the business, "Member" at the residence) with multiple accounts, providing the required "single view" of that customer.

Why B is Incorrect (Person Accounts):
Person Accounts are a viable model for B2C, but they have significant limitations and are irreversible once enabled. The major problem here is relating them to other accounts. A Person Account record is an Account. You cannot relate one Account (the Person Account) to another Account (the Business Account) using standard Account Contact Relationship, which is designed to relate a Contact to an Account. This model would fail to create the required single view for a customer who is both a person and a business employee.

Why C is Incorrect (Individual object):
The Individual object is designed for consent and preference management, not for acting as the primary representation of a customer. It is not a standard object for core CRM functionality like service ownership and would create a highly non-standard, complex, and unsustainable data model.

Why D is Incorrect (Parent Account):
Using a Parent Account hierarchy is for representing corporate structures (e.g., "Global HQ" -> "EMEA Region" -> "UK Subsidiary"). It is not designed to model the relationship between a person and their employer. A person's residential account should not be the parent of their business account or vice-versa, as this would misrepresent the data and make reporting nonsensical.

Reference:
The Salesforce Data Model and the specific capabilities of Account Contact Relationships are fundamental topics. The limitations of Person Accounts, especially regarding relationships, are a critical consideration for a Data Architect. The recommended approach (Option A) is a standard industry pattern for this type of scenario.

NTO need to extract 50 million records from a custom object everyday from its Salesforce org. NTO is facing query timeout issues while extracting these records. What should a data architect recommend in order to get around the time out issue?

A. Use a custom auto number and formula field and use that to chunk records while extracting data.

B. The REST API to extract data as it automatically chunks records by 200.

C. Use ETL tool for extraction of records.

D. Ask SF support to increase the query timeout value.

C. Use ETL tool for extraction of records.

Explanation:

Extracting 50 million records daily from a Salesforce custom object is a large data volume (LDV) scenario, and query timeouts are a common issue due to Salesforce’s governor limits and query performance constraints. Let’s evaluate each option to determine the best approach to avoid timeouts:

🔴 Option A: Use a custom auto number and formula field and use that to chunk records while extracting data.
While chunking records (breaking the query into smaller batches) can help manage large data extractions, creating a custom auto number and formula field to facilitate this is not a recommended or scalable solution. This approach requires modifying the data model, adding complexity, and may not fully address timeout issues for 50 million records. Additionally, it’s less efficient than using purpose-built tools or APIs designed for LDV.

🔴 Option B: The REST API to extract data as it automatically chunks records by 200.
The REST API does not automatically chunk records by 200; this is a misconception. The REST API (e.g., via Bulk API or composite queries) supports batching, but the default batch size for Bulk API is configurable (up to 10,000 records per batch). While the Bulk API is suitable for LDV, it requires proper configuration and error handling, and an ETL tool is often a more efficient way to manage such extractions. The REST API alone doesn’t inherently solve the timeout issue without additional setup.

🟢 Option C: Use ETL tool for extraction of records.
This is the best approach. ETL (Extract, Transform, Load) tools like Informatica, MuleSoft, or Salesforce Data Loader are designed to handle large-scale data extractions efficiently. These tools leverage Salesforce’s Bulk API, which is optimized for LDV, allowing for configurable batch sizes, parallel processing, and error handling. ETL tools can manage the extraction of 50 million records by breaking the process into manageable chunks, avoiding query timeouts, and providing logging and monitoring capabilities. This aligns with Salesforce’s best practices for LDV data extraction.

🔴 Option D: Ask SF support to increase the query timeout value.
Salesforce does not typically allow customers to increase query timeout limits, as these are governed by platform constraints to ensure performance and stability. While Salesforce Support can assist with optimizations (e.g., enabling skinny tables or indexes), increasing timeout values is not a standard or scalable solution for daily extractions of 50 million records.

🟢 Why Option C is Optimal:
An ETL tool, leveraging the Bulk API, is the most efficient and scalable solution for extracting 50 million records daily. It handles large data volumes by processing records in batches, avoids query timeouts, and integrates seamlessly with Salesforce’s platform. This approach also supports automation and error handling, making it suitable for NTO’s recurring extraction needs.

References:
Salesforce Documentation: Bulk API 2.0
Salesforce Architect Guide: Large Data Volumes Best Practices
Salesforce Help: Data Loader Guide

NTO (Northern Trail Outlets) has a complex Salesforce org which has been developed over past 5 years. Internal users are complaining abt multiple data issues, including incomplete and duplicate data in the org. NTO has decided to engage a data architect to analyze and define data quality standards. Which 3 key factors should a data architect consider while defining data quality standards? (Choose 3 answers)

A. Define data duplication standards and rules

B. Define key fields in staging database for data cleansing

C. Measure data timeliness and consistency

D. Finalize an extract transform load (ETL) tool for data migration

E. Measure data completeness and accuracy

A.   Define data duplication standards and rules

C.   Measure data timeliness and consistency

E.   Measure data completeness and accuracy

Explanation:

Defining data quality standards for a complex Salesforce org with issues like incomplete and duplicate data requires focusing on metrics and rules that directly address data integrity, usability, and reliability. Let’s analyze each option to identify the three key factors:

✅ Option A: Define data duplication standards and rules
This is a critical factor. Duplicate data is a common issue in Salesforce orgs and directly impacts user experience, reporting accuracy, and system performance. Defining standards and rules for identifying, preventing, and resolving duplicates (e.g., using matching rules, duplicate rules, or third-party tools like Data.com Dupeblocker) is essential for maintaining data quality. This addresses one of NTO’s primary complaints.

Option B: Define key fields in staging database for data cleansing
While a staging database can be useful for data cleansing in migration or integration scenarios, it is not a core component of defining data quality standards within the Salesforce org. Staging databases are typically part of implementation processes, not ongoing data quality management. This option is less relevant to the goal of establishing standards for the existing org’s data issues.

✅ Option C: Measure data timeliness and consistency
Data timeliness (ensuring data is up-to-date) and consistency (ensuring data aligns across objects and systems) are key data quality metrics. For example, outdated records or inconsistent data between related objects (e.g., Accounts and Contacts) can lead to user dissatisfaction and errors in reporting. Measuring these factors helps NTO ensure data remains relevant and reliable, addressing user complaints about data issues.

Option D: Finalize an extract transform load (ETL) tool for data migration
While ETL tools are valuable for data migration or integration, selecting a tool is a tactical decision, not a factor in defining data quality standards. ETL tools may be used to implement data quality processes, but the question focuses on defining standards, not choosing tools.

✅ Option E: Measure data completeness and accuracy
This is another critical factor. Incomplete data (e.g., missing required fields) and inaccurate data (e.g., incorrect values) are explicitly mentioned as issues by NTO’s users. Measuring completeness (ensuring all necessary fields are populated) and accuracy (ensuring data reflects reality) is fundamental to establishing data quality standards and improving user trust in the system.

✅ Why A, C, and E are Optimal:
These three factors directly address NTO’s data quality issues (incomplete and duplicate data) and align with standard data quality frameworks. Defining duplication standards (A) tackles duplicate records, measuring timeliness and consistency (C) ensures data is current and coherent, and measuring completeness and accuracy (E) addresses missing or incorrect data. Together, these form a comprehensive approach to defining data quality standards for the Salesforce org.

References:
Salesforce Documentation: Duplicate Management
Salesforce Architect Guide: Data Quality
Salesforce Help: Data Quality Best Practices