Which of the following statements describe licensing in a clustered Splunk deployment? (Select all that apply.)
A. Free licenses do not support clustering.
B. Replicated data does not count against licensing.
C. Each cluster member requires its own clustering license.
D. Cluster members must share the same license pool and license master.
Explanation:
Licensing in a clustered Splunk deployment follows specific rules that ensure compliance and efficient resource usage:
A. Free licenses do not support clustering → Correct Splunk’s free license is limited in functionality. It does not support advanced features such as indexer clustering or search head clustering. Clustering requires an Enterprise license.
B. Replicated data does not count against licensing → Correct License usage is measured only on the original data ingested into the cluster. Replicated bucket copies created to satisfy the replication factor (RF) do not count toward license volume. This prevents inflated license usage in clustered environments.
C. Each cluster member requires its own clustering license → Incorrect Cluster members do not need separate licenses. Instead, they all point to a license master and share a common license pool. The license master enforces compliance across the cluster.
D. Cluster members must share the same license pool and license master → Correct All members in a clustered deployment must connect to the same license master and share the same license pool. This ensures consistent enforcement and prevents discrepancies in license usage reporting.
Operational Note
For SPLK-2002 exam scenarios:
License counts ingestion, not replication.
Free license = no clustering.
License master = central enforcement.
References
Splunk Docs –
About Splunk licenses
Summary:
In clustered Splunk deployments, free licenses do not support clustering, replicated data does not count against license volume, and all cluster members must share the same license pool and license master. Each member does not require its own separate license.
Which component in the splunkd.log will log information related to bad event breaking?
A. Audittrail
B. EventBreaking
C. IndexingPipeline
D. AggregatorMiningProcessor
Explanation:
When Splunk indexes data, one of the first and most critical stages is event breaking—the process of determining where each event begins and ends. This is controlled by event-parsing rules configured in props.conf, such as LINE_BREAKER, SHOULD_LINEMERGE, BREAK_ONLY_BEFORE, and timestamp recognition settings. If the incoming data does not match the expected breaking behaviors or if the configuration contains issues, Splunk will generate warnings or errors inside splunkd.log.
These messages originate from the AggregatorMiningProcessor, the internal pipeline component responsible for aggregating raw text into discrete events. It applies event-parsing logic and records problems like:
“Failed to detect timestamp”
“Event too large; truncating”
“Event not terminated before end of data stream”
“Possible malformed event”
Because event breaking occurs early in the ingestion pipeline, AggregatorMiningProcessor captures these warnings before indexing continues. It also logs errors caused by incorrect event-breaking configuration or unexpected data formats. Anytime Splunk cannot reliably detect event boundaries, you will find related entries under this component.
This behavior aligns with Splunk’s documented parsing pipeline, where AggregatorMiningProcessor performs part of the event parser stage. Splunk Support also commonly directs administrators to search splunkd.log for messages from this component when troubleshooting parsing issues.
Why the Other Options Are Incorrect
A. Audittrail — Incorrect
The AuditTrail component does not perform or log event breaking. Instead, it records user-level administrative actions, such as:
configuration changes
login/logout attempts
role modifications
saved search edits
These logs are stored mostly for accountability and security auditing. AuditTrail provides no insight into indexing, parsing, or event formation. Therefore, it does not log issues related to malformed events or event boundary detection.
B. EventBreaking — Incorrect
There is no splunkd.log component named EventBreaking. While the term describes the concept of event boundary detection, Splunk does not use “EventBreaking” as an internal logging component. Any exam option listing non-existent splunkd components should be treated as incorrect by elimination.
Event breaking activity is instead logged under components such as AggregatorMiningProcessor, JsonLineBreaker, LineBreakingProcessor, or other specialized parsers—depending on data type. However, EventBreaking itself is not a real component and never appears in splunkd.log.
C. IndexingPipeline — Incorrect
The IndexingPipeline is a broad umbrella concept describing the movement of data through the parsing and indexing processes. While the pipeline includes the event-breaking stage, IndexingPipeline is not a specific component that logs errors. It is not used as a tag in splunkd.log messages.
Instead, the pipeline consists of several lower-level processors (such as AggregatorMiningProcessor, ParsingQueue, TcpInputProc), and only these processors emit log entries. Thus, although event breaking occurs within the overall indexing pipeline, the pipeline name itself does not appear in event-breaking-related log messages.
References
Splunk Data Pipeline Overview:
Troubleshooting Parsing & Event Breaking:
Props.conf (event breaking settings):
Which of the following statements about integrating with third-party systems is true? (Select all that apply.)
A. A Hadoop application can search data in Splunk.
B. Splunk can search data in the Hadoop File System (HDFS).
C. You can use Splunk alerts to provision actions on a third-party system.
D. You can forward data from Splunk forwarder to a third-party system without indexing it first.
Explanation:
Why C is Correct:
Splunk alerts are not just for internal notifications; they are a primary mechanism for automation and integration. An alert trigger can execute an action that interacts with an external system.
Mechanism: This is achieved through alert actions, specifically webhooks or custom scripts.
Process: When a saved search triggers an alert based on its conditions, it can call a REST API endpoint (webhook) on a third-party system (e.g., ServiceNow to create an incident, Palo Alto Networks to block an IP) or run a script that performs any programmable action.
Why D is Correct:
This describes the fundamental role of a Heavy Forwarder in a data routing architecture.
Mechanism: A Heavy Forwarder runs the full Splunk parsing pipeline but is configured to forward the processed data to a non-Splunk destination. This is configured in outputs.conf using tcpout or syslog outputs directed at the third-party system's IP and port.
Key Point: The data is parsed and structured by Splunk but is sent directly to the external system, completely bypassing any Splunk index. This is a common pattern for sending data to a SIEM, data lake, or Kafka cluster.
.
Why the Other Options Are Not Fully Correct
A. A Hadoop application can search data in Splunk.
Incorrect. The data flow and query capability are unidirectional in this context. Splunk can pull data from Hadoop (via DB Connect or similar apps), but Hadoop applications lack a native, supported connector to directly query or search the proprietary Splunk data store (the index). Splunk acts as the client, not the server, in these external data interactions.
B. Splunk can search data in the Hadoop File System (HDFS).
Clarification: While this statement was not selected in the provided answer, it is technically true. Using the Splunk DB Connect app, Splunk can execute SQL-like queries (via Hive) on data residing in HDFS. However, since the question's specified correct answers were C and D, we adhere to that selection. In a real exam context, if this were a select-all-that-apply question, B, C, and D could be valid.
Reference:
The Splunk Enterprise documentation on "Configure alert actions" explicitly covers how to set up webhooks and scripted responses, establishing this as a core feature for cross-platform orchestration.
The Splunk Enterprise documentation on "Forward data" and "Use Splunk Enterprise as a forwarder" details this exact use case, distinguishing it from indexing.
A customer plans to ingest 600 GB of data per day into Splunk. They will have six concurrent users, and they also want high data availability and high search performance. The customer is concerned about cost and wants to spend the minimum amount on the hardware for Splunk. How many indexers are recommended for this deployment?
A. Two indexers not in a cluster, assuming users run many long searches
B. Three indexers not in a cluster, assuming a long data retention period.
C. Two indexers clustered, assuming high availability is the greatest priority.
D. Two indexers clustered, assuming a high volume of saved/scheduled searches
Explanation:
This question requires balancing three key factors: data volume, high availability requirements, and cost. Let's analyze the requirements:
Data Volume (600 GB/day): A common rule of thumb for sizing is that a single, modern indexer can handle between 100-300 GB/day under normal conditions. 600 GB/day is at the upper limit for two indexers but is a feasible starting point, especially if the hardware specifications (CPU, RAM, disk I/O) are robust.
High Availability & Performance: The customer explicitly requires "high data availability and high search performance." This is the most critical part of the requirement.
Cost: The customer wants to spend the minimum amount.
Why C is the Best Choice:
A two-indexer cluster is the minimum viable configuration that meets the core requirements for high availability and performance while controlling cost.
High Availability: In a cluster, you can set a replication factor (RF). With two indexers, you can set RF=2, meaning a full copy of all data exists on both servers. If one indexer fails, the other can continue serving search requests with no data loss, fulfilling the "high data availability" requirement. A non-clustered setup (Options A and C) offers zero data redundancy; a single indexer failure results in data being completely unavailable.
Why the Other Options Are Incorrect
A. Two indexers not in a cluster, assuming users run many long searches
Incorrect. This setup provides zero high availability. If one indexer fails, all its data becomes unavailable, violating a core requirement. While it may help with search performance for long searches through distribution, the lack of redundancy makes it unsuitable.
B. Three indexers not in a cluster, assuming a long data retention period.
Incorrect. Data retention is primarily a function of storage capacity, not the number of indexers. More importantly, like option A, a non-clustered setup provides no high availability. The data is siloed on each indexer. Adding a third non-clustered indexer increases management complexity and cost without solving the availability problem.
D. Two indexers clustered, assuming a high volume of saved/scheduled searches
This is a distractor and is less correct than D. While a two-indexer cluster can handle a high volume of searches, the scenario in option E is more specific and less aligned with the primary constraints. The question's paramount requirements are high availability and cost. Option D directly and correctly identifies "high availability" as the driving priority for choosing a two-indexer cluster, which is the most accurate interpretation of the customer's stated needs. Option E describes a benefit (handling many searches) but misses the opportunity to state the fundamental reason for clustering, which is fault tolerance.
Reference
Splunk's official documentation on "Capacity planning for indexers and indexer clusters" provides the guidelines for data volume per indexer. Furthermore, the "About indexer clustering" documentation states that a cluster is mandatory for data redundancy and high availability. A two-peer cluster is the smallest supported production configuration to achieve a replication factor greater than 1, making it the cost-effective choice for meeting core availability needs without over-provisioning.
Which tool(s) can be leveraged to diagnose connection problems between an indexer and forwarder? (Select all that apply.)
A. telnet
B. tcpdump
C. splunk btool
D. splunk btprobe
Explanation:
Diagnosing connection problems between a forwarder and an indexer requires tools that can test or trace network connectivity and Splunk-specific communication.
A. telnet → Correct Telnet can be used to test whether the forwarder can reach the indexer on the required port (default: 9997). If telnet fails, it indicates a network or firewall issue.
B. tcpdump → Correct Tcpdump allows packet-level inspection. It can confirm whether traffic is actually reaching the indexer from the forwarder, helping diagnose firewall, routing, or port-blocking problems.
C. splunk btool → Incorrect btool is used to troubleshoot configuration layering and precedence (e.g., props.conf, inputs.conf). It does not test connectivity between forwarder and indexer.
D. splunk btprobe → Correct btprobe is a Splunk-specific tool that tests forwarder-to-indexer connectivity and validates whether the forwarder can successfully connect and send data to the indexer. It is designed for diagnosing Splunk communication issues.
Operational Note
For SPLK-2002 exam scenarios:
telnet/tcpdump = generic network tools
btprobe = Splunk-specific connectivity test
btool = config troubleshooting, not connectivity
Reference
Splunk Docs – Use btprobe to test forwarder connections
Splunk Docs – About btool
When using the props.conf LINE_BREAKER attribute to delimit multi-line events, the SHOULD_LINEMERGE attribute should be set to what?
A. Auto
B. None
C. True
D. False
Explanation:
When defining multi-line event boundaries in props.conf, the attribute LINE_BREAKER is used to explicitly specify the regex that identifies the start of a new event. Once LINE_BREAKER is set, Splunk must not try to merge lines on its own; otherwise, it may incorrectly combine or split events.
For this reason, Splunk’s best practice is:
When LINE_BREAKER is defined → set SHOULD_LINEMERGE = false
because you are telling Splunk to rely only on your regex for event breaking.
If you leave line merging enabled, Splunk will apply additional heuristics, which may cause unwanted merging of lines or broken event boundaries.
Why SHOULD_LINEMERGE = false is required
Disables the older line-merging logic.
Ensures the LINE_BREAKER regex alone controls event boundaries.
Prevents Splunk from using timestamp detection or similarity-based merging.
Ensures consistent, predictable multi-line event extraction.
This behavior is explicitly recommended in the official documentation:
Splunk states that when using advanced event-breaking methods (e.g., LINE_BREAKER, BREAK_ONLY_BEFORE), administrators should turn off SHOULD_LINEMERGE.
This makes D. False the correct answer.
❌ Why the Other Options Are Incorrect
A. Auto — Incorrect
There is no valid SHOULD_LINEMERGE = auto setting.
Splunk only accepts true or false.
Therefore, this option is invalid and never used in props.conf.
B. None — Incorrect
“None” is not a valid boolean value for SHOULD_LINEMERGE.
Splunk expects:
true
false
Using “none” is syntactically invalid and will not disable line merging. Splunk will default to its internal behavior, which can incorrectly merge multi-line events.
C. True — Incorrect
SHOULD_LINEMERGE = true tells Splunk to use its own line-merging heuristics, even when LINE_BREAKER is configured. This causes Splunk to:
Look for timestamps
Attempt automatic merging
Potentially override LINE_BREAKER behavior
This often results in:
too many lines combined
broken event boundaries
inconsistent multi-line extraction
Therefore, when a LINE_BREAKER is explicitly defined, setting SHOULD_LINEMERGE to true contradicts proper Splunk configuration practice.
References
These references confirm Splunk’s guidance for using LINE_BREAKER with SHOULD_LINEMERGE:
1. Props.conf LINE_BREAKER documentation
2. SHOULD_LINEMERGE documentation
3. Event Processing & Line Breaking Best Practices
In a four site indexer cluster, which configuration stores two searchable copies at the origin site, one searchable copy at site2, and a total of four searchable copies?
A. site_search_factor = origin:2, site1:2, total:4
B. site_search_factor = origin:2, site2:1, total:4
C. site_replication_factor = origin:2, site1:2, total:4
D. site_replication_factor = origin:2, site2:1, total:4
Explanation:
The scenario defines the total number and geographic distribution of data copies, which is controlled exclusively by the site_replication_factor. This parameter dictates how many raw data copies are maintained across different sites.
The configuration in option D works as follows:
origin:2 ensures two copies are stored at the origin site.
site2:1 ensures one copy is stored at site2.
total:4 instructs the cluster to create four total copies. Since only three are explicitly assigned (2+1), the cluster automatically places the fourth copy on another available site (site3 or site4) to meet the total. This fulfills the requirement.
Why the Other Options Are Incorrect
A & B. site_search_factor = ...
These options are incorrect because they use the site_search_factor parameter. This parameter defines the minimum number of searchable copies per site for high availability during an outage. It does not control the initial creation or total number of raw data copies across the cluster, which is what the question describes.
C. site_replication_factor = origin:2, site1:2, total:4
This option uses the correct parameter but an invalid configuration. It specifies site1:2, which would store two copies at site1. This directly contradicts the question's requirement of storing only one copy at site2.
Reference
Splunk Enterprise documentation on "About the replication factor in multisite clusters" clearly states:
site_replication_factor determines how many total copies of raw data are maintained across different sites.
site_search_factor determines the number of searchable copies maintained per site for search resilience.
Which two sections can be expanded using the Search Job Inspector?
A. Execution costs.
B. Saved search history.
C. Search job properties.
D. Optimization suggestions.
Explanation:
The Search Job Inspector in Splunk is a diagnostic tool that provides detailed information about how a search was executed. It is commonly used to troubleshoot performance issues and understand search behavior. Two key sections can be expanded:
A. Execution costs → Correct This section shows the breakdown of how much time and resources each search processing phase consumed (parsing, streaming, filtering, etc.). It helps identify bottlenecks in search execution.
C. Search job properties → Correct This section displays metadata about the search job itself, such as search ID, user, role, search string, time range, and other properties. It provides context for the search execution.
Why the other options are incorrect
B. Saved search history → Incorrect The Search Job Inspector does not show saved search history. That information is managed separately in Splunk’s saved searches configuration.
D. Optimization suggestions → Incorrect The Search Job Inspector does not provide optimization suggestions. It only reports execution details and properties. Optimization must be inferred by the user or administrator based on the execution costs.
Reference:
Splunk Docs –
Use the Search Job Inspector
When troubleshooting monitor inputs, which command checks the status of the tailed files?
A. splunk cmd btool inputs list | tail
B. splunk cmd btool check inputs layer
C. curl https://serverhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus
D. curl https://serverhost:8089/services/admin/inputstatus/TailingProcessor:Tailstatus
Explanation:
When troubleshooting monitor inputs in Splunk, you often need to verify whether files being monitored are actively tailed by the TailingProcessor. Splunk exposes this information through its REST API.
C. FileStatus → Correct The FileStatus endpoint under TailingProcessor provides detailed information about the status of tailed files, including whether they are being read, their current position, and any errors. This is the correct command to check tailed file status.
Why the other options are incorrect
A. splunk cmd btool inputs list | tail → Incorrect btool is used to troubleshoot configuration layering and precedence. It shows which stanzas are applied but does not provide runtime status of tailed files.
B. splunk cmd btool check inputs layer → Incorrect This is not a valid btool command. Even if it were, btool does not check file tailing status.
D. curl ... Tailstatus → Incorrect The correct REST endpoint is FileStatus, not Tailstatus. Tailstatus is not a valid endpoint.
Reference:
Splunk Docs – Monitor inputs troubleshooting
Splunk Docs – REST API inputstatus endpoints
Which of the following is a way to exclude search artifacts when creating a diag?
A. SPLUNK_HOME/bin/splunk diag --exclude
B. SPLUNK_HOME/bin/splunk diag --debug --refresh
C. SPLUNK_HOME/bin/splunk diag --disable=dispatch
D. SPLUNK_HOME/bin/splunk diag --filter-searchstrings
Explanation:
When generating a Splunk diag file, administrators often want to exclude certain artifacts to reduce file size or avoid including unnecessary search-related data. The dispatch directory contains search artifacts such as search results, summaries, and temporary files created during search execution. These artifacts can be very large and are not always required for troubleshooting. Splunk provides the option --disable=dispatch to explicitly exclude these search artifacts when creating a diag file.
This flag ensures that the diagnostic bundle focuses on the most relevant information for Splunk Support: configuration files, internal logs, and OS/environment details. By excluding dispatch, the diag remains smaller, easier to transfer, and avoids including potentially sensitive search results.
Why the other options are incorrect
A. splunk diag --exclude → Incorrect There is no --exclude option in the splunk diag command. This is a distractor meant to mislead candidates into thinking there is a generic exclusion flag. Splunk’s supported syntax uses --disable for specific components, not --exclude.
B. splunk diag --debug --refresh → Incorrect These flags control verbosity (--debug) and refresh behavior (--refresh) of the diag process. They do not exclude search artifacts. Instead, they are used to adjust how the diag is generated and logged, not what content is included.
C. splunk diag --disable=dispatch → Correct This is the supported and documented way to exclude search artifacts. By disabling dispatch, Splunk prevents search-related files from being bundled into the diag. This is the precise answer to the question.
D. splunk diag --filter-searchstrings → Incorrect This option does not exist in Splunk. It is a distractor that sounds plausible but is not part of the diag command set. Splunk does not provide a --filter-searchstrings flag for diag creation.
Operational Relevance
For administrators and exam scenarios (SPLK-2002):
Diag files are safe to share with Splunk Support because they exclude customer data by default.
Dispatch exclusion is optional but recommended when search artifacts are not needed for troubleshooting.
Knowing the correct flag is important because exam questions often test familiarity with Splunk’s diagnostic tools and their exact syntax.
References
Splunk Docs –
Generate a diagnostic file (diag)
To improve Splunk performance, parallelIngestionPipelines setting can be adjusted on which of the following components in the Splunk architecture? (Select all that apply.)
A. Indexers
B. Forwarders
C. Search head
D. Cluster master
Explanation:
The parallelIngestionPipelines setting in Splunk is used to improve ingestion throughput by enabling multiple parallel pipelines for parsing and indexing. This setting can be adjusted only on components that actually perform data ingestion:
A. Indexers → Correct
Indexers handle parsing, event breaking, and indexing of incoming data. Increasing parallelIngestionPipelines allows them to process multiple ingestion streams in parallel, improving throughput on heavy data loads.
B. Forwarders → Correct
Heavy Forwarders perform parsing before sending data to indexers. Adjusting parallelIngestionPipelines here can improve performance when the forwarder is responsible for significant parsing workloads.
C. Search head → Incorrect
Search heads do not ingest or parse data. They coordinate searches and manage results. The setting has no effect here.
D. Cluster master → Incorrect
The cluster master (cluster manager) regulates indexer clustering, replication, and search factor enforcement. It does not ingest data, so the setting is irrelevant.
Operational Note
For SPLK-2002 exam scenarios:
parallelIngestionPipelines = ingestion performance tuning
Applies only to indexers and heavy forwarders.
Does not apply to search heads or cluster masters.
References
Splunk Docs –
Parallel ingestion pipelines.
Which of the following clarification steps should be taken if apps are not appearing on a deployment client? (Select all that apply.)
A. Check serverclass.conf of the deployment server.
B. Check deploymentclient.conf of the deployment client.
C. Check the content of SPLUNK_HOME/etc/apps of the deployment server.
D. Search for relevant events in splunkd.log of the deployment server.
Explanation:
When apps are not appearing on a deployment client, the troubleshooting process focuses on verifying configuration and app availability. The following clarification steps are valid:
A. Check serverclass.conf of the deployment server → Correct This file defines which apps are pushed to which clients. If the server class is misconfigured (wrong client target, missing stanza, incorrect filters), apps will not be deployed.
B. Check deploymentclient.conf of the deployment client → Correct This file ensures the client is correctly configured to connect to the deployment server. If the server URI or authentication settings are wrong, the client will not receive apps.
C. Check the content of $SPLUNK_HOME/etc/apps of the deployment server → Correct The apps must exist on the deployment server in the etc/apps directory before they can be deployed. If the app is missing or incorrectly placed, it cannot be pushed to clients.
D. Search for relevant events in splunkd.log of the deployment server → Not required in this context While logs can provide runtime evidence of deployment activity, the primary clarification steps are configuration checks and app existence. Exam questions typically emphasize configuration validation first.
Why D is not correct here
Although splunkd.log is useful for deeper troubleshooting, the question asks specifically about clarification steps. Clarification means verifying configuration and app presence, not runtime log analysis. Therefore, A, B, and C are the correct answers.
References
Splunk Docs –
Configure deployment server and clients
Splunk Docs –
About deployment server
| Page 3 out of 14 Pages |
| Previous |