Free CompTIA DY0-001 Practice Questions 2026 - Page 2

Timed Practice Test

Think You're Ready?

Your Final Exam Before the Final Exam.
Dare to Take It?

Which of the following image data augmentation techniques allows a data scientist to increase the size of a data set?

A. Clipping

B. Cropping

C. Masking

D. Scaling

B.   Cropping

Explanation:
The goal of data augmentation is to artificially expand the size and diversity of a training dataset without collecting new data. This is done by applying random-yet-realistic transformations to existing images.

How Cropping Increases Data Set Size:
When you apply random cropping to an image, you select a random portion of the original image (e.g., a 200x200 pixel section from a 250x250 image). This creates a new, unique image that is added to the training set. By performing this process multiple times on each original image, you can generate hundreds or thousands of new training examples from a single source image. This effectively "increases the size of the data set."

Why the Other Options Are Not Correct
A. Clipping
Purpose: In the context of image processing, clipping typically refers to restricting pixel values to a certain range (e.g., clipping values above 255 to 255 after a contrast adjustment) or clipping graphical elements that extend beyond a canvas boundary.
Why it's wrong: Clipping is an operation that constrains data; it does not create new data. It might be a step in a normalization process, but it does not generate new image files or variants that can be added to a dataset. Therefore, it does not increase the dataset size.

C. Masking
Purpose: Masking involves applying a binary mask to an image to isolate or hide specific regions. For example, you might mask out the background of a photo to focus on the main object.
Why it's wrong: While masking can be a useful pre-processing step to help a model focus on relevant features, it is not a primary data augmentation technique. Applying the same mask to an image does not create meaningfully new training examples in the way that geometric transformations do. It alters the image but doesn't typically generate multiple unique variants from a single source for the purpose of dataset expansion.

D. Scaling
Purpose: Scaling (or resizing) changes the dimensions of an image (e.g., from 500x500 pixels to 250x250 pixels).
Why it's misleading and not the best answer:
Scaling is a pre-processing requirement, not a standard data augmentation technique. Convolutional Neural Networks (CNNs) require all input images to be the same size. Therefore, you must scale all images to a fixed dimensions (e.g., 224x224) before feeding them into the model. While you could technically create different scaled versions, the standard practice is to scale everything to one consistent size. It is not used to create new training examples but to standardize existing ones. Cropping, flipping, and rotating are used for augmentation; scaling is used for pre-processing.

References
Deep Learning Frameworks:
The documentation for popular deep learning libraries like TensorFlow and PyTorch explicitly list RandomCrop as a core image augmentation transformation. For instance, torchvision.transforms.RandomCrop is designed to "Crop the given image at a random location" to increase data diversity.
Image Data Augmentation Standards:
Foundational textbooks and resources on computer vision (e.g., Deep Learning for Computer Vision by Rajalingappaa Shanmugamani) consistently list cropping, along with rotating, flipping, and color jittering, as fundamental techniques for dataset expansion. Scaling is universally listed as a pre-processing step, not an augmentation step.

Which of the following distance metrics for KNN is best described as a straight line?

A. Radial

B. Euclidean

C. Cosine

D. Manhattan

B.   Euclidean

Explanation:
In geometry, the shortest path between two points is a straight line. The Euclidean distance formula calculates the length of this straight-line segment in Euclidean space.

Formula:
For two points in a 2-dimensional space, p = (p1, p2) and q = (q1, q2), the Euclidean distance is calculated as:
d(p, q) = √( (q1 - p1)² + (q2 - p2)² )
This is a direct application of the Pythagorean theorem.
Visualization:
If you plot two points on a graph and draw a line directly between them, measuring the length of that line gives you the Euclidean distance. This holds true for higher dimensions as well (3D, 4D, etc.), even if we can't visualize it.

Why the Other Options Are Not Correct
A. Radial
Explanation: "Radial" is not a standard, standalone distance metric in machine learning or mathematics. The term is often used as an adjective. For example:
Radial Basis Function (RBF): A kernel used in SVMs that uses the Euclidean distance in its calculation.
Radial Distance: This can sometimes refer to the distance from a central point (origin) in a polar coordinate system.
Why it's wrong:It is not the specific name of a distance metric and does not describe a straight-line distance between two arbitrary points.

C. Cosine
Purpose: Cosine distance (derived from Cosine Similarity) measures the cosine of the angle between two vectors. It is a measure of orientation, not magnitude or direct distance. Why it's wrong: It does not measure a straight-line path. Two vectors can be pointing in the same direction (cosine distance = 0) but be very far apart in terms of straight-line Euclidean distance. It is best used for comparing documents in text analysis where the magnitude (length of the document) is less important than the content direction.

D. Manhattan
Purpose: Also known as "city block" distance or L1 norm, it measures the distance between two points by summing the absolute differences of their coordinates.
Why it's wrong: It explicitly does not measure a straight line. Imagine being in a city with a grid-like street layout. The Manhattan distance is the total length of the path you would walk along the streets to get from point A to point B (e.g., 3 blocks North and 4 blocks East). The straight-line (Euclidean) distance would be the length of the diagonal you could walk through the buildings, which is shorter (√(3² + 4²) = 5 blocks).

Valid References:
Mathematics & Geometry:
Euclidean distance is a foundational concept in Euclidean geometry, named after the ancient Greek mathematician Euclid.
Machine Learning Textbooks:
Introductory texts like An Introduction to Statistical Learning (ISL) or Pattern Recognition and Machine Learning clearly define Euclidean distance as the straight-line distance between points in a feature space, especially in the context of K-Nearest Neighbors (KNN) and K-Means clustering algorithms.

Which of the following explains back propagation?

A. The passage of convolutions backward through a neural network to update weights and biases

B. The passage of accuracy backward through a neural network to update weights and biases

C. The passage of nodes backward through a neural network to update weights and biases

D. The passage of errors backward through a neural network to update weights and biases

D.   The passage of errors backward through a neural network to update weights and biases

Explanation:

What backpropagation is:
Backpropagation (“backward propagation of errors”) is the algorithm used to train neural networks.

It works by:
Forward pass
→ Input moves forward through the network to compute predictions.
Error calculation
→ Compare predictions to actual output using a loss function.
Backward pass
→ Compute the gradient of the error with respect to each weight (via chain rule / calculus).
Weight update
→ Adjust weights and biases to reduce error (using gradient descent).
Key concept: errors (not accuracy, not nodes, not convolutions) are propagated backward.

Option Analysis:
A. The passage of convolutions backward… ❌
Convolutions are specific to CNNs (convolutional neural networks).
Backpropagation applies to all neural networks, not just CNNs.
Wrong scope.
B. The passage of accuracy backward… ❌
Accuracy is a performance metric, not what gets propagated.
What actually propagates is the error/gradient.
C. The passage of nodes backward… ❌
Nodes (neurons) are structural parts of the network, not what propagates backward.
Incorrect.
D. The passage of errors backward… ✅
Correct definition: backpropagation = passing errors/gradients backward to adjust weights and biases.

📝 Exam Tip:
When you see “backpropagation”, always think:
Error → Gradient → Weight update.
Key phrase: “Backward propagation of errors.”

📚 References:
CompTIA DataX DY0-001 Exam Objectives, Domain 4.0: Machine Learning and AI Concepts — explain neural network training (forward pass, backpropagation).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning.
Stanford CS231n: Backpropagation Notes

The passage of errors backward through a neural network to update weights and biases

A. INNER JOIN

B. LEFT OUTER JOIN

C. RIGHT OUTER JOIN

D. FULL OUTER JOIN

D.   FULL OUTER JOIN

Explanation:
Backpropagation, short for “backward propagation of errors,” is the fundamental algorithm used to train artificial neural networks. It enables the network to learn by systematically reducing the difference between predicted values and actual outcomes. The algorithm operates in two major stages: the forward pass and the backward pass.

1.Forward Pass
Input data flows forward through the layers of the network.
Each neuron applies weights and biases, then passes its activation to the next layer.
At the final layer, the model produces a prediction.
2.Error Calculation
A loss fun
ction (such as Mean Squared Error or Cross-Entropy) computes the difference between the model’s output and the true label.
This error quantifies how far off the prediction is.
3.Backward Pass
The error is then propagated backward through the network.
Using the chain rule of calculus, the algorithm computes partial derivatives (gradients) of the error with respect to each weight and bias.
These gradients show how much each parameter contributed to the overall error.
4.Weight and Bias Update
The gradients are used in an optimization algorithm, commonly gradient descent, to update weights and biases in the opposite direction of the error.
Over many iterations (epochs), the model improves and predictions become more accurate.
The key point: it is the passage of errors that flows backward, not convolutions, accuracy, or nodes.

Why Other Options Are Incorrect
A. The passage of convolutions backward…
Convolutions are specific operations used in Convolutional Neural Networks (CNNs). While convolutional filters are updated using backpropagation, the general definition of backpropagation does not refer to “convolutions” themselves. The process is not limited to CNNs but applies to all feedforward networks. Therefore, this option is too narrow and misleading.
B. The passage of accuracy backward…
Accuracy is a performance metric that indicates how many predictions are correct relative to total predictions. Accuracy is not a differentiable function suitable for optimization, so it is not what gets propagated. Instead, errors from a loss function (which are differentiable) are backpropagated. This option is incorrect because backpropagation optimizes loss, not accuracy.
C. The passage of nodes backward…
Nodes (neurons) are the structural units of a neural network. They do not “move backward.” What flows backward are the gradients of the error with respect to each parameter. Saying that nodes are passed backward is a misunderstanding of how neural networks work. Thus, this option is false.
D. The passage of errors backward…
This is the correct explanation. Backpropagation literally means propagating the error backward through the network to calculate how much each parameter needs to adjust. This enables learning and optimization.

References
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. [Chapter 6: Deep Feedforward Networks].
Stanford CS231n: Backpropagation Notes
Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.

A data analyst wants to use compression on an analyzed data set and send it to a new destination for further processing. Which of the following issues will most likely occur?

A. Library dependency will be missing.

B. Server CPU usage will be too high

C. Operating system support will be missing

D. Server memory usage will be too high

B.   Server CPU usage will be too high

Explanation:
When a data analyst applies compression before transferring a dataset, the major benefit is reduced file size and faster transmission across networks. However, compression algorithms (e.g., gzip, bzip2, LZ4) are CPU-intensive because they must repeatedly scan the dataset, find patterns, and encode them efficiently.
Thus, the most likely issue from applying compression is increased CPU usage during both compression (at the sender side) and decompression (at the destination side).

This trade-off is common in data pipelines:
Smaller file sizes → faster transfer & lower storage cost.
Higher CPU usage → slower local processing if CPU is a bottleneck.

Why Other Options Are Incorrect
A. Library dependency will be missing.
Compression typically uses well-established, widely supported libraries (gzip, zip, tar, etc.).
Missing dependencies are rare because most modern OSes and data tools natively support common compression formats.
❌ Not the most likely issue.
B. Server CPU usage will be too high.
Compression/decompression requires significant CPU cycles to encode and decode data.
This is the main expected drawback of compression.
✅ Correct.
C. Operating system support will be missing.
Major operating systems all support common compression algorithms (Linux, Windows, macOS, cloud environments).
❌ Very unlikely.
D. Server memory usage will be too high.
Some algorithms use extra memory, but compared to CPU usage, memory overhead is relatively minor.
The bigger and more predictable impact is CPU load, not RAM exhaustion.
❌ Not the best answer.

📝 Exam Tip
When you see “compression” in a CompTIA exam question, think:
Pros: Smaller size, faster transfer, cheaper storage.
Cons: Higher CPU usage for compressing/decompressing.
Don’t confuse with “encryption” (which stresses CPU + key libraries).

📚 References
CompTIA Data+ (DA0-001) Exam Objectives, Domain 2.0: Data Mining – Apply data transformation techniques (compression, normalization, encryption).
Microsoft Docs: Data compression impact on CPU
AWS Big Data Blog: Tradeoffs of data compression

A data scientist wants to digitize historical hard copies of documents. Which of the following is the best method for this task?

A. Word2vec

B. Optical character recognition

C. Latent semantic analysis

D. Semantic segmentation

B.   Optical character recognition

Explanation:
The task is to "digitize historical hard copies." This means converting a physical document into a digital format. The core requirement is to extract the textual content from the image of the document so it can be used in a computer (e.g., searched, edited, stored in a database).

How OCR works:
OCR software analyzes an image of a document. It identifies light and dark areas to determine shapes (characters, numbers, symbols). Using pattern recognition and feature detection algorithms, it then translates those shapes into actual text characters (e.g., ASCII or Unicode).
Best for the Task:
OCR is the direct technological solution to this exact problem. It transforms a picture of text into actual text data. Modern OCR systems can handle various fonts and even handwritten text, which is crucial for historical documents.

Why the Other Options Are Not Correct
A. Word2vec
Purpose: Word2vec is a Natural Language Processing (NLP) technique used to create "word embeddings." It converts words into high-dimensional vectors such that words with similar meanings have similar vector representations.
Why it's wrong: Word2vec requires text data as its input. It is a method for analyzing and representing text that is already digitized. It cannot process a hard copy or an image of a document. It is a step that would come after OCR has already performed the digitization.

C. Latent semantic analysis (LSA)
Purpose: LSA is another NLP technique used to analyze relationships between a set of documents and the terms they contain. It is used for topic modeling, document classification, and discovering hidden semantic structures (hence "latent semantic").
Why it's wrong:Like Word2vec, LSA operates on text that is already in a digital, machine-readable format. It is an analysis method, not a digitization method. It is completely incapable of extracting text from an image.
D. Semantic segmentation
Purpose:
Semantic segmentation is a computer vision technique where every pixel in an image is classified into a specific category (e.g., "road," "car," "building," "person").
Why it's wrong: While it works on images, its goal is completely different. It is used for understanding scenes and the spatial layout of objects within an image. It could, in theory, be used to identify which regions of a document image contain "text" versus "images" or "background," but it does not perform the actual conversion of image pixels to text characters. That is the exclusive job of OCR. Semantic segmentation is a potential pre-processing step for a complex OCR pipeline, but it is not the method for digitization itself.

Valid References

1.Industry Standard Tooling:
Software like Adobe Acrobat, Google Docs' "Open with Google Docs" feature for PDFs, and open-source tools like Tesseract OCR are built around OCR technology. They are the standard tools for this digitization task.
2.Nutanix Use Case:
While Nutanix provides the infrastructure (e.g., storing scanned document images on Nutanix Objects, running OCR applications on VMs or Kubernetes clusters via Karbon), the core technology performing the digitization is the OCR software itself. A data scientist on the Nutanix platform would leverage these infrastructure services to host and execute their chosen OCR solution.

A company created a very popular collectible card set. Collectors attempt to collect the entire set, but the availability of each card varies, because some cards have higher production volumes than others. The set contains a total of 12 cards. The attributes of the cards are shown.
The data scientist is tasked with designing an initial model iteration to predict whether the animal on the card lives in the sea or on land, given the card's features: Wrapper color, Wrapper shape, and Animal.
Which of the following is the best way to accomplish this task?

A. ARIMA

B. Linear regression

C. Association rules

D. Decision trees

D.   Decision trees

Explanation:

Best Approach: Decision Trees
The problem is to predict whether an animal lives in the sea or on land using categorical features: Wrapper color, Wrapper shape, and Animal. This is a classification task, not a forecasting or continuous prediction problem. Decision trees are the best option here because they are designed for classification and regression, can easily handle categorical data, and are highly interpretable. A decision tree can split on attributes like Animal = Whale or Wrapper color = Red and directly predict the outcome (Sea or Land).
Decision trees are also advantageous because they can deal with non-linear relationships and interactions between features without requiring complex preprocessing. For an initial iteration of the model, interpretability is key, and decision trees provide a clear path of logic that can be communicated to non-technical stakeholders.
📖 Reference:
Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Scikit-learn: Decision Tree Classifier .

Why the Other Options Are Not Correct
A. ARIMA
ARIMA (AutoRegressive Integrated Moving Average) is used for time-series forecasting. It models patterns like trends, seasonality, and autocorrelation across time. In this dataset, there is no temporal component—cards are not sequential time-based observations. Predicting Sea vs Land is a classification task, while ARIMA outputs continuous numeric forecasts. Therefore, ARIMA does not apply.
📖 Reference:
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.
B. Linear Regression
Linear regression is a method used to predict a continuous dependent variable from one or more independent variables. For example, predicting a house price from square footage and location. In this case, the dependent variable (Habitat) is categorical (Sea vs Land), not continuous. While logistic regression would be appropriate for binary classification, linear regression is not correct because it assumes numeric continuous output. Applying linear regression here would yield invalid probabilities and poor performance.
📖 Reference:
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R (2nd ed.). Springer.
C. Association Rules
Association rule mining (e.g., Apriori, FP-Growth) is used to identify relationships between items in transactional data. For example, in retail: “If a customer buys bread, they are likely to buy butter.” While association rules reveal interesting patterns in co-occurrence data, they are not predictive classification models. Here, the goal is to predict a single label (Sea or Land) based on features. Association rules would not directly output a class prediction for unseen data, making them unsuitable for this task.
📖 Reference:
Agrawal, R., Imieliński, T., & Swami, A. (1993). "Mining association rules between sets of items in large databases." ACM SIGMOD Record, 22(2), 207–216.

Why Decision Trees Are Best
Handles categorical data: Attributes like wrapper color, shape, and animal type can be directly split in a tree.
Performs classification: Directly outputs a class (Sea or Land).
Interpretability: The path from root to leaf provides clear decision logic.
Initial iteration suitability: Easy to implement, visualize, and explain before moving to more complex ensemble methods (Random Forests, Gradient Boosting).

A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?

A. SOAP

B. RPC

C. JSON

D. REST

D.   REST

Explanation:

Option Analysis
A. SOAP (Simple Object Access Protocol)
SOAP is an older web service protocol.
Uses XML, strict rules, and more overhead.
It’s more complex to implement and consumes more bandwidth.
❌ Not the best for minimal development effort.
B. RPC (Remote Procedure Call)
RPC allows executing code on another server like a local call.
It’s powerful but tightly coupled, requires more maintenance, and less flexible across multiple departments.
❌ Not ideal for interoperability and ease of use.
C. JSON (JavaScript Object Notation)
JSON is a data format, not an API itself.
It’s widely used for data exchange, especially within REST APIs.
On its own, JSON is not the mechanism for serving the model.
❌ Useful as payload format, but not the API standard.
D. REST (Representational State Transfer)
REST is the most widely used web API architecture.
Uses HTTP methods (GET, POST, PUT, DELETE).
Lightweight, stateless, scalable, and easy to consume across multiple platforms and departments.
Typically uses JSON for data exchange, making it very simple to integrate.
✅ Best choice for minimal development effort and maximum accessibility.

✅ Final Answer:
D. REST

📖 References:
Fielding, R. T. (2000). Architectural Styles and the Design of Network-based Software Architectures (REST dissertation).
Richardson, L., & Ruby, S. (2007). RESTful Web Services. O’Reilly.
IBM Developer Docs: SOAP vs REST

A data scientist is building a model to predict customer credit scores based on information collected from reporting agencies. The model needs to automatically adjust its parameters to adapt to recent changes in the information collected. Which of the following is the best model to use?

A. Decision tree

B. Random forest

C. Linear discriminant analysis

D. XGBoost

D.   XGBoost

Explanation:
The key requirement in the question is that the model must "automatically adjust its parameters to adapt to recent changes." This describes a need for online learning or incremental learning, where a model can update itself efficiently as new data arrives, rather than being retrained from scratch on the entire dataset every time.

Why XGBoost is the best choice:
Incremental Learning (Warm Start):
XGBoost has a built-in capability for this. After an initial model is trained, you can provide new data and use the xgb.train() function with the xgb_model parameter set to the existing model. This allows the new trees to be built to correct the errors of the existing ensemble, effectively adapting the model to new patterns in the data.
Handles Dynamic Data:
Credit data is not static. Economic conditions, reporting agency practices, and consumer behaviors change. XGBoost's ability to update itself makes it well-suited for this dynamic environment where the underlying patterns may drift over time.
State-of-the-Art Performance:
For structured/tabular data like credit information, gradient boosting algorithms like XGBoost consistently rank among the top performers in machine learning competitions (like Kaggle) due to their high predictive power.

Why the Other Options Are Not Correct
A. Decision Tree & B. Random Forest
Standard Implementation is Batch Learning: Classic implementations of decision trees and Random Forests are batch learners. This means they are trained on the entire dataset at once. To incorporate new data, they typically need to be retrained from the beginning on the old data combined with the new data. This is computationally expensive and does not "automatically adjust" in the efficient, incremental way described.
No Native Incremental Update: While there are research techniques for incremental decision trees, they are not standard, robust, or easily implemented out-of-the-box like they are in XGBoost. Random Forest, by its nature of building independent trees, cannot easily update its ensemble without retraining.
C. Linear Discriminant Analysis (LDA)
Statistical Batch Method: LDA is a statistical method that calculates linear discriminants based on the mean and covariance of the entire training dataset. Like decision trees, its standard form is a batch learner.
Updating is Complex:To update an LDA model with new data, you would need to recalculate the overall mean and covariance matrices for the entire combined dataset. This is effectively a full retrain and does not constitute an automatic adjustment of parameters.
Key Distinction: Batch vs. Online Learning Batch Learning (A, B, C): Train on the entire dataset at once. To learn from new data, you must retrain on the full dataset (old data + new data).
Online/Incremental Learning (D - XGBoost): The model can be updated sequentially with new data, adjusting its parameters without needing the entire historical dataset present. This is the requirement specified in the question.

Valid References
XGBoost Documentation:
The official XGBoost documentation features a section on "Continual Training with Warm Start," which explicitly describes the process: "You can use warm start to continue training a model from the last iteration. This can be useful if you want to train a model for more iterations, or if you want to train a model on new data that has the same format as the old data."
Machine Learning Literature:
The concept of online learning is a well-established subfield. XGBoost's support for incremental learning is frequently cited as a key advantage over other ensemble methods like Random Forest for applications involving non-stationary data (data that changes over time), which is exactly the case with ongoing credit information updates.

A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?

A. n-grams

B. NER

C. TF-IDF

D. POS

A.   n-grams

Explanation:

Option Analysis
A. n-grams
N-grams are sequences of n words that appear together in text (e.g., “credit score,” “machine learning”).
They are directly used to build conceptual associations because they capture word co-occurrence and context.
For example, bigrams (n=2) show pairs of words that are conceptually linked.
✅ Best fit.
B. NER (Named Entity Recognition)
NER extracts entities such as names, organizations, dates, locations.
It identifies proper nouns but does not directly build associations between concepts.
❌ Not correct here.
C. TF-IDF (Term Frequency–Inverse Document Frequency)
TF-IDF measures the importance of a word in a document relative to a corpus.
It’s great for ranking keywords but does not capture associations between terms.
❌ Not correct.
D. POS (Part of Speech tagging)
POS tagging labels words with grammatical roles (noun, verb, adjective, etc.).
Useful for linguistic analysis, but it doesn’t establish conceptual associations.
❌ Not correct.

✅ Final Answer:
A. n-grams

📖 References:
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Jurafsky, D., & Martin, J. H. (2023). Speech and Language Processing (3rd ed. draft). Stanford University.

Page 2 out of 9 Pages