A Glossary of AI Jargon

This AI glossary offers concise definitions of terms crucial to understanding artificial intelligence and its sub-disciplines. It complements related glossaries like those in computer science, robotics, and machine vision. It serves as a reference hub for understanding key AI concepts to guide beginners and experts in navigating and communicating within the AI community.

Activation Function

A mathematical function used in artificial neural networks (ANNs). It determines whether a neuron in the network should be activated (i.e., fire) or not, based on the weighted sum of its inputs. Activation functions introduce non-linearity into the network, enabling it to learn complex patterns and relationships in the data.


An innovative technique enabling pre-trained AI models to quickly adapt to new tasks without extensive retraining, saving time, money, and resources. These modules efficiently repurpose existing models for various tasks in fields such as natural language processing, computer vision, and robotics.


A step-by-step procedure or set of rules followed to solve a particular problem or perform a specific task. In essence, an algorithm is a finite sequence of well-defined instructions that takes some input, processes it, and produces an output. 


It is the process of analyzing data to gain insights, make informed decisions, and solve problems. It involves the use of various computational techniques, algorithms, and tools to extract meaningful patterns, trends, and relationships from large and complex datasets.


It involves enriching data by adding contextual information to facilitate the comprehension and learning of machine learning algorithms. This process aids algorithms in understanding and leveraging data effectively for improved performance in various tasks.

Application Programming Interface (API)

It is a set of rules, protocols, and tools that allows different software applications to communicate with each other. It defines the methods and data formats that applications can use to request and exchange information, perform specific tasks, or access services provided by another software component, platform, or system.

Artificial General Intelligence (AGI)

It embodies AI systems with diverse cognitive abilities akin to humans, facilitating learning, reasoning, adaptation to novel scenarios, and creative problem-solving across multiple domains. AGI aims for broader applicability and versatility in tackling complex challenges.

Artificial Intelligence (AI)

 AI is a technology that enables computers and digital devices to think and act like humans. It helps them learn from data, recognize patterns, and make decisions without being explicitly programmed for every single task. Just like how we learn from our experiences, AI learns from the information it's given. It's used in everyday tech like voice assistants, recommendation systems, and self-driving cars.

Artificial Neural Network (ANN)

An artificial neural network is a computational model inspired by the structure and function of biological neural networks in the human brain. It consists of interconnected nodes, called neurons or artificial neurons, organized into layers. Each neuron receives input signals, processes them using an activation function, and produces an output signal, which may be passed to neurons in the next layer.

Auto Classification

The automated process of categorizing or labeling data into predefined classes or categories without human intervention. This task is commonly performed using machine learning techniques, particularly supervised learning algorithms, where the algorithm learns from labeled training data to predict the class labels of new, unseen instances.

Area Under the Curve (AUC)

Area Under the Curve, is a metric used to evaluate the performance of a binary classification model. The ROC curve (Receiver Operating Characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) for different threshold values. The AUC represents the area under this ROC curve.


It is short for "backward propagation of errors" and is a fundamental algorithm used in the training of artificial neural networks (ANNs). It is a supervised learning algorithm that adjusts the weights of connections between neurons in the network in order to minimize the difference between the predicted output and the true output for a given set of training examples.

Batch Normalization (BatchNorm)

A technique used in deep learning, particularly in neural networks, to improve the training stability and convergence speed.


It stands for Bidirectional Encoder Representations from Transformers and is a pre-trained natural language processing (NLP). It represents a significant advancement in the field of deep learning for NLP tasks.

Binary Tree

A hierarchical data structure, composed of nodes, where each node has at most two children, referred to as the left child and the right child. This structure resembles a tree, where each node can have zero, one, or two branches.

Black Box AI

Artificial intelligence (AI) systems or models whose internal workings are opaque or not easily interpretable by humans. In other words, users may understand what inputs are given to the system and what outputs it produces, but they lack visibility into how the AI arrives at those outputs.


An intuitive interface that enables users to pose questions and obtain responses, ranging from predefined replies to dynamic interactions powered by advanced AI technology. This spectrum of functionality spans from basic response systems to sophisticated conversational AI, streamlining issue resolution and enhancing user experience.

Watch demo:

Make your products speak with Smart Chatbot

Generative AI Powered Smart Chatbot



A chat interface powered by GPT-3.5 leverages a state-of-the-art language model created by OpenAI. Trained on vast internet text data, it excels in various natural language tasks like translation, summarization, and question answering, showcasing its versatility and effectiveness.

Categorical Data

It is a type of data that represents categories or groups and cannot be measured on a numerical scale. Instead of having a numeric value, categorical data consists of labels or descriptions that classify observations into distinct groups or categories based on qualitative attributes.


It refers to the task of categorizing or labeling data into predefined classes or categories based on their attributes or features. Classification is a supervised machine learning process of classifying a given set of input data into classes based on one or more variables.

Watch Demo

Automating Mortgage Document Indexing & Classification with RPA OCR

Cluster Analysis

A technique used in data mining and machine learning to group similar objects or data points into clusters or clusters based on their characteristics or features. The goal of cluster analysis is to partition a dataset into groups such that objects within the same group are more similar to each other than to those in other groups.

Convolutional Neural Network (CNN)

A class of deep neural networks commonly used in computer vision tasks, such as image classification, object detection, and image segmentation. CNNs are specifically designed to process and analyze visual data efficiently by leveraging the principles of convolution, pooling, and hierarchical feature learning.

Cognitive Science

It refers to the interdisciplinary study of how humans learn, reason, and solve problems, with the aim of informing the design and development of machine learning algorithms and models that mimic or augment human-like cognitive abilities.

View this page for information

Computer Vision

Computer vision focuses on enabling computers to interpret and understand visual information from the real world. It aims to replicate the human visual system's ability to perceive, analyze, and interpret images and videos, allowing computers to extract meaningful insights and make decisions based on visual input.

Constructed Language

A programming language that has been deliberately designed and created by programmers or language designers. These languages are purposefully constructed to provide specific features, functionalities, and syntax for expressing algorithms, implementing software systems, and interacting with computers.


A fundamental aspect of control systems, playing a pivotal role in various control problems, including stabilizing unstable systems through feedback or achieving optimal control. It is closely intertwined with observability, forming dual aspects of the same problem.

Conversational AI

A subfield of AI dedicated to creating systems capable of comprehending and generating human-like language, enabling seamless interactive exchanges. For instance, chatbots adeptly handle customer queries, mimicking natural human conversation for enhanced user engagement.

Read Case Study: Auto Manufacturer Boosts Lead Conversion by 25% with Conversational AI

Confusion Matrix

A tabular representation that summarizes the performance of a classification model by comparing its predicted labels with the actual labels in a dataset. It is a widely used tool in machine learning and evaluation metrics to assess the accuracy and effectiveness of classification algorithms.


A large and structured collection of text or spoken language data that is used for linguistic analysis, research, or machine learning tasks. A corpus serves as a representative sample of language usage and provides researchers and practitioners with valuable insights into language patterns, usage, and variability.


A novel generative model developed by OpenAI that generates images from textual descriptions. It builds upon the success of OpenAI's GPT (Generative Pre-trained Transformer) models, which are adept at natural language understanding and generation. DALL·E extends this capability to the domain of visual creativity.

Data Augmentation

Data Augmentation enlarges and diversifies training sets by generating modified versions of existing data. It encompasses adjustments like flipping, resizing, or altering image brightness to enrich datasets, mitigating overfitting and improving model generalization.

Data Ingestion

The process of collecting and transferring data from multiple sources into a centralized storage or processing system. It's a critical step in data processing, allowing organizations to aggregate data for analysis and decision-making.

Data Labelling

The process of assigning meaningful labels or tags to raw data to make it understandable, usable, and machine-readable is called data labelling. It is a crucial step in preparing data for training machine learning models, as labeled data serves as input for supervised learning algorithms.

Data Mining

The process of discovering patterns, relationships, and insights from large datasets using computational techniques to make informed decisions. It involves extracting valuable information from raw data using computational techniques like clustering, classification, and regression.

Data Validation

The process of ensuring that data is accurate, consistent, and reliable before it is used for analysis or stored in a database. It involves checking data for errors, anomalies, and inconsistencies to ensure its quality and integrity. 

Data Science

An interdisciplinary field that combines domain knowledge, programming skills, and statistical and computational techniques to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large volumes of data to uncover patterns, trends, and relationships that can aid decision-making and solve complex problems.

Watch Video: How Data Sciences pre-empt policy lapsation by analyzing key data points.

Deep Learning

It is a subset of ML, employs multi-layered neural networks to glean insights from data. For instance, a deep learning model can discern objects in images by traversing through layers of neural networks, showcasing its capability in complex pattern recognition tasks.

Read Whitepaper: Deep Learning and Natural Language Processing

Decision Tree

It is a supervised machine learning algorithm used for classification and regression tasks. It works by recursively partitioning the input space (or feature space) into smaller regions, based on the values of input features, to make predictions or decisions.


A learned representation of words, phrases, or entities in a continuous vector space. Embeddings capture semantic relationships and contextual information, allowing machine learning models to better understand and process textual data.

Ensemble Learning

A machine learning technique that combines the predictions of multiple individual models (learners) to improve overall performance. Instead of relying on a single model, ensemble methods leverage the diversity of multiple models to produce more accurate and robust predictions.


A data integration process which is used to collect, clean, and load data from various sources into a target destination like a data warehouse. It's essential to consolidate data from multiple sources for analysis and decision-making in business intelligence and analytics.

Explainable AI

Explainability involves methods that render AI model decisions and predictions comprehensible to humans, enhancing transparency and trust in the technology's outcomes. These techniques facilitate understanding of the underlying reasoning behind AI-driven insights and actions.

Extractive Summarization

It is a text summarization technique that involves selecting and extracting key sentences or passages from a document to create a concise summary. Instead of generating new sentences, extractive summarization identifies the most important information from the original text and presents it in a condensed form.


Refers to one complete pass through the entire training dataset during the training phase of a model. During each epoch, the model iteratively updates its parameters (weights and biases) based on the training data to minimize the loss function and improve its performance.

Expert System

A computer program that harnesses artificial intelligence (AI) to mimic the judgment and decision-making skills of human experts or organizations specialized in a specific field. It's like having a virtual expert on your computer, capable of offering advice and solutions based on its knowledge and experience in a particular domain.


Extraction entails generative models' capacity to analyze vast datasets, identifying pertinent patterns, trends, and specific information for insightful analysis.

Watch Demo: TruAI-Powered Solution for Annual Report Extraction

Federated Learning

Allows training models across decentralized edge devices or servers with local data, without exchanging it. Instead of sending raw data to a central server, training occurs locally on each device or server, with only model updates shared centrally. This preserves data privacy and reduces communication overhead.

Foundation Model

It encompasses a diverse range of AI models, including large language, computer vision, and reinforcement learning models. Termed "foundation" models, they provide a base for building applications across various domains, accommodating a wide array of use cases with their versatile capabilities.


Fine-tuning involves adapting a pre-trained model to a specific task by training it on a smaller dataset. For instance, an image classification model trained on intersections can be fine-tuned to detect cars running red lights. 


The F-score (also called F1-score) is a metric used to evaluate the performance of a classification model. It is the harmonic mean of precision and recall and provides a single score that balances both measures.

Generative AI

Generative AI models generate new data by discerning patterns in input or training data. For instance, crafting an original short story by analyzing published ones showcases their ability to create novel content based on learned patterns.

Watch Demo: Claims Processing with Generative AI

Generative Model

A type of statistical model used in machine learning that learns to generate new data samples that are similar to those in the training data, generative models focus on understanding the underlying distribution of the data and can be used to generate new samples from that distribution.

Generative Adversarial Network (GANs)

GANs are neural networks adept at producing novel data closely resembling the training data, offering powerful capabilities in generating never-before-seen content.

Generative Pre-Trained Transformer

GPT models are neural networks trained on vast datasets without supervision, proficient in generating text with remarkable coherence and diversity.

GPT - 3

The third iteration in the GPT-n series, boasts 175 billion parameters for prediction. Chat-GPT utilizes GPT-3.5, a subsequent version of this model, for enhanced conversational capabilities.


It marks a milestone in OpenAI's deep learning advancements, representing a leap in scaling capabilities. It's the inaugural GPT model to embrace multimodal inputs, handling both images and text to generate textual outputs, expanding its versatility and applications.

Hybrid AI

The integration of multiple artificial intelligence techniques or approaches to solve complex problems or tasks is called Hybrid AI. It combines different AI methodologies, such as symbolic AI, machine learning, deep learning, evolutionary algorithms, and expert systems, to leverage their respective strengths and compensate for their weaknesses.


A problem-solving strategy or rule of thumb that is used to find a solution more efficiently. Although it may not guarantee an optimal solution. Heuristics are often employed in situations where finding an optimal solution is computationally expensive or impractical.


Parameters that are set prior to training a model and govern the learning process. Unlike model parameters, which are learned during training, hyperparameters are not updated based on the training data but are instead specified by the user.

IDP-Intelligent Document Processing

Automating the process of manual data entry from paper-based documents or document images using artificial intelligence (AI) and machine learning (ML) techniques. It extracts relevant information from unstructured documents, such as invoices, receipts, contracts, forms, and emails, thereby reducing the need for manual data entry and increasing efficiency.

Click here for more information

Image Recognition

A computer vision task that involves identifying and categorizing objects or patterns within digital images. The goal of image recognition is to develop algorithms and models that can accurately recognize and classify objects or scenes in images, similar to how the human visual system works.

Read Blog: AI/ML Based Object Recognition

Intelligent Agent

Autonomous software entities that can perceive their environment, make decisions, and take actions to achieve specific goals or objectives. These agents are equipped with artificial intelligence (AI) capabilities, such as reasoning, learning, and problem-solving, which enable them to interact with their environment and adapt to changing conditions.

Jaccard Similarity

Jaccard similarity is often used in natural language processing tasks such as text mining and information retrieval to compare the similarity between two documents or sets of words.

K-Nearest Neighbors

A non-parametric and instance-based learning approach widely used for classification and regression tasks in machine learning. It does not make assumptions about the underlying distribution of the data and instead relies on the data itself during the prediction phase.

K-Means Clustering

A popular unsupervised machine learning algorithm used for clustering data points into groups or clusters based on similarity. It aims to partition the data into K clusters, where each data point belongs to the cluster with the nearest mean (centroid).

Labelled Data

A dataset where each data point is associated with one or more labels or categories that indicate its class or category is labeled data. In supervised learning, labeled data is used to train machine learning models to make predictions or classifications.

Large Language Model (LLM)

LLM Models like BERT, GPT-2, GPT-3, and the latest GPT-4 are deep learning models trained extensively on diverse datasets for natural language tasks. They vary in size, task versatility, and training data, enabling applications ranging from coding assistance to chatbots, reflecting ongoing advancements in AI.


A dataset where each data point is associated with one or more labels or categories that indicate its class or category is labeled data. In supervised learning, labeled data is used to train machine learning models to make predictions or classifications. 

Learn about TruBot Designer, a low-code bot designer 

Logistic Regression

A supervised learning algorithm that makes use of logistic functions to predict the probability of a binary outcome. The model learns coefficients to fit the data during training, making it efficient and interpretable for binary classification tasks.

Machine Learning (ML)

Machine learning is a branch of AI focusing on algorithms enabling machines to enhance performance through experience. For instance, predicting customer churn based on past behavior showcases ML's ability to learn and improve over time, driving better decision-making.

Learn more: AI at Core

Machine Learning Models

They are computational algorithms or mathematical representations that are trained on data to perform specific tasks without being explicitly programmed to do so. These models learn patterns and relationships from the input data and use that knowledge to make predictions or decisions on new, unseen data.

Mean Absolute Error (MAE)

Mean Absolute Error, is a metric used to evaluate the performance of a regression model. MAE measures the average absolute difference between the predicted values and the actual values in a dataset.

Meta-Learning Algorithms

It is designed to generalize across multiple learning tasks or datasets. Rather than focusing on solving a specific task, these algorithms aim to discover general patterns or principles across tasks, enabling them to adapt quickly to new tasks with minimal data.


A mathematical representation or approximation of a real-world process or phenomenon. It is created based on observed data to make predictions, gain insights, or understand underlying patterns in the data.

Model Parameter

A configuration variable that is internal to the model and is learned from the training data during the model training process. These parameters define the behavior and characteristics of the model and directly affect its predictions.

Mean Squared Error (MSE)

Mean Squared Error is a commonly used metric for evaluating the performance of a regression model. MSE measures the average squared difference between the predicted values and the actual values in a dataset.

Naïve Bayes

A probabilistic classification algorithm based on Bayes' theorem with an assumption of independence between features. It is commonly used for text classification tasks, such as spam detection, sentiment analysis, and document categorization.

Natural Language Generation (NLG)

An AI subfield that creates written or spoken language in a human-like manner.

Natural Language Processing (NLP)

An AI subfield focused on programming computers to handle large language datasets, converting unstructured text into a structured format.

Read Blog: Accelerate Product Reviews with Deep Learning & NLP

Natural Language Understanding (NLU)

A subset of NLP that interprets text to extract semantic meaning, encompassing context, sentiment, intent, among others.

Natural Language Programming

A programming paradigm that allows developers to write code using natural language, such as English, instead of traditional programming languages like Python or Java. With NLP, developers can express programming concepts, commands, and instructions in a human-readable format, making it more accessible to non-programmers and reducing the learning curve associated with traditional coding languages.

Natural Language Query

It is a query that allows users to ask questions using everyday language instead of using technical data query languages like SQL or code. It makes it easier for users to interact with data by typing questions or commands in a way that feels natural and conversational.

Naive Bayes Classifier

A probabilistic machine learning algorithm commonly used for classification tasks. It is based on Bayes' theorem and assumes that the features in the dataset are independent of each other, hence the term "naive."

Neural Network

A neural network, mimicking the brain's structure, comprises interconnected nodes or "neurons." For instance, recognizing handwritten digits showcases its capability with high accuracy.

Read Whitepaper: Artificial Neural Network

No Code

No-code simplifies application development, enabling design and usage without coding or programming language expertise.

Click here for more details

Numeric Data 

The data that consists of numerical values, which can be represented and processed as numbers. In the context of data analysis and statistics, numeric data includes quantitative information that can be measured or counted.


OpenAI, the developer of ChatGPT, is dedicated to responsibly advancing friendly AI. Their GPT-3 model exemplifies their commitment, being among the largest and most potent language models for natural language processing tasks.

Click here for more details


Fine-tuning adjusts a model's parameters to minimize discrepancies between predictions and true values. For instance, a neural network's parameters are optimized using gradient descent to decrease prediction errors.


A formal representation of knowledge that defines the concepts, relationships, and properties within a specific domain or subject area. It provides a structured and organized way to represent knowledge, allowing for clear and precise communication and reasoning within that domain.

Know more: TruCap+ Ontologies

Ordinal Data

A type of categorical data with a natural order or ranking between the categories, but the numerical difference between the categories is not meaningful or consistent. Ordinal data represents categories that can be ordered or ranked but do not have a fixed numerical value associated with each category.


Overfitting arises when a model, excessively complex, excels on training data but falters on new data. For instance, memorizing training data leads to poor performance on unseen data due to a lack of learning generalized patterns.


It is the process of analyzing a string of symbols according to the rules of a formal grammar to determine its syntactic structure. In computer science, parsing is commonly used in the context of programming languages, where it involves breaking down source code into its constituent parts (such as tokens or syntactic elements) to understand its structure and meaning.

Watch Demo: Generative AI for Log Analysis, no manual parsing, achieve real-time results.


A variable or value that is used to define or control the behavior of a function, algorithm, process, or system. Parameters play a crucial role in programming, mathematics, science, and engineering, as they allow for the customization and flexibility of various systems and processes.

Pattern Recognition

The process of identifying patterns, regularities, or structures within data or observations and making sense of them. It is a fundamental aspect of human cognition and is also a key area of study in various fields such as computer science, machine learning, psychology, and neuroscience.

Principal Component Analysis (PCA)

It is a widely used dimensionality reduction technique in machine learning and data analysis. It simplifies complex datasets by reducing the number of features while retaining most of the original information.

Predictive Analytics

It utilizes statistical algorithms, machine learning techniques, and data mining methods to analyze current and historical data in order to make predictions about future events or outcomes. It involves extracting patterns, trends, and relationships from data to forecast or anticipate future scenarios and make informed decisions.

Know more: Predictive Analytics Using Artificial Intelligence - Whitepaper


It involves refining, interpreting, and enhancing the results or outputs generated from the analysis. This typically includes tasks like visualizing the results to aid interpretation, refining the results to improve accuracy, and validating the results to ensure reliability. Post-processing is crucial for extracting meaningful insights and making informed decisions based on the analysis.


It is a measure of the accuracy of the positive predictions made by a model. It quantifies the proportion of correctly predicted positive instances among all instances that the model has predicted as positive.

Prescriptive Analytics

An advanced form of analytics that goes beyond descriptive and predictive analytics to provide recommendations and insights on the best course of action to achieve a specific goal or outcome. It uses a combination of data analysis, mathematical modeling, optimization techniques, and business rules to evaluate different decision options and prescribe the optimal or most effective actions to take.


It entails preparatory steps and techniques applied to raw data before analysis or modeling. Tasks include cleaning, transforming, and integrating data to enhance its quality and usability. This involves error correction, variable scaling, feature selection, and data integration from multiple sources. Preprocessing ensures that data is clean, consistent, and suitably prepared for analysis or modeling tasks.

Read Blog: Top pre-processing must haves for Intelligent Data Capture software


A stimulus or cue that is provided to elicit a specific response or action. In various contexts, prompts are used to guide behavior, facilitate communication, or trigger a particular reaction.

Prompt Engineering

It is vital for guiding the outputs of LLMs, which lack full controllability due to their complex algorithms. It involves crafting inputs or prompts to elicit meaningful responses. For example, in copywriting applications, providing templates and wizards directs the generation of tailored content. This ensures LLMs align with user expectations and intended tasks. Despite their capabilities, prompt engineering optimizes LLM performance across applications and domains.

Quantum Computing

An advanced computing paradigm that leverages the principles of quantum mechanics to perform computations using quantum bits or qubits, which can represent and process information in multiple states simultaneously. Unlike classical bits, which can only be in a state of 0 or 1, qubits can exist in a superposition of both states. This enables quantum computers to perform parallel computations and solve certain problems much faster than classical computers.

Random Forest

A popular ensemble learning technique used in machine learning for classification and regression tasks. It operates by constructing multiple decision trees during training and outputting the mode of the classes (for classification) or the mean prediction (for regression) of the individual trees.


AI reasoning involves problem-solving, critical thinking, and knowledge creation by analyzing available information. This process enables artificial intelligence systems to make well-informed decisions across various tasks and domains.


It also known as sensitivity or true positive rate, is a metric used to assess the performance of a classification model, especially in situations where correctly identifying positive instances is crucial. It quantifies the proportion of actual positive instances (true positives) correctly identified by the model.


A statistical method used in machine learning and data analysis to model the relationship between a dependent variable (target) and one or more independent variables (features). It aims to predict the value of the dependent variable based on the values of the independent variables.

Reinforcement Learning (RL)

It is a model-learning approach where decisions are made through interaction with the environment, receiving feedback through rewards or penalties. GPT utilizes reinforcement learning by incorporating human feedback. Tuning GPT-3 involved human annotators providing examples and ranking model outputs to refine its behavior.

Root Mean Square Error (RMSE)

A commonly used metric to evaluate the performance of a regression model. RMSE measures the average magnitude of the errors (the differences between predicted values and actual values) made by the model. It is calculated as the square root of the mean of the squared differences between predicted and actual values.

Recurrent Neural Network (RNN)

RNNs are neural networks specialized in processing sequential data. With cyclic connections, they model dynamic temporal behavior, making them ideal for sequences like time series and natural language. RNNs find applications in speech recognition, language modeling, sentiment analysis, and more.

Sentiment Analysis

A natural language processing (NLP) technique used to determine the sentiment or opinion expressed in a piece of text. It involves analyzing the textual content to identify and categorize the sentiment as positive, negative, or neutral.

Simulated Annealing (SA)

A probabilistic optimization algorithm inspired by the annealing process in metallurgy. It is used to find approximate solutions to optimization problems by simulating the physical process of heating and cooling a material to reach a state of minimum energy.

Speech Recognition

A technology that enables a computer or device to transcribe spoken language into text. It involves the process of analyzing and interpreting audio signals containing spoken words and converting them into written text.

Read Case Study: Revolutionizing Customer Engagement with Datamatics Speech Analytics

Structured Data

Data that is organized and formatted in a way that is easily understandable by both humans and machines. In structured data, information is organized into a predefined format with well-defined fields, data types, and relationships between entities.


Data that is organized and formatted in a way that is easily understandable by both humans and machines. In structured data, information is organized into a predefined format with well-defined fields, data types, and relationships between entities.

Read Whitepaper: Deep Learning Framework for Accurate Text Summarization

Supervised Learning

It trains a model on labeled data to predict outcomes for new, unseen data. For instance, a supervised algorithm classifies handwritten digits using labeled training data to assign images to specific categories.

Support Vector Machines

A type of supervised learning algorithm used in machine learning to solve classification and regression tasks in which the data is linearly separable or can be transformed into a higher-dimensional space where it is linearly separable.

Synthetic Data

Artificially generated data that mimics the statistical properties and characteristics of real-world data but is not derived from actual observations or measurements. Synthetic data is created using algorithms, models, or simulations to generate data points that resemble the patterns, distributions, and relationships present in the original data.

Supervised Machine Leraning Models

Algorithms that learn patterns and relationships from labeled training data in order to make predictions or decisions about new, unseen data. In supervised learning, each example in the training dataset is accompanied by a target variable or label, which represents the correct answer or outcome that the model is expected to predict.

Read Blog: The Future of Machine Learning in 2022

Testing Data

A portion of the dataset that is used to evaluate the performance and generalization ability of a machine learning model after it has been trained on the training data. The purpose of testing data is to assess how well the trained model can make predictions or classifications on new, unseen data.

Know more: Digital Testing

Training Data

The portion of the dataset used to train a machine learning model. It consists of input features and corresponding target variables (labels) that are used to teach the model to make predictions or classifications based on patterns and relationships in the data.


A token refers to a single, meaningful unit of text, typically a word or a punctuation symbol.

Know more: Token-Based AR/VR Use Cases for Event Management


Tokenization involves breaking text into individual words or subwords for input into a language model. For example, tokenizing the sentence "I am ChatGPT" results in tokens like "I," "am," "Chat," "G," and "PT."

Transfer Learning

A machine learning technique where a model trained on one task or dataset is repurposed or transferred to another related task or dataset. Instead of starting the learning process from scratch, transfer learning leverages the knowledge or representations learned from the source task to improve the performance of the model on the target task.


Sequential neural networks process sequential data like text. For instance, models such as ChatGPT utilize the transformer architecture for natural language processing tasks.

Turing Test

A measure of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human.

Time Series

A sequence of data points collected or recorded at successive time intervals. Each data point in a time series is associated with a specific timestamp or time period, which represents when the observation was made.

Read Case Study: A large South-East Asian Central Bank uses R, .Net, and Econometric Time-Series Forecasting

Unstructured Data

Unstructured data lacks a predefined model or structure, posing challenges in collection, processing, and analysis.

Read Blog: Unlocking Value from Unstructured Data with AI and Analytics

TruCap+ IDP: Accurate Data Capture and High STP for Unstructured Data

Unsupervised Learning

Unsupervised learning is a machine learning approach where a model is trained on unlabeled data to identify patterns or features independently. For instance, an unsupervised learning algorithm can group similar images of handwritten digits based solely on their visual characteristics, without requiring labeled examples for guidance.

Unsupervised Machine Learning Models

Algorithms used to analyze and interpret datasets without labeled target variables. It involves extracting patterns, relationships, and structures from unlabeled data. These models aim to uncover hidden insights, group similar data points, or reduce the dimensionality of the data without explicit guidance from labeled examples.

Validation Data

It is a portion of the dataset that is used to evaluate the performance of a machine learning model during the training process. Validation data serves as an independent dataset that is separate from the training data and is used to assess how well the model generalizes to new, unseen data.

Variational Autoencoder

A type of generative model used in unsupervised machine learning and deep learning. It is a variant of the traditional autoencoder architecture, which consists of an encoder and a decoder neural network.

Voice Recognition

A process that enables computers or devices to interpret and understand human speech and convert it into text. Voice recognition systems analyze spoken language input and transcribe it into written text, allowing users to interact with devices, applications, and services using spoken commands or natural language.

Read Blog: Voice Recognition Enhances RPA with AI for Unstructured Data


OpenAI's Whisper is an AI system designed for automatic speech recognition (ASR), converting spoken language into text.


It refers to the output or result produced by an AI model or algorithm. Yield represents what the AI system generates in response to input data or stimuli.


Z-score is a measure of how many standard deviations a data point is from the mean of a dataset. It's used in various AI applications for outlier detection and normalization.

Zero-Shot Learning

Zero-shot learning enables a machine learning model to identify and categorize novel concepts without prior labeled examples, showcasing its ability to generalize knowledge and adapt to unforeseen scenarios.

Zero-to-One Problem

The zero-to-one problem denotes the challenge of initiating solutions for complex problems, often presenting a substantial hurdle compared to subsequent advancements.

XAI (Explainable Artificial Intelligence)

XAI refers to the methods and techniques used to make artificial intelligence systems more transparent and understandable to humans. It aims to provide insights into how AI models make decisions, especially in critical applications such as healthcare and finance, where interpretability is crucial.


(eXtreme Gradient Boosting) is a popular and efficient open-source implementation of gradient boosted decision trees designed for speed and performance. It is widely used in machine learning competitions and is known for its accuracy and computational efficiency.

Cognitive Capture

The process whereby technology utilizes artificial intelligence, machine learning, and other advanced technologies to interpret and replicate human cognitive functions like perception, attention, memory, and decision-making. This term is commonly used in contexts aimed at enhancing human-computer interaction, refining user experience, and advancing intelligent systems.

RAG Capabilities

It refers to the capabilities of a "Retrieval-Augmented Generation" (RAG) model. RAG is a framework designed to improve the performance of language models by integrating them with information retrieval systems.RAG models improve accuracy and relevance by integrating external, up-to-date information into the response generation process.

Download the          AI-Pedia glossary