Unquestionably, we are in an age of data. Modern technology generates an increasing amount of data, which can be processed to extract knowledge within it. Technologies, tools, and algorithms such as neural networks, decision trees, or vector support machines take on particular significance today, given they can be exploited due to the volume of data generated daily. Latest Gartner reports assuring that AI will be a key technology with the greatest projection during 2022.
However, the development of an AI-based system is different from any other. It involves cost, risk, and effort that is difficult to quantify in the initial conceptualization phase. In this situation, the design and execution of a PoC (Proof Of Concept) can be the perfect ally.
Often conducting a PoC is a process that seeks to provide a small-scale prototype capable of validating the viability of a project or product, both technically and economically. Accordingly, it is an instrument widely used in different industries and sectors.
However, developing a PoC for a project based on AI techniques, where data management from different categories (structured, unstructured, and semi-structured) is crucial, it cannot be approached in traditional ways. For numerous reasons, some are complex and have risks involved while working with entirely new data and its potential completeness, credibility, accuracy, consistency, and interpretability issues.
Furthermore, it is essential to consider that scalability in an AI project cannot be achieved linearly. Processing 50 times more data will not make the system 50 times more intelligent. Additionally, increasing the computational capacity of the machine that processes it will also not fundamentally make the algorithm run 20 times faster.
At AI Shepherds, we have designed a specific cyclical and incremental work model for developing a PoC in AI projects, combining different work methods and frameworks alongside our experience. Accordingly, all stages and deliverables defined by CRISP-DM (Cross Industry Standard Process for Data Mining) have been analyzed and considered a professional, standardized, and universally practiced model to tackle data mining projects. In order to guarantee a rapid generation of value and ensure a correct alignment of technical development and business objectives, concepts of agile methodologies have been used, such as Scrum and Kanban. Figure 1 shows a diagram of our method of working.
Figure 1. General Framework for AI PoC
The primary phases carried out by AI Shepherds for developing a PoC are described in detail below.
Business understanding – Assuring that the development will meet the objectives while avoiding revisions. Business-level goals are determined by analyzing a company to determine the scope of issues to be solved. By utilizing objectives, they are minimized into small requirements such as user stories. Making rise to a working stack or “backlog,” selecting only those that contribute the greatest value in each iteration that could be completed in 2 weeks. Resources must be inventoried at the hardware, personnel, and data/knowledge level. While studying the various kinds of risk, an estimate of costs/benefits is performed, and a project plan is developed.
Data Understanding – Essential for any AI process based on data exploitation, it is necessary to understand the data model and procedural exploration to analyze. It starts with collecting the available data from the different information sources and documenting their nature, volume, and relationships. Subsequently, a detailed exploration process is performed, concluding that the report generated indicates the typology of each attribute, its meaning, range of value observed, known correlations, and any detected hypothesis for later analysis. Conclusively, the quality of the data is analyzed considering the required standards and generating a report with the results received.
Data preparation – The original state of the data requires a series of transformations and preparations to be used during the PoC. Beginning by selecting the data set to be used, the chosen subset should contain the largest possible number of characteristics and ideally the most representative ones. It is essential with the different AI algorithms to be tested to select the most appropriate one regarding the results attained. A data cleaning process must be applied to eliminate noise, detect, correct, or disregard possible errors or faults. The result achieved can make it necessary to reconsider the dataset selected for the PoC. The chosen data set could be manipulated by dataset transformation techniques, including the possibility of combining, transposing, eliminating, or dividing records to adapt them to the desired processing mode.
Modeling – Starting the process of modeling and developing the core of the PoC is important to understand the problem from a business level. This begins by selecting the most appropriate AI technique and algorithms to solve the problem and defining the test dataset used for validation. The model is then developed and establishes the appropriate parameterization according to the nature of the data and objectives by documenting the results with test data collection. If different parameterizations correspond, repeat the previous step by selecting the model showing the most optimal results.
Evaluation – The results obtained with the selected model are analyzed, verifying if they meet the business objectives and the success criteria established in the initial phase. Also, check if, during the development, the goals set at the beginning were able to arise or vary, potentially implying a new iteration in the process. Additionally, the documentation generated and experiences collected were reviewed to identify errors.
Deployment – If required, the PoC will deploy for use by the customer. The deployment plan created is where the results and information provided are summarized. The possible deployment options and different integration routes with other existing systems are assessed by identifying potential difficulties during the commissioning process. A maintenance plan is implemented in which the periodicity of the monitoring is established, the criteria that will alert about necessary adjustments, or the impossibility of using the PoC further due to data obsolescence, business objective alterations, or other reasonings. Conclusively, the generated report contains conclusions gathered during the process, analyzing the results achieved and recognizing improvement capabilities.
Since the process is conducted under an agile development perspective, the objective signifies rapid delivery of value, including continuous improvement. These 6 phases can be executed repeatedly, in a loop, until the implementation of all requirements marked the origin or before the appearance of new needs.
The beginning of any project, especially those related to software and particularly those related to data management and application of AI techniques, is fraught with risks and uncertainties. However, the correct planning, design, and implementation of a specific PoC can be crucial in solving these problems and starting a larger project with the best guarantees.