Laptop computer on a wooden table with a modern look

OneCloud’s mission is to empower customers to Integrate enterprise systems, Automate the exchange of critical data, and Accelerate business processes that rely on accurate and timely data.  One of the common needs when sharing data between systems is to validate that the data in both systems reconcile.  Validating the data between these different systems is often an onerous task for a number of reasons.  Many of these systems have different data models whether it be the number/usage of segments such as Natural Account, Legal Entity, Customer, Product, Line of Business, or Functional Area or the level of grain within a given segment.  

The level of grain challenge is easily illustrated by comparing systems of record such as the General Ledger with Performance Management applications.  The General Ledger (G/L) system may contain a number of detailed natural accounts whereas the consolidation or financial planning system may summarize a collection of natural accounts into a single reporting or planning account.  

A common example is the Cash accounts within the Balance Sheet.  In the G/L system, there may be a natural account for each financial institution (e.g., Bank of America, Chase) in which an account is held.  In the consolidation or financial planning system, this level of detail is unnecessary and as a result, the collection of natural Cash accounts is represented by a single segment value of Cash.  When data is loaded from the General Ledger to either of these systems, the cash balances across all of the individual natural accounts are accumulated and loaded to the reporting segment value of Cash.

Another challenge with data exchange between systems is when the data models vary or there is a multi-segment relationship to determine the performance management or analytics view of the data.  Let’s consider an example that is common in manufacturing.  Cost centers capture costs associated with human capital - people.  In a manufacturing facility, the employee wage expenses are directly attributable to cost of manufacturing the products and are therefore recognized cost of good expense (sometimes called above the line expense).  In the same manufacturing facility, there are functions (e.g., finance, IT, management) whose costs are not able to be directly related to the manufacture of a product, and as such their costs are recognized as operating expenses.  In this instance, the relationship between natural account (employee wage expense) and the cost center (finance, IT, manufacturing) needs to be evaluated to determine the appropriate segment value in the downstream system.  

In both of these examples, we have functional transformations that need to be applied when exchanging data between systems.  Summarization introduces the potential for a variance between the systems.  Is the data mapped (summarized) correctly?  Have new natural segment values been added that are not accounted for in the mapping?  Does the logic for the multi-segment relationship capture new segment values or exceptions?  

These are just a few examples of reasons for variances between systems.  Realistically, any data movement, even within a system, can benefit from a data validation process being performed to ensure data quality.  Let’s explore how OneCloud can streamline the process and accelerate the resolution time.  

Data validation has four primary components:

  1) Data Extracts
  2) Data Alignment/Harmonization
  3) Variance Calculation
  4) Reporting/Auditability

Data extracts are the foundation of data validation.  The term data extract can invoke the idea that granular data needs to be extracted from each system.  While this is a valid definition, it is not the only definition.  Variance analysis can take on multiple forms:

  - Base level/input data
  - Calculated data
  - Aggregated/summarized data

OneCloud supports each of these.  When we refer to data extracts, we also include system reports that extract calculated and/or aggregated data.  Depending on the enterprise system, OneCloud can either connect directly to the system to retrieve data or natively process data or report extracts that may contain base, calculated, or aggregated data or some combination thereof.  

Once we identify the data extracts, we need to determine if data alignment is required in order to perform variance analysis.  Harmonizing the data is the process of making the file layout consistent across the data sets and transforming the segment values to align where possible.  This is necessary to enable data to be compared and variances to be calculated.  The powerful transformation capabilities of OneCloud provide the ability to easily and efficiently align data and create a combined data set upon which variances can be calculated.  

The following two data sets represent sample data generated as part of an allocation routine.  The first data set contains the data that was allocated to various segments while the second contains data for segments from which data was allocated.  

Allocation To (Inbound)

Allocation In

Allocation Out (Outbound)

Allocation Out

These two data sets are combined using the Stack Files Command of the Tabular Transformation BizApp to create a data set upon which the variance can be calculated.

After data is extracted from the various systems, aligned, and combined into a single data set, we leverage the power of OneCloud’s Advanced Query Command to quickly and easily generate our variances.  A sample variance calculation query is below.  With this query, we test that the allocation produces equal inbound and outbound allocation amounts for the between various segments.

Select Account,Rule
,printf('%.4f',Inbound_Alloc) as Inbound_Alloc
,printf('%.4f',Outbound_Alloc) as Outbound_Alloc
,printf('%.4f',Inbound_Alloc+Outbound_Alloc) as Variance
from (
Select Account,Rule
,Sum(Case When Movement = 'Allocation In' then Amount Else 0 End) as Inbound_Alloc
,Sum(Case When Movement = 'Allocation Out' then Amount Else 0 End) as Outbound_Alloc
from fact
Where
Period = 'June'
Group By
Account,Rule
)
Where
Abs(Inbound_Alloc+Outbound_Alloc) > 0.0001
Order By
Abs(Inbound_Alloc+Outbound_Alloc) desc

Let’s explore each section of this query.  First, let’s start with the interior Select statement.  Interior Select_Final

This query performs several actions.  First, it makes the various fields in the data set available for reporting purposes.  For the purpose of variance reporting, it is important to select only the fields for which an association between the data sets can be created.  In this example, both data sets (Allocation In, Allocation Out) contain a field called Customer; however, because the allocation logic is spreading data between customers, we should not include the Customer field in the select portion of the variance query because a variance will always exist at the individual customer level.  Having functional knowledge of the data is critical to creating an appropriate query.  In this data set, we know that data is able to be reconciled at individual values for the Account and Rule segments.  These segments are used later in the query to support the aggregation of data.  We’ll discuss this later in this article. 

Inner Select_1

 

Second, as part of the Select statement, we create individual data columns to align with each of the functional data sets represented and later calculate variances.  In this example, we have functional data sets that contain the detailed Allocation In and Allocation Out data.  The Sum and Case Statements allows us to aggregate only the data records that the Movement segment identifies as either an Allocation In or Out.     Inner Select_2

Third, if needed, the data set is filtered using a Where clause.  In this example, we only want to see the variances for the period June.  While the Period segment value is manually defined in this example, we could leverage OneCloud Variables, Command Outputs, or Runtime Prompts to ensure the filter is dynamic.  We filter within this interior Select statement to reduce the overall data volume that will be returned to the outer Select statement thereby ensuring optimal performance.Inner Select_3

Fourth, the Group By statement is used to support the data column aggregation described in the first section.  All fields specified in the Select portion of this query, except those on which the aggregation is being performed, must be included in the Group By portion of the query.  In functional terms, we can collapse the detail behind each of these segments into an aggregate value.  For example, if the allocation produces 100 data records for Account PER2100 and Rule R0005, the Sum and Group By operators will aggregate to a single data point that can be used to calculate the variance at that total Customer level.  Inner Select_4

The result of the interior Select statement can be leveraged as a subselect from an outer Select statement.  This allows subsets of the interior query results to be output, variances to be calculated, as well as additional filtering criteria and/or ordering operations to be applied.

First, in the outer Select statement, we retrieve all of the fields of the interior Select statement and calculate a variance between the data columns.  We assign formatting (printf) to the data columns for readability.  Outer Select_1

Second, we can apply optional filtering to the data set created by interior Select statement using a Where Clause.  In this example, we are retrieving data only when the variance is more than 0.0001.  We utilized the absolute value (Abs) function to allow us to use a simple greater than (>) operator.

Outer Select_2

 

Third, we sort the subset of data using the Order By operator.  In this example, we sorted to display non-zero variances first in the report. Outer Select_3

The result of the Advanced Query is an actionable report that quickly identifies variances.  Had we not included a filter on the variance in the outer Select, the sorting would enable us to stop evaluating the query output once a zero variance record is encountered.  This is especially impactful when calculating variances across large data sets. 

With Variance Filter

Report_Filter

Without Variance Filter

Report_NoFilter

 



The last step in the data validation process is reporting the results of the variance calculation.  OneCloud has a number of mechanisms that enable variance reporting.  In a simple deployment, the variance calculation result output can be emailed or saved to a file system including cloud-based such as Google Drive.  In more sophisticated examples, the output can be written to a database or analytics platform with a reporting dashboard attached.  Lastly, the results of the variance calculation can also be used in combination with a OneCloud Conditional where specific actions can be taken in the event of variances being detected such as generating a PagerDuty alert, performing a data rollback action, or generating a detailed drill report to support the variance investigation. 

The robust features of OneCloud to support data validation efforts empowers you to spend less time identifying where a variance exists and instead allows you to focus on why the variance exists and how to correct it.  The results are a more efficient integration process with higher confidence in the data. 

If you want to learn more about how OneCloud provide an automated solution for reconciling data exchange between systems, please contact us at info@onecloud.io.