The AI Auditor: What Task Automation Experiment Reveals About GenAI’s Productivity Promise

7 min readSep 9, 2024

We previously explored the debate surrounding the macro-level impact of Generative AI (GenAI). In that post, the case was made that leaving the discussion at the macro level could obscure the transformative potential of the technology. Economists have long noted the difficulty in seeing technological benefits “bubble up” to macro-level statistics. This challenge was famously articulated by economist Robert Solow in 1987: “You can see the computer age everywhere but in the productivity statistics.”

Instead of looking at the technology solely from a macro-view, the case was made to recognize how GenAI, at a task level, acts as a “language amplifier”: converting a condensed set of words into detailed, structured outputs. The power of this can be seen in workers finding the technology so useful that nearly 70% of the 5,000+ workers surveyed hid their use of GenAI from their superiors.

Ironically, even within Goldman Sachs, the potential of the technology is understood. Goldman Sachs’ Chief Information Officer, Marco Argenti, in an interview on Bloomberg’s Odd Lots podcast, highlighted several key areas where GenAI can make a difference:

1. Boosted developer productivity: GenAI increases developer productivity by 10 to 40% (averaging 20%), streamlining coding and expanding to other aspects of the software development lifecycle. Argenti further highlights that AI assistance extends beyond mere coding tasks. It can aid in crafting appropriate test cases, generating comprehensive documentation, and developing deployment scripts. This broader support enhances the overall software development process.

2. Improved ability to search and retrieve information: Traditionally, querying mountains of unstructured documents requires precise queries to locate the desired information. GenAI does not require that same level of precision due to its ability to recognize linguistic structures. Specifically, the technology can recognize patterns like phone numbers without rigid rules, tolerating deviations such as the letter “O” instead of the number “0”. Argenti also notes that the technology can pull citations from the identified documents, thereby managing the risk of hallucinations.

3. Prepare once and repurpose for different audiences: Argenti also points out the potential to leverage GenAI to repurpose content. Goldman Sachs’s CIO sees how technology can reduce the “repurposing” efforts because one piece of content, like a pitchbook, can be effortlessly transformed for internal users, external promotion efforts, and other audiences.

With that background in mind, we dive deeper and explore how the technology can practically automate the work done within the financial audit.

Harnessing GenAI for Financial Audits: The CustomGPT Experiment

To explore the practical applications of GenAI in a professional context, an experiment was conducted using a CustomGPT to document the results of Accounts Receivable (AR) testing, a common task in financial audits.

Setting up the Custom GPT

A Custom GPT was built to perform AR testing. The prompt instructed the AI to act as an expert financial auditor and CPA, conducting a year-end financial audit focused on accounts receivable. The AI was given specific steps to follow, including:

Request the AR sub-ledger, i.e., the AR working paper.
Analyze the AR confirmations received and update the working paper.
Request the cash receipt report to conduct the subsequent receipt testing.
Update the AR working paper with findings and conclusions.

The Custom GPT is only for experimental use and is available here, but you must be logged in to ChatGPT to access it.

The AR Subledger Working Paper

The experiment began with uploading the AR sub-ledger as the working paper:

Next, the AI was provided with the AR confirmations, including three PDF confirmations and one image confirmation:

The confirmations included various scenarios to see whether the Custom GPT would be able to handle different conditions:

Albatross Inc.: Amount paid in full
Bluebird Solutions: Amount too high, to be reduced by $674,927
Cardinal Enterprises: Amount too low, customer owes an additional $50,000
Dove Technologies: Blank confirmation with signature (image file)

AR Confirmation Testing Results

The AI processed these confirmations and updated the AR working paper:

As can be seen, the AI correctly accounted for each of the confirmations, accurately reflecting the confirmed amounts and noting discrepancies where they occurred.

Subsequent Receipt Testing

A cash receipt report was then provided to the Custom GPT to conduct subsequent receipt testing:

Subsequent Receipt Testing Results:

The AI correctly accounted for each of the subsequent receipts received:

(Note: The empty columns were hidden to make the image easier to read)

As can be seen, for customers without confirmations or subsequent receipts (Ibis and Jay), it noted “Difference of nan, further investigation needed.”

What is a “Difference of nan”?

This seemed like an error, but it was better to ask ChatGPT what it meant by the remarks:

Prompt: “Why did you write “Difference of nan, further investigation needed” for Ibis and Jay? What does “nan” mean?”

Response: “ChatGPT: The term “NaN” stands for “Not a Number,” which indicates missing or undefined values in the dataset. In the context of the AR Working Paper, “Difference of NaN, further investigation needed” suggests that the calculation couldn’t be completed because necessary data — like confirmed amounts or cash receipts — was missing for those customers.”

How the AI leveraged math-oriented terminology to execute audit documentation is pretty interesting. The use of “NaN” (Not a Number) is derived from mathematical and programming contexts, typically used to represent undefined or unrepresentable values in floating-point calculations.

Applying a technical computing term to an accounting context illustrates that AI could understand the underlying concept. It effectively used “NaN” to flag instances where the audit evidence was missing, requiring further investigation. That is, its lack of understanding of the full accounting lexicon was not an obstacle to completing the task correctly.

Analysis

The AI demonstrated promising capabilities in handling complex AR testing tasks:

It accurately processed various confirmation scenarios, including full payments, overpayments, and underpayments.
It correctly updated the AR working paper with confirmation results.
For subsequent receipt testing, it accurately traced payments and identified items requiring further investigation.
It (correctly) did not perform subsequent receipt testing for the customers for which confirms were received.
It recognized that there was an issue with the two customers, for which there were no confirmations or subsequent receipts.
It made the correct conclusions related to the subsequent receipt testing.

Where did it fall short?

The Custom GPT did not complete the following columns for the subsequent receipt testing:

· Total amount traced to subsequent receipts (for those partially confirmed or with no confirmations)

· Amount not confirmed or not traced to subsequent receipts

· Testing Conclusion

The idea was for the GenAI “to show its work” to make it easy for the auditor to review what it was doing.

Additionally, it did not duplicate the results from the “Conclusion on Amount Confirmed (i.e., are further procedures required)” column into the final “Conclusion” column. The idea was that this last column would include documentation from the AR confirms and the subsequent receipt testing. It only included the latter.

Conclusion

This experiment illustrates the potential of generative AI in automating basic auditing tasks. The Generative AI tool (Custom GPT) was capable of the following:

Compiling the AR working paper.
Accurately extracting information from both PDF and image file AR confirmations.
Correctly assigning subsequent receipts to customers.
Appropriately handling discrepancies.
Generating accurate initial conclusions about each customer.

While it demonstrated these capabilities in pulling the AR working paper together, it still needs the “human in the loop.” A CPA must review the working paper and ensure that the documentation is complete (e.g. completing the last column, figuring out the next steps for the AR for which no information was received, etc.). The AI can automate the compilation of information and give a preliminary assessment. For the work to meet professional standards, the auditor must apply professional judgment and interpret what the GenAI tool has assembled.

In our next installment, we’ll explore how GenAI is just one piece of the puzzle in revolutionizing the audit process. Much like how digital photography required the convergence of multiple technologies to become ubiquitous, we’ll examine the suite of innovations poised to create the audit platform of the future. Additionally, we’ll delve into the transformative impact a GenAI-driven audit could have on the structure and composition of audit teams.

The AI Auditor: What Task Automation Experiment Reveals About GenAI’s Productivity Promise

Harnessing GenAI for Financial Audits: The CustomGPT Experiment

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Malik D.

No responses yet