Catching tax cheats with graph databases

Register now

Another Tax Day has passed, and tens of millions of American citizens and corporations have filed returns showing they paid their fair share. Unfortunately, many did not.

Consider that the “tax gap” — the IRS’s standard measure of tax evasion — has exceeded $400 billion, according to the agency’s latest figures.

Tax evasion “is illegal and is an underappreciated problem in the United States,” the Brookings Institution says. “About one out of every six dollars owed in federal taxes is not paid. The amount of unpaid taxes every year is plausibly about three-quarters the size of the entire annual federal budget deficit.”

To make matters worse, the IRS pursues fewer cases of tax evasion than it used to. The number of criminal cases brought by the IRS in 2017, in which tax fraud was the primary crime, declined by almost a quarter since 2010, the New York Times reported.

As the problem grows, it’s time for officials to explore the use of technology — especially the advanced analytics powered by graph databases.

Graph databases are used by multiple industries to perform deep, real-time analytics on massive datasets. When you receive a personalized recommendation from an online retailer, for example, chances are it was the product of graph databases.

The same technology can help catch tax cheats.

Side effects of a well-connected global economy

Conditions worldwide are ripe for tax evasion. Modern technology has facilitated easy movement of money across international borders, driving tremendous velocity and growth for the global economy.

Unfortunately, it also allows tax evaders to set up shell corporations with just a few clicks on the internet and an encrypted phone call to the criminals who make these corporations look like the legitimate entities.

The local laws in tax havens — such as the Cayman Islands, Panama, and the Bahamas — further complicate the issue, limiting the amount of information shared by these governments with the U.S., EU, and other tax authorities.

Setting up the shell corporations to evade taxes with crime-as-a-service

Shell corporations are entities set up with the express purpose of hiding the income and avoiding the taxes for that income. Crime-as-a-Service — organized online crime rings — has become a reality, with the sophisticated fraudsters incorporating new companies with fake or paid directors hiding the actual beneficiary, the tax evader. They route the money through an intricate trail of the accounts for these shell corporations and passing the proceeds or income back via an equally complex path to avoid the detection.

The result is a complicated hub of connections, with multiple layers of relationships hidden within data.

Traditional fraud investigation solutions built on the relational databases struggle to go beyond two or three levels of data, as every level requires computationally expensive and time-consuming database joins.

The first and second generation graph databases are great at finding the money trail up to three levels, but struggle as the layers of the tax evasion trail expands to four or more levels.

The criminals set up new corporations or subsidiaries of an existing corporation and use it to launder the money to and from the tax havens before shutting down these subsidiaries, making it very difficult to find or track the money movement through these “fireflies of tax evasion."

This requires the tax fraud detection solution to understand the structure of corporate entities with three or more layers, identify changes to the structure over time (“temporal analysis”) and flag suspicious patterns where specific subsidiaries were used for a short period of time for routing money and were shut down after that period.

Finding the tax evaders with the native parallel graph database and analytics

Native parallel graph databases are built for digging as much as ten or more levels deep into the money trail, and identifying the shell corporations that have similar or identical addresses, contact numbers, share one or more directors, and have been created or administered from the same set of IP addresses.

They are also adept at the temporal analysis of complex corporate hierarchies, identifying subsidiaries that are used for a very short period of time for passing funds back and forth to a related set of accounts who all seem to transact only with each other.

Native parallel graphs are also capable of incorporating data from multiple internal and external sources, such as OpenCorporates — the world’s largest open database of corporate information. This is useful in finding and connecting common directors among companies from multiple sources as well as common or similar addresses, phone numbers, and other contact information.

Lastly, native parallel graphs are capable of analyzing the money flow through accounts with as many as 10 or more hops, understand the loopbacks through an equally complex path and identify suspicious patterns that seem to indicate tax evasion. This is powered by the massively parallel processing in the native parallel graphs.

As criminals deploy complex strategies and modern technology for tax evasion, this technology can be used effectively by the IRS and other agencies all over the world to catch the crooks.

For reprint and licensing requests for this article, click here.