Customer Behavior Analysis with Neo4j in banking

Connected Data on a Knowledge Graph
Traditional relational databases, while effective for transactional data, often struggle to capture the intricate web of relationships that define customer behavior. These systems can fall short when attempting to analyze complex patterns like product holding ratios across diverse customer segments, or the subtle correlations between transactional behavior and product adoption. The rigid, tabular structure of these databases makes it difficult to traverse the interconnectedness of customers, accounts, and financial products, leading to fragmented insights and missed opportunities for personalization.
Enter Neo4j, a graph database designed to model and query highly connected data. By representing banking data as a network of nodes and relationships, Neo4j allows for intuitive exploration of customer interactions and product affinities. This approach empowers analysts to uncover hidden patterns in product holding ratios, identify key behavioral drivers like bill payment patterns such asApply Pay usage, and visualize the complex relationships that drive customer value. With Neo4j, banks can move beyond simple transactional analysis to gain a holistic view of their customers, unlocking deeper insights that were previously inaccessible.
The power of Neo4j lies in its ability to create knowledge graphs, which provide a rich, contextual understanding of customer behavior. These graphs enable banks to deliver truly personalized experiences by leveraging the interconnectedness of data to anticipate customer needs and offer tailored product recommendations. By understanding the ‘why’ behind customer actions, rather than just the ‘what,’ banks can foster stronger relationships, increase customer lifetime value, and drive sustainable growth. In the following sections, we will go into the practical aspects of modeling banking data in Neo4j, demonstrating how to leverage Cypher queries and graph algorithms to extract actionable insights and build powerful recommendation engines.
Modeling Banking Data in Neo4j: Products, Behaviors, and Relationships
To effectively utilize Neo4j for analyzing product holding ratios and customer behavior, we need to design a graph schema that accurately reflects the entities and relationships within the banking domain. This involves identifying key nodes and relationships, and defining their properties.
Designing the Graph Schema:
- Customers:
- Nodes:
(c:Customer)
- Properties:
customerId
,firstName
,lastName
,dateOfBirth
,location
,customerSegment
- Accounts:
- Nodes:
(a:Account)
- Properties:
accountId
,accountType
(e.g., "Current", "Credit", "Mortgage"),accountBalance
,openingDate
- Products:
- Nodes:
(p:Product)
- Properties:
productId
,productName
,productType
(e.g., "Debit Card", "Credit Card", "Mortgage Loan"),interestRate
- Transactions:
- Nodes:
(t:Transaction)
- Properties:
transactionId
,transactionDate
,transactionAmount
,transactionType
(e.g., "Debit", "Credit"),merchant
- Bill Payments:
- Nodes:
(bp:BillPayment)
- Properties:
paymentId
,paymentDate
,amount
,payee
(e.g., "Electricity Company", "Internet Provider"),category
- Apply Pay Usage:
- Nodes:
(ap:ApplyPayUsage)
- Properties:
usageId
,usageDate
,amount
,merchant
,location
Relationships:
(c)-[:OWNS]->(a)
: ACustomer
owns anAccount
.(c)-[:HOLDS]->(p)
: ACustomer
holds aProduct
.(a)-[:HAS_TRANSACTION]->(t)
: AnAccount
has aTransaction
.(c)-[:PERFORMS_BILL_PAYMENT]->(bp)
: ACustomer
performs aBillPayment
.(c)-[:USES_APPLY_PAY]->(ap)
: ACustomer
usesApplyPayUsage
.(t)-[:RELATED_TO]->(p)
: ATransaction
is related to aProduct
.(bp)-[:RELATED_TO]->(p)
: ABillPayment
is related to aProduct
.(ap)-[:RELATED_TO]->(p)
: AApplyPayUsage
is related to aProduct
.
Product holding ratios can be represented by the HOLDS
relationship between Customer
and Product
nodes. Customer interactions, such as transactions, bill payments, and Apply Pay usage, are represented by the corresponding relationships.
// Creating a Customer Node
CREATE (c:Customer {customerId: "C123", firstName: "Alice", lastName: "Smith", dateOfBirth: "1980-01-01"})
// Creating an Account Node and Relating it to a Customer
CREATE (a:Account {accountId: "A456", accountType: "Current", accountBalance: 1000})
MATCH (c:Customer {customerId: "C123"})
CREATE (c)-[:OWNS]->(a)
// Creating a Product Node and Relating it to a Customer
CREATE (p:Product {productId: "P789", productName: "Credit Card", productType: "Credit Card"})
MATCH (c:Customer {customerId: "C123"})
CREATE (c)-[:HOLDS]->(p)
// Creating a Transaction Node and Relating it to an Account
CREATE (t:Transaction {transactionId: "T101", transactionDate: "2023-10-27", transactionAmount: 50, transactionType: "Debit"})
MATCH (a:Account {accountId: "A456"})
CREATE (a)-[:HAS_TRANSACTION]->(t)
// Creating a BillPayment node and relating it to a customer
CREATE (bp:BillPayment {paymentId : "BP101", paymentDate : "2023-10-27", amount: 100, payee: "Electricity Company", category: "Utilities"})
MATCH (c:Customer {customerId: "C123"})
CREATE (c)-[:PERFORMS_BILL_PAYMENT]->(bp)
Data Ingestion and Transformation: Bringing Banking Data into Neo4j
To build a robust knowledge graph, you need to seamlessly integrate data from your core banking systems or platforms like Backbase and transaction logs. This involves extracting raw data, transforming it into a graph-friendly format, and efficiently loading it into Neo4j.
Alright, so when we’re talking about getting data ready for a graph database like Neo4j, it’s really a two-step process, starting with pulling the information from where it lives now. First, we need to gather data from the source, which hold things like customer profiles and account details. Think of these systems as the central hub of all customer information. We can get this data out using a few different methods, like through APIs, which are essentially digital doorways that allow us to request specific pieces of information. Or, we can directly query the database using SQL, which is a language designed for talking to databases. Another option is to use data export tools to grab the information in common formats like CSV or JSON.
Then, there are the transaction logs, which give us a detailed view of every transaction, bill payment, and even things like Apple Pay usage. To get this data, we might need to parse log files, which is like reading through a long diary of events. We could also use streaming platforms like Kafka or RabbitMQ, which handle large streams of data in real-time. And sometimes, we might replicate the database to get a copy of the transaction logs.
But, there are a few things to keep in mind here. We’ll likely encounter data in various formats, so we need to be prepared to handle those differences. Plus, we’re probably dealing with a lot of data, so we need to use efficient extraction techniques. And of course, security is paramount, so we need to ensure the data is transferred and stored safely.
Once we have all the data, the next step is to clean it up and transform it into a format that Neo4j can use. This involves cleaning the data by removing duplicates, fixing missing values, and correcting any inconsistencies. We also need to standardize things like date formats and currency symbols. Then, we need to transform the relational data, which is typically in tables, into a graph structure with nodes and relationships. This means creating unique identifiers for each node, converting data types as needed, and calculating any new fields we might need.
We can also enrich the data by adding contextual information, like geolocation data or merchant categories. This is a great opportunity to bring in external data that can enhance the graph and make it more insightful.
To do all this, we can use ETL tools like Apache NiFi or Talend, which are designed for moving and transforming data. Or, we can use programming languages like Python with Pandas or Spark, and write custom data transformation scripts. Essentially, it’s about taking raw data and shaping it into something that’s ready for analysis in a graph database.
// Importing Customer Data
LOAD CSV WITH HEADERS FROM 'file:///customers.csv' AS row
CREATE (c:Customer {customerId: row.customerId, firstName: row.firstName, lastName: row.lastName, dateOfBirth: row.dateOfBirth});
// Importing Account Data and Creating Relationships
LOAD CSV WITH HEADERS FROM 'file:///accounts.csv' AS row
MERGE (a:Account {accountId: row.accountId})
ON CREATE SET a.accountType = row.accountType, a.accountBalance = toInteger(row.accountBalance)
WITH row, a
MATCH (c:Customer {customerId: row.customerId})
CREATE (c)-[:OWNS]->(a);
// Importing Transactions
LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row
MERGE (t:Transaction {transactionId: row.transactionId})
ON CREATE SET t.transactionDate = row.transactionDate, t.transactionAmount = toFloat(row.transactionAmount), t.transactionType = row.transactionType
WITH row, t
MATCH (a:Account {accountId: row.accountId})
CREATE (a)-[:HAS_TRANSACTION]->(t)
Analyzing Product Holding Ratios and Customer Behavior with Cypher
Cypher, Neo4j’s query language, is your key tool for navigating the interconnectedness of your banking data. We’ll explore how to use it to calculate product holding ratios, identify behavioral patterns, and discover correlations.
Step 1. Cypher Queries to Calculate Product Holding Ratios Across Customer Segments
// Calculating Overall Product Holding Ratio
MATCH (c:Customer)-[:HOLDS]->(p:Product)
RETURN p.productName, count(c) AS customerCount
ORDER BY customerCount DESC
// Calculating Product Holding Ratio by Customer Segment
MATCH (c:Customer)-[:HOLDS]->(p:Product)
WITH c.customerSegment AS segment, p.productName AS product
MATCH (c2:Customer{customerSegment: segment})-[:HOLDS]->(p2:Product{productName: product})
RETURN segment, product, count(c2) AS customerCount
ORDER BY segment, customerCount DESC
Step 2. Identifying Patterns in Bill Payment and Apply Pay Usage
// Finding Customers with Frequent Bill Payments:
MATCH (c:Customer)-[:PERFORMS_BILL_PAYMENT]->(bp:BillPayment)
RETURN c.customerId, count(bp) AS paymentCount
ORDER BY paymentCount DESC
LIMIT 10
// Analyzing Apply Pay Usage by Location
MATCH (c:Customer)-[:USES_APPLY_PAY]->(ap:ApplyPayUsage)
RETURN ap.location, count(ap) AS usageCount
ORDER BY usageCount DESC
// Analyzing bill payment categories
MATCH (c:Customer)-[:PERFORMS_BILL_PAYMENT]->(bp:BillPayment)
RETURN bp.category, count(bp) as paymentCount
ORDER BY paymentCount DESC
Step 3. Discovering Correlations Between Product Holdings and Behavioral Data
// Finding Customers with Mortgages and Low Apply Pay Usage
MATCH (c:Customer)-[:HOLDS]->(m:Product {productType: "Mortgage Loan"})
WHERE NOT (c)-[:USES_APPLY_PAY]->(:ApplyPayUsage)
RETURN c.customerId
// Finding Customers with High Apply Pay Usage and Low Credit Card Penetration
MATCH (c:Customer)-[:USES_APPLY_PAY]->(ap:ApplyPayUsage)
WITH c, count(ap) AS applyPayCount
WHERE applyPayCount > 10 // Threshold for high usage
AND NOT (c)-[:HOLDS]->(:Product {productType: "Credit Card"})
RETURN c.customerId
// Finding customers with high bill payment frequency and high current account balance
MATCH (c:Customer)-[:PERFORMS_BILL_PAYMENT]->(bp:BillPayment)
WITH c, count(bp) as paymentCount
MATCH (c)-[:OWNS]->(a:Account{accountType: "Current"})
WHERE paymentCount > 5 AND a.accountBalance > 5000
RETURN c.customerId
This query identifies high bill payment frequency customers that also have high current account balances.
Enhancing Product Recommendations with Graph Algorithms and Machine Learning
Graph algorithms are really useful for finding hidden connections in your banking data. For example, we can use an algorithm called PageRank to figure out which customers are the most influential. In banking, this might mean customers who have lots of connections with other people or who do a lot of transactions. So, PageRank helps us see who’s really central in the network of customers.
This code calculates PageRank based on customer account ownership, bill payments, and Apply Pay usage. High-scoring customers are considered influential.
CALL gds.pageRank.stream({
nodeProjection: 'Customer',
relationshipProjection: {
OWNS: {
type: 'OWNS',
orientation: 'UNDIRECTED'
},
PERFORMS_BILL_PAYMENT: {
type: 'PERFORMS_BILL_PAYMENT',
orientation: 'UNDIRECTED'
},
USES_APPLY_PAY: {
type: 'USES_APPLY_PAY',
orientation: 'UNDIRECTED'
}
}
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).customerId AS customerId, score
ORDER BY score DESC
LIMIT 10;
We can use special algorithms to group customers together based on their behavior or what products they have. These algorithms, like Louvain or Label Propagation, find communities within the customer base. This allows the bank to give specific product recommendations to each group, making offers more relevant.
This code finds customer communities based on product holdings, Account ownership, and behavioral data.
CALL gds.louvain.stream({
nodeProjection: 'Customer',
relationshipProjection: {
OWNS: {
type: 'OWNS',
orientation: 'UNDIRECTED'
},
HOLDS: {
type: 'HOLDS',
orientation: 'UNDIRECTED'
},
PERFORMS_BILL_PAYMENT: {
type: 'PERFORMS_BILL_PAYMENT',
orientation: 'UNDIRECTED'
},
USES_APPLY_PAY: {
type: 'USES_APPLY_PAY',
orientation: 'UNDIRECTED'
}
}
})
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).customerId AS customerId, communityId
ORDER BY communityId;
We can figure out which products customers tend to use together by using special tools, like similarity and path-finding algorithms (Product Affinity Analysis). Basically, these tools help us see which products naturally pair up.
# This code finds products that are held by similar customers
CALL gds.nodeSimilarity.stream({
nodeProjection: 'Product',
relationshipProjection: {
HOLDS: {
type: 'HOLDS',
orientation: 'UNDIRECTED'
}
}
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).productName AS product1, gds.util.asNode(node2).productName AS product2, similarity
ORDER BY similarity DESC
LIMIT 10;
To create advanced recommendation systems, it’s beneficial to combine Neo4j with machine learning tools like scikit-learn or TensorFlow. We can use Cypher queries, the language for Neo4j, to extract features directly from the graph data. For instance, we could determine the number of products a customer holds, their average transaction amount, or how often they pay bills. We can then use the Neo4j driver within Python to execute these Cypher queries and retrieve the extracted data, which can then be fed into the machine learning pipelines.
from neo4j import GraphDatabase
import os
uri = os.environ.get("NEO4J_URI", "neo4j://localhost:7687")
user = os.environ.get("NEO4J_USER", "neo4j")
password = os.environ.get("NEO4J_PASSWORD", "your_password")
driver = GraphDatabase.driver(uri, auth=(user, password))
def get_customer_features(tx, customer_id):
"""
Retrieves the count of products held by a customer from Neo4j.
Args:
tx (neo4j.Transaction): Neo4j transaction.
customer_id (str): Customer ID.
Returns:
int: Count of products held by the customer, or 0 if not found.
"""
try:
result = tx.run("""
MATCH (c:Customer {customerId: $customer_id})-[:HOLDS]->(p:Product)
RETURN count(p) AS product_count
""", customer_id=customer_id)
record = result.single()
if record:
return record["product_count"]
else:
return 0
except Exception as e:
print(f"Error retrieving customer features: {e}")
return 0
with driver.session() as session:
product_count = session.execute_read(get_customer_features, "C123")
print(f"Customer C123 holds {product_count} products.")
driver.close()
Once we’ve extracted features from the graph data, we can use them to train machine learning recommendation models. Techniques like collaborative filtering, content-based filtering, and matrix factorization are suitable for this purpose. For example, using scikit-learn in Python, we can implement collaborative filtering to build a recommendation engine based on customer similarities.
Cosine similarity is calculated by the formula, where A and B are vectors representing customer product holdings:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def recommend_products(customer_id, customer_ids, customer_product_matrix, product_names):
"""
Recommends products for a given customer based on cosine similarity.
Args:
customer_id (str): The ID of the customer.
customer_ids (list): A list of all customer IDs.
customer_product_matrix (np.array): A matrix where rows are customers,
columns are products, and values indicate product holdings.
product_names (list): A list of product names.
Returns:
list: A list of recommended product names.
"""
try:
customer_index = customer_ids.index(customer_id)
except ValueError:
return "Customer ID not found."
similarity_matrix = cosine_similarity(customer_product_matrix)
similar_customers = similarity_matrix[customer_index].argsort()[::-1][1:11] #top 10 similar customers
recommended_products = np.zeros(len(product_names))
similarity_scores = similarity_matrix[customer_index]
for similar_customer_index in similar_customers:
recommended_products += customer_product_matrix[similar_customer_index] * similarity_scores[similar_customer_index]
recommended_product_indices = recommended_products.argsort()[::-1]
return [product_names[i] for i in recommended_product_indices]
# Example usage (with placeholder data)
customer_ids = ["C123", "C456", "C789", "C101"]
product_names = ["Product A", "Product B", "Product C", "Product D"]
customer_product_matrix = np.array([
[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 1, 0, 0],
[0, 0, 1, 1]
])
recommendations = recommend_products("C123", customer_ids, customer_product_matrix, product_names)
print(recommendations)
To effectively train recommendation models, Cypher allows us to engineer features that capture the inherent graph structure of our data. We can derive path-based features, such as the length of the shortest path between a customer node and a product node, which quantifies the proximity or relationship strength between them. Furthermore, neighborhood-based features can be extracted by examining the immediate connections of a node; for example, counting the number of other customers who possess the same product as a given customer provides a measure of product popularity within a customer’s community. Aggregation-based features can be constructed by summarizing properties of related nodes, such as calculating the average transaction amount for a customer’s set of accounts, offering a holistic view of the customer’s financial activity.
In practical applications, we can leverage these features to implement collaborative filtering. For example, by identifying products that are frequently held together by customers, we can compute product co-holding similarity. This similarity metric allows us to build a recommendation system that suggests products to customers who possess similar portfolios. Alternatively, we can analyze customer transaction patterns, bill payment behaviors, and Apple Pay usage to establish behavioral similarity. By identifying clusters of customers with similar behaviors, we can recommend products that are popular among these similar customer segments, effectively tailoring recommendations to individual behavioral profiles.
Future Directions
I predict by 2027, 70% of leading financial institutions will deploy graph database technology to enhance customer experience and drive revenue growth through proactive, personalized services.
I observe a significant trend in the financial services sector towards leveraging graph database technology, for example Neo4j, to overcome the limitations of traditional relational databases in handling complex relationships and contextual data. This shift is driven by the increasing need for banks to deliver hyper-personalized experiences and optimize customer lifetime value.
Key Drivers:
- Enhanced Customer Insights: Graph databases excel at revealing hidden patterns and connections within customer data, enabling banks to gain a holistic understanding of customer behavior and preferences.
- Proactive Recommendation Engines: The ability to identify product affinities and predict customer needs empowers banks to deliver timely and relevant recommendations, driving cross-selling and up-selling opportunities. For instance, analyzing transaction data to identify the correlation between savings account openings and subsequent credit card applications allows for proactive, targeted offers.
- Improved Churn Prediction: By modeling customer relationships and behaviors as a network, graph databases enable banks to identify at-risk customers with greater accuracy. Factors such as decreased transaction frequency, changes in product usage, and negative interactions can be analyzed to predict churn and implement targeted retention strategies.
- AI-Driven Personalization: Graph databases provide the rich contextual data required for advanced AI and machine learning models, enabling the development of highly personalized financial services.
Implications:
- Banks that adopt graph database technology will gain a competitive advantage by delivering superior customer experiences and driving revenue growth.
- Financial institutions will need to invest in skilled data scientists and engineers with expertise in graph database technology and machine learning.
- Data governance and security will become increasingly critical as banks handle sensitive customer data within graph databases.
Recommendation:
Financial institutions should evaluate and implement graph database technology to enhance customer insights, build proactive recommendation engines, and improve churn prediction. A phased approach, starting with targeted use cases and gradually expanding to broader applications, is recommended. Investing in training and development to build internal expertise in graph database technology is also crucial.