The Rdrools package aims to accomplish two main objectives:
Rule engines allow for optimal checking of rules against data for large rule sets [of the order of hundreds or even thousands of rules]. Drools [and other rule engines] implement an enhanced version of the Rete algorithm, which efficiently match facts [data tuples] against conditions [rules]. This allows for codifying intuition/ business context which can be used to power intelligent systems.
RDrools brings the efficiencies of large-scale production rule systems to data science users. Rule sets can be used alone, or in conjunction with machine learning models, to develop and operationalize intelligent systems. RDrools allows for deployment of rules defined through an R interface into a production system. As data comes in [periodic or real-time], a pre-defined set of rules can be checked on the data, and actions can be triggered based on the result
In order to achieve the objective of providing data scientists an intuitive interface to execute rules on datasets, the Rdrools package exposes the executeRulesOnDataset function, which is explicitly designed for data scientists. As input to this function rules are defined using the typical language of data science with verbs such as
For ease of use, the rules can be defined in a csv format and imported into the R session through the usual read functions. The require format follows a familiar structure using the verbs discussed earlier. We take the example of the iris dataset and define rules on it. The sample rules for the iris dataset are defined in the irisRules data object [for the purpose of the example]
data("iris")
data("irisRules")
sampleRules <- irisRules
rownames(sampleRules) <- seq(1:nrow(sampleRules))
sampleRules[is.na(sampleRules)] <-""
sampleRules
## Filters GroupBy Column Function Operation
## 1 Species == 'setosa'
## 2 Species Sepal.Length average >=
## 3 Sepal.Length average <
## 4 Sepal.Width > 3 Sepal.Length average >=
## 5 Petal.Width > 0.4 Species Petal.Length average <
## 6 Petal.Length compare >=
## 7 Species == 'versicolor' Petal.Length compare >=
## Argument
## 1
## 2 5.9
## 3 5
## 4 5
## 5 5
## 6 Sepal.Width
## 7 Sepal.Width
Through this function, various typical types of rules can be executed with a combination of the verbs described above.
Note - In order to plot graphs to show counts of number of facts passing/ failing rules, we have defined a function internal to the vignette to plot graphs called ‘plotgraphs’
The first type of rule is applying a simple filter based on the condition on a particular column. This is done by specifying the full condition under the filter column.
In the case of the iris dataset, we filter out a specific type of Species. To illustrate this case, we apply only rule 1.
filterRule <- sampleRules[1,]
filterRule
Filters GroupBy Column Function Operation Argument
1 Species == 'setosa'
filterRuleOutput <- executeRulesOnDataset(iris, filterRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(),(Double.valueOf($conditio"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(filterRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr "Species == 'setosa'"
.. ..$ GroupBy : chr ""
.. ..$ Column : chr ""
.. ..$ Function : chr ""
.. ..$ Operation: chr ""
.. ..$ Argument : chr ""
.. ..$ ruleNum : int 1
..$ intermediateOutput: list()
..$ output :'data.frame': 150 obs. of 3 variables:
.. ..$ Group : int [1:150] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ Indices: int [1:150] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ IsTrue : chr [1:150] "true" "true" "true" "true" ...
The output has three objects:
Plotting graphs of the result obtained
The output obtained can be visualized by plotting the graphs of the distribution of true and false in the output. true here represents the points which satisfy the rule i.e Species = setosa and false represents the points which do not.
anomaliesCountGraph <- plotgraphs(result=filterRuleOutput, plotName="Plot of points distribution")
anomaliesCountGraph[[1]][[1]]
The second type of rule is to apply a condition to the aggregated value of metrics for different groups. In the case of the iris dataset, we aggregate the Sepal.Length variable across different Species, and identify the Species which have an average Sepal.Length greater than a threshold value.
To illustrate this case, we apply only rule 2 from the set of sample rules.
groupedAggregationRule <- sampleRules[2,]
groupedAggregationRule
Filters GroupBy Column Function Operation Argument
2 Species Sepal.Length average >= 5.9
groupedAggregationRuleOutput <- executeRulesOnDataset(iris, groupedAggregationRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(Species == input.get(\"Sp"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result>=5.9);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(groupedAggregationRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr ""
.. ..$ GroupBy : chr "Species"
.. ..$ Column : chr "SepalLength"
.. ..$ Function : chr "average"
.. ..$ Operation: chr ">="
.. ..$ Argument : chr "5.9"
.. ..$ ruleNum : int 1
..$ intermediateOutput:Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
.. ..$ Species : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ Rule1 : chr [1:3] "false" "true" "true"
.. ..$ Rule1Value: num [1:3] 5.01 5.94 6.59
..$ output :Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
.. ..$ Group : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ Indices: chr [1:3] "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,"| __truncated__ "51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,"| __truncated__ "101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128"| __truncated__
.. ..$ IsTrue : chr [1:3] "false" "true" "true"
The output has three objects:
Plotting graphs of the result obtained
anomalousSetGraph<-plotgraphs(result=groupedAggregationRuleOutput, plotName="Plot of groups")
anomalousSetGraph[[1]][[1]]
The above graph shows the groups i.e, the Species for which the average of Sepal.Length is greater than or equal to 5.9. The Y-axis shows the average Sepal.Length for each Species.
The plot below shows the number of groups which satisfied the rule. As we can see from above, 2 of the 3 groups satisfy the rule, and hence true has a count of 2.
anomaliesCountGraph<-plotgraphs(result=groupedAggregationRuleOutput, plotName="Plot of points distribution")
anomaliesCountGraph[[1]][[1]]
This type of rule allows the data scientist to aggregate an entire column and compare that with a threshold value. In the case of the iris dataset, we aggregate the Sepal.Length variable across all cases, and check if it is less than a threshold value
To illustrate this case, we apply only rule 3 from the set of sample rules.
columnAggregationRule <- sampleRules[3,]
columnAggregationRule
Filters GroupBy Column Function Operation Argument
3 Sepal.Length average < 5
columnAggregationRuleOutput <- executeRulesOnDataset(iris, columnAggregationRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(),average(Double.valueOf($c"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result<5);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(columnAggregationRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr ""
.. ..$ GroupBy : chr ""
.. ..$ Column : chr "SepalLength"
.. ..$ Function : chr "average"
.. ..$ Operation: chr "<"
.. ..$ Argument : chr "5"
.. ..$ ruleNum : int 1
..$ intermediateOutput: list()
..$ output :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 3 variables:
.. ..$ Group : num 1
.. ..$ Indices: chr "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,"| __truncated__
.. ..$ IsTrue : chr "false"
The output has three objects:
In this case, we apply a filter, and then on the filtered data, aggregate a column and compare it to a threshold value. In the case of the iris dataset, we check if for cases with Sepal.Width > 3, if the average Sepal.Length is greater than 5
To illustrate this case, we apply only rule 4 from the set of sample rules.
filterColAggregationRule <- sampleRules[4,]
filterColAggregationRule
Filters GroupBy Column Function Operation Argument
4 Sepal.Width > 3 Sepal.Length average >= 5
filterColAggregationRuleOutput <- executeRulesOnDataset(iris, filterColAggregationRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(),average(Double.valueOf($c"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result>=5);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(filterColAggregationRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr "SepalWidth > 3"
.. ..$ GroupBy : chr ""
.. ..$ Column : chr "SepalLength"
.. ..$ Function : chr "average"
.. ..$ Operation: chr ">="
.. ..$ Argument : chr "5"
.. ..$ ruleNum : int 1
..$ intermediateOutput: list()
..$ output :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 3 variables:
.. ..$ Group : num 1
.. ..$ Indices: chr "1,3,4,5,6,7,8,10,11,12,15,16,17,18,19,20,21,22,23,24,25,27,28,29,30,31,32,33,34,35,36,37,38,40,41,43,44,45,47,4"| __truncated__
.. ..$ IsTrue : chr "true"
The output has three objects:
We now combine all types if verbs into one rule. In the iris dataset, we check if for all cases with Petal.Width greater than a threshold value, if each type of Species [which is a group] has an average Petal.Length greater than another threshold.
To illustrate this case, we apply only rule 5 from the set of sample rules.
filterGroupByAggrRule <- sampleRules[5,]
filterGroupByAggrRule
Filters GroupBy Column Function Operation Argument
5 Petal.Width > 0.4 Species Petal.Length average < 5
filterGroupByAggrRuleOutput <- executeRulesOnDataset(iris, filterGroupByAggrRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(Species == input.get(\"Sp"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result<5);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(filterGroupByAggrRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr "PetalWidth > 0.4"
.. ..$ GroupBy : chr "Species"
.. ..$ Column : chr "PetalLength"
.. ..$ Function : chr "average"
.. ..$ Operation: chr "<"
.. ..$ Argument : chr "5"
.. ..$ ruleNum : int 1
..$ intermediateOutput:Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
.. ..$ Species : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ Rule1 : chr [1:3] "true" "true" "false"
.. ..$ Rule1Value: num [1:3] 1.65 4.26 5.55
..$ output :Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
.. ..$ Group : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ Indices: chr [1:3] "24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44" "51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,"| __truncated__ "101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128"| __truncated__
.. ..$ IsTrue : chr [1:3] "true" "true" "false"
The output has three objects:
anomalousSetGraph<-plotgraphs(result=filterGroupByAggrRuleOutput, plotName="Plot of groups")
anomalousSetGraph[[1]][[1]]
The above graph shows the groups i.e, the Species for which the average of Petal.Length is less than 5. The Y-axis shows the average Petal.Length for each Species.
Here we compare values of two columns. In the case of the iris dataset, we compare the Petal.Length with Sepal.Width, and identify the rows which have a Petal.Length greater than Sepal.Width.
To illustrate this case, we apply only rule 6 from the set of sample rules.
compareColumnsRule <- sampleRules[6,]
compareColumnsRule
Filters GroupBy Column Function Operation Argument
6 Petal.Length compare >= Sepal.Width
compareColumnsRuleOutput <- executeRulesOnDataset(iris, compareColumnsRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(),compare(Double.valueOf($c"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result>=SepalWidth);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(compareColumnsRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr ""
.. ..$ GroupBy : chr ""
.. ..$ Column : chr "PetalLength"
.. ..$ Function : chr "compare"
.. ..$ Operation: chr ">="
.. ..$ Argument : chr "SepalWidth"
.. ..$ ruleNum : int 1
..$ intermediateOutput: list()
..$ output :'data.frame': 150 obs. of 3 variables:
.. ..$ Group : int [1:150] 51 52 53 54 55 56 57 58 59 60 ...
.. ..$ Indices: int [1:150] 51 52 53 54 55 56 57 58 59 60 ...
.. ..$ IsTrue : chr [1:150] "true" "true" "true" "true" ...
The output has three objects:
anomaliesCountGraph<-plotgraphs(result=compareColumnsRuleOutput, plotName="Plot of points distribution")
anomaliesCountGraph[[1]][[1]]
Here we compare values of two columns after filtering the dataset. In the case of the iris dataset, we compare the Petal.Length with Sepal.Width, and identify the rows which have a Petal.Length greater than Sepal.Width.
To illustrate this case, we apply only rule 7 from the set of sample rules.
compareFilterRule <- sampleRules[7,]
compareFilterRule
Filters GroupBy Column Function Operation
7 Species == 'versicolor' Petal.Length compare >=
Argument
7 Sepal.Width
compareFilterRuleOutput <- executeRulesOnDataset(iris, compareFilterRule)
List of 1
$ :List of 20
..$ : chr "import java.util.HashMap"
..$ : chr "import java.lang.Double"
..$ : chr "global java.util.HashMap output"
..$ : chr ""
..$ : chr " dialect \"mvel\""
..$ : chr "rule \"Rule1\""
..$ : chr " salience 0"
..$ : chr " when"
..$ : chr " input: HashMap()"
..$ : chr "result: Double()\n from accumulate($condition:HashMap(),compare(Double.valueOf($c"| __truncated__
..$ : chr "then"
..$ : chr "output.put('SepalLength',input.get('SepalLength'));"
..$ : chr "output.put('SepalWidth',input.get('SepalWidth'));"
..$ : chr "output.put('PetalLength',input.get('PetalLength'));"
..$ : chr "output.put('PetalWidth',input.get('PetalWidth'));"
..$ : chr "output.put('Species',input.get('Species'));"
..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
..$ : chr "output.put(\"Rule1\",result>=SepalWidth);"
..$ : chr "output.put('Rule1Value',result);"
..$ : chr "end"
str(compareFilterRuleOutput)
List of 1
$ :List of 3
..$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
.. ..$ Filters : chr "Species == 'versicolor'"
.. ..$ GroupBy : chr ""
.. ..$ Column : chr "PetalLength"
.. ..$ Function : chr "compare"
.. ..$ Operation: chr ">="
.. ..$ Argument : chr "SepalWidth"
.. ..$ ruleNum : int 1
..$ intermediateOutput: list()
..$ output :'data.frame': 150 obs. of 3 variables:
.. ..$ Group : int [1:150] 51 52 53 54 55 56 57 58 59 60 ...
.. ..$ Indices: int [1:150] 51 52 53 54 55 56 57 58 59 60 ...
.. ..$ IsTrue : chr [1:150] "true" "true" "true" "true" ...
The output has three objects:
anomaliesCountGraph<-plotgraphs(result=compareColumnsRuleOutput, plotName="Plot of points distribution")
anomaliesCountGraph[[1]][[1]]
We now consider a more business-specific problem, where such a rule system might be deployed.
Consider the customers of a retail bank, who make transactions against their bank account for different purposes such as shopping, money transfers, etc. In the banking system, there is a huge potential for fraud. Typically, abnormal transaction behavior is a strong indicator of fraud.
We explore how such transactions can be monitored intelligently to detect fraud using Rdrools by applying business rules.
The following dataset provides transaction data for multiple customers of the retail bank (identified by their Account IDs) is used. Every transaction that a user (account) does is recorded with the following details:
data("transactionData")
transactionData$Date <- lubridate::ymd(transactionData$Date)
transactionData <- transactionData[1:500,]
'data.frame': 500 obs. of 16 variables:
$ Account_ID : chr "2266 97472609" "2266 97472609" "2266 97472609" "2266 97472609" ...
$ Customer_ID : chr "HS 10003669" "HS 10003669" "HS 10003669" "HS 10003669" ...
$ Month : chr "Trans_Month1" "Trans_Month1" "Trans_Month1" "Trans_Month1" ...
$ Product_Type : chr "Savings Account" "Savings Account" "Savings Account" "Savings Account" ...
$ No_of_transactions : int 5 5 5 5 5 5 5 5 5 5 ...
$ Account_open_date : chr "2016-03-22" "2016-03-22" "2016-03-22" "2016-03-22" ...
$ Transaction_ID : int 81993859 50847914 58383961 31707922 14904755 23169362 26156823 59730045 83134275 8863921 ...
$ Date : Date, format: "2016-10-03" "2016-10-06" ...
$ account_month : chr "226697472609Trans_Month1" "226697472609Trans_Month1" "226697472609Trans_Month1" "226697472609Trans_Month1" ...
$ trans_tender_type : chr "Overseas Transfer" "Domestic Transfer" "Cash Withdrawal" "Bill Payment" ...
$ Credit_Debit_Indicator : chr "Credit" "Credit" "Debit" "Debit" ...
$ Total_Transactions : int 7 7 7 7 7 7 1 7 2 3 ...
$ Transaction_Amount : num 1262 141 700 739 600 ...
$ Balance : num 1262 1403 791 52 -512 ...
$ risk_level : chr "Low" "Low" "Low" "Low" ...
$ Credit_Card_Monthly_Expenditure: num 0 0 0 0 0 0 0 0 0 0 ...
There might be certain cases where we simply want to check the behavior of customers based on a constant benchmark value. These might be cases such as compliance and policy violations, etc.
In our case we check rules like:
data("transactionRules")
rownames(transactionRules) <- seq(1:nrow(transactionRules))
transactionRules[is.na(transactionRules)] <-""
transactionRules
Filters
1 Credit_Card_Monthly_Expenditure > 28000
2
3
4 risk_level == 'Low'
5 Date > '2017-05-01' && Credit_Debit_Indicator == 'Debit'
GroupBy Column Function Operation Argument
1
2 Account_ID,Month Transaction_Amount sum >= 10000
3 Total_Transactions max > 5
4 Transaction_Amount average < 1400
5 Account_ID Transaction_Amount sum >= 40000
One example of the rules to mark anomalous transactions from the above list is
\[\textsf{For an account, the total Transaction_Amount } \\ \textsf{should be greater than or equal to USD 40,000}\]
We now take the entire set of rules and execute it on the transaction data as follows:
transactionDataOutput <- executeRulesOnDataset(transactionData, transactionRules)
## List of 1
## $ :List of 31
## ..$ : chr "import java.util.HashMap"
## ..$ : chr "import java.lang.Double"
## ..$ : chr "global java.util.HashMap output"
## ..$ : chr ""
## ..$ : chr " dialect \"mvel\""
## ..$ : chr "rule \"Rule1\""
## ..$ : chr " salience 0"
## ..$ : chr " when"
## ..$ : chr " input: HashMap()"
## ..$ : chr "result: Double()\n from accumulate($condition:HashMap(),(Double.valueOf($conditio"| __truncated__
## ..$ : chr "then"
## ..$ : chr "output.put('AccountID',input.get('AccountID'));"
## ..$ : chr "output.put('CustomerID',input.get('CustomerID'));"
## ..$ : chr "output.put('Month',input.get('Month'));"
## ..$ : chr "output.put('ProductType',input.get('ProductType'));"
## ..$ : chr "output.put('Nooftransactions',input.get('Nooftransactions'));"
## ..$ : chr "output.put('Accountopendate',input.get('Accountopendate'));"
## ..$ : chr "output.put('TransactionID',input.get('TransactionID'));"
## ..$ : chr "output.put('Date',input.get('Date'));"
## ..$ : chr "output.put('accountmonth',input.get('accountmonth'));"
## ..$ : chr "output.put('transtendertype',input.get('transtendertype'));"
## ..$ : chr "output.put('CreditDebitIndicator',input.get('CreditDebitIndicator'));"
## ..$ : chr "output.put('TotalTransactions',input.get('TotalTransactions'));"
## ..$ : chr "output.put('TransactionAmount',input.get('TransactionAmount'));"
## ..$ : chr "output.put('Balance',input.get('Balance'));"
## ..$ : chr "output.put('risklevel',input.get('risklevel'));"
## ..$ : chr "output.put('CreditCardMonthlyExpenditure',input.get('CreditCardMonthlyExpenditure'));"
## ..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
## ..$ : chr "output.put(\"Rule1\",result);"
## ..$ : chr "output.put('Rule1Value',result);"
## ..$ : chr "end"
## List of 1
## $ :List of 31
## ..$ : chr "import java.util.HashMap"
## ..$ : chr "import java.lang.Double"
## ..$ : chr "global java.util.HashMap output"
## ..$ : chr ""
## ..$ : chr " dialect \"mvel\""
## ..$ : chr "rule \"Rule2\""
## ..$ : chr " salience 0"
## ..$ : chr " when"
## ..$ : chr " input: HashMap()"
## ..$ :List of 1
## .. ..$ : chr "result: Double()\n from accumulate($condition:HashMap(AccountID==input.get(\"Acco"| __truncated__
## ..$ : chr "then"
## ..$ : chr "output.put('AccountID',input.get('AccountID'));"
## ..$ : chr "output.put('CustomerID',input.get('CustomerID'));"
## ..$ : chr "output.put('Month',input.get('Month'));"
## ..$ : chr "output.put('ProductType',input.get('ProductType'));"
## ..$ : chr "output.put('Nooftransactions',input.get('Nooftransactions'));"
## ..$ : chr "output.put('Accountopendate',input.get('Accountopendate'));"
## ..$ : chr "output.put('TransactionID',input.get('TransactionID'));"
## ..$ : chr "output.put('Date',input.get('Date'));"
## ..$ : chr "output.put('accountmonth',input.get('accountmonth'));"
## ..$ : chr "output.put('transtendertype',input.get('transtendertype'));"
## ..$ : chr "output.put('CreditDebitIndicator',input.get('CreditDebitIndicator'));"
## ..$ : chr "output.put('TotalTransactions',input.get('TotalTransactions'));"
## ..$ : chr "output.put('TransactionAmount',input.get('TransactionAmount'));"
## ..$ : chr "output.put('Balance',input.get('Balance'));"
## ..$ : chr "output.put('risklevel',input.get('risklevel'));"
## ..$ : chr "output.put('CreditCardMonthlyExpenditure',input.get('CreditCardMonthlyExpenditure'));"
## ..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
## ..$ : chr "output.put(\"Rule2\",result>=10000);"
## ..$ : chr "output.put('Rule2Value',result);"
## ..$ : chr "end"
## List of 1
## $ :List of 31
## ..$ : chr "import java.util.HashMap"
## ..$ : chr "import java.lang.Double"
## ..$ : chr "global java.util.HashMap output"
## ..$ : chr ""
## ..$ : chr " dialect \"mvel\""
## ..$ : chr "rule \"Rule3\""
## ..$ : chr " salience 0"
## ..$ : chr " when"
## ..$ : chr " input: HashMap()"
## ..$ : chr "result: Double()\n from accumulate($condition:HashMap(),max(Double.valueOf($condi"| __truncated__
## ..$ : chr "then"
## ..$ : chr "output.put('AccountID',input.get('AccountID'));"
## ..$ : chr "output.put('CustomerID',input.get('CustomerID'));"
## ..$ : chr "output.put('Month',input.get('Month'));"
## ..$ : chr "output.put('ProductType',input.get('ProductType'));"
## ..$ : chr "output.put('Nooftransactions',input.get('Nooftransactions'));"
## ..$ : chr "output.put('Accountopendate',input.get('Accountopendate'));"
## ..$ : chr "output.put('TransactionID',input.get('TransactionID'));"
## ..$ : chr "output.put('Date',input.get('Date'));"
## ..$ : chr "output.put('accountmonth',input.get('accountmonth'));"
## ..$ : chr "output.put('transtendertype',input.get('transtendertype'));"
## ..$ : chr "output.put('CreditDebitIndicator',input.get('CreditDebitIndicator'));"
## ..$ : chr "output.put('TotalTransactions',input.get('TotalTransactions'));"
## ..$ : chr "output.put('TransactionAmount',input.get('TransactionAmount'));"
## ..$ : chr "output.put('Balance',input.get('Balance'));"
## ..$ : chr "output.put('risklevel',input.get('risklevel'));"
## ..$ : chr "output.put('CreditCardMonthlyExpenditure',input.get('CreditCardMonthlyExpenditure'));"
## ..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
## ..$ : chr "output.put(\"Rule3\",result>5);"
## ..$ : chr "output.put('Rule3Value',result);"
## ..$ : chr "end"
## List of 1
## $ :List of 31
## ..$ : chr "import java.util.HashMap"
## ..$ : chr "import java.lang.Double"
## ..$ : chr "global java.util.HashMap output"
## ..$ : chr ""
## ..$ : chr " dialect \"mvel\""
## ..$ : chr "rule \"Rule4\""
## ..$ : chr " salience 0"
## ..$ : chr " when"
## ..$ : chr " input: HashMap()"
## ..$ : chr "result: Double()\n from accumulate($condition:HashMap(),average(Double.valueOf($c"| __truncated__
## ..$ : chr "then"
## ..$ : chr "output.put('AccountID',input.get('AccountID'));"
## ..$ : chr "output.put('CustomerID',input.get('CustomerID'));"
## ..$ : chr "output.put('Month',input.get('Month'));"
## ..$ : chr "output.put('ProductType',input.get('ProductType'));"
## ..$ : chr "output.put('Nooftransactions',input.get('Nooftransactions'));"
## ..$ : chr "output.put('Accountopendate',input.get('Accountopendate'));"
## ..$ : chr "output.put('TransactionID',input.get('TransactionID'));"
## ..$ : chr "output.put('Date',input.get('Date'));"
## ..$ : chr "output.put('accountmonth',input.get('accountmonth'));"
## ..$ : chr "output.put('transtendertype',input.get('transtendertype'));"
## ..$ : chr "output.put('CreditDebitIndicator',input.get('CreditDebitIndicator'));"
## ..$ : chr "output.put('TotalTransactions',input.get('TotalTransactions'));"
## ..$ : chr "output.put('TransactionAmount',input.get('TransactionAmount'));"
## ..$ : chr "output.put('Balance',input.get('Balance'));"
## ..$ : chr "output.put('risklevel',input.get('risklevel'));"
## ..$ : chr "output.put('CreditCardMonthlyExpenditure',input.get('CreditCardMonthlyExpenditure'));"
## ..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
## ..$ : chr "output.put(\"Rule4\",result<1400);"
## ..$ : chr "output.put('Rule4Value',result);"
## ..$ : chr "end"
## List of 1
## $ :List of 31
## ..$ : chr "import java.util.HashMap"
## ..$ : chr "import java.lang.Double"
## ..$ : chr "global java.util.HashMap output"
## ..$ : chr ""
## ..$ : chr " dialect \"mvel\""
## ..$ : chr "rule \"Rule5\""
## ..$ : chr " salience 0"
## ..$ : chr " when"
## ..$ : chr " input: HashMap()"
## ..$ : chr "result: Double()\n from accumulate($condition:HashMap(AccountID == input.get(\""| __truncated__
## ..$ : chr "then"
## ..$ : chr "output.put('AccountID',input.get('AccountID'));"
## ..$ : chr "output.put('CustomerID',input.get('CustomerID'));"
## ..$ : chr "output.put('Month',input.get('Month'));"
## ..$ : chr "output.put('ProductType',input.get('ProductType'));"
## ..$ : chr "output.put('Nooftransactions',input.get('Nooftransactions'));"
## ..$ : chr "output.put('Accountopendate',input.get('Accountopendate'));"
## ..$ : chr "output.put('TransactionID',input.get('TransactionID'));"
## ..$ : chr "output.put('Date',input.get('Date'));"
## ..$ : chr "output.put('accountmonth',input.get('accountmonth'));"
## ..$ : chr "output.put('transtendertype',input.get('transtendertype'));"
## ..$ : chr "output.put('CreditDebitIndicator',input.get('CreditDebitIndicator'));"
## ..$ : chr "output.put('TotalTransactions',input.get('TotalTransactions'));"
## ..$ : chr "output.put('TransactionAmount',input.get('TransactionAmount'));"
## ..$ : chr "output.put('Balance',input.get('Balance'));"
## ..$ : chr "output.put('risklevel',input.get('risklevel'));"
## ..$ : chr "output.put('CreditCardMonthlyExpenditure',input.get('CreditCardMonthlyExpenditure'));"
## ..$ : chr "output.put('rowNumber',input.get('rowNumber'));"
## ..$ : chr "output.put(\"Rule5\",result>=40000);"
## ..$ : chr "output.put('Rule5Value',result);"
## ..$ : chr "end"
length(transactionDataOutput)
[1] 5
str(transactionDataOutput[[5]]) #Rule 5 output
List of 3
$ input :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 7 variables:
..$ Filters : chr "Date > '2017-05-01' && CreditDebitIndicator == 'Debit'"
..$ GroupBy : chr "AccountID"
..$ Column : chr "TransactionAmount"
..$ Function : chr "sum"
..$ Operation: chr ">="
..$ Argument : chr "40000"
..$ ruleNum : int 5
$ intermediateOutput:Classes 'tbl_df', 'tbl' and 'data.frame': 11 obs. of 3 variables:
..$ AccountID : chr [1:11] "1300 41463086" "3077 81314800" "3256 22875398" "3335 81433260" ...
..$ Rule5 : chr [1:11] "true" "true" "true" "false" ...
..$ Rule5Value: num [1:11] 129946 122968 118012 32400 281931 ...
$ output :Classes 'tbl_df', 'tbl' and 'data.frame': 11 obs. of 3 variables:
..$ Group : chr [1:11] "1300 41463086" "3077 81314800" "3256 22875398" "3335 81433260" ...
..$ Indices: chr [1:11] "78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,1"| __truncated__ "311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338"| __truncated__ "434,435,436" "438,439,440" ...
..$ IsTrue : chr [1:11] "true" "true" "true" "false" ...
Let us take the results obtained for Rule5 to understand the applications of Rdrools. Rule 5 was
\[\textsf{For a fraudulent/ anomalous account, the maximum of Transaction_Amount } \\ \textsf{should be greater than or equal to USD 40,000 for all the debit transactions done after 2017-05-01}\]
The output has three objects:
The distribution of points i.e, the Account_ID that are true or false is shown in the graph below. In this case, the true values can be called as Anomalous Account_IDs and the points that are false are Non-Anomalous Account_IDs.
anomaliesCountGraph<-plotgraphs(result=transactionDataOutput, plotName="Plot of points distribution")
anomaliesCountGraph[[5]][[5]]
The above graph shows that there are 4 anomalous Account_IDs which satisfy the rule given and 7 Account_IDs that are non-anomalous.
anomalousSetGraph<-plotgraphs(result=transactionDataOutput, plotName="Plot of groups")
anomalousSetGraph[[5]][[5]]
The above graph gives more information about the anomalous Account_IDs. The graph shows the sum of Transaction_Amount for each anomalous Account_ID