Is there a data structure / library in the memory OLAP / pivot table in Java / Scala?
Related issues
This question is very relevant, but 2 years old: in memory OLAP engine in Java
background
I want to create a PivotTable like a table in memory from a given table dataset
For example, age by marital status (row is age and column is marital status)
>Input: personnel list, containing age and some Boolean attributes (e.g. married), > required output: number of people, age (row) and married (column)
What did I try (Scala)
case class Person(val age:Int,val isMarried:Boolean) ... val people:List[Person] = ... // val peopleByAge = people.groupBy(_.age) //only by age val peopleByMaritalStatus = people.groupBy(_.isMarried) //only by marital status
I tried to do this in a naive way, first grouped by age, then counted according to marital status, output the results, and then I foldright summarized
TreeMap(peopleByAge.toSeq: _*).map(x => {
val age = x._1
val rows = x._2
val numMarried = rows.count(_.isMarried())
val numNotMarried = rows.length - numMarried
(age,numMarried,numNotMarried)
}).foldRight(List[FinalResult]())(row,list) => {
val cumMarried = row._2+
(if (list.isEmpty) 0 else list.last.cumMarried)
val cumNotMarried = row._3 +
(if (list.isEmpty) 0 else l.last.cumNotMarried)
list :+ new FinalResult(row._1,row._2,row._3,cumMarried,cumNotMarried)
}.reverse
I don't like the above code. It's not efficient and difficult to read. I'm sure there's a better way
problem
How do I groupby "two"? And how to count each subgroup, for example
Another question is, how can I make a total to answer this question:
Edit:
Thank you for all your good answers
For clarity, I want the output to include a "table" with the following columns
>Age (in ascending order) > num married > num unmarried > fully married > total number of unmarried
Answer not only these specific questions, but also make a report that can answer all such questions
Solution
This is a more verbose option, but it is in a generic way rather than using strict data types You can certainly use generics to make this better, but I think you see
/** Creates a new pivot structure by finding correlated values
* and performing an operation on these values
*
* @param accuOp the accumulator function (e.g. sum,max,etc)
* @param xCol the "x" axis column
* @param yCol the "y" axis column
* @param accuCol the column to collect and perform accuOp on
* @return a new Pivot instance that has been transformed with the accuOp function
*/
def doPivot(accuOp: List[String] => String)(xCol: String,yCol: String,accuCol: String) = {
// create list of indexes that correlate to x,y,accuCol
val colsIdx = List(xCol,yCol,accuCol).map(headers.getOrElse(_,1))
// group by x and y,sending the resulting collection of
// accumulated values to the accuOp function for post-processing
val data = body.groupBy(row => {
(row(colsIdx(0)),row(colsIdx(1)))
}).map(g => {
(g._1,accuOp(g._2.map(_(colsIdx(2)))))
}).toMap
// get distinct axis values
val xAxis = data.map(g => {g._1._1}).toList.distinct
val yAxis = data.map(g => {g._1._2}).toList.distinct
// create result matrix
val newRows = yAxis.map(y => {
xAxis.map(x => {
data.getOrElse((x,y),"")
})
})
// collect it with axis labels for results
Pivot(List((yCol + "/" + xCol) +: xAxis) :::
newRows.zip(yAxis).map(x=> {x._2 +: x._1}))
}
My pivot type is very basic:
class Pivot(val rows: List[List[String]]) {
val headers = rows.head.zipWithIndex.toMap
val body = rows.tail
...
}
To test it, you can do this:
val marriedP = Pivot(
List(
List("Name","Age","Married"),List("Bill","42","TRUE"),List("Heloise","47",List("Thelma","34","FALSE"),List("Bridget",List("Robert",List("Eddie","TRUE")
)
)
def accum(values: List[String]) = {
values.map(x => {1}).sum.toString
}
println(marriedP + "\n")
println(marriedP.doPivot(accum)("Age","Married","Married"))
Output:
Name Age Married Bill 42 TRUE Heloise 47 TRUE Thelma 34 FALSE Bridget 47 TRUE Robert 42 FALSE Eddie 42 TRUE Married/Age 47 42 34 TRUE 2 2 FALSE 1 1
The advantage is that you can use any function that passes values, just like in a traditional excel PivotTable
More information can be found here: https://github.com/vinsonizer/pivotfun
