Is there a data structure / library in the memory OLAP / pivot table in Java / Scala?
Related issues
This question is very relevant, but 2 years old: in memory OLAP engine in Java
background
I want to create a PivotTable like a table in memory from a given table dataset
For example, age by marital status (row is age and column is marital status)
>Input: personnel list, containing age and some Boolean attributes (e.g. married), > required output: number of people, age (row) and married (column)
What did I try (Scala)
case class Person(val age:Int,val isMarried:Boolean) ... val people:List[Person] = ... // val peopleByAge = people.groupBy(_.age) //only by age val peopleByMaritalStatus = people.groupBy(_.isMarried) //only by marital status
I tried to do this in a naive way, first grouped by age, then counted according to marital status, output the results, and then I foldright summarized
TreeMap(peopleByAge.toSeq: _*).map(x => { val age = x._1 val rows = x._2 val numMarried = rows.count(_.isMarried()) val numNotMarried = rows.length - numMarried (age,numMarried,numNotMarried) }).foldRight(List[FinalResult]())(row,list) => { val cumMarried = row._2+ (if (list.isEmpty) 0 else list.last.cumMarried) val cumNotMarried = row._3 + (if (list.isEmpty) 0 else l.last.cumNotMarried) list :+ new FinalResult(row._1,row._2,row._3,cumMarried,cumNotMarried) }.reverse
I don't like the above code. It's not efficient and difficult to read. I'm sure there's a better way
problem
How do I groupby "two"? And how to count each subgroup, for example
Another question is, how can I make a total to answer this question:
Edit:
Thank you for all your good answers
For clarity, I want the output to include a "table" with the following columns
>Age (in ascending order) > num married > num unmarried > fully married > total number of unmarried
Answer not only these specific questions, but also make a report that can answer all such questions
Solution
This is a more verbose option, but it is in a generic way rather than using strict data types You can certainly use generics to make this better, but I think you see
/** Creates a new pivot structure by finding correlated values * and performing an operation on these values * * @param accuOp the accumulator function (e.g. sum,max,etc) * @param xCol the "x" axis column * @param yCol the "y" axis column * @param accuCol the column to collect and perform accuOp on * @return a new Pivot instance that has been transformed with the accuOp function */ def doPivot(accuOp: List[String] => String)(xCol: String,yCol: String,accuCol: String) = { // create list of indexes that correlate to x,y,accuCol val colsIdx = List(xCol,yCol,accuCol).map(headers.getOrElse(_,1)) // group by x and y,sending the resulting collection of // accumulated values to the accuOp function for post-processing val data = body.groupBy(row => { (row(colsIdx(0)),row(colsIdx(1))) }).map(g => { (g._1,accuOp(g._2.map(_(colsIdx(2))))) }).toMap // get distinct axis values val xAxis = data.map(g => {g._1._1}).toList.distinct val yAxis = data.map(g => {g._1._2}).toList.distinct // create result matrix val newRows = yAxis.map(y => { xAxis.map(x => { data.getOrElse((x,y),"") }) }) // collect it with axis labels for results Pivot(List((yCol + "/" + xCol) +: xAxis) ::: newRows.zip(yAxis).map(x=> {x._2 +: x._1})) }
My pivot type is very basic:
class Pivot(val rows: List[List[String]]) { val headers = rows.head.zipWithIndex.toMap val body = rows.tail ... }
To test it, you can do this:
val marriedP = Pivot( List( List("Name","Age","Married"),List("Bill","42","TRUE"),List("Heloise","47",List("Thelma","34","FALSE"),List("Bridget",List("Robert",List("Eddie","TRUE") ) ) def accum(values: List[String]) = { values.map(x => {1}).sum.toString } println(marriedP + "\n") println(marriedP.doPivot(accum)("Age","Married","Married"))
Output:
Name Age Married Bill 42 TRUE Heloise 47 TRUE Thelma 34 FALSE Bridget 47 TRUE Robert 42 FALSE Eddie 42 TRUE Married/Age 47 42 34 TRUE 2 2 FALSE 1 1
The advantage is that you can use any function that passes values, just like in a traditional excel PivotTable
More information can be found here: https://github.com/vinsonizer/pivotfun