Is there a data structure / library in the memory OLAP / pivot table in Java / Scala?

Related issues

This question is very relevant, but 2 years old: in memory OLAP engine in Java

background

I want to create a PivotTable like a table in memory from a given table dataset

For example, age by marital status (row is age and column is marital status)

>Input: personnel list, containing age and some Boolean attributes (e.g. married), > required output: number of people, age (row) and married (column)

What did I try (Scala)

case class Person(val age:Int,val isMarried:Boolean)

...
val people:List[Person] = ... //

val peopleByAge = people.groupBy(_.age)  //only by age
val peopleByMaritalStatus = people.groupBy(_.isMarried) //only by marital status

I tried to do this in a naive way, first grouped by age, then counted according to marital status, output the results, and then I foldright summarized

TreeMap(peopleByAge.toSeq: _*).map(x => {
    val age = x._1
    val rows = x._2
    val numMarried = rows.count(_.isMarried())
    val numNotMarried = rows.length - numMarried
    (age,numMarried,numNotMarried)
}).foldRight(List[FinalResult]())(row,list) => {
     val cumMarried = row._2+ 
        (if (list.isEmpty) 0 else list.last.cumMarried) 
     val cumNotMarried = row._3 + 
        (if (list.isEmpty) 0 else l.last.cumNotMarried) 
     list :+ new FinalResult(row._1,row._2,row._3,cumMarried,cumNotMarried) 
}.reverse

I don't like the above code. It's not efficient and difficult to read. I'm sure there's a better way

problem

How do I groupby "two"? And how to count each subgroup, for example

Another question is, how can I make a total to answer this question:

Edit:

Thank you for all your good answers

For clarity, I want the output to include a "table" with the following columns

>Age (in ascending order) > num married > num unmarried > fully married > total number of unmarried

Answer not only these specific questions, but also make a report that can answer all such questions

Solution

This is a more verbose option, but it is in a generic way rather than using strict data types You can certainly use generics to make this better, but I think you see

/** Creates a new pivot structure by finding correlated values 
  * and performing an operation on these values
  *
  * @param accuOp the accumulator function (e.g. sum,max,etc)
  * @param xCol the "x" axis column
  * @param yCol the "y" axis column
  * @param accuCol the column to collect and perform accuOp on
  * @return a new Pivot instance that has been transformed with the accuOp function
  */
def doPivot(accuOp: List[String] => String)(xCol: String,yCol: String,accuCol: String) = {
  // create list of indexes that correlate to x,y,accuCol
  val colsIdx = List(xCol,yCol,accuCol).map(headers.getOrElse(_,1))

  // group by x and y,sending the resulting collection of
  // accumulated values to the accuOp function for post-processing
  val data = body.groupBy(row => {
    (row(colsIdx(0)),row(colsIdx(1)))
  }).map(g => {
    (g._1,accuOp(g._2.map(_(colsIdx(2)))))
  }).toMap

  // get distinct axis values
  val xAxis = data.map(g => {g._1._1}).toList.distinct
  val yAxis = data.map(g => {g._1._2}).toList.distinct

  // create result matrix
  val newRows = yAxis.map(y => {
    xAxis.map(x => {
      data.getOrElse((x,y),"")
    })
  })

 // collect it with axis labels for results
 Pivot(List((yCol + "/" + xCol) +: xAxis) :::
   newRows.zip(yAxis).map(x=> {x._2 +: x._1}))
}

My pivot type is very basic:

class Pivot(val rows: List[List[String]]) {

  val headers = rows.head.zipWithIndex.toMap
  val body    = rows.tail
  ...
}

To test it, you can do this:

val marriedP = Pivot(
  List(
    List("Name","Age","Married"),List("Bill","42","TRUE"),List("Heloise","47",List("Thelma","34","FALSE"),List("Bridget",List("Robert",List("Eddie","TRUE")

  )
)

def accum(values: List[String]) = {
    values.map(x => {1}).sum.toString
}
println(marriedP + "\n")
println(marriedP.doPivot(accum)("Age","Married","Married"))

Output:

Name     Age      Married  
Bill     42       TRUE     
Heloise  47       TRUE     
Thelma   34       FALSE    
Bridget  47       TRUE     
Robert   42       FALSE    
Eddie    42       TRUE     

Married/Age  47           42           34           
TRUE         2            2                         
FALSE                     1            1

The advantage is that you can use any function that passes values, just like in a traditional excel PivotTable

More information can be found here: https://github.com/vinsonizer/pivotfun

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>