Read and write Parquet from and to Protobuf
Using the original Java Parquet library, you can read and write parquet to and from Protbuf. Parquet4s has custom
functions in its API, which could be leveraged for that. However, Parquet Protobuf can only be used with Java models, not to mention other issues that make it hard to use, especially in Scala. You would prefer to use ScalaPB in Scala projects, right? Thanks to Parquet4S, you can! Import ScalaPB extension to any Parquet4S project, either it is Akka / Pekko, FS2 or plain Scala:
"com.github.mjakubowski84" %% "parquet4s-scalapb" % "2.20.0"
Follow the ScalaPB documentation to generate your Scala model from .proto
files.
Then, import Parquet4S type classes tailored for Protobuf. The rest of the code stays the same as in regular Parquet4S - no matter if that is Akka / Pekko, FS2 or core!
import com.github.mjakubowski84.parquet4s.ScalaPBImplicits._
import com.github.mjakubowski84.parquet4s.{ParquetReader, ParquetWriter, Path}
import scala.util.Using
case class GeneratedProtobufData(someField: Int)
val data: Iterable[GeneratedProtobufData] = ??? // your data // your data
val path: Path = ??? // path to write to / to read from // path to write to / to read from
// write
ParquetWriter.of[GeneratedProtobufData].writeAndClose(path.append("data.parquet"), data)
// read
Using.resource(ParquetReader.as[GeneratedProtobufData].read(path))(_.foreach(println))
Please follow the examples to learn more.