How to View ORC File Contents using the Java ORC Tools Jar
Suppose we want to print the data of an ORC file to validate its contents.
The process is quite simple with Java ORC Tools.
First things first. Let’s check if Java is installed on our machine.
If Java is not installed, we’ll get an output like this:
'java' is not recognized as an internal or external command, operable program or batch file.
In this case, we’ll want to go through the steps to download Java, which can be found here.
Download the JAR
Let’s go to this repository of
orc-tools jar files: https://repo1.maven.org/maven2/org/apache/orc/orc-tools.
Select the latest version available, then download
Alternatively, if we know the version number already (e.g.
1.7.0), we can get the file from the CLI using
Use the JAR to view file contents
Suppose we’ve navigated to a directory with our the jar and the ORC file.
We can view the metadata of this file.
java -jar orc-tools-1.7.0-uber.jar meta orcfile
We can also view the contents of this file.
java -jar orc-tools-1.7.0-uber.jar data orcfile
We’ll be able to see all the rows represented as JSON documents.
Much better than reading binary.