Monday, 17 September 2012

[bash / find] How to list 10 largest jar files ?

 Sometimes you may need to list largest or smallest files of given type. Find seems to be a perfect tool to perform such tasks. Let's say I want to find 10 largest .jar files in my local maven repository. The repository contains 2498 jars.

gt ~/.m2/repository find . -name "*.jar" | wc -l                                                                                                      [1497] 
2498
This task can be completed in 4 simple steps:

1. Find all jar files:
gt ~/.m2/repository find . -type f -name "*.jar"                                                                                          [1499] 
./commons-pool/commons-pool/1.6/commons-pool-1.6.jar
./commons-pool/commons-pool/1.5.7/commons-pool-1.5.7.jar
./xalan/xalan/2.7.1/xalan-2.7.1.jar
./xalan/serializer/2.7.1/serializer-2.7.1.jar
./joda-time/joda-time/2.3/joda-time-2.3.jar
./joda-time/joda-time/2.1/joda-time-2.1.jar
./xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
...
-type f - means that find looks only for flat files
-name "*.jar" - looks for files which end with .jar

2. Print out the size of each file:
gt ~/.m2/repository find . -type f -name "*.jar" -exec du -h {} \;                                                                                    [1502] 
112K ./commons-pool/commons-pool/1.6/commons-pool-1.6.jar
100K ./commons-pool/commons-pool/1.5.7/commons-pool-1.5.7.jar
3,1M ./xalan/xalan/2.7.1/xalan-2.7.1.jar
272K ./xalan/serializer/2.7.1/serializer-2.7.1.jar
568K ./joda-time/joda-time/2.3/joda-time-2.3.jar
560K ./joda-time/joda-time/2.1/joda-time-2.1.jar
1,2M ./xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
1,4M ./xerces/xercesImpl/2.11.0/xercesImpl-2.11.0.jar
...
-exec du -h {} - executes du -h on each result ({} - result placeholder)

3. Sort files by size:
gt ~/.m2/repository find . -type f -name "*.jar" -exec du -h {} \; | sort -hr                                                                         [1506] 
46M ./org/glassfish/extras/glassfish-embedded-all/3.0.1/glassfish-embedded-all-3.0.1.jar
25M ./com/vaadin/vaadin-client-compiler-deps/1.0.2/vaadin-client-compiler-deps-1.0.2.jar
22M ./com/censored
21M ./com/liferay/portal/portal-impl/6.1.0/portal-impl-6.1.0.jar
20M ./org/robotframework/robotframework/2.8.3/robotframework-2.8.3.jar
16M ./com/vaadin/vaadin-client/7.1.0/vaadin-client-7.1.0.jar
...
In this step all the results returned by find are being piped to sort.
-r - reversed order
-h - human readable form

4. Show only 10 files:
gt ~/.m2/repository find . -type f -name "*.jar" -exec du -h {} \; | sort -hr | head -n 10                                                            [1507] 
46M ./org/glassfish/extras/glassfish-embedded-all/3.0.1/glassfish-embedded-all-3.0.1.jar
25M ./com/vaadin/vaadin-client-compiler-deps/1.0.2/vaadin-client-compiler-deps-1.0.2.jar
22M ./com/censored
21M ./com/liferay/portal/portal-impl/6.1.0/portal-impl-6.1.0.jar
20M ./org/robotframework/robotframework/2.8.3/robotframework-2.8.3.jar
16M ./com/vaadin/vaadin-client/7.1.0/vaadin-client-7.1.0.jar
16M ./com/censored
14M ./org/scala-lang/scala-compiler/2.11.5/scala-compiler-2.11.5.jar
14M ./org/scala-lang/scala-compiler/2.10.3/scala-compiler-2.10.3.jar
13M ./com/cenqua/clover/clover/3.1.2/clover-3.1.2.jar
I'm pretty sure that there's linux command which does the same but on the other hand this short example shows how powerful find is.