How To Remove Duplicate Lines While Maintaining Order in Linux
The most elegant solutions.
# Order not preserved (lines sorted) sort file.txt | uniq
# Display first occurrence awk '!v[$0]++' file.txt
# Display last occurrence tac file.txt | awk '!v[$0]++' | tac
Without Preserving Order
If order doesn’t matter, these are two options for removing duplicate lines.
sort file.txt | uniq sort -u file.txt
uniq only removes adjacent duplicate lines, which is why we
-u forces unique lines while sorting.
Given the following
111 222 222 111
We can either print the first or last occurrences of duplicates:
# First # Last 111 222 222 111
Print First Occurrence of Duplicates
cat -n file.txt | sort -uk2 | sort -nk1 | cut -f2-
cat -n adds an order number to each line in order to store the original order.
sort -uk2 sorts the lines in the second column (
-k2) and keep only first occurrence of duplicates (
sort -nk1 returns to original order by sorting the order numbers in the first column (
-k1) and treating the values as numbers (
cut -f2- prints only the second column, or field, which is the line itself
Another way to achieve this is to use
awk '!v[$0]++' file.txt
This command will use a dictionary (a.k.a. map, associative array)
v to store each line and their number of occurrences, or frequency, in the file so far.
!v[$0]++ will be run on every line in the file.
$0 holds the value of the current line being processed.
v[$0] checks for the number of occurrences of the current line so far.
!v[$0] returns true when
v[$0] == 0, or when the current line is not a duplicate. This is when the line is printed (the print statement is omitted for simplicity).
v[$0]++ will increment the frequency of the current line by one.
Print Last Occurence of Duplicates
In order to print the last occurence of the duplicate line, we can use
tac, which reverses the specified file.
tac file.txt > file1.txt; cat -n file1.txt | sort -uk2 | sort -nk1 | cut -f2- > file2.txt; tac file2.txt > file3.txt; cat file3.txt
tac file.txt | awk '!v[$0]++' | tac
Useful Tricks To Know
# Display only unique/duplicate lines sort file.txt | uniq -u # Unique sort file.txt | uniq -d # Duplicate
# Display number of duplicates per line sort file.txt | uniq -uc sort file.txt | uniq -dc
# Skip first 10 characters uniq -s 10 file.txt
# Compare first 10 characters uniq -w 10 file.txt
More CLI Articles
- How to Install Powerline in WSL2 Terminal
- How to Zip and Unzip Files in WSL2
- How to List All Git Aliases
- How to Add, Commit, and Push in One Git Command
- How to Replace a Local Branch with a Remote Branch in Git
- How to Revert a Specific File or Folder in Git
- How to Install Hugo on WSL2
- How to Install Node.js and npm on WSL2
- How to Revert to a Previous Commit in Git without Modifying History
- How to Merge a Branch Into Another Branch in Git
- How to Fix "xcrun error invalid active developer path" with Git on macOS
- How to Update Node to Another Version on Windows WSL
- How to Use SSH with GitHub (Instead of HTTPS) on Windows WSL
- How to Delete Files with a Specific Extension in all Subdirectories Recursively
- How to Suppress Output of npm install
- How to Pass Environment Variables to Makefiles in Subdirectories
- How to Access Environment Variables in a Makefile
- How to Reset Password on Ubuntu Linux
- How to Pull a Branch from Another User's Fork using Git
- How to Align GitHub README images in Markdown
- How to Merge Development and Production Branches in Git
- How to Quickly Push to Git with a Bash Script
- How to Change Author of Commit in Git History
- WSL2: How to Fix "Virtual hard disk files must be uncompressed and unencrypted and must not be sparse"
- How to Replace a Branch with Another Branch in Git
- How to Delete a Git Branch Locally and Remotely
- How to Schedule Recurring Deploys in Netlify
- How to View All Global Packages in npm
- How to Test an npm package locally
- How to Update an npm Package (Fixing "You cannot publish over the previously published versions")
- How to Undo Changes in a Single File in Git
- How to Fix the Netlify error: "Treating warnings as errors because process.env.CI = true"
- How to Set the Default Directory for Git Bash on Windows
- How to Search Past Terminal Commands in Linux
- How to Get the Count of Files with each File Extension in Linux
- How to Find All Files with a Specific File Extension in Linux
- How to Run Multiple Commands Simultaneously in Linux
- How to Watch a File in Linux using tail
- Linux Timestamps: The Difference Between atime, mtime, ctime, and crtime
- How To Count the Number of Non-Empty Output Lines in Linux
- How To Find All IP Addresses within a File in Linux with grep
- How To Rename Every File in all Subdirectories in Windows CMD