sort、uniq、cut、tr、wc – Linux 技术杂谈

1、sort（排序）

sort [OPTION]... [FILE]...

选项说明：
  -r：倒序 
  -n：按数字排序 
  -t：指定分隔符(默认空格) 
  -k：指定第几列，指定几列几字符（指定1,1  3.1,3.3）

# 测试文件
[root@test ~]# cat test.txt 
e:11
a:2
c:3
p:5
d:1
f:7

# 使用 sort 对上面文件进行排序
[root@test ~]# sort test.txt 
a:2
c:3
d:1
e:11
f:7
p:5

# 结果是按第一列字母进行排序的
# 可以使用 -t 指定分隔符, 使用 -k 指定需要排序的列。
[root@test ~]# sort -t ":" -k2 test.txt 
d:1
e:11
a:2
c:3
p:5
f:7

# 结果第2列2行为11，显然不是我们要的结果。
# 因为按照排序的方式，只会根据第一个字符进行排序，11的第一个字符是1，按照字符来排序确实比2小。 
# 如果想要按照数字的方式进行排序, 需要使用 -n 选项。
[root@test ~]# sort -t ":" -k2 -n test.txt 
d:1
a:2
c:3
p:5
f:7
e:11

2、uniq（去重）

如果文件中有多行完全相同的内容，我们期望删除重复的行并统计出完全相同的行出现的总次数，那么就可以组合使用 uniq 和 sort 解决这个问题

uniq [OPTION]... [INPUT [OUTPUT]]

选项说明：
  -c：计算重复的行

# 测试文件
[root@test ~]# cat test.txt 
abc
123
abc
123
efg
efg

# 单独使用，并没有达成我们的期望。可以明显看出重复的行要连续在一起才能达成我们的期望，所以就需要用到 sort。
[root@test ~]# cat test.txt | uniq -c
      1 abc
      1 123
      1 abc
      1 123
      2 efg

# uniq 和 sort 组合使用。
# 先使用 sort 排序, 让重复行连续在一起，再使用 uniq 去除相邻重复的行并统计
[root@test ~]# cat test.txt | sort | uniq -c
      2 123
      2 abc
      2 efg

3、cut（截取字段）

cut命令用于按列提取文本字符。

cut OPTION... [FILE]...

选项说明：
  -d：指定分隔符 
  -f：数字，取第几列。例如“–f3,6”，表示取第3列和第6列。 
  -c：按字符取(空格也算)

# 提取 /etc/passwd 文件中用户名和其对应的Shell解释器
[root@test ~]# cut -d ":" -f 1,7 /etc/passwd
root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
......

# 按字符截取
[root@test ~]# echo "abcdefg" > test.txt
[root@test ~]# cut -c1,3 test.txt 
ac
[root@test ~]# cut -c1-3 test.txt
abc
[root@test ~]# cut -c1- test.txt    # 截取第一个及以后的所有字符
abcdefg
[root@test ~]# cut -c-3 test.txt    # 截取前3个字符
abc

4、tr（替换）

tr命令用于替换文本文件中的字符。

tr [OPTION]... SET1 [SET2]

选项说明：
  -c：反选设定字符。也就是符合 SET1 的部份不做处理，不符合的剩余部份才进行转换
  -d：删除指令字符

支持的常用转义字符：
  \\：反斜杠
  \b：退格键
  \f：换页
  \n：新行
  \r：回车
  \t：水平制表符

# 测试文件
[root@test ~]# cat test.txt 
abcd

# 替换a为1
[root@test ~]# cat test.txt | tr a 1
1bcd

# 小写替换成大写
[root@test ~]# cat test.txt | tr [a-z] [A-Z]
ABCD

# 列转行
[root@test ~]# cat test.txt 
a
b
c
d
[root@test ~]# cat test.txt |tr "\n" " " | sed 's/$/\n/'
a b c d 
[root@test ~]# cat test.txt |tr "\n" " "| sed 's/$/\n/' | tr -d " "
abcd

5、wc（统计行号）

wc [OPTION]... [FILE]...

选项说明：
  -l：显示文件行数 
  -c：显示文件字节 
  -w：显示文件单词

# 统计 /etc/passwd 文件有多少行
[root@test ~]# wc -l /etc/passwd    
18 /etc/passwd

# 其它统计行号方法
[root@test ~]# grep -n ".*" /etc/passwd  | tail -1 | cut -d ":" -f1
18
[root@test ~]# awk '{print NR}' /etc/passwd | tail -1
18
[root@test ~]# cat -n /etc/passwd | tail -n1 | cut -f1 | tr -d " "
18

5、示例

以下是一段nginx日志，请统计每个ip的访问次数并排序。

[root@test ~]# cat nginx-access.log 
101.132.175.77 - - [22/Feb/2021:11:47:41 +0800] "GET / HTTP/1.1" 302 145 "-" "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)" "-"
101.132.175.77 - - [22/Feb/2021:11:47:41 +0800] "GET / HTTP/1.1" 200 9409 "-" "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)" "-"
47.98.103.245 - - [22/Feb/2021:11:59:04 +0800] "GET / HTTP/1.1" 200 9409 "-" "Mozilla/5.0 (Android Mobile Safari) android mobile safari" "-"
3.122.195.44 - - [22/Feb/2021:12:09:24 +0800] "GET /.git/HEAD HTTP/1.1" 302 145 "-" "curl/7.61.1" "-"
3.122.195.44 - - [22/Feb/2021:12:09:25 +0800] "GET /.git/HEAD HTTP/1.1" 404 27413 "-" "curl/7.61.1" "-"
101.132.175.77 - - [22/Feb/2021:11:47:41 +0800] "GET / HTTP/1.1" 302 145 "-" "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)" "-"
3.122.195.44 - - [22/Feb/2021:12:09:24 +0800] "GET /.git/HEAD HTTP/1.1" 302 145 "-" "curl/7.61.1" "-"
47.98.103.245 - - [22/Feb/2021:11:59:04 +0800] "GET / HTTP/1.1" 200 9409 "-" "Mozilla/5.0 (Android Mobile Safari) android mobile safari" "-"
3.122.195.44 - - [22/Feb/2021:12:09:24 +0800] "GET /.git/HEAD HTTP/1.1" 302 145 "-" "curl/7.61.1" "-"

[root@test ~]# cat nginx-access.log | cut -d " " -f1 | sort | uniq -c | sort -nr
      4 3.122.195.44
      3 101.132.175.77
      2 47.98.103.245

1、sort（排序）

2、uniq（去重）

3、cut（截取字段）

4、tr（替换）

5、wc（统计行号）

5、示例

你可能也喜欢

netstat

ps

rsync

发表回复 取消回复

发表回复取消回复