如何在 Linux 上搜索多个 pdf 文件的内容？

2023-06-20 16:44:02 442

Linux中的pdfgrep命令用于过滤对一个PDF或多个PDF中特定字符模式的搜索。它是最常用的Linux实用程序命令之一，用于显示包含我们尝试搜索的模式的行。

通常，我们尝试在文件中搜索的模式称为正则表达式。

安装PDFgrep

对于Ubuntu/Fedora

sudo apt-get update -y

sudo apt-get install -y pdfgrep

对于CentOS

yum install pdfgrep

语法

pdfgrep [options...] pattern [files]

虽然我们有很多不同的选择，但最常用的是-

-c : counts the number of matches per input file.
-h : suppresses the prefixing of file name on output.
-i : Ignores, case for matching
-H : print the file name for each match
-n : prefix each match with the number of the page where it is found
-r : recursively search all files
-R : same as -r, but it also follows all symlinks.

现在，让我们考虑一种情况，我们希望在特定目录中的所有pdf文件中找到特定模式，例如dir1。

语法

pdfgrep -HiR "word" *

在上面的命令中，将“word”占位符替换为

为此，我们使用如下所示的命令-

pdfgrep -HiR "func main()" *

上面的命令将尝试main()在特定目录和子目录中的所有文件中查找字符串“func”。

输出结果

main.go:120:func main() {}

如果我们只想在单个目录中而不是在子目录中找到特定模式，那么我们需要使用如下所示的命令-

pdfgrep -i "func main()" *

在上面的命令中，我们使用了-s标志，这将帮助我们不对运行命令的目录中存在的每个子目录发出警告。

输出结果

main.go:120:func main() {}

我们可以使用的另一个命令是find命令。

命令

find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}"
--color "func main()"' \;

输出结果

./main.go:func main() {

如何在 Linux 上搜索多个 pdf 文件的内容？

安装PDFgrep

对于Ubuntu/Fedora

对于CentOS

语法

语法

命令

热门推荐

随机推荐