summary refs log tree commit diff
path: root/bin/man1
diff options
context:
space:
mode:
Diffstat (limited to 'bin/man1')
-rw-r--r--bin/man1/dehtml.135
1 files changed, 35 insertions, 0 deletions
diff --git a/bin/man1/dehtml.1 b/bin/man1/dehtml.1
new file mode 100644
index 00000000..a0c5a8c4
--- /dev/null
+++ b/bin/man1/dehtml.1
@@ -0,0 +1,35 @@
+.Dd September  7, 2021
+.Dt DEHTML 1
+.Os
+.
+.Sh NAME
+.Nm dehtml
+.Nd extract text from HTML
+.
+.Sh SYNOPSIS
+.Nm
+.Op Fl s
+.Op Ar
+.
+.Sh DESCRIPTION
+The
+.Nm
+utility extracts text
+from HTML documents.
+Text inside
+.Sy <title> ,
+.Sy <style>
+and
+.Sy <script>
+tags is discarded.
+Numeric and common named HTML entities
+are converted.
+.
+.Pp
+The arguments are as follows:
+.Bl -tag -width Ds
+.It Fl s
+Collapse whitespace outside of
+.Sy <pre>
+tags.
+.El