summary refs log tree commit diff
path: root/bin/man1/dehtml.1
diff options
context:
space:
mode:
authorJune McEnroe <june@causal.agency>2021-09-07 16:53:43 -0400
committerJune McEnroe <june@causal.agency>2021-09-07 16:53:43 -0400
commita64fd47f6196f769f19a205885a8ca5a4a0388c5 (patch)
treee7f362c7ec69a84d0bd431fab2b874cbc8115192 /bin/man1/dehtml.1
parentShow about path in page title (diff)
downloadsrc-a64fd47f6196f769f19a205885a8ca5a4a0388c5.tar.gz
src-a64fd47f6196f769f19a205885a8ca5a4a0388c5.zip
Add dehtml
Diffstat (limited to '')
-rw-r--r--bin/man1/dehtml.135
1 files changed, 35 insertions, 0 deletions
diff --git a/bin/man1/dehtml.1 b/bin/man1/dehtml.1
new file mode 100644
index 00000000..a0c5a8c4
--- /dev/null
+++ b/bin/man1/dehtml.1
@@ -0,0 +1,35 @@
+.Dd September  7, 2021
+.Dt DEHTML 1
+.Os
+.
+.Sh NAME
+.Nm dehtml
+.Nd extract text from HTML
+.
+.Sh SYNOPSIS
+.Nm
+.Op Fl s
+.Op Ar
+.
+.Sh DESCRIPTION
+The
+.Nm
+utility extracts text
+from HTML documents.
+Text inside
+.Sy <title> ,
+.Sy <style>
+and
+.Sy <script>
+tags is discarded.
+Numeric and common named HTML entities
+are converted.
+.
+.Pp
+The arguments are as follows:
+.Bl -tag -width Ds
+.It Fl s
+Collapse whitespace outside of
+.Sy <pre>
+tags.
+.El