summary refs log tree commit diff
path: root/bin/man1/dehtml.1
blob: c55c35d4543f4b1c729e311e3e7c5abd8ea8e7d5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
.Dd September  7, 2021
.Dt DEHTML 1
.Os
.
.Sh NAME
.Nm dehtml
.Nd extract text from HTML
.
.Sh SYNOPSIS
.Nm
.Op Fl s
.Op Ar
.
.Sh DESCRIPTION
The
.Nm
utility extracts text
from HTML documents.
Text inside
.Sy <title> ,
.Sy <style>
and
.Sy <script>
tags is discarded.
Numeric and common named HTML entities
are converted.
.
.Pp
The arguments are as follows:
.Bl -tag -width Ds
.It Fl s
Collapse whitespace outside of
.Sy <pre>
tags.
.El
.
.Sh BUGS
There is no way to extract image alt text.
ng lines with line numbers affects where the first tab indent ends up relative to the text above it. Not sure if it's worth fixing somehow. 2019-02-17Always split spans after newlinesJune McEnroe Simplifies ANSI and IRC output code, and prepares for line numbered output. 2019-02-15Color format specifiers light cyan in vimJune McEnroe 2019-02-15Highlight Interp as yellowJune McEnroe 2019-02-15Highlight strings in sh command substitutionsJune McEnroe 2019-02-15Add nmap gpJune McEnroe 2019-02-14Avoid newline when copying URL to pasteboardJune McEnroe 2019-02-13Add forgotten "sixth" book of H2G2June McEnroe