summary refs log tree commit diff
path: root/bin/man1/dehtml.1
blob: c55c35d4543f4b1c729e311e3e7c5abd8ea8e7d5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
.Dd September  7, 2021
.Dt DEHTML 1
.Os
.
.Sh NAME
.Nm dehtml
.Nd extract text from HTML
.
.Sh SYNOPSIS
.Nm
.Op Fl s
.Op Ar
.
.Sh DESCRIPTION
The
.Nm
utility extracts text
from HTML documents.
Text inside
.Sy <title> ,
.Sy <style>
and
.Sy <script>
tags is discarded.
Numeric and common named HTML entities
are converted.
.
.Pp
The arguments are as follows:
.Bl -tag -width Ds
.It Fl s
Collapse whitespace outside of
.Sy <pre>
tags.
.El
.
.Sh BUGS
There is no way to extract image alt text.
4-01-12tests: add Valgrind supportJohn Keeping 2014-01-12cache: don't leave cache_slot fields uninitializedJohn Keeping 2014-01-10filter: split filter functions into their own fileJason A. Donenfeld 2014-01-10filter: make exit status localJason A. Donenfeld 2014-01-10parsing: fix header typoJason A. Donenfeld 2014-01-10cgit.c: Fix comment on bit mask hackLukas Fleischer 2014-01-10cgit.c: Use "else" for mutually exclusive branchesLukas Fleischer 2014-01-10ui-snapshot.c: Do not reinvent suffixcmp()Lukas Fleischer 2014-01-10Refactor cgit_parse_snapshots_mask()Lukas Fleischer 2014-01-10Disallow use of undocumented snapshot delimitersLukas Fleischer 2014-01-10Replace most uses of strncmp() with prefixcmp()Lukas Fleischer 2014-01-09README: Fix dependenciesLukas Fleischer 2014-01-08README: Spelling and formatting fixesLukas Fleischer 2014-01-08Fix UTF-8 with syntax-highlighting.pyPřemysl Janouch 2014-01-08Add a suggestion to the manpagePřemysl Janouch 2014-01-08Fix the example configurationPřemysl Janouch 2014-01-08Fix about-formatting.shPřemysl Janouch 2014-01-08Fix some spelling errorsPřemysl Janouch 2014-01-08filters: highlight.sh: add css comments for highlight 2.6 and 3.8Ferry Huberts