• Aucun résultat trouvé

HTML MPRI 2.26.2: Web Data Management

N/A
N/A
Protected

Academic year: 2022

Partager "HTML MPRI 2.26.2: Web Data Management"

Copied!
28
0
0

Texte intégral

(1)

HTML

MPRI 2.26.2: Web Data Management

Antoine Amarilli Friday, December 7th

(2)

General presentation

• HyperText Markup Language

• Describes aWeb page(not the only kind of Web content...)

• Normalized by theW3C(industry+academia) andWHATWG

• Language withtags, giving thestructureandcontentof the document

• Forpresentation, we will see CSS

• Fordynamic behavior, we will see JavaScript

• Main version:HTML5 (also older versions, XHTML, etc.)

• Official W3C standard: HTML5 specification (548 pages, 2014)

• Now also aliving standard(WHATWG)

(3)

Table of Contents

General notions

Structure

Text, lists, tables Multimedia Forms

(4)

Markup principles

• Opening(<em>) andclosing(</em>) tags:

Here is an <em>example</em>

• Consecutive spacesare ignored:

Doesn't matter!

• Self-closingtags, e.g.,<br>(line break).

(Or<br/>in XHTML.)

• Attributes:

<p>A <a href="https://example.com/">link</a>.</p>

• Comments:

<!-- not shown -->

(5)

Nesting

• Tags should be opened and closed in thecorrect order

<a><b></b></a>and not<a><b></a></b>

• Rulesabout which tags can go where (in theory at least)

• Obvioustree interpretation(see next class about XML)

<table>

<tr>

<th>A1</th>

<th>B1</th>

</tr>

<tr>

<td>A2</td>

<td>B2</td>

</tr>

</table>

table tr

td B2 td B1 tr

th A2 th A1

(6)

General structure

• DOCTYPEto indicate which HTMLversionis used (here, HTML5)

• Optionallangattribute (language)

<!DOCTYPE html>

<html lang="fr">

<head>

<!-- meta-information -->

</head>

<body>

<!-- main document contents -->

</body>

(7)

Header

• The<head>tag containsmetadata:

<head>

<meta charset="utf-8">

<title>Title of the page</title>

<meta name="description" content="blah">

<!-- ... -->

</head>

• Indicate thecharacter encoding(e.g., UTF-8)

• Indicate thetitle

• Other information for search engines, caching

• AlsoscriptsandCSS(see later)

(8)

Tag soup

• HTML was not historically written byprogrammersso documents in the wild are oftenbroken

• Web browsers arevery resilient

• Validatorsallow you to check if your markup is correct (e.g., validator.w3.org)

(9)

Table of Contents

General notions Structure

Text, lists, tables Multimedia Forms

(10)

Text

• Text should go inparagraphsdelimited by<p> ... </p>

• Line breakwith<br>

• Tostresssome content, use:

<em> ... </em>

<strong> ... </strong>

<mark> ... </mark>

(11)

Titles and structure

• Titles: from<h1>(main title) to<h6>(lowest title)

• Other tags to indicate thepage structure(semantically):

<header>

<footer>

<nav>: navigation element

<article>: e.g., on a blog

<main>: main page content

<dialog>: dialog box

(12)

Lists

• A

• B

• C

<ul>

<li>A</li>

<li>B</li>

<li>C</li>

</ul>

1. A 2. B 3. C

<ol>

<li>A</li>

<li>B</li>

<li>C</li>

</ol>

X A Y B Z C

<dl>

<dt>X</dt> <dd>A</dd>

<dt>Y</dt> <dd>B</dd>

<dt>Z</dt> <dd>C</dd>

</dl>

(13)

Tables

<table>

<tr>

<th>X1</th>

<th>X2</th>

</tr>

<tr>

<td>A1</td>

<td>A2</td>

</tr>

<tr>

<td>B1</td>

<td>B2</td>

</tr>

</table>

X1 X2 A1 A2 B1 B2

(14)

Hypertext links (hence HTTP, HTML)

<a href="https://www.telecom-paristech.fr/">Telecom</a>

The URL can beabsolute(as above) orrelative On the pagehttps://example.com/foo/bar.html

Link Resolution

/quux https://example.com/quux

.. https://example.com/

bar2.html https://example.com/foo/bar2.html

baz/toto.html https://example.com/foo/baz/toto.html

#top https://example.com/foo/bar.html#top

• Thefragment#topwill make the browserscrollto the element

(15)

iFrames

• Displayanother pagein the current page:

<iframe src="https://en.wikipedia.org/">

<p>Sorry, your browser does not support iFrames.</p>

</iframe>

• Not veryflexibleandconfusingfor the user, plussecurityissues

(16)

Table of Contents

General notions Structure

Text, lists, tables Multimedia Forms

(17)

Images

<img src="logo_telecom.png" alt="Logo Telecom">

• srcgives theURLof the image (absolute or relative)

• altindicates a text to replace the image (required)

if the image is missing

for vision-impaired users

for text browsers, robots, mobile browsers with images disabled, etc.

• To get the image, the browser will make anew query, possibly to a completely differentserver

• Mainimage formats: JPEG, PNG, GIF, SVG, WebP

(18)

Sound and video

<audio src="audio.ogg">

<p>No audio support, you can

<a href="audio.ogg">download audio.ogg</a>.</p>

</audio>

• Can beautoplayed, showcontrols, etc.

• Video is similar, with<video>

• Mainsound formats: MP3, Opus

• Mainvideo formats:H.264, WebM

• Videos require largebandwidth!

Host them on a platform likeYoutube

(19)

Table of Contents

General notions Structure

Text, lists, tables Multimedia Forms

(20)

Basics

• General structure of a form:

<form action="action.php" method="get">

<!-- form contents go here -->

<input type="submit">

</form>

<input> Control (here, a button to submit the form) action URL to which we shouldsubmit

method TheHTTP methodto use when submitting enctype Theencodingfor the form data

→ When submitting the form, the browser will query theaction

(21)

Back to HTTP

• GET: the data is given in thepath:

GET action.php?first=Jean&last=Dupont HTTP/1.1 Host: example.com

• POSTandapplication/x-www-form-urlencoded(default) : ditto but the data is in therequest body:

POST action.php HTTP/1.1 Host: example.com

Content-Type: application/x-www-form-urlencoded first=Jean&last=Dupont

(22)

Back to HTTP (cont’d)

• POSTandmultipart/form-data: less concise but more efficient when the data is large:

POST action.php HTTP/1.1 Host: example.com

Content-Type: multipart/form-data; boundary=--e06 ----e06

Content-Disposition: form-data; name="first"

Jean ----e06

Content-Disposition: form-data; name="last"

(23)

Text fields

<form action="action.php" method="get">

<label for="firstname">First name</label>

<input name="first" id="firstname" type="text"><br>

<label for="lastname">Last name</label>

<input name="last" id="lastname" type="text">

<input type="submit">

</form>

• The<label>elementdescribesthe field, itsforattribute points to theidattribute of the field

• Thenameattributes indicates thenamein the HTTP query:

POST action.php HTTP/1.1 Host: example.com

Content-Type: application/x-www-form-urlencoded

(24)

Some attributes of a text field

placeholder Help textindicated when the field is empty value Indicate thedefault value

required Must be filled in to submit the form type Several kinds

→ hidden(not visible)

→ password(stars)

→ email

→ range(withminandmax),tel,number,color...

pattern Check with aregular expression(also in JS)

→ Inall cases, client constraints (pattern,required,hidden, etc.)

(25)

Uploading files

• Attaching a file to the form:

<label for="thefile">File to submit</label>

<input type="file" name="thefile" id="thefile">

• Mustusepostandmultipart/form-data!

• Themaximal sizeshould be imposed on the server

• The expectedMIME typecan be specified but should be revalidated

(26)

Other controls

• <input type="submit">: Button to submit the form

• <textarea>: Multiline text area

• Radio buttons (type="radio"), checkboxes (type="checkbox"), dropdown lists<select>...

(27)

Shorthand

Variousshorthandsto avoid writing HTML documents, e.g.,Markdown

# My title

This is a *document*

with `monospace`

and here is a list:

* One

* Two

and here is a link:

[A](https://a3nm.net/)

<h1>My title</h1>

<p>This is a <em>document</em>

with <code>monospace</code>

and here is a list:</p>

<ul>

<li>One</li>

<li>Two</li>

</ul>

<p>and here is a link: <a href="https://a3nm.net/">A</a>

(28)

Crédits

• Matériel de cours inspiré de notes par Pierre Senellart

• Merci à Marc Jeanmougin, Antonin Delpeuch et Pierre Senellart pour leur relecture

Références

Documents relatifs

Firefox Gecko, and (work-in-progress) Servo, using Rust Safari WebKit engine. Chrome Blink (fork of Webkit, in

• CSS3 makes it possible to position elements in two dimensions using CSS Grid. • Use display: grid on the element which will serve as

SVG+JS Javascript manipulation of vector graphics WebGL Javascript API for accelerated 3D using a GPU. WebVR Describe scenes for virtual reality headsets with a-scene WebRTC Voice

DOM (Document Object Model) gives an API for the tree representation of HTML and XML documents (independent from the programming language). Schema languages Specify the structure

• YAML: extends JSON with several features, e.g., explicit types, user-defined types, anchors and references. • Some JSON parsers are permissive and allow,

• While inside an element we can check that the word of its children satisfies the deterministic regular expression used to define it. • Can be implemented with a

descendant::C[@att1='1'] is a step which denotes all the Element nodes named C, descendant of the context node, having an Attribute node att1 with value

• Information extraction: creating structured information out of existing Web Data