• Aucun résultat trouvé

Server-Side Technologies MPRI 2.26.2: Web Data Management

N/A
N/A
Protected

Academic year: 2022

Partager "Server-Side Technologies MPRI 2.26.2: Web Data Management"

Copied!
36
0
0

Texte intégral

(1)

Server-Side Technologies

MPRI 2.26.2: Web Data Management

Antoine Amarilli Friday, December 7th

1/36

(2)

Servers

Handle the query(simple case) orroute it to another program (complex case)

Apache Free and open-source, released in 1995 IIS Provided with Windows, proprietary

nginx High performance, free and open-source (but commercial Plus version), released in 2002 GWS Google Web Server, internal

lighttpd Lightweight alternative to Apache

(3)

Market share

3/36

(4)

Simple static websites

• Each resource is stored in afileon disk:

/var/www/page.html,/var/www/style.css...

• Pages are organized as ahierarchyof folders

• The requestedpathscorrespond to the folders

GET /a/b.htmlcorresponds to/var/www/a/b.html

• If adirectoryis queried:

• Serveindex.htmlif it exists

• Otherwise, serve alistof the files in the folder

(5)

Apache extensions

• Parameter Apache (and other servers) with.htaccessfiles:

deny from allto block clients from accessing a directory

HTTP Basic authentication

• URL rewriting:

RewriteRule (.*\.png) /images/$1

• Server Side Includes:

<!–#include virtual="/footer.html" –>

5/36

(6)

Logging and statistics

Traditional Web serverslogqueries (NCSA common log format):

• IP addressof the client

• Date and time

• First lineof the HTTP query (includes the path)

• HTTP status code

• Sizeof the response

• Often:User-AgentandReferer

208.115.113.88 - - [22/Jan/2012:06:27:00 +0100]

(7)

What do logs divulge

• Most visitedweb pages

• Number ofunique visitors

• Time spenton website and paths (but complicated by tabs)

• Browser market share, most commonbots

• Geographical originof visitors (IP)

• Linksto the website, search engineterms

7/36

(8)

Modern logging

• The historical way is to process these logs directly

• Also: PHP scripts that will do their own logging (e.g.,Matomo)

• Also: Javascript analytics, e.g.,Google Analytics

• Also: third-party info, e.g.,Google Search Console

(9)

Table of Contents

Web servers

Server-side languages

Frameworks Practical aspects

9/36

(10)

CGI

• Historical way: the Web server calls anexternal program

• The program is given theparametersof the query

• The query result is what the programreturns

• Drawback:it’s heavy to create one process per query:

FastCGIand other such mechanisms, or

Integratethe programming language to the server, e.g., PHP

(11)

PHP

• Released in 1995 and used byhundreds of millionsof websites1

• Fromdirty hacks(Personal Home Page) to afulllanguage

• Addedto HTML pages and run by the server

<ul>

<?php

$from = intval($_POST['from']);

$to = intval($_POST['to']);

for ($i = $from; $i < $to; $i++) { echo "<li>$i</li>";

}

?>

</ul>

1https://secure.php.net/usage.php

11/36

(12)

Drawbacks of PHP

• Not initially designed as acompleteprogramming language

• Historically encouraged badsecuritypractices

e.g., making$_POST[’from’]available as$from

• Interpretedlanguage so bad performance

Now,virtual machineswith JIT compilation (HHVM by Facebook)

(13)

Other server-side languages

ASP.NET Microsoft

ColdFusion Adobe, commercial and proprietary

JSP Integrating Java and a Web server (e.g., Apache Tomcat) node.js Chrome’s JavaScript engine (V8) plus a Web server Python Web frameworks:Django, CherryPy, Flask

Ruby Web frameworks:Ruby on Rails, Sinatra

13/36

(14)

Databases

• SQL databases:MySQL, PostgreSQL, or proprietary solutions

• NoSQL databases(e.g., MongoDB) for documents, graphs, key-value pairs, triples, etc.

• Distributed databases

→ SeePierre’s class

(15)

Table of Contents

Web servers

Server-side languages

Frameworks

Practical aspects

15/36

(16)

Frameworks

• Set offunctionsandtools, organized around alanguage, for Web applications

• AJAX integration, Javascript code generation...

• MVC:

Model Thestructureof application data and how to manipulate it

View Thepresentationof data seen on the client Controller Thecontrolof the interaction between the model

(17)

Templates again

• Templatesfor HTML pages with instantiablefields

• Example(with Jinja2), note that it isprocedural(for):

<h1>Results for "{{ query }}"</h1>

<ul>

{% for object in results %}

<li>

<a href="details/{{ object.id }}">

{{ object.name }}

</a>

</li>

{% endfor %}

</ul>

17/36

(18)

URL routing

• Routing depending on thepathandmethod

• Example(with Flask):

@app.route('/') def index():

pass # Prepare the index page

@app.route('/message/<int:message_id>') def message(message_id):

pass # Prepare the display of message <message_id>

(19)

CMS

• Content Management System

• Allows users to design websiteswithout programming:

• Edit pages with arich text editoror shorthand (Markdown, etc.)

• File hosting

• User management

• Predefined themes

• Differentkinds:

Wikis MediaWiki, MoinMoin, PmWiki...

Forums phpBB, PunBB, Phorum, vBulletin...

Blogs WordPress, Movable Type, Drupal, Blogger...

QA (like StackOverflow): Shapado, OSQA, AskBot Shops Magento, PrestaShop...

• Otherhosted services: Weebly, Wix, etc.

19/36

(20)

Market share

0 5 10 15 20 25

(21)

node.js

• TheMEANplatform:

MongoDB(see Pierre’s class)

Express.js(minimal framework for Node.js)

Angular

Node.js

• Advantage:Isomorphic JS: same code on the client and server:

Idea:first run the code on theserverto compute the view

Then:send the complete Javascript code in background and run it on the client

• Otheradvantagesfor code reuse (e.g., validating input)

• Package manager for libraries:npm(oryarn); alsobower

21/36

(22)

Meteor

• Integrated solution on top ofnode.js

• Database everywhere:Have apartial cacheof the database on the client which is transparently synchronized with the server

More efficient and simpler to code

Access rules to limit what the client can edit

• Latency compensation:optimistically performchangeson the client and then sync them with the server in the background

• Sessionandusermanagement

(23)

Javascript tooling

• Javascriptminification andobfuscation

Make it reversible for debugging withsource maps

• Linting

• Documentation:JSDoc, likeDoxygen

• Packing together code and dependencies:webpack

• Elimination ofdead codeortree shaking

• Transpiling to other Javascript variants, e.g.,Babel,Google Closure Tools

• Task running system:grunt,gulp

23/36

(24)

Identifying server technologies

Client technologies are easy to identify (up to JS minification and obfuscation), but server technologies are harder to spot

• whois: database which (often) has information about the owner of a domain name

• Geolocation of servers based on their IPs

• tracerouteto know thenetwork pathto a host

• Scanning tools (nmap) to find machines and fingerprint their OS

• Serverheader (may be missing or wrong)

• Format for cookies, session identifiers...

(25)

Table of Contents

Web servers

Server-side languages Frameworks

Practical aspects

25/36

(26)

How to host your website

• You can use ahosted solution, e.g., Wordpress, Weebly (simplest)

• ISPs often propose hosting withlimited space, sometimes PHP/MySQL

• You can rent aVPSordedicated serverto run your own (a few EUR/month)

• You can host a websitein your homebehind an optic fiber connection with a fixed IP address

• You can host a web applicationon the cloud(serverless

(27)

Hosting infrastructure

• Server: machine which is always on and answers Web queries

• Datacenter: building containing servers with a good Internet connection, reliable electricity, air conditioning, physical security

• VPS:Virtual Private Server: a virtual machine that pretends to be a true machine

• Cloud: easy way to rent machines at a large scale

Can adapt the number of rented machines depending on load

• CDN:Content delivery network, acts as a proxy

27/36

(28)

How to self-host your website

• Rent adomain name(around 15 EUR/year)

• Have aserver(your own machine, or a VPS or dedicated server)

• ConfigureSSHto connect to the machine and set it up

• Install a Web server, your framework, your CMS, etc.

(29)

Concrete example: Wikimedia

• wikipedia.org,5th most visited Websitein 2018.2

• Not-for-profit charity, around300 employeesin 2018

• 81.9 millionUSD in revenue in 2016 (mostly donations)

• Technical costs in 2016: a fewmillionUSD

• Aboutone thousandservers in total (2013)

2http://www.alexa.com/topsites

29/36

(30)

General statistics

• 137 115active users onen.wikipedia.org3

• Around1000edits per minute in total4

• 16 billionpage views per month on all projects5

• Around7 000per second on average with peaks at50 0006

• 825 millionunique devices per month on the English Wikipedia7

3

(31)

General infrastructure

• Datacenters:

Main site: Ashburn, Virginia (Equinix)

For Europe (network and cache),Amsterdam(EvoSwitch, Kennisnet)

Cache: San Francisco (United Layer), Singapore (Equinix)

Other sites: Dallas, Chicago

Backup datacenter: Carrollton, Texas (CyrusOne)

• Dellservers runningUbuntu8and Debian

• puppetto manage the server configuration

• Monitoring software: Icinga, Grafana

grafana.wikimedia.org

status.wikimedia.org

8https://insights.ubuntu.com/2010/10/04/

wikimedia-chooses-ubuntu-for-all-of-its-servers/

31/36

(32)

Main tasks (as of 2013)

• Wiki management software:MediaWiki, in PHP

• Apacheserver, using HHVM9

192machines (in Ashburn)

• Database:MariaDB

54database machines

10storage machines with 12 hard drives of 2 TB in RAID10

• Distributed file system:Ceph(previouslySwift)

12servers

• Asynchronous task servers (NoSQL databaseRedis)

(33)

Caches (as of 2013)

• Squid: 40 machines

8machines for multimedia

32machines for text

• Varnish

8machines

• Cache invalidationwith MediaWiki

• Memcachedbetween MediaWiki and the database

16machines

→ 90%of trafficonly uses the cacheand not Apache.10

10https://blog.wikimedia.org/2013/01/19/

wikimedia-sites-move-to-primary-data-center-in-ashburn-virginia/

33/36

(34)

Other services (as of 2013)

• SSL termination proxies withnginx:

9machines

• Load balancingwith LVS (Linux Virtual Server):

6machines

• Indexationfor the search function:

Lucene 25 machines Solr 3 machines

• Media fileresizing: Images 8 machines

Videos 2 machines

(35)

Example of a rack (2015)

Image credits:https://meta.wikimedia.org/wiki/File:Wikimedia_Foundation_Servers_2015- 90.jpg

35/36

(36)

Crédits

• Matériel de cours inspiré de notes par Pierre Senellart

• Merci à Pierre Senellart pour sa relecture

• Transparent 3: chiffres Netcrafthttps:

//news.netcraft.com/archives/category/web-server-survey, graphe arichnad, licence CC-BY-SA, cf

https://commons.wikimedia.org/wiki/File:

Usage_share_of_web_servers_(Source_Netcraft).svg

Références

Documents relatifs

In this paper we present a novel approach towards ephemeral Web personalization consisting in a client- side semantic user model built by aggregating RDF data encountered by the user

This paper describes the process of developing and using visualisation techniques to disseminate site usage information to non-technical stakeholders, in order to identify

Firefox Gecko, and (work-in-progress) Servo, using Rust Safari WebKit engine. Chrome Blink (fork of Webkit, in

placeholder Help text indicated when the field is empty value Indicate the default value. required Must be filled in to submit the form type

• CSS3 makes it possible to position elements in two dimensions using CSS Grid. • Use display: grid on the element which will serve as

SVG+JS Javascript manipulation of vector graphics WebGL Javascript API for accelerated 3D using a GPU. WebVR Describe scenes for virtual reality headsets with a-scene WebRTC Voice

• Information extraction: creating structured information out of existing Web Data

  Des pages HTML avec du code JScript ou VBScript pour accéder aux composants DCOM côté serveur.. INF347 - Java côté serveur-V3.0