Evaluating espresso coffee quality by means of time-series feature engineering
Daniele Apiletti, Eliana Pastor, Riccardo Callà, Elena Baralis
Department of Control and Computer Engineering, Politecnico di Torino, Italy [email protected]
ABSTRACT
Espresso quality attracts the interest of many stakeholders: from consumers to local business activities, from coffee-machine ven- dors to international coffee industries. So far, it has been mostly addressed by means of human experts, electronic noses, and chem- ical approaches. The current work, instead, proposes a data- driven analysis exploiting time-series feature engineering. We an- alyze a real-world dataset of espresso brewing by professional coffee-making machines. The novelty of the proposed work is provided by the focus on the brewing time series, from which we propose to engineer features able to improve previous data-driven metrics determining the quality of the espresso. Thanks to the exploitation of the proposed features, better quality-evaluation predictions are achieved with respect to previous data-driven approaches that relied solely on metrics describing each brewing as a whole (e.g., average flow, total amount of water). Yet, the engineered features are simple to compute and add a very limited workload to the coffee-machine sensor-data collection device, hence being suitable for large-scale IoT installations on-board of professional coffee machines, such as those typically installed in consumer-oriented business activities, shops, and workplaces.
To the best of the authors’ knowledge, this is the first attempt to perform a data-driven analysis of real-world espresso-brewing time series. Presented results yield to three-fold improvements in classification accuracy of high-quality espresso coffees with respect to current data-driven approaches (from 30% to 100%), exploiting simple threshold-based quality evaluations, defined in the newly proposed feature space.
1 INTRODUCTION
Espresso is an almost syrupy beverage generated by a machine, typically using a motor-driven pump, forcing pressurized hot water through finely ground coffee. Each espresso shot in a bar can generate one or two cups of coffee, being called, respectively, single or double, and requiring proportional amounts of ground coffee.
Drinking espresso coffee is a ritual rooted in the pleasure of its taste. In some countries, such as Italy, where 97% of adults drink espresso daily [18], espresso quality is a main driver for consumers’ habits and a primary focus of coffee industries.
In 2018, each Italian had 2.2 daily espresso cups on average, i.e., 6 kg yearly, in one of the 150 thousand bars, with each bar using 1.2 kg of ground coffee daily to serve almost 200 coffees on average, and most of them were espresso, representing approxi- mately one third of a medium bar turnover [18].
According to common knowledge and online sources [12, 18], such as the Italian Espresso National Institute, a perfect espresso depends on different variables: (i) the coffee blend, (ii) the grinder
© 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed- ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- tribution 4.0 International (CC BY 4.0)
settings, i.e., the weight of coffee grounds and how fine it is ground; (iii) the espresso machine, with professional machine makers improving such technology over and over to promise the perfect espresso all the time; (iv) the barista, i.e., the human-in- the-loop preparing the espresso in the bar, from blend choice, to manual grinder settings, and to proper usage of the coffee machine and its brewing procedure.
In the current work, among the different quality-influencing variables, we focus on (i) coffee ground size, (ii) ground amount, and (iii) water pressure. Regarding the quality-evaluation vari- ables, we exploit the following common metrics as selected by domain experts and related works: (i) total extraction time, (ii) the total volume of coffee in cup, and (iii) the derived average flow of the extraction [5].
The ideal portion [12] of ground coffee for each cup is declared to be 7±0.5 g, while the water pressure should be 9±1 bar, the extraction time 25±5 s, and the volume in cup 25±5 ml.
The coffee ground derives from the process of coffee grinding from coffee beans. Small changes in the grind size can drastically affect the taste and the quality of the brewed espresso. In general, if the coffee is ground too coarse, the espresso can be under- extracted and less flavorful. On the other hand, too fine ground may result in an over-extracted and bitter coffee. The amount of ground itself impacts on quality, resulting in a too watery or bitter coffee. Water pressure must be set to brew the right coffee amount in a proper time, thus leading to the right flow rate determining an intense flavour.
The novelty of the proposed work is provided by the exploita- tion of the brewing time series, from which we propose to engi- neer features able to improve the standard data-driven metrics determining the quality of the espresso, i.e., extraction time, vol- ume, and flow (as the ratio of volume and time). The proposed features are applied on a real-world dataset where we show that they can provide better quality-evaluation predictions, by allow- ing to reduce the false positives, i.e., apparently good coffees, without any loss in true positives.
Since the engineered features are simple to compute and add a very limited workload to the coffee-machine sensor-data collec- tion device, they are also suitable for large-scale IoT installations on-board of professional coffee machines, such as those typically installed in consumer-oriented business activities, shops, and workplaces.
Presented results uncover insights into the espresso quality evaluation, its relationships with the main quality variables, lead- ing to positive impacts on both coffee consumers and coffee- making industries, respectively enjoying and providing more pleasure in drinking higher-quality espresso coffee.
The rest of the paper is structured as follows. Section 2 dis- cusses related works, Section 3 describes the dataset and the ex- perimental design, Section 4 introduces the time-series feature en- gineering algorithm, and Section 5 presents experimental results.
Finally, Section 6 draws conclusions and outlines future works.
2 RELATED WORK
Espresso quality assessment is traditionally performed with sen- sory analysis, the scientific discipline that statistically and ex- perimentally analyze reactions to stimuli perceived through the human senses (sight, smell, taste, touch and hearing). Sensory evaluation is however time-consuming and affected by subjec- tiveness and low-reproducibility due to the human component.
Considering these limitations, objective analysis as chemical techniques, electronic noises and data-driven approaces are com- monly exploited for coffee quality control. Different chemical techniques adopt Gas Chromatography (GC) and Mass Spec- troscopy (MS) analysis. Several works study the effect of external variables (e.g. water pressure, water temperature) or of coffee characteristics on the final espresso quality. Some works are fo- cused on the influence of water, as its composition, pressure [1], temperature [2] and of water pressure and temperature com- bined [6]. Others studies instead consider the impact of coffee features themselves, as the roasting conditions [19] or the type of coffee and roast combined [3].
However, GC and MS analysis often require a significant amount of time and human intervention. Many studies exploit Electronic Nose (EN) systems to overcome the complexity and cost of GS/MS techniques. An electronic nose is a device intended to mimic human olfaction. It consists of an array of chemical sensors for chemical detection and a pattern recognition system capable of identifying the specific components of an odor [11]. EN are frequently exploited for determining and discriminating cof- fee characteristics. Several works aim at determining the roasting degree [17], using PCA and Neural Networks (NN) coupled with GRNN, while others focus on distinguishing coffee blends, explot- ing both NN [15] and Support Vector Machines techniques [16].
EN systems are also used in conjunction with GS analysis, as in [14], to characterize roasting degree and coffee beans from different countries. The analysis in [20] studies espresso chemical attributes when the extraction time and grinding level are varied.
The work emphasizes the importance of the first 8 seconds of the espresso brew, because in this range the major amount of organic acids, solids and caffeine are extracted. This result confirms the relevance of analyzing the entire trend of coffee extractions to characterize their quality.
Finally, data-driven approaches can be applied for large-scale and real-time espresso quality assessment, exploiting Internet of Things (IoT) sensors in place of the more sensitive and unstable EN devices. Recently, a data-driven approach that exploits asso- ciation rule mining has been proposed to analyze the correlation of coffee-making machine parameters and espresso quality [5].
The work relies solely on metrics describing each espresso brew- ing as a whole (e.g., average flow, total amount of water). In the proposed work, instead, we focus on the brewing time series to fully characterize the coffee extractions.
Time series analysis is a popular and well-known approach in many application fields [10, 13], from physiological data [4] to energy and weather data [9]. However, in our work, we exploit a basic intuition on the time series trend and resort to feature engineering to avoid a direct analysis of the time series itself.
Feature engineering from time series has been extensively ad- dressed for different applications, as in [7] for industrial one in the context of IoT and Industry 4.0, or for pattern matching of technical patterns in financial applications [8].
With respect to the state of the art, the current work con- tributes by cleverly transferring known and simple time-series
feature engineering techniques into the espresso quality evalua- tion domain, leading to significant improvement in classification performance with respect to the state of the art. To the best of the authors’ knowledge, this is the first attempt to perform a data- driven analysis of real-world espresso-brewing time series, as until now the focus has been limited to whole-extraction metrics.
3 DATASET DESCRIPTION
The dataset under analysis consists of real-world espresso brew- ing data. Since the dataset is provided by a leading coffee com- pany, we cannot disclose exact details of the real-world settings (e.g., the coffee-machine maker and model, the precise location and name of the involved business activities). Each espresso extraction has been performed on professional coffee-making machines and the values of the quality-evaluation variables have been collected every 300 ms. In particular, our time series consist of the values of the amount of water at each time interval, as provided by flow-meter pulse counter, then deriving the instant flow rate (i.e., the ratio of the amount of water and the time).
Each extraction has been performed with specific values of the quality-influencing variables, hence allowing us to know the ground-truth labels of high-quality espresso coffees, i.e., those having all optimal settings for (i) coffee ground size, (ii) ground amount, and (iii) water pressure. An exhaustive set of coffees has been produced to observe the effect of non-optimal values on the espresso quality. For each quality-influencing variable, different values are considered: ground size can be coarse, optimal, or fine;
ground amount can be high, optimal, or low; brewing water pres- sure can be high, optimal, or low. All possible combinations of the three external-variable values (e.g., optimal, high, low) have been included in the dataset, hence generating 33= 27 possible input configurations. For each configuration among the 27 com- binations of external variables (for instance: coarse ground size, optimal ground amount, and high water pressure), 20 espresso ex- tractions have been performed. Experiments have been repeated on a professional coffee-making machine, generating a datasets consisting of 540 espresso extractions.
The domain-expert quality thresholds used in our experiments are as follows: espresso volume from 20–30 ml, extraction time from 20–30 s. The values have been selected according to public literature, e.g., those published by the Specialty Coffee Associa- tion of Europe [5, 12]. The flow rate thresholds derive from the above-mentioned ones, as the flow rate is the ratio of the volume by the time, hence obtaining the range 0.67–1.50 ml/s.
Given such thresholds, espresso extractions can be labelled with their quality assessment. Quality labels areoptimal, too lowor toohighfor each of the quality variables: volume, time, and flow. Table 1 recaps the domain-based threshold values and corresponding labels.
Table 1: Domain-based quality thresholds.
Quality Variable Low Optimal High extraction time (s) <20 [20–30] >30 volume (ml) <20 [20–30] >30 flow rate (ml/s) <0.67 [0.67–1.50] >1.50
The problem tackled by this work stems from the fact that analyzing the standard quality-evaluation variables without the additional time-series novel features, many false positives are
provided: some espresso extractions are characterized by high- quality values in terms of water amount, flow rate and extraction time, however, their ground size, ground amount or water pres- sure were not optimal (compensation effect [5]).
4 TIME-SERIES FEATURE ENGINEERING
Feature engineering refers to the process of extracting features from raw data. It is typically executed to improve the performance of predictive or classification models. In the current work, we exploit feature engineering to leverage the coffee-brewing time series with the aim of improving the espresso quality assessment.
For each coffee extraction, the time series of the flow-meter pulses is stored, with sampling time equal to 300 ms. Flow-meter pulses are firstly converted to quantity of brewed waterq, as follows:
q=nump∗pulseq
numc (1)
wherenumpis the number of pulses of the flow-meter,pulseq
represents the quantity of brewed water per pulse of the flow- meter andnumcrepresents the number of brewed coffees. In the experimental data under analysis,pulseq=0.5 ml, as given by the coffee-machine datasheet, andnumc=2, since two espresso cof- fees are brewed for each extraction. The time series captures the water quantity over time, hence the instant flow rate is known.
Figure 1 shows an example of a real time series from the dataset.
We notice a clear two-segment trend that is observable for any arbitrary extraction: a first steeper phase is followed by a second part having a lower flow rate. This phenomenon is known by domain experts. In the first, transient, phase of coffee brewing, water is forced in the coffee panel inside the filter holder, and coffee grounds do not slow the water flow yet. On the contrary, in the second phase, water penetrate and dampen coffee grounds yielding the actual coffee extraction.
We propose to extract the following new features to capture the two-fold behavior of the extraction. We firstly determine the point where a significant flow variation is observed. We refer to this point astrend point. The trend point is used to approxi- mate the water quantity time series as a polygonal chain. The approximate polygonal chain is constituted by two line segments that represent the two phases of the water flow and its vertex of intersection is the trend point. The trend point is estimated by considering the maximum variation of the slope average of the points in two consecutive not-overlapping sliding windows of sizeW. Theslopesi (orgradient) of two consecutive points pi =(ti,qi)andpj =(tj,qj)is computed as follows.
si =qj−qi
tj−ti (2)
In Equation 2,tis the time reference andqis the water quantity, and they represent the axes of Figure 1. The slopesdescribes the steepness of the water flow.
The procedure for the trend point estimation is reported in Algorithm 1.
The maximum variation of the slope and the corresponding points are initialized in Lines 1 and 2. In Lines 4 and 5, two con- secutive not-overlapping sliding windows of sizeWare defined.
Letwkbe a time window of sizeW. The slope averagewkme an of all consecutive points of the time window is computed as follows
wkme an = 1 W−1
W−1
Õ
j=1
qj−qj−1
tj−tj−1 (3)
0 5 10 15 20 25
Time(s) 0
25 50 75 100 125 150 175
Quantity of water (ml)
Figure 1: A real sample time series of the total water quan- tity of an espresso coffee brewing.
Algorithm 1:Trend point computation Result:Trend point
1maxt d =0.0;
2 pointmaxt d =(0.0,0.0);
3 fori=0toN−2Wdo
4 w1=ranдe(i,i+W);
5 w2=ranдe(i+W,i+2W);
6 w1me an =mean(compute_slopes(w1));
7 w2me an =mean(compute_slopes(w2));
8 trend_di f f =w2me an−w1me an;
9 maxt d,pointmaxt d =updateMax(trend_di f f);
10 end
11 trend_point=pointmaxt d;
12 returntrend_point
wherepj =(tj,qj)andpj−1=(tj−1,qj−1)are consecutive points of the time window.
The slope average is estimated for the two sliding windows, as reported in Lines 6 and 7. The two terms capture the average flow rate in the corresponding time window. The difference of the two slope averages is computed in Line 8. The maximum slope variation and the corresponding point are updated in Line 9.
The point of maximum variation corresponds to the intersec- tion point of the two considered sliding windows. The process is repeated until allN points of the time series are considered.
Finally, the trend point is returned (Line 12).
The trend pointpt p=(tt p,qt p)represents the intersect vertex of an approximate polygonal chain of the water quantity time series. It is exploited to compute two features that capture the two phases of the espresso extraction. Let bep0=(t0,q0)andpN = (tN,wqN)the first and last points of the time series, respectively.
We defines1ands2as follows.
s1=qt p−q0
tt p−t0 (4)
0 5 10 15 20 25 Time (s)
0 25 50 75 100 125 150 175
Quantity of water (ml)
Real Flow Average Flow Slope 1 Slope 2 Trend Point
Figure 2: Features engineered from the espresso extrac- tion time series with Trend Point, Slope 1, and Slope 2.
s2=qN−qt p
tN−tt p (5)
In Figure 2, the approximate polygonal chain of a coffee ex- traction time series is reported. The dashed line indicates the average water flow. The slopes1represents the average flow of the first phase of the espresso brewing while slopes2the average flow of the second phase. These two features are exploited in the analysis to better characterize the coffee extraction, providing additional information with respect to the overall average flow.
The extracted features will also be exploited to compute new ranges for the optimal quality parameters, hence improving the recognition of high-quality coffees.
5 EXPERIMENTAL RESULTS
This section provides a description of the data cleaning proce- dures applied to the dataset (Section 5.1), a discussion of the data characterization of the extracted features (Section 5.2), and their contribution to the espresso quality assessment improvement (Section 5.3).
5.1 Data cleaning
The dataset has been pre-processed by applying the data clean- ing steps described in [5]. The original dataset consists of 1080 coffees, corresponding to 540 extractions. Among them, 30 extrac- tions were missing the time series data due to low-level hardware issues. Domain-driven thresholds, aimed at removing values be- ing unacceptable for the phenomena under exam, lead to other 38 extractions to be discarded. As described in [5], domain-driven threshold values of valid espresso extractions have been set to 10–40 ml and 10–40 s, according to leading industrial domain experts. Finally, the statistical-based outlier removal approach of [5] removed 15 additional samples from the dataset. After the cleaning procedure, 457 extraction time series remain out of the 540 original records.
5.2 Data characterization
We firstly analyze the relationship between the extracted features and the quality-evaluation variables (i.e., total extraction time, average flow rate, total water amount). The trend point and the consequent slope values have been computed with a window size W set to 10.
The correlation analysis shows that slopes2is highly cor- related with the average flow rate (over the whole extraction), with a Pearson correlation coefficient equal to 0.95, and the total brewing time, with a correlation coefficient of -0.94. As expected, lower flow rates lead to longer extraction times, since the total amount of coffee is an almost constant goal of the coffee machine.
We then investigate the relationship between the two aver- age flows (i.e.s1ands2) and the three external quality-influencing variables: water pressure, coffee ground amount and coffee ground size, also known as grinding setting).
Figure 3 shows the pressure behavior with respect tos1ands2. The pressure values (low, optimal, and high) are represented by the label in the scatter plot. We can observe that coffee extractions in the (s1,s2) space are clearly divided in three macro-areas, determined bys1value. The central partition is characterized by an optimal pressure, while the first and last areas by low and high values of pressure respectively. Hence, the value of the external variable highly influence the first phase of coffee extractions, when water is forced into the coffee panel. To a low pressure corresponds a low water flow in the initial phase and vice versa for the high pressure. The flow in the second phase is instead almost independent from the pressure value.
Regarding the total amount of water, we report in Figure 4 the coffee extractions as a function ofs1ands2. Differently from the pressure-labeled scatter plot, it is not observable a sharp dis- tinction. We can however identify a relationship withs2. Higher amounts of coffee ground lead to lower values of the flows2. In this case, the average flow in the second phase of the extraction is hindered by the higher amount of coffee ground. Hence, the water flow is reduced. Likewise, the lower quantity of coffee ground facilitates the flow of water, with a consequent increase in flows2. The coffee ground amount, instead, do not influences1, since it captures the average flow of the water when it is forced in the coffee panel and before the coffee ground tampering.
Finally, we observe a similar behavior when considering the coffee ground size (i.e., grinding settings), hence we do not report the plot. A coarser grinding generally corresponds to a higher flow. The finer coffee grinding instead hinders the water flow.
This results in a lower flows2in the second phase of the coffee extraction.
5.3 Quality Evaluation
In this section, we evaluate the extracted feature ability to char- acterize espresso quality and to improve the detection of high- quality espresso coffees. All the three external variables are under the barista control. However, brew pressure is set at first in the espresso machine calibration phase and it is periodically checked and configured, typically with the support of technicians. On the other hand, the grinding settings and the amount of coffee ground are determined by the barista at each espresso brewing.
Hence, it is particularly relevant to control that these two exter- nal variables are set properly by the barista. In existing works, domain-experts and data-driven thresholds on quality indexes, such as espresso volume, extraction time and brewing flow rate, have been applied to evaluate coffee quality. The analysis in [5]
4.0 4.5 5.0 5.5 6.0 6.5 Slope 1
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Slope 2
Low Pressure Optimal Pressure High Pressure
Figure 3: Extractions in the proposed feature space, la- beled according to the water pressure value.
4.0 4.5 5.0 5.5 6.0 6.5
Slope 1 1.5
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Slope 2
Low amount erogations Optimal amount erogations High amount erogations
Figure 4: Extractions in the proposed feature space, la- beled according to the coffee ground amount.
explored the phenomena of compensating sub-optimal values of different external variables. A compensation effect is observ- able when configurations of values of external variables allow to achieve apparently high-quality coffees, in terms of quality indexes, despite one or more values are, in fact, not optimal. Inter- pretable exploration techniques highlighted that high amounts of coffee ground, that generally hinder the water flow and lead to long percolation times, could be compensated by a coarser grinding that, on the other hand, facilitates the flow [5]. Simi- larly, the low amounts of coffee ground could be compensated by a finer grinding. Despite the optimal quality-index values, the low amount of coffee has generally a negative impact on coffee intensity and body, and therefore on the final customer
experience, hence possibly affecting also the brand image of the coffee supplier. To this aim, we exploit the time-series features to better characterize the quality of espressos so that false high- quality coffees can be detected and, if not totally avoided, at least significantly reduced.
As a reference, we consider domain-driven thresholds on cof- fee quality indexes. In Figure 5 the espresso extractions with optimal values of quality indexes are reported in thes1ands2
space. They can be grouped as follows. (i) True high-quality ex- tractions present optimal values for both the quality-evaluation indexes and, in particular, for all external variables. (ii) False high-quality extractions present optimal quality-index values with respect to domain-expert thresholds, but at least an external variable has a sub-optimal value [5]. Such espresso extractions (ii) are the result of compensation effects.
We refer to true high-quality extractions asoptimal, and we characterize them as a function of the proposed time-series fea- turess1ands2. LetObe the set ofoptimalextractions{o1,o2, ...,oN}, where each pointoi ∈ Ois defined in terms ofs1ands2, i.e., oi =(oi_s1,oi_s2). We define novel quality thresholds for optimal extractionsTo_minandTo_max in the (s1,s2) space as follows:
To_min=(min(oi_s1),min(oi_s2)) (6) To_max=(max(oi_s1),max(oi_s2)) (7) Among the whole set of espresso extractionsE={e1,e2, ...,eM}, a generic sampleej = (ej_s1,ej_s2) ∈ Eis labeled as optimal e ∈ O, withO ⊆ E, if its values of flow rate (ej_s1,ej_s2) are within the thresholdsTo_minandTo_max.
In Figure 5 two rectangular areas are shown. The green area contains theoptimalextractions. Its boundaries are defined by the thresholdsTo_minandTo_max. The orange dashed area contains the false high-quality extractions, which current state-of-the- art solutions would (incorrectly) classify as high-quality coffees.
Exploiting the proposed thresholds in the new feature space, we can detect many false positives (orange squared points in the plot). Specifically, instead of assigning an optimal label to the overall 67 extractions (green and orange ones), we can correctly detect the 20 true optimal extractions (green ones), and we can discard 31 out of 47 false positives (orange ones). State of the art thresholds would lead to the same true positive detection (20 out of 67), while the proposed approach leads to a drastically better accuracy (76% instead of 30%) and precision of high-quality extractions (56% instead of 30%).
To drill down the analysis, we further distinguished two types of false positives, stemming from different compensation effects:
(i) low amount of coffee ground with fine grinding and (ii) high amount of coffee ground with coarse grinding. The former is less common, since very few baristas intentionally use higher amounts of coffee ground, being a cost for them. On the contrary, the latter is much more frequent, because it brings savings on coffee ground costs. For this reason, extractions affected by the latter are of greater interest.
In Figure 6 three areas are shown. The green one still contains the true optimal extractions, the blue one contains the extractions belonging to the first type of compensation and the orange one now contains only the extractions belonging to the second type of compensation. Again, exploiting thresholds in the new feature space, the target extractions can be correctly classified and the compensation effect can be detected. Results show that all 23 extractions from type-(ii) compensation can be correctly detected, besides 8 extractions out of 24 from type-(i) compensation, which means improving from 30% accuracy of data-driven state of the
4.0 4.5 5.0 5.5 6.0 6.5 Slope 1
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Slope 2
False high-quality extractions True optimal extractions
Figure 5: True optimal extractions and false high-quality extractions in the proposed feature space.
4.0 4.5 5.0 5.5 6.0 6.5
Slope 1 1.5
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Slope 2
(High amount - Coarse grinding) extractions (Low amount - Fine grinding) extractions True optimal extractions
Figure 6: Optimal extractions in the proposed feature space and false high-quality extractions due to different compensation effects.
art to 100% accuracy considering only true optimal and type- (ii) compensation extractions. To this aim, in our dataset, the new feature thresholds have been set as 5.19<s1 <5.48 and 2.64<s2<3.73.
6 CONCLUSIONS
This work presented a data-driven analysis of a real-world time- series dataset of espresso brewing by professional coffee-making machines. The proposed feature space, despite being simple and easy to compute, brought a large improvement in the classifica- tion accuracy of high-quality espresso with respect to current
state-of-the-art data-driven approaches: results yielded to three- fold improvements in accuracy, from 30% to 100%, with specific focus on currently misclassified extractions due to common com- pensation effects. The proposed methodology can be applied in similar contexts to improve current data-driven analyses of espresso quality.
Future works aim to widen the scope of the analysis includ- ing additional quality variables, definitely different models of professional coffee-making machines, diverse coffee blends, and environmental variables. Furthermore, we plan to apply cluster- ing techniques for determining the quality-index thresholds.
ACKNOWLEDGMENTS
This work is partially funded by the SmartData@PoliTO center.
REFERENCES
[1] S. Andueza, L. Maeztu, B. Dean, M. P. de Peña, J. Bello, and C. Cid. 2002.
Influence of Water Pressure on the Final Quality of Arabica Espresso Coffee.
Application of Multivariate Analysis.J. Agric. Food Chem50, 25 (2002), 7426–
7431. https://doi.org/10.1021/jf0206623 PMID: 12452670.
[2] S. Andueza, L. Maeztu, L. Pascual, C. Ibáñez, M Paz de Peña, and C. Cid. 2003.
Influence of extraction temperature on the final quality of espresso coffee.J.
Sci. Food Agric.83, 3 (2003), 240–248. https://doi.org/10.1002/jsfa.1304 [3] S. Andueza, M. A. Vila, M. Paz de Peña, and C. Cid. 2007. Influence of cof-
fee/water ratio on the final quality of espresso coffee.J. Sci. Food Agric.87, 4 (2007), 586–592. https://doi.org/10.1002/jsfa.2720
[4] D. Apiletti, E. Baralis, G. Bruno, and T. Cerquitelli. 2009. Real-time analysis of physiological data to support medical applications.IEEE Trans. Inf. Technol.
Biomed.13, 3 (2009), 313–321. https://doi.org/10.1109/TITB.2008.2010702 [5] D. Apiletti and E. Pastor. 2020. Correlating Espresso Quality with Coffee-
Machine Parameters by Means of Association Rule Mining.Electronics9, 1 (2020), 100.
[6] G. Caprioli, M. Cortese, G. Cristalli, F. Maggi, L. Odello, M. Ricciutelli, G.
Sagratini, V. Sirocchi, G. Tomassoni, and S. Vittori. 2012. Optimization of espresso machine parameters through the analysis of coffee odorants by HS- SPME–GC/MS.Food Chemistry135, 3 (2012), 1127 – 1133.
[7] M. Christ, A. W Kempa-Liehr, and M. Feindt. 2016. Distributed and parallel time series feature extraction for industrial big data applications.arXiv preprint arXiv:1610.07717(2016).
[8] T. Chung, F.and Fu, R. Luk, and V. Ng. 2001. Flexible time series pattern matching based on perceptually important points. (2001).
[9] E. Di Corso, T. Cerquitelli, and D. Apiletti. 2018. METATECH: METeorological data analysis for thermal energy characterization by means of self-learning transparent models.Energies11, 6 (2018). https://doi.org/10.3390/en11061336 [10] P. Esling and C. Agon. 2012. Time-series data mining.ACM Computing Surveys
(CSUR)45, 1 (2012), 1–34.
[11] J. W. Gardner and 1956 Bartlett, Philip N. 1999.Electronic noses : principles and applications. Oxford ; New York : Oxford University Press.
[12] Istituto Nazionale Espresso Italiano. [n.d.]. Espresso Italiano Certificato. http:
//www.espressoitaliano.org/files/File/istituzionale_inei_hq_en.pdf/. [Online;
accessed January-2020].
[13] E Keogh, S. Chu, D. Hart, and M. Pazzani. 2004. Segmenting time series: A survey and novel approach. InData mining in time series databases. World Scientific, 1–21.
[14] T. Michishita, M. Akiyama, Y. Hirano, M. Ikeda, Y. Sagara, and T. Araki. 2010.
Gas chromatography/olfactometry and electronic nose analyses of retronasal aroma of espresso and correlation with sensory evaluation by an artificial neural network.J. Food Sci.75, 9 (2010), S477–S489.
[15] M. Pardo, G. Niederjaufner, G. Benussi, E. Comini, G. Faglia, G. Sberveglieri, M. Holmberg, and I. Lundstrom. 2000. Data preprocessing enhances the classification of different brands of Espresso coffee with an electronic nose.
Sensors and Actuators B: Chemical69, 3 (2000), 397–403.
[16] M. Pardo and G. Sberveglieri. 2005. Classification of electronic nose data with support vector machines.Sensors and Actuators B: Chemical107, 2 (2005), 730 – 737. https://doi.org/10.1016/j.snb.2004.12.005
[17] S. Romani, C. Cevoli, A. Fabbri, L. Alessandrini, and M. Dalla Rosa. 2012.
Evaluation of coffee roasting degree by using electronic nose and artificial neural network for off-line quality control.J. Food Sci.77, 9 (2012), C960–C965.
[18] Rossi writes. [n.d.]. Coffee in Italy or 101 Facts about Ital- ian Coffee Culture. http://rossiwrites.com/italy/italy-for-foodies/
coffee-in-italy-italian-coffee-culture. [Online; accessed January-2020].
[19] S. Schenker, C. Heinemann, M. Huber, R. Pompizzi, R. Perren, and R Escher.
2002. Impact of Roasting Conditions on the Formation of Aroma Compounds in Coffee Beans.J. Food Sci.67, 1 (2002), 60–66.
[20] C. Severini, I. Ricci, M. Marone, A. Derossi, and T. De Pilli. 2015. Changes in the Aromatic Profile of Espresso Coffee as a Function of the Grinding Grade and Extraction Time: A Study by the Electronic Nose System.J. Agric. Food Chem 63, 8 (2015), 2321–2327. https://doi.org/10.1021/jf505691u PMID: 25665600.