IMPACT OF THE HARVESTING YEAR ON THE POSSIBLE AUTHENTICATION OF GEOGRAPHICAL ORIGIN OF GREEN COFFEA ARABICA USING PROFILE OF VOLATILES

Authentication of coffee is highly required. This study aimed to understand the relative abundance of volatiles of green coffee harvested during two years. Using volatiles (GC-MS) and Linear Discriminant Analysis, we focused on the geographical origin identification. We analyzed samples of green Coffea arabica from Africa, Central America, and South America, harvested in 2018 and the same samples harvested in 2019. A total of 215 different volatiles were detected. Based on their chemical structure and the functional chemical group, they were divided into categories: furan derivates, aldehydes, ketones, alcohols, organic acids, hydrocarbons (alkanes, alkenes, alkynes, aromatic hydrocarbons), terpenoids, heterocyclic compounds, nitriles, amines. Green Arabica contained mostly organic acids and esters, aldehydes, hydrocarbons, and alcohols. We observed significant differences in aromatic hydrocarbons and furan derivates by comparing the volatiles profiles of African coffee beans collected in 2018 and 2019. The profile of Central American samples (both years) was homogenous; thus, no significant differences were observed. The aroma profile of South American coffees had significant differences in aromatic hydrocarbons and alkanes (p-value < 0.05). Rao’s approximation and Bartlett’s test proved a significant difference between 3 continents by applying LDA. More than 94% of the variability between Africa, Central, and South America coffees harvested in 2018 was explained by organic acids and esters, alkenes, aldehydes, and ketones. By adding samples from 2019, LDA calculations reduced input parameters to aldehydes and ketones, organic acids and esters, alkenes, terpenoids, and aldehydes. These appear to be useful for geographical authentication regardless of the year of harvesting.


Determination of volatile compounds using gas chromatography (GS-MS)
Twenty grams of homogenized green coffee beans were put into 40 mL glass vials with septum Archon caps ptfe/sil. The samples was warmed up to 35 °C for 15 minutes in Metaltermoblock Liebisch Labortechnik. Self-sorption was performed with Fiber: Carboxen®/PDMS (CAR/ PDMS) 2 cm, time 30 minutes at 35 °C, followed by GC-MS analysis. Volatiles were determined according to method previously described in Sádecká et al., (2014) with modifications. The Agilent Technologies 6890 gas chromatograph (Agilent Technology, Palo Alto, USA) equipped with an Agilent Technologies 5973 selective inertial detector (MSD). Volatiles were separated using a J&W 122-7333 DB-WAXetr 30 m x 0.25 mm x 0.5 μm capillary column. The carrier gas was helium. The injector temperature was set to 250 °C. The oven temperature was programmed to be isothermal at 50 °C for 1 minute, then heated to 250 °C at a rate of 5 °C.min -1 . Input parameters were as follows: splitless mode, initial temperature: 250 °C, pressure: 88.9 kPa, flow rate: 20.0 mL.min -1 , cleaning time: 1.00 min, total flow rate: 24.6 mL.min -1 . Electron ionization (EI) was set to 70 eV. The transfer line temperature and ion source temperature were 280 °C. The mass spectrometer collected data in full scan mode. Identification was performed by comparing the target compounds' mass spectra and chromatography data with reference materials in the NIST 14 library.

Statistical Analysis
For the summarizing and describing of our results descriptive statistic was used. To discover any possible significant differences within selected geographical groups of green coffee beans, ANOVA Duncan test and REGWQ was used. This statistical analysis was performed using Microsoft Office Excel 365 for iOS. Linear Discriminant Analysis was used to create a model that would be able to determine the geographical origin of single bean green Coffea arabica (XL Stat for iOS, Addinsoft), and Tanagra (Lyon 2 University Berges du Rhone Campus).

Profile of volatiles in green coffee harvested in 2018
Obtained volatiles were divided into several categories based on their chemical composition and functional group. Selected categories were furan derivates, aldehydes, alcohols, organic acids and their esters, terpenoids, hydrocarbons (separately divided into alkanes, alkenes, alkynes, and aromatic hydrocarbons), heterocyclic compounds, nitriles, ketones, amines. African samples showed that in 2018 the most abundant compounds were alcohols, organic acids and their esters, aldehydes, terpenoids, and aromatic hydrocarbons. These observations are in accordance with Tsegay et al., (2019), who reported that the most abundant groups in green coffee are alcohols and organic acids. Among organic acids, we observed mainly acetic acid; hexanoic acid; 2-butenoic acid; 3methyl-; pentanoic acid, and ethyl acetate. These reached the highest concentrations in the group. The most abundant alcohols were ethanol, 1penthanol; 1-hexanol; maltol; and 2,3-butanediol. Given that furan is considered a potential carcinogen and strict legislation is applied, furan derivates were evaluated separately. Our samples from Africa showed a concentration range from 3. 26 -12.19%. This by far highest value was detected in a sample from Burundi, where furfural; 2-methyl furan; 3-methyl furan, and furan 2-methanol were abundant the most. On the other hand, furan derivates concentration of African samples harvested in 2019 showed significantly different values (p-value < 0.05) ranging from 3.26 to 5.61%, however, the highest concentration within the current year was measured again in the Burundi sample. The concentration of aromatic hydrocarbons was significantly higher in African coffees in 2019. In 2018, GC-MS detected mainly benzene, 1,3-dimethyl-; toluene; and o-xylene, however, only in minor concentrations. On the other hand, in 2019, this group reached 19.81%, containing mainly methylbenzene. Average concentrations of each chemical group are shown in Figure 2.

Volatiles profile of green African coffee harvested in 2018 and 2019
B A Within South America green coffee, we observed that none of the selected chemical groups showed a significant difference (p-value > 0.05) in observed harvesting years (Figure 3). Procida et al. 2020 focused their research on the volatiles profiling of green Arabica coffee from various continents, including South America. They observed that 4-methyl-2,3-dihydrofuran, n-hexanol, limonene and nonanal, appear involved in the characterization of the geographical origin of the analyzed samples. Apart from 4-methyl-2,3-dihydrofuran, these were also identified in our samples. However, a direct comparison cannot be made given mentioned study did not focus on the robustness of these substances regarding the geographical origin during two years of harvesting. The last observed group was Central America. Comparably to African samples, ANOVA -REGWQ shows significant differences (p-value < 0.05) in average values of aromatic hydrocarbons. In 2018, this group had the highest concentrations in Costa Rica (17.67%), mainly methylbenzene (11.19%). The lowest concentration of aromatic hydrocarbons was detected in Guatemala (0.30%). In 2019 Guatemala, showed relatively similar values (0.71%), but Costa Rica contained only 0.37%, and methylbenzene was measured in minor concentrations. Another significant difference was in alkanes. Samples from 2018 contained mainly pentane, 1-chloro-; butane, 1-chloro-, and undecane. Whereas samples from 2019 contained undecane, 5-ethyl-; decane; and cyclopropane, 1,1dimethyl. Average values of the group are shown in Figure 5. Importantly, all three selected geographical groups showed the same pattern in volatiles profile regardless the year of harvesting. The highest concentration range was measured within alcohols, organic acids and esters, and other groups reached relatively similar concentrations, which indicates that the profile of volatiles may be stable enough in order to be used further in combination with an advanced statistical approach to develop a statistical model suitable for geographical origin identification of green Coffea arabica. Firstly, by applying LDA on the profile of volatiles detected in samples harvested in 2018, we observed that Bartlett's test reached significant values (p-value < 0.05). The test represents the significance of the model and how well each function separates cases into groups and confirm that the within-class covariance matrices are different. Moreover, LDA proved that only two factors are needed to explain 100% of the variability between three geographical groups. Variables representation shows which initial variables (groups of volatiles) are correlated with these factors. From Figure 6 is obvious that 94.22% of the variability is explained by alcohols, organic acids, ketones, aldehydes, and alkenes. The rest of the variables correlates with the F2 factor that represents 5.78% of the observed variability. Furthermore, LDA performed in Tanagra accordingly evaluated the most significant input parameters to be organic acids and esters (p-value = 0.07), alkenes (p-value = 0.0144), and ketones (p-value = 0.00025).  These data confirm that the groups are discriminated, and continents are well separated, especially African samples are clearly separated. However, on the LDA map ( Figure 6) we see that Central and South American centroids tends to each other, which can be explained by the close geographical locality and similarity in climatic, environmental conditions, and altitude of growing areas.
To observe the harvesting year's impact on the possibility of identifying the geographical origin of green coffee, we added samples harvested in 2019 to the LDA model. In this case, Rao's approximation and Bartlett's test reached significance (p-value < 0.05). This indicated that the model could distinguish selected continents on a significant level. As in 2018, the most significant parameter among all were ketones (p-value < 0.05). Nevertheless, in 2019, organic acids, alcohols, and alkenes parameters did not reach the same level. These previously significant parameters were substituted by aldehydes (p-value = 0.045) and terpenoids (p-value = 0.048). This indicated that year of harvesting; its climate conditions, respectively, may affect the volatile profile of green Coffea arabica. Bertrand et al. (2012) also confirmed that climate change, which generally involves a substantial increase in average temperatures in mountainous tropical regions, could impact the quality of coffee aroma, thus volatiles. To quantify the model's functionality. Given the number of samples, the leave-oneout cross-validation was used. This process is repeated several times, always using different sample. As an output, the Confusion matrix for training samples, Crossvalidation of prior and posterior classification, and membership probabilities are calculated. The latter is a matrix that predicts the classification accuracy. In our study, the confusion matrix for training samples reached 100% accuracy of origin identification regardless of the harvesting year. According to the confusion matrix, LDA could identify all training samples in all three continents correctly. However, the Cross-validation of prior and posterior classification, and membership probabilities contained two miss-classified samples from Africa identified as South American coffee, one sample from South America identified as African coffee, and one misclassified sample from Central America identified as South American coffee. These results showed that the predict overall accuracy of the model was lowered to 77.78% (Table 2). These results showed that the predict overall accuracy of the model was lowered to 88.89% (Table 3).

CONCLUSION
Given coffee's popularity, geographical authentication of green beans is very required in the coffee industry. Our research showed that GC-MS analysis of volatiles provides sufficient information suitable for further statistical approaches. We observed significant differences in aromatic hydrocarbons and furan derivates by comparing the volatiles profiles of African coffee beans collected in 2018 and 2019. The profile of Central American coffee beans from two years was homogenous. Thus, no significant differences were observed. The aroma profile of South American coffees showed significant differences in aromatic hydrocarbons and alkanes (p-value < 0.05). To create a model suitable for geographical origin identification, LDA was used. Analyzed samples from 2018 suggest that the most significant parameters among volatiles are organic acids and esters, alkenes, and ketones. According to LDA, these parameters explained most of the variability between African, South American, and Central American coffee. Coffee is a biological material; thus, the impact of harvesting years must be considered. Therefore, samples from the same growing areas harvested in 2019 were added to LDA model. The confusion matrix showed that training samples were correctly identified among all geographical groups regardless of the year's harvest. However, leave-one-out testing showed several misclassified samples that lowered the model's overall accuracy. Furthermore, based on LDA calculations we could reduce input parameters and determined organic acids and esters, alkenes, terpenoids, aldehydes, and ketones to be most useful for geographical authentication regardless the year of harvesting. Results showed that harvesting year might affect the volatiles profile of coffee. More importantly, samples from both years suggest that the group of ketones were the most significant parameters in geographical origin identification regardless of the year. Therefore, a more detailed investigation of ketones in green coffee is needed, given that group may contain individual markers suitable for origin identification that can simplify whole identification process.