internet as data source in the istat survey on ict in enterprises
The Istat sampling survey on ICT in enterprises aims at producing information on
the use of ICT and in particular on the use of Internet by Italian enterprises for various purposes (e-commerce, e-recruitment, advertisement, e-tendering, e-procurement, egovernment). To such a scope, data are collected by means of the traditional instrument of the questionnaire. Istat began to explore the possibility to use web scraping techniques, associated, in the estimation phase, to text and data mining algorithms, with the aim to replace traditional instruments of data collection and estimation, or to combine them in an integrated approach. The 8,600 websites, indicated by the 19,000 enterprises responding to ICT survey of year 2013, have been scraped and the acquired texts have been processed in order to try to reproduce the same information collected via questionnaire. Preliminary results are encouraging, showing in some cases a satisfactory predictive capability of fitted
models (mainly those obtained by using the Naive Bayes algorithm). Also the method known as Content Analysis has been applied, and its results compared to those obtained with classical learners. In order to improve the overall performance, an advanced system for scraping and mining is being adopted, based on the open source Apache suite Nutch-Solr-Lucene. On the basis of the nal results of this test, an integrated system harnessing both survey data and data collected from Internet to produce the required estimates will be implemented, based on systematic scraping of the near 100,000 websites related to the whole population of Italian enterprises with 10 persons employed and more, operating in industry and services. This new approach, based on Internet as Data source (IaD), is characterized by advantages and drawbacks that need to be carefully analysed.
Reference Key |
barcaroli2015austrianinternet
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | ;Guilio Barcaroli;Alessandra Nurra;Sergio Salamone;Monica Scannapieco;Marco Scarnò;Donato Summa |
Journal | austrian journal of statistics |
Year | 2015 |
DOI | 10.17713/ajs.v44i2.53 |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.