Data Mining the SDSS SkyServer Database

From Simple Sci Wiki
Revision as of 04:08, 24 December 2023 by SatoshiNakamoto (talk | contribs) (Created page with "Title: Data Mining the SDSS SkyServer Database Abstract: This research aimed to build a database and interfaces to support both query load and ad-hoc access for the Sloan Digital Sky Survey's (SDSS) data. The paper discusses the database design, data loading pipeline, and query implementation and performance. The queries were translated to a single SQL statement, and most ran in less than 20 seconds, allowing scientists to interactively explore the database. Introducti...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: Data Mining the SDSS SkyServer Database

Abstract: This research aimed to build a database and interfaces to support both query load and ad-hoc access for the Sloan Digital Sky Survey's (SDSS) data. The paper discusses the database design, data loading pipeline, and query implementation and performance. The queries were translated to a single SQL statement, and most ran in less than 20 seconds, allowing scientists to interactively explore the database.

Introduction: The Sloan Digital Sky Survey (SDSS) is a five-year survey using a ground-based telescope to observe about 200 million objects in five optical bands. The survey measures the spectra of a million objects and produces a large and high-quality catalog of the northern sky and a small stripe of the southern sky. The raw telescope data is processed through a data analysis pipeline at Fermilab, which analyzes the images and extracts attributes for each celestial object. It also processes the spectra, extracting absorption and emission lines and other attributes. The pipeline is a significant part of the SDSS project, consisting of approximately 25% of the total cost and effort.

The research focused on creating a database and interfaces to support both query load and ad-hoc access for the SDSS data. The database design and data loading pipeline were discussed, along with query implementation and performance. The queries were designed to be efficient and user-friendly, allowing scientists to interactively explore the data.

Results: The research resulted in a successful database and interface system that supported both query load and ad-hoc access. The queries were translated to a single SQL statement and ran in less than 20 seconds, allowing scientists to interactively explore the database. The system was designed to be efficient and user-friendly, making it easier for scientists to analyze the data.

Conclusion: In conclusion, this research successfully developed a database and interface system that supported both query load and ad-hoc access for the SDSS data. The system was efficient and user-friendly, making it easier for scientists to analyze the data and interactively explore the database. This research has significant implications for the field of astronomy, as it provides a model for data management and analysis that can be applied to other large-scale astronomical surveys.

Link to Article: https://arxiv.org/abs/0202014v1 Authors: arXiv ID: 0202014v1