Data Science at the Command Line

Facing the Future with Time-Tested Tools

Author: Jeroen Janssens

Publisher: "O'Reilly Media, Inc."


Category: Computers

Page: 212

View: 657

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms

Hands-On Data Science with the Command Line

Automate everyday data science tasks using command-line tools

Author: Jason Morris

Publisher: Packt Publishing Ltd


Category: Computers

Page: 124

View: 866

Big data processing and analytics at speed and scale using command line tools. Key Features Perform string processing, numerical computations, and more using CLI tools Understand the essential components of data science development workflow Automate data pipeline scripts and visualization with the command line Book Description The Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed. This book will start with the requisite concepts and installation steps for carrying out data science tasks using the command line. You will learn to create a data pipeline to solve the problem of working with small-to medium-sized files on a single machine. You will understand the power of the command line, learn how to edit files using a text-based and an. You will not only learn how to automate jobs and scripts, but also learn how to visualize data using the command line. By the end of this book, you will learn how to speed up the process and perform automated tasks using command-line tools. What you will learn Understand how to set up the command line for data science Use AWK programming language commands to search quickly in large datasets. Work with files and APIs using the command line Share and collect data with CLI tools Perform visualization with commands and functions Uncover machine-level programming practices with a modern approach to data science Who this book is for This book is for data scientists and data analysts with little to no knowledge of the command line but has an understanding of data science. Perform everyday data science tasks using the power of command line tools.

Algorithms for Data Science

Author: Brian Steele

Publisher: Springer


Category: Computers

Page: 430

View: 399

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials. This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.

Data Science from Scratch

First Principles with Python

Author: Joel Grus

Publisher: O'Reilly Media


Category: Computers

Page: 406

View: 690

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. With this updated second edition, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.

The FAST Mission

Author: Robert F. Pfaff

Publisher: Springer Science & Business Media


Category: Science

Page: 219

View: 652

The FAST Mission contains detailed discussion of the design philosophy of a new breed of satellite to measure particles and fields in the magnetosphere. The FAST Mission is the only publicly available resource to provide complete and authoritative documentation of the FAST satellite and instruments. The FAST Mission contains detailed examples and descriptions of data gathered by its instruments and will thus be an invaluable source to those working with results from this new observatory. FAST's 'snapshot' data gathering approach that utilizes an onboard computer to recognize acceleration physics events and store them in the on-board "burst memory" have revolutionized our understanding of auroral microphysics. Such unique capabilities are described in full in The FAST Mission. The information included herein is unique and not available elsewhere. The book is intended for space physics researchers as well as satellite engineers.

Numerical Methods for the Life Scientist

Binding and Enzyme Kinetics Calculated with GNU Octave and MATLAB

Author: Heino Prinz

Publisher: Springer Science & Business Media


Category: Science

Page: 149

View: 979

Enzyme kinetics, binding kinetics and pharmacological dose-response curves are currently analyzed by a few standard methods. Some of these, like Michaelis-Menten enzyme kinetics, use plausible approximations, others, like Hill equations for dose-response curves, are outdated. Calculating realistic reaction schemes requires numerical mathematical routines which usually are not covered in the curricula of life science. This textbook will give a step-by-step introduction to numerical solutions of non-linear and differential equations. It will be accompanied with a set of programs to calculate any reaction scheme on any personal computer. Typical examples from analytical biochemistry and pharmacology can be used as versatile templates. When a reaction scheme is applied for data fitting, the resulting parameters may not be unique. Correlation of parameters will be discussed and simplification strategies will be offered.

Python Scripting for Computational Science

Author: Hans Petter Langtangen

Publisher: Springer Science & Business Media


Category: Computers

Page: 756

View: 699

With a primary focus on examples and applications of relevance to computational scientists, this brilliantly useful book shows computational scientists how to develop tailored, flexible, and human-efficient working environments built from small scripts written in the easy-to-learn, high-level Python language. All the tools and examples in this book are open source codes. This third edition features lots of new material. It is also released after a comprehensive reorganization of the text. The author has inserted improved examples and tools and updated information, as well as correcting any errors that crept in to the first imprint.

The Science of Bombing

Operational Research in RAF Bomber Command

Author: Randall Thomas Wakelam

Publisher: University of Toronto Press


Category: History

Page: 384

View: 251

After suffering devastating losses in the early stages of the Second World War, the United Kingdom's Royal Air Force established an Operational Research Section within bomber command in order to drastically improve the efficiency of bombing missions targeting Germany. In The Science of Bombing,Randall Wakelam explores the work of civilian scientists who found critical solutions to the navigational and target-finding problems and crippling losses that initially afflicted the RAF. Drawing on previously unexamined files that re-assess the efficacy of strategic bombing from tactical and technical perspectives, Wakelam reveals the important role scientific research and advice played in operational planning and how there existed a remarkable intellectual flexibility at Bomber Command. A fascinating glimpse into military strategy and decision-making, The Science of Bombing will find a wide audience among those interested in air power history as well as military strategists, air force personnel, and aviation historians.

Web and Network Data Science

Modeling Techniques in Predictive Analytics

Author: Thomas W. Miller

Publisher: FT Press


Category: Computers

Page: 384

View: 496

Master modern web and network data modeling: both theory and applications. In Web and Network Data Science, a top faculty member of Northwestern University’s prestigious analytics program presents the first fully-integrated treatment of both the business and academic elements of web and network modeling for predictive analytics. Some books in this field focus either entirely on business issues (e.g., Google Analytics and SEO); others are strictly academic (covering topics such as sociology, complexity theory, ecology, applied physics, and economics). This text gives today's managers and students what they really need: integrated coverage of concepts, principles, and theory in the context of real-world applications. Building on his pioneering Web Analytics course at Northwestern University, Thomas W. Miller covers usability testing, Web site performance, usage analysis, social media platforms, search engine optimization (SEO), and many other topics. He balances this practical coverage with accessible and up-to-date introductions to both social network analysis and network science, demonstrating how these disciplines can be used to solve real business problems.