Pro Python Data Wrangling Author:Wes McKinney Pro Python Data Wrangling is about turning the messy data we have to work with, in the real world, into a clean dataset. Missing or incomplete data, varying time series, and inconsistent definitions are just a few of the things that can turn our data into a mess. — Data Wrangling is the process of resolving these problems with data, and this book... more » shows you the techniques that you need, as a Python developer, to resolve the data messes that you're likely to encounter and transform them into clean datasets. So whether you're facing missing data with gaps that needs to be imputed or estimated, terrible inconsistencies across datasets, or time-series that don't seem to want to talk to each other, this book shows you the creative ways you can work with your data to get back to that clean dataset.
The good news is that as a Python programmers you have a wealth of libraries at your disposal to wrangle data. This book will show you how to analyze data patterns in your datasets, across large or small domains. You will learn how to use Python's language features and libraries to manipulate and analyze data?at first, without recourse to libraries outside the standard Python download, and then to a wider library set should your wrangling needs demand it.
Along the way, you'll discover two of the richest Python libraries that a data wrangler could ever wish for: ScyPy and NumPy. You don't need a degree in statistics or mathematics to use these libraries for your everyday data analysis, but knowing their data structures and fundamental algorithms helps enormously as you wrangle your data.
You'll also learn how to parse and store data optimally, and what databases to use for large datasets. Whether to use relational or no-SQL databases is examined carefully as you consider your data wrangling options. Large data sets also impose their own requirements and so we find in this book how to design new libraries and how to optimize them. What you?ll learn What data wrangling is, and the Python resources to data wrangle How to establish a clean dataset from messy data! How to use Python-inherent features for data analysis Make the Python standard library useful for large data sets Dive into NumPy and SciPy without being a statistician Data storage, parsing and serialization How to cope with No-SQL databases and when to use relational databases Designing and optimizing Python code for large data sets Who this book is for This is a book for all Python programmers who are interested in analyzing data whatever their source. If you're dealing with data, and having some problems, you're probably already data wrangling, and this book will show you all the data wrangling tips you need.« less