当前位置:主页 > 资料 >

One process programming
栏目分类:资料   发布日期:2018-08-03   浏览次数:

导读:本文为去找网小编(www.7zhao.net)为您推荐的One process programming,希望对您有所帮助,谢谢! One process programming notes (with Go and SQLite) 2018 July 30 Blog-ified version of a talk I gave at . This content cover

本文为去找网小编(www.7zhao.net)为您推荐的One process programming,希望对您有所帮助,谢谢! www.7zhao.net



One process programming notes (with Go and SQLite)

2018 July 30 本文来自去找www.7zhao.net

Blog-ified version of a talk I gave at . 欢迎访问www.7zhao.net

This content covers my recent exploration of writing internet services, iOS apps, and macOS programs as an indie developer. 欢迎访问www.7zhao.net

There are several topics here that should each have their own blog post. But as I have a lot of programming to do I am going to put these notes up as is and split the material out some time later. 欢迎访问www.7zhao.net

My focus has been on how to adapt the lessons I have learned working in teams at Google to a single programmer building small business work. There are many great engineering practices in Silicon Valley's big companies and well-capitalized VC firms, but one person does not have enough bandwidth to use them all and write software. The exercise for me is: what to keep and what must go.

内容来自www.7zhao.net

If I have been doing it right, the technology and techniques described here will sound easy. I have to fit it all in my head while having enough capacity left over to write software people want. Every extra thing has great cost, especially rarely touched software that comes back to bite in the middle of the night six months later. 欢迎访问www.7zhao.net

Two key technologies I have decided to use are Go and SQLite.

去找(www.7zhao.net欢迎您

A brief introduction to SQLite

SQLite is an implementation of SQL. Unlike traditional database implementations like PostgreSQL or MySQL, SQLite is a self-contained C library designed to be embedded into programs. It has been built by D. Richard Hipp since its release in 2000, and in the past 18 years other open source contributors have helped. At this point it has been around most of the time I have been programming and is a core part of my programming toolbox. copyright www.7zhao.net

Hands-on with the SQLite command line tool

Rather than talk through SQLite in the abstract, let me show it to you.

本文来自去找www.7zhao.net

A kind person on Kaggle has of the plays of Shakespeare. Let's build an SQLite database out of it. 去找(www.7zhao.net欢迎您

$ head shakespeare_data.csv
"Dataline","Play","PlayerLinenumber","ActSceneLine","Player","PlayerLine"
"1","Henry IV",,,,"ACT I"
"2","Henry IV",,,,"SCENE I. London. The palace."
"3","Henry IV",,,,"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"
"4","Henry IV","1","1.1.1","KING HENRY IV","So shaken as we are, so wan with care,"
"5","Henry IV","1","1.1.2","KING HENRY IV","Find we a time for frighted peace to pant,"
"6","Henry IV","1","1.1.3","KING HENRY IV","And breathe short-winded accents of new broils"
"7","Henry IV","1","1.1.4","KING HENRY IV","To be commenced in strands afar remote."
"8","Henry IV","1","1.1.5","KING HENRY IV","No more the thirsty entrance of this soil"
"9","Henry IV","1","1.1.6","KING HENRY IV","Shall daub her lips with her own children's blood," 
copyright www.7zhao.net

First, let's use the sqlite command line tool to create a new database and import the CSV.

欢迎访问www.7zhao.net

$ sqlite3 shakespeare.db
sqlite> .mode csv
sqlite> .import shakespeare_data.csv import 

copyright www.7zhao.net

Done! A couple of SELECTs will let us quickly see if it worked.

copyright www.7zhao.net

sqlite> SELECT count(*) FROM import;
111396
sqlite> SELECT * FROM import LIMIT 10;
1,"Henry IV","","","","ACT I"
2,"Henry IV","","","","SCENE I. London. The palace."
3,"Henry IV","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"
4,"Henry IV",1,1.1.1,"KING HENRY IV","So shaken as we are, so wan with care,"
5,"Henry IV",1,1.1.2,"KING HENRY IV","Find we a time for frighted peace to pant,"
6,"Henry IV",1,1.1.3,"KING HENRY IV","And breathe short-winded accents of new broils"
7,"Henry IV",1,1.1.4,"KING HENRY IV","To be commenced in strands afar remote."
8,"Henry IV",1,1.1.5,"KING HENRY IV","No more the thirsty entrance of this soil"
9,"Henry IV",1,1.1.6,"KING HENRY IV","Shall daub her lips with her own children's blood," 

www.7zhao.net

Looks good! Now we can do a little cleanup. The original CSV contains a column called AceSceneLine that uses dots to encode Act number, Scene number, and Line number. Those would look much nicer as their own columns.

内容来自www.7zhao.net

sqlite> CREATE TABLE plays (rowid INTEGER PRIMARY KEY, play, linenumber, act, scene, line, player, text);
sqlite> .schema
CREATE TABLE import (rowid primary key, play, playerlinenumber, actsceneline, player, playerline);
CREATE TABLE plays (rowid primary key, play, linenumber, act, scene, line, player, text);
sqlite> INSERT INTO plays SELECT
	row AS rowid,
	play,
	playerlinenumber AS linenumber,
	substr(actsceneline, 1, 1) AS act,
	substr(actsceneline, 3, 1) AS scene,
	substr(actsceneline, 5, 5) AS line,
	player,
	playerline AS text
	FROM import; 

copyright www.7zhao.net

(The substr above can be improved by using instr to find the '.' characters. Exercise left for the reader.) 本文来自去找www.7zhao.net

Here we used the INSERT ... SELECT syntax to build a table out of another table. The ActSceneLine column was split apart using the builtin SQLite function substr , which slices strings.

本文来自去找www.7zhao.net

The result:

www.7zhao.net

sqlite> SELECT * FROM plays LIMIT 10;
1,"Henry IV","","","","","","ACT I"
2,"Henry IV","","","","","","SCENE I. London. The palace."
3,"Henry IV","","","","","","Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"
4,"Henry IV",1,1,1,1,"KING HENRY IV","So shaken as we are, so wan with care,"
5,"Henry IV",1,1,1,2,"KING HENRY IV","Find we a time for frighted peace to pant,"
6,"Henry IV",1,1,1,3,"KING HENRY IV","And breathe short-winded accents of new broils"
7,"Henry IV",1,1,1,4,"KING HENRY IV","To be commenced in strands afar remote."
8,"Henry IV",1,1,1,5,"KING HENRY IV","No more the thirsty entrance of this soil"
9,"Henry IV",1,1,1,6,"KING HENRY IV","Shall daub her lips with her own children's blood," 

www.7zhao.net

Now we have our data, let us search for something:

copyright www.7zhao.net

sqlite> SELECT * FROM plays WHERE text LIKE "whether tis nobler%";
sqlite> 内容来自www.7zhao.net 

That did not work. Hamlet definitely says that, but perhaps the text formatting is slightly off. SQLite to the rescue. It ships with a Full Text Search extension compiled in. Let us index all of Shakespeare with FTS5: copyright www.7zhao.net

sqlite> CREATE VIRTUAL TABLE playsearch USING fts5(playsrowid, text);
sqlite> INSERT INTO playsearch SELECT rowid, text FROM plays; 内容来自www.7zhao.net 

Now we can search for our soliloquy:

欢迎访问www.7zhao.net

sqlite> SELECT rowid, text FROM playsearch WHERE text MATCH "whether tis nobler";
34232|Whether 'tis nobler in the mind to suffer 
copyright www.7zhao.net

Success! The act and scene can be acquired by joining with our original table. copyright www.7zhao.net

sqlite> SELECT play, act, scene, line, player, plays.text
	FROM playsearch
	INNER JOIN plays ON playsearch.playsrowid = plays.rowid
	WHERE playsearch.text MATCH "whether tis nobler";
Hamlet|3|1|65|HAMLET|Whether 'tis nobler in the mind to suffer 去找(www.7zhao.net欢迎您 

Let's clean up. 欢迎访问www.7zhao.net

sqlite> DROP TABLE import;
sqlite> VACUUM; 内容来自www.7zhao.net 

Finally, what does all of this look like on the file system?

copyright www.7zhao.net

$ ls -l
<a href="/cdn-cgi/l/email-protection" data-cfemail="577a25202f257a2f257a2f17">[email protected]</a> 1 crawshaw  staff  10188854 Apr 27  2017 shakespeare_data.csv
-rw-r--r--  1 crawshaw  staff  22286336 Jul 25 22:05 shakespeare.db 去找(www.7zhao.net欢迎您 

There you have it. The SQLite database contains two full copies of the plays of Shakespeare, one with a full text search index, and stores both of them in about twice the space it takes the original CSV file to store one. Not bad.

copyright www.7zhao.net

That should give you a feel for the i-t-e of SQLite.

copyright www.7zhao.net

And scene.

去找(www.7zhao.net欢迎您

Using SQLite from Go

The standard database/sql

There are a number of cgo-based drivers available for SQLite. The most popular one appears to be . It gets the job done and is probably what you want. 去找(www.7zhao.net欢迎您

Using the database/sql package it is straightforward to open an SQLite database and execute SQL statements on it. For example, we can run the FTS query from earlier using this Go code:

本文来自去找www.7zhao.net

package main

import (
	"database/sql"
	"fmt"
	"log"

	_ "github.com/mattn/go-sqlite3"
)

func main() {
	db, err := sql.Open("sqlite3", "shakespeare.db")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	stmt, err := db.Prepare(`
		SELECT play, act, scene, plays.text
		FROM playsearch
		INNER JOIN plays ON playsearch.playrowid = plays.rowid
		WHERE playsearch.text MATCH ?;`)
	if err != nil {
		log.Fatal(err)
	}
	var play, text string
	var act, scene int
	err = stmt.QueryRow("whether tis nobler").Scan(&play, &act, &scene, &text)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s %d:%d: %q\n", play, act, scene, text)
} 

copyright www.7zhao.net

Executing it yields: 去找(www.7zhao.net欢迎您

Hamlet 3:1 "Whether 'tis nobler in the mind to suffer" 
内容来自www.7zhao.net

A low-level wrapper: crawshaw.io/sqlite

Just as SQLite steps beyond the basics of SELECT, INSERT, UPDATE, DELETE with full-text search, it has several other interesting features and extensions that cannot be accessed by SQL statements alone. These need specialized interfaces, and many of the interfaces are not supported by any of the existing drivers.

内容来自www.7zhao.net

So I wrote my own. You can get it fromcrawshaw.io/sqlite. In particular, it supports the streaming blob interface, the , and implements the necessary sqlite_unlock_notify machinery to make good use of the for connection pools. I am going to cover these features through two use case studies: the client and the cloud. 欢迎访问www.7zhao.net

cgo

All of these approaches rely on cgo for integrating C into Go. This is straightforward to do, but adds some operational complexity. Building a Go program using SQLite requires a C compiler for the target.

内容来自www.7zhao.net

In practice, this means if you develop on macOS you need to install a cross-compiler for linux. 本文来自去找www.7zhao.net

Typical concerns about the impact on software quality of adding C code to Go do not apply to SQLite as it has an extraordinary degree of testing. The quality of the code is exceptional. 欢迎访问www.7zhao.net

Go and SQLite for the client

I am building an , with almost all the code written in Go and the UI provided by a web view. This app has a full copy of the user data, it is not a thin view onto an internet server. This means storing a large amount of local, structured data, on-device full text searching, background tasks working on the database in a way that does not disrupt the UI, and syncing DB changes to a backup in the cloud. 欢迎访问www.7zhao.net

That is a lot of moving parts for a client. More than I want to write in JavaScript, and more than I want to write in Swift and then have to promptly rewrite if I ever manage to build an Android app. More importantly, the server is in Go, and I am one independent developer. It is absolutely vital I reduce the number of moving pieces in my development environment to the smallest possible number. Hence the effort to build (the big bits) of a client using the exact same technology as my server. 本文来自去找www.7zhao.net

The Session extension

The session extension lets you start a session on an SQLite connection. All changes made to the database through that connection are bundled into a patchset blob. The extension also provides method for applying the generated patchset to a table. 内容来自www.7zhao.net

func (conn *Conn) CreateSession(db string) (*Session, error)

func (s *Session) Changeset(w io.Writer) error

func (conn *Conn) ChangesetApply(
	r          io.Reader,
	filterFn   func(tableName string) bool,
	conflictFn func(ConflictType, ChangesetIter) ConflictAction,
) error 

copyright www.7zhao.net

This can be used to build a very simple client-sync system. Collect the changes made in a client, periodically bundle them up into a changeset and upload it to the server where it is applied to a backup copy of the database. If another client changes the database then the server advertises it to the client, who downloads a changeset and applies it.

去找(www.7zhao.net欢迎您

This requires a bit of care in the database design. The reason I kept the FTS table separate in the Shakespeare example is I keep my FTS tables in a separate attached database (which in SQLite, means a different file). The cloud backup database never generates the FTS tables, the client is free to generate the tables in a background thread and they can lag behind data backups. 去找(www.7zhao.net欢迎您

Another point of care is minimizing conflicts. The biggest one is AUTOINCREMENT keys. By default the primary key of a rowid table is incremented, which means if you have multiple clients generating rowids you will see lots of conflicts.

www.7zhao.net

I have been trialing two different solutions. The first is having each client register a rowid range with the server and only allocate from its own range. It works. The second is randomly generating int64 values, and relying on the low collision rate. So far it works too. Both strategies have risks, and I haven't decided which is better.

www.7zhao.net

In practice, I have found I have to limit DB updates to a single connection to keep changeset quality high. (A changeset does not see changes made on other connections.) To do this I maintain a read-only pool of connections and a single guarded read-write connection in a pool of 1. The code only grabs the read-write connection when it needs it, and the read-only connections are enforced by the read-only bit on the SQLite connection. www.7zhao.net

Nested Transactions

The database/sql driver encourages the use of SQL transactions with its Tx type, but this does not appear to play well with nested transactions. This is a concept implemented by SAVEPOINT / RELEASE in SQL, and it makes for surprisingly composable code. 本文来自去找www.7zhao.net

If a function needs to make multiple statements in a transaction, it can open with a SAVEPOINT , then defer a call to RELEASE if the function produces no Go return error, or if it does instead call ROLLBACK and return the error.

去找(www.7zhao.net欢迎您

func f(conn *sqlite.Conn) (err error) {
	conn...SAVEPOINT
	defer func() {
		if err == nil {
			conn...RELEASE
		} else {
			conn...ROLLBACK
		}
	}()
} 欢迎访问www.7zhao.net 

Now if this transactional function f needs to call another transactional function g , then g can use exactly the same strategy and f can call it in a very traditional Go way:

copyright www.7zhao.net

if err := g(conn); err != nil {
	return err // all changes in f will be rolled back by the defer
} 欢迎访问www.7zhao.net 

The function g is also perfectly safe to use in its own right, as it has its own transaction.

欢迎访问www.7zhao.net

I have been using this SAVEPOINT + defer RELEASE or return an error semantics for several months now and find it invaluable. It makes it easy to safely wrap code in SQL transactions. 本文来自去找www.7zhao.net

The example above however is a bit bulky, and there are some edge cases that need to be handled. (For example, if the RELEASE fails, then an error needs to be returned.) So I have wrapped this up in a utility: 去找(www.7zhao.net欢迎您

func f(conn *sqlite.Conn) (err error) {
	defer sqliteutil.Save(conn)(&err)

	// Code is transactional and can be stacked
	// with other functions that call sqliteutil.Save.
} 

内容来自www.7zhao.net

The first time you see sqliteutil.Save in action it can be a little off-putting, at least it was for me when I first created it. But I quickly got used to it, and it does a lot of heavy lifting. The first call to sqliteutil.Save opens a SAVEPOINT on the conn and returns a closure that either RELEASE s or ROLLBACK s depending on the value of err, and sets err if necessary.

去找(www.7zhao.net欢迎您

Go and SQLite in the cloud

I have spent several months now redesigning services I have encountered before and designing services for problems I would like to work on going forward. The process has led me to a general design that works for many problems and I quite enjoy building. 去找(www.7zhao.net欢迎您

It can be summarized as 1 VM, 1 Zone, 1 process programming .

本文来自去找www.7zhao.net

If this sounds ridiculously simplistic to you, I think that's good! It is simple. It does not meet all sorts of requirements that we would like our modern fancy cloud services to meet. It is not "serverless", which means when a service is extremely small it does not run for free, and when a service grows it does not automatically scale. Indeed, there is an explicit scaling limit. Right now the best server you can get from Amazon is roughly: www.7zhao.net

  • 128 CPU threads at ~4GHz
  • 4TB RAM
  • 25 Gbit ethernet
  • 10 Gbps NAS
  • hours of yearly downtime

That is a huge potential downside of of one process programming. However, I claim that is a livable limit. www.7zhao.net

I claim typical services do not hit this scaling limit. www.7zhao.net

If you are building a small business, most products can grow and become profitable well under this limit for years. When you see the limit approaching in the next year or two, you have a business with revenue to hire more than one engineer, and the new team can, in the face of radically changing business requirements, rewrite the service.

www.7zhao.net

Reaching this limit is a good problem to have because when it comes you will have plenty of time to deal with it and the human resources you need to solve it well.

欢迎访问www.7zhao.net

Early in the life of a small business you don't, and every hour you spend trying to work beyond this scaling limit is an hour that would have been better spent talking to your customers about their needs. 欢迎访问www.7zhao.net

The principle at work here is:

欢迎访问www.7zhao.net

Don't use N computers when 1 will do.

To go into a bit more technical detail,

本文来自去找www.7zhao.net

I run a single VM on AWS, in a single availability zone. The VM has three EBS volumes (this is Amazon name for NAS). The first holds the OS, logs, temporary files, and any ephemeral SQLite databases that are generated from the main databases, e.g. FTS tables. The second the primary SQLite database for the main service. The third holds the customer sync SQLite databases. copyright www.7zhao.net

The system is configured to periodically snapshot the system EBS volume and the customer EBS volumes to S3, the Amazon geo-redundant blob store. This is a relatively cheap operation that can be scripted, because only blocks that change are copied.

本文来自去找www.7zhao.net

The main EBS volume is backed up to S3 very regularly, by custom code that flushes the WAL cache. I'll explain that in a bit. copyright www.7zhao.net

The service is a single Go binary running on this VM. The machine has plenty of extra RAM that is used by linux's disk cache. (And that can be used by a second copy of the service spinning up for low down-time replacement.)

本文来自去找www.7zhao.net

The result of this is a service that has at most tens of hours of downtime a year, about as much change of suffering block loss as a physical computer with a RAID5 array, and active offsite backups being made every few minutes to a distributed system that is built and maintained by a large team. 去找(www.7zhao.net欢迎您

This system is astonishingly simple. I shell into one machine. It is a linux machine. I have a deploy script for the service that is ten lines long. Almost all of my performance work is done with pprof.

copyright www.7zhao.net

On a medium sized VM I can clock 5-6 thousand concurrent requests with only a few hours of performance tuning. On the largest machine AWS has, tens of thousands.

去找(www.7zhao.net欢迎您

Now to talk a little more about the particulars of the stack:

www.7zhao.net

Shared cache and WAL

To make the server extremely concurrent there are two important SQLite features I use. The first is the shared cache, which lets me allocate one large pool of memory to the database page cache and many concurrent connections can use it simultaneously. This requires some support in the driver for sqlite_unlock_notify so user code doesn't need to deal with locking events, but that is transparent to end user code.

欢迎访问www.7zhao.net

The second is the Write Ahead Log. This is a mode SQLite can be knocked into at the beginning of connection which changes the way it writes transactions to disk. Instead of locking the database and making modifications along with a rollback journal, it appends the new change to a separate file. This allows readers to work concurrently with the writer. The WAL has to be flushed periodically by SQLite, which involves locking the database and writing the changes from it. There are default settings for doing this.

欢迎访问www.7zhao.net

I override these and execute WAL flushes manually from a package that, when it is done, also triggers an S3 snapshot. This package is called reallyfsync , and if I can work out how to test it properly I will make it open source. 本文来自去找www.7zhao.net

Incremental Blob API

Another smaller, but important to my particular server feature, is SQLite's . This allows a field of bytes to be read and written in the DB without storing all the bytes in memory simultaneously, which matters when it is possible for each request to be working with hundreds of megabytes, but you want tens of thousands of potential concurrent requests.

内容来自www.7zhao.net

This is one of the places where the driver deviates from being a close-to-cgo wrapper to be moreGo-like: 本文来自去找www.7zhao.net

type Blob
    func (blob *Blob) Close() error
    func (blob *Blob) Read(p []byte) (n int, err error)
    func (blob *Blob) ReadAt(p []byte, off int64) (n int, err error)
    func (blob *Blob) Seek(offset int64, whence int) (int64, error)
    func (blob *Blob) Size() int64
    func (blob *Blob) Write(p []byte) (n int, err error)
    func (blob *Blob) WriteAt(p []byte, off int64) (n int, err error) 
www.7zhao.net

This looks a lot like a file, and indeed can be used like a file, with one caveat: the size of a blob is set when it is created. (As such, I still find temporary files to be useful.) 欢迎访问www.7zhao.net

Designing with one process programming

I start with: Do you really need N computers? copyright www.7zhao.net

Some problems really do. For example, you cannot build a low-latency index of the public internet with only 4TB of RAM. You need a lot more. These problems are great fun, and we like to talk a lot about them, but they are a relatively small amount of all the code written. So far all the projects I have been developing post-Google fit on 1 computer. 本文来自去找www.7zhao.net

There are also more common sub-problems that are hard to solve with one computer. If you have a global customer base and need low-latency to your server, the speed of light gets in the way. But many of these problems can be solved with relatively straightforward CDN products.

去找(www.7zhao.net欢迎您

Another great solution to the speed of light is geo-sharding. Have complete and independent copies of your service in multiple datacenters, move your user's data to the service near them. This can be as easy as having one small global redirect database (maybe SQLite on geo-redundant NFS!) redirecting the user to a specific DNS name like {us-east, us-west}.mservice.com.

本文来自去找www.7zhao.net

Most problems do fit in one computer, up to a point. Spend some time determining where that point is. If it is years away there is a good chance one computer will do.

去找(www.7zhao.net欢迎您

Indie dev techniques for the corporate programmer

Even if you do not write code in this particular technology stack and you are not an independent developer, there is value here. Use the one big VM, one zone, one process Go, SQLite, and snapshot backup stack as a hypothetical tool to test your designs. copyright www.7zhao.net

So add a hypothetical step to your design process: If you solved your problem on this stack with one computers, how far could you get? How many customers could you support? At what size would you need to rewrite your software?

内容来自www.7zhao.net

If this indie mini stack would last your business years, you might want to consider delaying the adoption of modern cloud software. www.7zhao.net

If you are a programmer at a well-capitalized company, you may also want to consider what development looks like for small internal or experimental projects. Do your coworkers have to use large complex distributed systems for policy reasons? Many of these projects will never need to scale beyond one computer, or if they do they will need a rewrite to deal with shifting requirements. In which case, find a way to make an indie stack, linux VMs with a file system, available for prototyping and experimentation. 欢迎访问www.7zhao.net

copyright www.7zhao.net


本文原文地址:https://crawshaw.io/blog/one-process-programming-notes

以上为One process programming文章的全部内容,若您也有好的文章,欢迎与我们分享! 欢迎访问www.7zhao.net

Copyright ©2008-2017去找网版权所有   皖ICP备12002049号-2 皖公网安备 34088102000435号   关于我们|联系我们| 免责声明|友情链接|网站地图|手机版